That's for a good start. I hope that you smiled a bit.
🗝️ Quick Bytes:
‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says
OpenAI, the developer of AI tools like ChatGPT, has stated that it cannot create its products without using copyrighted material, amidst growing scrutiny over the content AI firms use for training. This comes in the wake of a lawsuit from the New York Times and a group of authors, including George RR Martin and John Grisham, accusing OpenAI and Microsoft of unlawful use of their work.
OpenAI defended its use of copyrighted content, arguing that limiting training data to public domain materials would result in inadequate AI systems. The company leans on the legal doctrine of "fair use", which allows the use of copyrighted content under certain circumstances without requiring the owner's permission.
In response to concerns about AI safety, OpenAI supports independent analysis of its security measures, including "red-teaming" where third-party researchers emulate rogue actors to test product safety. The company has agreed to collaborate with governments on safety testing their most powerful models before and after deployment.
Artists are making creative companies apologize for using AI
Popular drawing tablet brand Wacom and Dungeons & Dragons publisher Wizards of the Coast both faced backlash after using AI-generated images in their marketing materials. This fueled mistrust from digital artists who see AI art as a threat to their livelihoods.
The controversies highlighted the difficulty of detecting AI art, as tools become more advanced and integrated into mainstream software like Adobe Photoshop. WotC admitted AI slipped through via a third-party vendor, showing the problem of ensuring human-made art through supply chains.
The inability to reliably avoid or identify AI art has caused anxiety among artists, some even abandoning their careers. While companies like Wacom and WotC apologized and promised more oversight, the persistence of generative AI means ongoing pressure campaigns from creatives wanting accountability and transparency around its use.
How Microsoft found a potential new battery material using AI
Microsoft and Pacific Northwest National Laboratory used AI and cloud computing to rapidly discover a promising new solid-state battery electrolyte material that could enable safer, less lithium-reliant batteries. Their platform screened 32 million candidates suggested by AI, narrowing it down to just 23 materials in 80 hours - a feat impossible without AI acceleration.
One candidate was synthesized and used to build a working prototype battery to power small electronics, demonstrating potential to cut sodium use by 70%. However, its conductivity was lower than predicted, so more research is needed to optimize performance.
While exciting, many battery material discoveries never make it to market. But the speed of AI and computing can compress decades of materials science into years, helping develop the next-gen batteries urgently needed to transition to renewable energy and combat climate change. There's also a need to power data centers on clean energy to limit AI's own emissions impact.
🎛️ Algorithm Command Line
I know that you sometimes feel like a prisoner in your mind.
This approach will allow you to reframe your thoughts:
I am an overthinker. Like, hell yeah, overthinker.
It was always my big struggle.
So I started to experiment with things that can help me reframe my thoughts and uncover certain pattern to focus on.
And of course, it’s a prompt.
But the best part is that you can use it too, just by changing one sentence in it.
It goes like that:
I need you to do a task that will require step by step approach.
STEPS:
1. You will hire a group of experts to interview me.
2. There will be personal coach, business coach, psychologist, diabetologist and expert from tech business industry that worked in popular startups.
3. You will start asking me questions from the perspective of each of these experts.
4. One question, one answer.
5. After each answer you will ask contrarian question that helps deepening the perspective.
6. Each of these experts should seek for patterns and behaviours that needs to be change to achieve healthier lifestyle.
7. After any round of questions from ALL the experts you will provide your conclusion on me formatted in the table for better clarity.
8. You will constantly ask questions and provide conclusions until I will write "stop".
Are you ready to begin?
This prompt is designed to ask you not only relevant questions, but also provide contrarian view that will broaden your perspective, and also after every round of questions provide you conclusions formatted in the nice table for clarity.
You just need to change the experts from the point 2 for your desired ones.
Use your imagination.
And let me know how it worked for you.
PS.
Yes, I am aware that it will not replace real experts and specialist - especially in terms of psychology related topics.
But playing a ping-pong with your own thoughts is sometimes a good experiment.
💡Explained
Language models may not be few-shot anymore
LLMs are well-known for their remarkable ability in zero-shot and few-shot tasks, often outperforming dedicated models in diverse benchmarks. However, a recent study on task contamination in LLMs introduces a critical perspective: these impressive benchmark results might be skewed by the inclusion of test data during training.
☣️ Understanding task contamination
Task contamination also referred to as data contamination or data leakage, occurs when test examples are used during model training. This leads to higher performance on familiar data compared to unseen data. The paper identifies two primary contamination sources:
Test data contamination: defined as incorporating test data examples with labels in the model's pre-training data.
Task contamination: when task training examples are included in pre-training data, undermining the validity of zero or few-shot evaluations*.
*Zero-shot evaluation is an evaluation where a model has seen zero examples for the task. Few-shot, or 𝑁-shot, where 𝑁 is a small number, is where the model has seen 𝑁 examples for the task.
👩🔬Methodology
The authors proposed four different methods for measuring task contamination:
Training data inspection: Analyzing training data for the presence of task training examples.
Task example extraction: Extracting task examples from existing models (refer to our previous post for more details).
Membership inference: Determining if model-generated content for a specific input is an exact match of the original sample.
Chronological analysis: Evaluating models trained at different times on datasets with known release dates, seeking chronological evidence of contamination.
The three first methods, while precise, are limited by low recall - the absence of contamination evidence does not guarantee its non-existence. Chronological analysis, on the other hand. offers high recall but low precision. It can effectively identify contamination but may be influenced by other performance factors.
📊 Results
Authors found evidence that some LLMs have seen task examples during pre-training for a range of tasks. Task contamination potentially inflates the zero-shot or few-shot performance of closed-sourced models, thus rendering them unreliable as baselines in these contexts, especially for models enhanced with instruction fine-tuning or RLHF. Moreover, when there was no task contamination, LLMs did not show any statistically significant improvements over majority baselines, in both zero and few-shot settings. This might suggest that performance increase over time in GPT-3 series models for various tasks is likely attributable to task contamination.
Conclusions
The research does not necessarily imply that LLMs are „bad”. They still perform great. However, it shows problems with existing benchmarks and LLMs comparison. Investigating task contamination seems to be a must-have if we want to better understand LLM’s performance.
🗞️ Longreads
- The New York Times’ AI Opportunity (read)