Imagine advanced AI systems like LaMDA and ChatGPT as intricate clockworks, where each gear and spring (algorithm and data point) works in harmony to drive the hands of the clock (AI responses). In 2021, when Google engineer Blake Lemoine suggested that LaMDA might possess consciousness, it was akin to pondering whether a clock could not just tell time but also understand its passage, a leap from mechanical function to a form of awareness.
To investigate this possibility, a team of 19 experts from diverse fields came together, much like master clockmakers, philosophers, and physicists would to explore the mysteries of a clock that might perceive time. They didn't seek a single definitive sign of consciousness in the clockwork but rather developed a checklist, akin to a series of tests to examine the clock's various mechanisms and interactions.
Their method was like analyzing the clock's inner workings, not just observing the movement of its hands. They delved into the AI's structure, looking at how each algorithmic gear and data spring interlocked and interacted, trying to discern if these interactions could resemble those of a conscious being.
The 14 criteria they devised were like various aspects of clockwork to be inspected – from the precision of the gears (algorithmic efficiency) to the resilience of the springs (data processing robustness). They applied these criteria to different AI systems, examining if any of these 'clockworks' showed signs of consciousness, of not just marking time but sensing it.
Their conclusion was that no AI system yet shows the full depth and intricacy of consciousness, akin to a clock that is masterful in keeping time but does not perceive its passage. However, their framework offers a blueprint for future exploration, a guide for examining and understanding these complex 'clockworks' of AI as they evolve, potentially developing features akin to a form of awareness, much like a clock that starts to understand the very time it measures.
🗝️ Quick Bytes:
Anthropic released Claude 2.1
Anthropic's Claude 2.1, an advancement of its large language model, offers a doubled token context window of 200K, enabling processing of up to 150,000 words or 500 pages. This feature significantly improves its capability in handling complex tasks like summarization and Q&A for longer documents. Additionally, the model integrates with developer-defined APIs for seamless incorporation into users' tech stacks.
The new version boasts a 2x reduction in hallucination rates, making it more truthful and reliable. It demonstrates a 30% decrease in incorrect answers and is less likely to erroneously affirm a document's support for a specific claim. Unlike some models, Claude 2.1 is more inclined to express uncertainty rather than provide incorrect information.
Claude 2.1 supports system prompts, allowing developers to guide the model for specific tasks or personas, enhancing user experience and consistency. The cost for using this advanced model is set at $8 per million tokens for input prompts and $24 per million tokens for the model's output, positioning it as a competitive alternative in the market, especially considering the current challenges faced by rival OpenAI.
Amazon is working on the custom AI model - “Olympus”
Amazon is developing a large language model named "Olympus" with 2 trillion parameters, potentially surpassing OpenAI's GPT-4 in size. This project, led by former Alexa head Rohit Prasad, aims to position Amazon as a major player in the field of artificial intelligence.
The initiative involves integrating Amazon's various AI efforts, leveraging expertise from the Alexa AI and Amazon science teams. Amazon's goal with Olympus is to enhance its Amazon Web Services (AWS) offerings, making them more appealing to enterprise clients seeking access to top-tier AI models.
This move towards developing larger AI models signifies a strategic shift in resource allocation, with increased investment in AI and a reduction in retail operations' fulfillment and transportation sectors. Amazon has also collaborated with AI startups like Anthropic and AI21 Labs, integrating their models into AWS.
UnitedHealthcare shows how easily you can make harm with algorithms
UnitedHealthcare, a major U.S. health insurer, faces a lawsuit for allegedly using a flawed AI algorithm, 'nH Predict', to override doctors' decisions and deny necessary health coverage to elderly patients. This has reportedly led to premature discharges from care facilities, causing patients to spend their savings on essential care covered under Medicare Advantage Plan.
Developed by UnitedHealth subsidiary NaviHealth, 'nH Predict' assesses post-acute care needs based on a database of medical cases, but has been criticized for not considering vital factors like comorbidities. The algorithm's decisions often result in significantly reduced care periods, with patients rarely staying in nursing homes for more than 14 days, despite being eligible for up to 100 days of covered care.
The lawsuit highlights the reversal of over 90% of denials upon appeal, suggesting the algorithm's consistent inaccuracies in coverage decisions. Despite this, UnitedHealth insists on the non-coverage-determining role of 'nH Predict', emphasizing its use as a guideline. The case raises serious concerns about the ethical and practical implications of AI in healthcare decision-making.
🎛️ ChatGPT Command Line
You can drown in 1000's of AI tools.
What if I told you that you need only 3 of them?
Today, I want to share something a bit different – my top three AI tools that streamline my daily workflow. Believe me, you don't need many tools; just a few efficient ones can make a difference in productivity and research.
First up is Perplexity AI. Developed by former Google engineers, this tool redefines web searches by aggregating information from diverse sources like Reddit, academic papers, and more into one comprehensive answer. It’s a game-changer for content creation and research, consolidating insights from multiple resources in one go. And the best part? Many are still unaware of this gem, making it my secret weapon for efficiency.
Next, for macOS users, there's Mac Whisper Pro. This sleek app, powered by OpenAI's Whisper model, is perfect for converting video and audio recordings to text. I often use it to transcribe webinars and podcasts quickly, fueling my content creation with fresh ideas. Its versatility is unmatched, fitting various needs from meeting transcriptions to podcast editing.
Finally, Descript is my go-to for video projects. It’s not your usual video editing tool; it allows you to edit videos based on transcripts, making the process intuitive and swift. From enhancing audio quality to tweaking visuals, Descript is incredibly efficient and affordable, cutting down my LinkedIn video editing time to just 15-20 minutes.
So, that's my Saturday share – a relaxed insight into the tools behind my content and videos. Are you familiar with these tools, or do you have others to recommend? I'm always curious to learn about new tools that can make our professional lives more manageable.
Looking forward to hearing about your go-to tools. Is it just GPT-4, or are there other hidden gems you've discovered?
💡Explained
Removing irrelevant text for better answer generation
In just a few months RAG became a standard used almost in every LLM service. However, models still make mistakes, especially when unnecessary information is provided to its output. To address this, researchers from Meta AI proposed System 2 Attention (S2A) - a new approach where the model regenerates the context to only include relevant information before answering.
⚙️ How It Works
S2A has two steps. First, we want to regenerate the context to remove irrelevant or distracting information, filtering out the noise, and ensuring that only important details remain. This is done by prompting the model to rewrite the text and extract only the useful parts.
Then, we generate the response using only the regenerated context. This focuses attention on what's relevant.
For example, on a factual QA task, if opinions are provided that suggest incorrect answers, S2A removes those opinions from the regenerated context. The essence of S2A is its ability to make the AI's attention process more efficient and targeted.
📊 Results
S2A was evaluated on three tasks, increasing accuracy from 62.8% to 80.3% on factual QA, from 51.7% to 61.3% on math word problems, and improved objectivity by 57.4% for argument generation.
💡Advantages and potential limitations
By regenerating the context to focus on what's relevant, S2A improves performance across diverse tasks in terms of accuracy, factuality, and objectivity. Another advantage is that S2A is not just a standalone approach but is also complementary to other reasoning methods. For instance, in the math problem experiment, chain-of-thought reasoning was also applied to the context generated by S2A.
S2A's performance depends on the quality of regeneration prompts, and hence LLM’s size. Smaller models can be prone to errors. Also, computing the extra reasoning step incurs additional costs. This is due to the need for regenerating parts of the context, which adds extra computational steps, similar to methods like chain-of-thought that also create intermediate generations. The cost varies based on the length of the context regeneration; larger contexts incur higher computational costs. The paper suggests potential speedup methods, like generating only the different parts or referencing labels for large sections, but these are left for future work.
To sum up, S2A regenerates the input context to only include the relevant portions, before attending to the regenerated context to elicit the final response. In experiments, S2A outperforms standard attention-based LLMs on three tasks containing opinion or irrelevant information, QA, math word problems and longform generation, where S2A increases factuality and objectivity, and decreases sycophancy.
What do you think, does S2A seem like a useful technique for real-world applications? How else could we improve attention and reasoning in AI systems? 👩🔬
🗞️ Longreads
- Hugging Face Removes Singing AI Models of Xi Jinping But Not of Biden (read)