3.3 million years ago, the earliest known stone tools were unearthed in Kenya. These Lomekwian tools, primarily used for hammering, were likely crafted by hominin species such as Australopithecus afarensis or Kenyanthropus.
The adoption of tools marked a pivotal moment in human evolution. Stone tools, for instance, allowed our ancestors to explore new dietary options, like meat from large animals, catalyzing dietary shifts that significantly influenced our evolutionary path.
The creation of increasingly complex tools demanded refined motor control and coordination, sparking the evolution of our brains and cognitive functions.
Fast forward 3 million years, and AI revolutionizes our approach to information processing, decision-making, and problem-solving.
Consider these figures:
ChatGPT emerged in the public sphere in November 2022.
Since then, OpenAI is nearing a $100 billion valuation.
In December 2023, Mistral AI secured approximately $487 million in its latest funding round, reaching a valuation of about $2 billion.
Anthropic stands at an $18.4 billion valuation.
Microsoft's market capitalization? A staggering $2.784 trillion.
The future isn't merely being written; it's being meticulously programmed.
🗝️ Quick Bytes:
Microsoft launches free Copilot app for Android
Microsoft has launched a dedicated Copilot app for Android, which is now available in the Google Play Store. The app provides access to Microsoft's AI-powered Copilot without the need for the Bing mobile app.
The Android version of Copilot, which has been available for nearly a week, is similar to ChatGPT, offering chatbot capabilities, image generation through DALL-E 3, and the ability to draft text for emails and documents.
It also includes free access to OpenAI's latest GPT-4 model. The launch of the Copilot app for Android comes a little over a month after Microsoft rebranded Bing Chat to Copilot.
OpenAI buffs safety team and gives board veto power on risky AI
OpenAI is enhancing its internal safety measures to mitigate the risks of harmful AI. They have established a new "safety advisory group" that will oversee the technical teams and provide recommendations to the leadership.
The board has also been given veto power, although it remains to be seen how and when this power will be exercised. These changes are part of OpenAI's updated "Preparedness Framework", which likely underwent revisions following a leadership shake-up in November.
Despite the typically private nature of such policy changes, the recent leadership changes and ongoing discussions about AI risk make these developments noteworthy.
Apple’s iPhone design chief enlisted by Jony Ive, Sam Altman to work on AI devices
Tang Tan, Apple's iPhone Design Chief, is set to leave Apple in February 2024 to work on a new artificial intelligence hardware project.
He will join LoveFrom, the design studio of legendary designer Jony Ive, and collaborate with Sam Altman of OpenAI. The project aims to create devices with the latest AI capabilities. The design and functionality of these new products will be shaped by LoveFrom, while Altman will provide the software underpinnings.
The information was provided by sources who wished to remain anonymous as the project is not yet public.
🎛️ Algorithm Command Line
Do you struggle to generate long form content with ChatGPT?
It's because of this one thing:
The fundamental principle behind LLM's. AI don't see the words like we humans do.
It’s all in the tokens – a concept we often overlook.
Think of tokens as puzzle pieces of language. Unlike the straightforward words we use, AI breaks down language into these smaller units – tokens. It’s like giving AI a magnifying glass to examine the nuances of language.
A single token is roughly equivalent to four English characters or three-quarters of a word.
1-2 sentences ≈ 30 tokens.
1 paragraph ≈ 100 tokens.
1,500 words ≈ 2048 tokens.
This insight is crucial when you’re playing the role of a content architect, building your structure one block at a time.
Imagine you have a canvas of 4097 tokens to paint your ideas. How you distribute these tokens between your prompt and ChatGPT’s response can be a game-changer in content generation.
The token-to-char ratio varies with languages. For instance, ‘Cómo estás’ splits into 5 tokens for 10 characters. This is like adjusting your lens when switching between different linguistic landscapes.
How I approach long form content in ChatGPT? I remember about this 5 things:
🟡 Treat your content like episodes of a series. Break it down into digestible segments within token limits.
🟡 Start with a solid foundation and add layers progressively. It’s like sculpting; each stroke adds more definition to your creation.
🟡 Direct ChatGPT to delve deeper into specific areas. Each prompt is a spotlight focusing on a different part of your content landscape.
🟡 Use summarization to create a blueprint, then ask ChatGPT to fill in the details. It’s like sketching before painting.
🟡 Ensure each segment links back to the previous one, maintaining a seamless flow. This is like creating a breadcrumb trail through your content forest.
Generating long-form content with ChatGPT is less about wrestling with a complex tool and more about learning a new language of tokens. Remember, it’s not the arsenal of tools but the skill in using them that crafts exceptional content.
💡Explained
What could LLM do with your smartphone?
Researchers from Tencent proposed, AppAgent: Multimodal Agents as Smartphone Users, an approach that differs from existing phone assistants like Siri, which operates through system back-end access and function calls. Instead, they proposed an LLM agent that interacts with smartphone apps in a human-like manner, using tapping and swiping on the screen.
⚙️ How it works?
The process involves two steps: an exploration phase where the agent interacts with apps through pre-defined actions and learns from their outcomes.; second, the deployment phase were the agent uses acquired knowledge to perform a task.
Exploration Phase 🧭
In this phase, the agent figures out the app's functionality via trial-and-error interactions and observing outcomes. It uses different actions and observes the resulting changes in the app interface to understand how it works. The LLM agent, attempts to figure out the functions of UI elements and the effects of specific actions by analyzing screenshots before and after each action (Actions: tap, long press, swipe on elements, text input, back navigation). This information is then compiled into a document that records the effects of actions applied to different elements. When a UI element is acted upon multiple times, the agent will update the document based on past documents and current observations to improve quality.
Deployment Phase 🛠️
The agent implements a systematic, step-by-step methodology for task execution. In each step, the agent is first tasked with providing its observations of the current UI, followed by articulating its thought process concerning the task and current observations. In each step the agent has access to a screenshot of the current UI and a dynamically generated document detailing the functions of UI elements and the actions’ effects on the current UI page, and prompts with possible actions. Finally, after all that observing and planning the the agent proceeds to execute actions by invoking available functions. After each action, the agent summarizes the interaction history and the actions taken during the current step. This information is incorporated into the next prompt, which provides the agent with a form of memory.
📊 Results
The approach was evaluated on 50 tasks across 10 diverse apps including Google Maps, Twitter, Telegram, YouTube, email, shopping, and even image editing. Results showed that agents can effectively handle a wide variety of high-level tasks on unfamiliar apps with success rates from 73.3% to 95.6% depending on how the exploration phase was performed. The best results were achieved with manually crafted documents (when the exploration phase is omitted and a human built a document), second place for watching human demos, and last place by auto exploration.
Conclusions
The key innovation is the learning approach: agent learns to use apps via autonomous exploration or observing human demos. This approach eliminates the need for system back-end access which adds to security, and flexibility aspects. The proposed exploration-based learning strategy allows the agent to adapt to new applications with unfamiliar user interfaces, making it a versatile tool for various tasks. Additionally, the code is made open on GitHub. It is worth noticing, that the high differences in results between the AutoExploartion and manually crafted documents suggest that the exploration phase can be further improved, potentially leading to much better results.
However, the swiping/tapping approach itself is not as novel as the authors suggest, as a similar approach was proposed in the paper „Empowering LLM to use Smartphone for Intelligent Task Automation” in August 2023. The main components included a functionality-aware UI representation method that helped LLM understand the UI, exploration-based memory injection techniques that augment the app-specific domain knowledge of LLM, and a multi-granularity query optimization module that reduces the cost of model inference. Sounds similar right? Still, both papers are worth exploring, as they show how we can use LLM’s to operate on different devices, without connecting it to the backend.
🗞️ Longreads
- How well can GPT-4 simulate an acid trip in 1963? (read)