Ian Kelk, in his article, wrote more about it and even created fantastic illustrations to help you visualize how exactly GPT operates in the environment of 'normal' conversation.
First of all, ChatGPT has no persistent memory or state - it doesn't remember previous parts of the conversation. Each time you send it a message, it re-reads the entire chat history to generate a response. This gives the illusion of it recalling context.
As conversations get very long, ChatGPT starts cutting the chat history it processes, only keeping a rolling window of more recent context. This means it can eventually 'forget' what was said at the start of long conversations.
Ever experienced this? Yeah, exactly.
Being stateless is a good thing technically, but from the perspective of the average user, it can sometimes be frustrating. It almost always leads to a worse response and final result of the 'conversation'.
Is there a solution then? No, because GPT will always hit a completion limit of around 4,096 tokens, which is approximately 3000 words.
Things will change for sure in the future but now you can be happy and relieved - AI doesn't know who you are and has absolutely no idea what you wanted from it a couple of days ago.
🗝️ Quick Bytes:
Google unveiled Gemini: new, advanced AI model
Google has introduced a new conversational AI model called Gemini that can summarize web pages, answer followup questions, and admit when it doesn't know something.
Gemini is more powerful than previous models like LaMDA and can have a coherent, multi-turn dialogue while staying on topic and providing accurate, up-to-date information. Google developed new techniques like sparse access and self-supervised learning to make Gemini safer, more grounded in reality, and able to differentiate fact from fiction.
Gemini will help improve Google Search and provide users with clearer, more helpful information. Google plans to incorporate Gemini into products over time as they ensure it meets their standards.
Meta’s new AI image generator was trained on 1.1 billion Instagram and Facebook photos
Meta released a free AI image generator website called "Imagine with Meta AI" which uses an AI model called Emu that was trained on 1.1 billion publicly available Instagram and Facebook photos.
The model can create new images based on text prompts entered by users, similar to other AI image generators like DALL-E and Stable Diffusion. To use the tool you need a Meta account connected to Facebook or Instagram, and it will generate 4 images per prompt which can be downloaded.
Meta says this expands access to their AI image generation technology beyond messaging apps where it was previously available, now opening it up as a "creative tool" for hobbyists in a standalone website. However, some issues exist with photorealism and potentially generating inappropriate content.
AMD releases new chips to power faster AI training
AMD has announced new AI accelerator chips - the Instinct MI300X and the Instinct M1300A APU - with higher memory capacity and energy efficiency to power faster training and inference of large language models compared to previous AMD chips and competing Nvidia H100 chips.
The MI300X offers 1.5x more memory than prior AMD chips while the MI300A APU combines CPU and GPU for 30x better energy efficiency and 1.6x more memory than the Nvidia offering. AMD also announced the Ryzen 8040 mobile processor with integrated AI acceleration, claiming 1.6x more AI performance, to bring more on-device AI capabilities.
AMD CEO Lisa Su asserted these new chips establish AMD as the highest performing AI accelerator provider and expand their $45 billion addressable data center market as AI adoption grows. The chips better support scale and complexity needs of evolving large language models from cloud providers and enterprises.
🎛️ ChatGPT Command Line
I pushed ChatGPT to break the limits of time.
The limitations surprised me. There's a lot of these experiments (especially on Reddit) lately. Users are pushing DALLE-3 to the limit in terms of not only visualising the past but also predicting the future.
All of that just using images. I got inspired by these and tried it out by myself.
How far we can go back in the future from modern lab environment? I noticed that the limit to some kind of "teleportation" is around 10 images. Not 12, not 15, just 10.
We've reached singularity guys. 🤪 What to test next? Any ideas?
💡Explained
Mamba: Selective State Space Model that outperformed transformers
For the past few years, transformers have been the leading architecture in deep learning. Foundation models are almost universally based on the Transformer architecture and its core attention module. When scaled up, we observed their huge capabilities - ChatGPT and Claude are both based on transformer-like architecture. But is this 'the final' or the best architecture? Transformers have limitations, such as the inability to model beyond a finite window and quadratic scaling with window length. What if we could create more efficient attention variants to improve effectiveness?
A few days ago, researchers from Princeton University published a new architecture called Mamba: Mamba: Linear-Time Sequence Modeling with Selective State Spaces.
🎯 Motivation
The authors argue that the biggest problem of sequence modeling is compressing context into a smaller state. The problem is the context-aware ability to focus on or filter out inputs into a sequential state (selectivity). One method of incorporating a selection mechanism into models is by letting their parameters affect interactions along the sequence and be input-dependent. An existing example would be the convolution kernel of a CNN, which is also known as a hardware-friendly architecture.
🤖 Selective State Space Models
Selective state space models can be interpreted as a combination of recurrent neural networks (RNNs) and convolutional neural networks (CNNs), with inspiration from classical state space models (Kalman 1960). This class of models can be computed very efficiently as either a recurrence or convolution, with linear or near-linear scaling in sequence length.
Selection Mechanism. The authors proposed a simple selection mechanism by parameterizing the SSM parameters based on the input. This allowed the model to filter out irrelevant information and remember relevant information indefinitely.
Hardware-aware. The authors proposed a hardware-aware algorithm that computes the model recurrently with a scan instead of convolution, resulting in up to 3× faster calculations on A100 GPUs.
Architecture. They simplified prior deep sequence model architectures by combining the design of prior SSM architectures with the MLP block of Transformers into a single block, leading to a simple and homogenous architecture design (Mamba) incorporating selective state spaces.
🏆 Results
By making SSM parameters input-dependent, Mamba efficiently manages sequence data, selectively focusing on relevant information. Mamba significantly outperforms Transformers in processing speed and scales linearly with sequence length, showing reasonably good results. It was tested on language, audio, and genomics tasks. Interestingly, the Mamba-3B model matches or exceeds the capabilities of Transformers e.g. Mamba-2.8B tested on multiple datasets achieves 63.3% accuracy on average, whereas GPT-Neo 2.7B achieves 56.5%, Pythia-6.9B 61.7% and OPT-6.7B 62.9%.
Authors released public code and you can run a demo on Google Colab, and the model on HuggingFace (mamba-chat). More results and answers to reviewers are available on OpenReview platform.
It is exciting to see that different architectures not only achieve similar performance as transformers but also beat them. Mamba, even at a smaller size of 3B, outperforms some 7B open-source models. If this architecture will scale as well as transformers, maybe we will see more effective 70b models with similar or even better performance. This gives hope for future development.
🗞️ Longreads
- Ego, Fear and Money: How the A.I. Fuse Was Lit (read)