I am shocked in how huge AI bubble I exist 🫧
It’s a pure example of how algorithms can put you in the box.
I made this realization from the title of this newsletter very recently. Just because my current approach to business as a freelancer hit the wall on two different levels.
The first one is the content I consume, and the people that I follow online (or clients that I work with). Most of them are forward-thinking, tech industry veterans or companies developing high-tech solutions. It’s natural that within this bubble we have common interests and language that we speak. It got me to assume that everyone around us has the same amount of context and understanding.
The second one is the “real” world outside. When I was preparing a prompt engineering workshop agenda that we are selling under TestArmy - Cybersecurity and Software Testing Services and testuj.pl, we sent it to IT professionals. More than half of them (!) didn’t even start to explore LLMs and their potential. There’s ChatGPT and for a long time nothing else. A recent poll I conducted also revealed that the most popular choice for people wanting to explore AI is prompt design or engineering.
A similar situation got to me at some of the recent tech conferences and events that I’ve attended. People have NO IDEA not even how to work with AI but how to effectively prompt it to do things. There’s a huge craving for an explanation of the simple use cases that can be done around work or life. No matter if it’s marketing, law, coding or just brainstorming food recipes for friends gathering.
People are overwhelmed and confused. The common emotion around AI that I’ve noticed is frustration and FOMO. In private conversations, I've also noticed a sense of guilt stemming from a lack of understanding. Everyone seems to brag about how they exploit this tech yet a huge group of people don’t even have an idea where to start.
What can be the solution here besides education? How to close this gap effectively?
What do you think?
🗝️ Quick Bytes:
Mistral AI seeks $6b valuation
Mistral AI, a Paris-based company, is seeking to raise $600 million at a valuation of $6 billion, marking a significant increase from its previous target of $5 billion. The company's rapid valuation growth is attributed to its successful product launches, competitive performance in benchmark tests, and strategic partnerships with major tech players like Microsoft. Mistral AI's open-source approach to AI development and its ability to attract significant investor interest has positioned it as a notable competitor to established Silicon Valley firms in the AI space. The current funding round is expected to include contributions from returning investors such as General Catalyst and Lightspeed Venture Partners, as well as new investor DST Global.
OpenAI explores NSFW content
OpenAI is exploring the possibility of allowing its AI models to generate NSFW content in a responsible and age-appropriate manner, marking a shift from its previous stance. The organization aims to provide users with the flexibility to use its services for diverse creative scenarios while adhering to strict guidelines to prevent misuse. OpenAI's exploration has sparked discussions within the developer and user communities, with some expressing interest in the potential for creative expression and others raising concerns about ethical implications.
DeepMind model predicts life molecules
AlphaFold 3, developed by Google DeepMind and Isomorphic Labs, is a groundbreaking AI model that can predict the structures and interactions of all life's molecules with fantastic accuracy. By modeling the joint 3D structure of proteins, DNA, RNA, ligands, and other organic elements, AlphaFold 3 offers a comprehensive understanding of life at the molecular level. The model's capabilities have the potential to revolutionize various scientific fields, particularly accelerating drug discovery and the development of new treatments for diseases. AlphaFold 3 has been made accessible to researchers through the AlphaFold Server.
🎛️ Algorithm Command Line
In this issue, I would not show you a tip. Rather I want to seed one idea in your head.
Recently, I realized that AI gave me a huge unexpected benefit.
It struck me when I was eating breakfast with my parents during the last weekend. This thought just popped out of nowhere.
Both of my parents work in academia, and we were talking about how education is evolving, not only because of AI but also because of technology. My mom shared with me her recent struggles around building the gamified process for effective learning, and I noticed that in just a couple of minutes, I intuitively pinpointed the things that can be solved quicker or enhanced by LLMs.
I realized that the hours I've put into prompt engineering and understanding LLMs reshaped my brain. It forced me naturally to break down problems, think more abstractly, and communicate with almost laser precision using proper words.
Not that I didn’t have these skills before. I had them thanks to writing and being a journalist for almost a decade. But they were not as sharp and fast as they are now. It feels like building a new muscle in my brain.
The input problem
By its nature, prompt engineering forces you to be specific, detailed, and intentional. We as humans are constantly trying to solve the biggest problem within our species - the INPUT. We are limited there by the language and the level of understanding of others.
People remember 10% of what they read, 20% of what they hear, 30% of what they see, 50% of what they see and hear, 70% of what they say and write, and 90% of what they do.
(just think how the world would look like if you could transfer your idea with full context to another person in under a second).
There are a lot of things that are lost in translation.
This isn't about human vs. machine; it's human + machine, and it's incredibly powerful.
Did you notice something similar in your way of doing things?
💡Explained
While everyone argues about how to make RAG better, hashtag#xLSTM paper just came out (Extended Long Short-Term Memory). It extends over one of the most known neural network architectures called LSTM. One of the co-authors is Sepp Hochreiter, who together with Schmidhuber was the first author of the original LSTM paper released in 97'.
So what's new in xLSTM?
1️⃣ Residual stacking (each layer feeds into the next with a residual connection, improving gradient flow during training) + 2️⃣ LayerNorm + 3️⃣ Matrix memory. The matrix memory (2D array) can store more complex patterns and relationships between different parts of the input data, than classic 1d vectors.
As they authors say: "contrary to Transformers, xLSTM networks have a linear computation and a constant memory complexity with respect to the sequence length. Since the xLSTM memory is compressive, it is well
suited for industrial applications and implementations on the edge."
Transformers which typically have quadratic complexity due to their attention mechanisms.
🕵♀️
Take a look at this very interesting line on page 11: "In contrast to other methods, xLSTM models maintain low perplexities for longer contexts" - What does this mean?
Perplexity basically measures how "surprised" a model is by the input, meaning, it checks well the model can predict the next word in a sequence.
So first, they trained using data sequences that were 2048 items long. After training, they were tested using much longer data sequences, up to 16384 items, to see if they still performed well.
The xLSTM model, compared to others like RWKV-4, Llama, and Mamba, did pretty good. It was able to keep its predictions accurate even as the data sequences got much longer.
💡
The disadvantage lies in the parallelization. While both xLSTM and Transformers can leverage GPU parallelization, the memory mixing feature in xLSTMs limits their parallelizability compared to Transformers (at least for now).
👩🔬
The paper is closed with the following words that fill us with hope: „(…) How far do we get in language modeling when scaling LSTM to billions of parameters? So far, we can answer: “At least as far as current technologies like Transformers or State Space Models”. We have enhanced LSTM to xLSTM by exponential gating with memory mixing and a new memory structure. xLSTM models perform favorably on language modeling when compared to state-of-the-art methods like Transformers and State Space Models. The scaling laws indicate that larger xLSTM models will be serious competitors to current Large Language Models that are built with the Transformer technology. xLSTM has the potential to considerably impact other deep learning fields like Reinforcement Learning, Time Series Prediction, or the modeling of physical systems.”
What do you think about it? Will XLSTMs be the next "thing"?