We are entering the era of multiple personalities 👥
Are the ability to create digital clones of ourselves will impact our sense of identity?
It’s obvious. We live with masks, especially in the era of building high-impact personal branding. As we go further and further, the lines between our real and digital identities become blurred or even invisible at some point. Will the future, with the help of sophisticated algorithms, allow us to create multiple personas and juggle them around - based on tasks, relationships, and goals? How might the vast amount of our digital personalities affect our online and offline relationships? How do these technologies intersect with mental health? (which is, as for now, getting worse and worse in the society).
Tools now allow you to create your avatar - talking in multiple languages and almost perfectly mimicking your face. But we are still far from creating digital clones that can fully replicate human cognition and emotions.
Also, current legal systems must be equipped to handle the complexities of digital clones acting independently. Not to mention that the commodification of digital selves could create new economic models and disrupt existing ones, potentially redistributing wealth and power - providing even more junk food for evolving digital capitalism.
Given the profound implications for privacy, identity, and society, there could be considerable public resistance to adopting these technologies.
But are we sure we are ready and prepared for it to come?
🗝️ Quick Bytes:
Sony's Advanced AI Research: A Push for Inclusive Skin Tone Recognition
Sony AI's recent research suggests a more comprehensive approach to measuring skin color in AI algorithms, beyond the traditional focus on lightness and darkness. The study, led by William Thong, Alice Xiang, and Przemyslaw Joniak, emphasizes the inclusion of red and yellow skin hues to enhance AI system diversity.
Existing scales, like the Monk Skin Tone Scale and the Fitzpatrick scale, primarily evaluate skin tones on a light-to-dark spectrum. Sony's study found that these measures can overlook biases against certain demographics, like East Asians and Hispanics. For instance, AI algorithms showed a preference for redder skin and misinterpreted those with a redder hue as "more smiley."
Sony proposes using an automated method based on the CIELAB color standard to address these biases. Despite the simplicity and intent behind the Monk Skin Tone Scale, major AI entities like Google and Amazon are considering Sony's multidimensional approach to skin tone evaluation.
LinkedIn's AI Revamp: Enhancements Across Recruitment, Learning, and Marketing
LinkedIn, the Microsoft-owned professional networking platform, is integrating new AI features into its suite of products to enhance job hunting, marketing, and sales functionalities. This move builds upon the platform's past implementations of AI, including AI-driven writing suggestions and AI-created job descriptions.
The latest additions encompass "Recruiter 2024," which employs generative AI to aid recruiters in refining their candidate searches using conversational language. Furthermore, "LinkedIn Learning" is introducing a chatbot-like "learning coach" offering advice and course recommendations. Marketing initiatives are also getting an AI upgrade with the "Accelerate" tool, although its use is confined to LinkedIn's ecosystem.
Another focal area is B2B sales, where AI will facilitate salespeople in discovering potential connections and initiating discussions with leads. While AI has profoundly influenced sales elsewhere, LinkedIn's adoption in this segment appears to be a delayed yet significant move.
The AI Challenge with Watermarks: Striving for Accuracy and Inclusivity
1Watermarks, traditionally used to prevent counterfeiting, have taken on a new significance in the realm of AI, serving as a means to identify AI-generated content and combat deep fakes and misinformation. Tech giants like Google, OpenAI, Meta, and Amazon are actively working on watermarking technologies for this purpose.
Researchers at the University of Maryland found that current watermarking methods are easily evaded and can even be falsely added to non-AI content. However, the same team developed a robust watermark almost impossible to remove without harming the content's intellectual property. Separate research by the University of California, Santa Barbara and Carnegie Mellon University revealed that watermarks can be removed through both destructive and constructive attacks, affecting image quality.
As AI-generated content, including deep fakes, gains prominence, especially with upcoming events like the 2024 U.S. presidential election, there's an urgent need for improved watermarking. Tools like Google's SynthID aim to address this challenge, though advancements in the space remain a race against hackers and potential misuse.
🎛️ ChatGPT Command Line
It is finally here. With the partnership from Microsoft, OpenAI implemented the DALL-E 3 model natively to the ChatGPT’s interface.
If you wonder, what you can do with it - you will be amazed.
Let’s dive in.
What is DALL-E 3?
DALL-E 3 stands at the forefront of AI technology, designed to transform textual descriptions into vivid images. Thanks to ChatGPT Plus, you can now enjoy a native and intuitive experience of this tech, opening a realm of visual possibilities right at your fingertips.
How Does It Work?
At its core, DALL-E 3 operates on neural networks, which are designed to simulate the way human brains process information. By training on a massive dataset of image-text pairs, it learns to associate words with visual patterns. The VQ-VAE-2 technology further refines its learning process, allowing it to compress and understand images efficiently.
Want to create this kind of image? Remember about:
Crafting Your Prompt:
Be Detailed: the magic lies in the details. Guide DALL-E 3 by describing every aspect of your desired image. Instead of "a sunset", paint a picture with words like, "A serene lakeside sunset with vibrant oranges reflecting on calm waters."
Portrait shot of a white girl with short brown hair, looking stylish in her yellow Nike hoodie layered over a white t-shirt, and paired with black shorts. Her look is completed with white oval glasses. The camera used for this shot is the Sony Alpha III, with a 50mm portrait lens at an aperture of f/1.4. The background features imposing brutalist architecture, contrasted by the gentle glow of the setting sun.
Captured in a vertical frame, hooded artists are seen working on a colorful graffiti masterpiece on the side of a metro train. The graffiti's vivid colors pop against the train's exterior. Beyond the train, the Berlin skyline stands illuminated at night, with its city lights casting captivating reflections on the train, merging art and urban beauty.
In this 50mm lens capture, a detective embodies the essence of noir films, standing in a rain-drenched city illuminated by vibrant neon lights. The glow from the neons contrasts with the dark moodiness of the scene. As the detective smokes his cigarette, the rising smoke further accentuates the atmosphere. His hat is strategically positioned, ensuring his eyes remain obscured.
Experiment with Styles: unleash your creativity. Play with different artistic styles. Fancy a quirky scene? "Cartoon of a cat playing a guitar". Or perhaps a classic touch? "Oil painting of a snowy mountain peak."
Best Practices:
Avoid Ambiguity: clarity is key. The more specific you are, the better DALL-E 3 can translate your vision.
Ethical Use: remember to respect copyright boundaries. Avoid generating images of sensitive or controversial topics.
Customization & Fine-tuning:
Specify Image Type: set the tone right from the start. Whether you want a "photo", "cartoon", or "watercolor", make it known in your prompt.
Iterative Refinement: if the first output isn't quite right, use it as feedback. Refine your description based on what you see and guide DALL-E 3 closer to your vision.
Feedback Loop: understand how DALL-E 3 interprets your instructions. This insight will help you adjust and perfect your prompts over time.
Pitfalls to Avoid:
Over-Complexity: while details are good, avoid cramming too many elements without clear context. This can confuse the model.
Cultural Nuances: while DALL-E 3 is advanced, it might not always capture localized or cultural subtleties. Be mindful and check the generated images for any inadvertent misrepresentations.
If you have any questions - don’t hesitate to ask! Just reply for this email.
💡Explained
The Technology behind Illusion-Like Generated Images
Recently, there has been a surge of interest generation of illusion-like images. Despite reaching its zenith in popularity over the last few weeks, it all traces back to the publication of a paper titled "Adding Conditional Control to Text-to-Image Diffusion Models", which made its debut earlier this year in February. Subsequently, around April, a new model was uploaded to HuggingFace Model, accompanied by a popular Reddit Post. Both model and Reddit post presented the use of diffusion models for crafting creative QR codes. More recently, the internet buzzed with high-quality and attention-grabbing illusion-like images, ranging from intricate swirls to chessboards, all meticulously generated using these models.
⚙️ How does it work?
The mentioned paper introduced a "supporting" model named ControlNet. As the name suggests, the role of a ControlNet is to guide and control the image generation process, e.g. to direct the diffusion model into generating photographs based on provided condition image. In terms of architecture, ControlNet is a trainable copy of a diffusion model that is connected with the original diffusion model through "zero convolution" layers. So, in essence, ControlNet is just an additional part of the diffusion model.
Illustration from the ControlNet paper showing a proposed model working together with diffusion model („neural network block”)
In my opinion, the huge advantage of a ControlNet is that it is a separate module that can be easily connected or detached from the diffusion model, and we can control (with the parameters) how much the ControlNet can influence the output. Such architecture resulted in a potent and controllable generation model.
🤖 How it was trained?
First, let's look into how the training dataset was built.
Dataset: We need a pair of images: a condition image that will work as an input and and the other intended to be generated based on that condition. Original authors showed that we can automatically generate datasets by reverse engineering e.g. take a photograph and extract edges from it (e.g. with Canny method). Or simply take segmentation masks from a selected open dataset.
Then, we can use prepared data - for instance, image edges - to train the model to generate source images.
Training: During the training we freeze the original diffusion model weights and train only the ControlNet. This way we get a model that can control the original diffusion model to generate realistic-looking photographs that will follow the shape of drawn edges.
💡Possibilities and Limitations
In conclusion, the ability to control generated outputs through prompts and also with input images opens up exciting possibilities not only in terms of art or marketing but also in training data generation or modification. This technology allows a user to generate artistic and still scannable QR codes, illusion-like images, generating photographs from scribbles. However, to achieve great results we still have to experiment a lot with prompt engineering and parameter tuning. I've tried a couple of times to generate our own QR Code for the Keygen but I ended up either with uninteresting or not-readable QR codes. I guess the technology is promising but still requires additional work to make it easier to use.
If you are interested in trying it yourself, I've prepared a Colab notebook for you.
Maybe you will have more luck and your illusion-like image will become viral?
🗞️ Longreads
- Scaling GAIA-1: 9-billion parameter generative world model for autonomous driving (read)