The Week in AI: The Empire Strikes Back (100K Tokens, Transformer Agents, One Embedding to Bind Them All)
After watching Google, seemingly, struggle to announce and release useful AI products, they’re striking back with a vengeance.
I was never sure how much of the internet rumors, gossip, and dramatic “code red” stories you could believe.
In my estimation, and knowing some of what Google's been cooking behind the scenes for years, there was truth to Google scrambling.
To be sure much of it was pure speculation and adversarial narratives propagated by competitors trying to gain an edge and investors looking to scoop up cheap Google stock.
Now with Generative AI, activist investing and hostile competitors can spin up fake news, fake stories, fake announcements, fake images, fake videos, fake podcasts—fake everything—and take over the news cycle for a while. Just long enough to achieve an objective.
Previously, you’d have to be working for any of the three-letter agencies in America, or the UK equivalent, to wield this sort of propaganda power.
Now? You can sit in your basement and unleash any deepfake you want, aided and abetted by bots you’ve coded.
Google’s revenge tour consists of:
Removing the waitlist for Bard and making it available in 180 countries. They’ve added support for new languages, dark mode, export functions, and visual search. Google says the upgraded Bard is particularly good at tackling coding queries, including debugging and explaining chunks of code in more than 20 languages
The Google Search experience will soon come with AI-generated answers prioritized at the top of the page.
Generative AI Creation Tools in Workspace. Google is integrating AI into its Workspace apps like Google Docs, Sheets, and Slides. Create job descriptions, write creative stories, or auto-generate spreadsheets for tracking information—and build out whole presentations, suggesting text for slides or instantly generating custom visual elements like photos.
Launching PaLM 2, a suite of 4 language models which excel at sense reasoning, mathematics, coding, and logic tasks. Some models are so lightweight that they can run on mobile devices. The key quote: “What we found in our work is that it’s not really the sort of size of model — that the larger is not always better.”
An update to Google’s photo-editing feature, which will now be called Magic Editor, and it’s like a quick mobile version of Photoshop. You can change nearly every element of a photo, including adjusting lighting, removing unwanted foreground elements like backpack straps, and even moving the subject of the photo into other parts of the frame.
And even more exciting, MusicLM. Turn text descriptions into music.
There’s a lot more, of course. You can gorge yourself on all things Google AI here.
Meanwhile, Anthropic expands Claude’s context window with 100,000 tokens (via API):
This is far more than any other language model, right now. And it will open up new use cases that no other language model can match:
OpenAI is slowly making 32,000 tokens available (via API) but with an emphasis on slowly.
We’ll see what Bard and other models do now.
Fundamentally, what all this means is that these models with huge context windows can now consume more of the reality you’re willing to feed it. As mentioned already, AGI continues to morph into a thing mostly because humans are on track to make it a thing.
We’re happily shoving it with more of reality, real or imagined, in larger context windows. As these windows expand, the cutoff between AI and reality evaporates. Which also means that reality is becoming more malleable and subject to changes no one can begin to understand yet.
Or, as Anthropic helpfully explains:
“The average person can read 100,000 tokens of text in ~5+ hours], and then they might need substantially longer to digest, remember, and analyze that information. Claude can now do this in less than a minute.”
When a large language model can consume, understand, and manipulate reality faster and better than you—you won’t even understand the changes happening all around, and inside, you.
Just imagine the effects on education and the formation of young minds:
Here’s why this is good news:
You can curate and control your reality, completely. Instead of relying on another Augustine rewriting the rise and fall of Rome—you can rewrite and reframe any history, any present reality, in any way you want.
Now here’s why this is bad news:
You can curate and control your reality, completely. Instead of relying on another Augustine rewriting the rise and fall of Rome—you can rewrite and reframe any history, any present reality, in any way you want.
Perhaps soon, you’ll be dating an AI girlfriend, modeled from a real person—for $1 per minute:
All you need is to train and fine-tune a model on data:
AI girlfriends (and maybe boyfriends?) will be a massive market within a year. I’m willing to bet a large sum that, just like on OnlyFans, many of these “girlfriends” will be creations of chubby males, sitting alone in a dark room.
The fundamental questions that are coming at us faster than the speed of AI are:
Who owns you?
Who owns your image?
Who owns your touch?
Who owns your voice?
Who owns your person?
Who owns your mind?
Who owns your thoughts?
Who owns your feelings?
Who has rights to any of it, besides you?
Surveillance capitalism has so far made the claim on your whole digital self. And they did so without asking you and without telling you the true meaning of their complicated Terms of Service and the data they’ve been harvesting.
But now, in the Age of AI, when you can be cloned and deepfaked in seconds:
Who owns you?
This must be discussed and legislated now, because it’s already too late.
Even your DNA is up for grabs if nothing changes.
In other AI news around the internet
Let’s lighten up the mood a little bit with bits and pieces from the great beyond, the internet.
Airtable is putting a stake in the ground and aiming to become a focal point for various AI apps. In their own words:
“Fully custom-implemented AI is expensive to build, iterate, and deploy across the range of use cases – and the agility each of them demands – in an organization. What's needed is a 100x faster way to build highly engaging AI apps. That’s where Airtable comes in.”
Meta brings Generative AI to advertisers. They’re calling it the AI Sandbox, and the suit of the tools include alternative text copy variations, generating various backgrounds, image cropping for Facebook and Instagram ads, and a lot more.
Along with the AI sandbox, there’s ImageBind, “one embedding to bind them all”, the first AI model capable of:
“Binding data from six modalities at once, without the need for explicit supervision. By recognizing the relationships between these modalities — images and video, audio, text, depth, thermal and inertial measurement units (IMUs) — this breakthrough helps advance AI by enabling machines to better analyze many different forms of information, together.”
In other words, truly multimodal and which could lead to the creation of a metaverse:
HuggingFace launches Transformers Agent, a coding language model that can generate other HuggingFace models on-the-fly for multimodal tasks. Create an agent using LLMs (OpenAssistant, StarCoder, OpenAI, etc.) and start talking to transformers and diffusers. It responds to complex queries and offers a chat mode. Create images using your words, have the agent read the summary of websites out loud, and read through a PDF.
The question remains of how fast these AI leaps will begin to affect everyone and everything. And now that a behemoth like Google has jumped in the fray, it seems likely we’ll get the answer soon.
And yet people will be more lonely, miserable and stressed out than ever.