The Week in AI: GPT-4 Launches, Everyone Else Piles In
AI is arriving in everything, everywhere, all at once.
This week is one for the history books in the world of AI, Machine Learning, and Large Language Models.
And I’m not only talking about GPT-4, which is yet another defining moment (in a daily cacophony of stunning AI releases).
On Monday, Stanford released Alpaca 7B, an instruction-tuned version of LLaMA 7B that "behaves similarly to OpenAI's "text-davinci-003" but runs on much less powerful hardware, like smartphones.
On Tuesday alone:
GPT-4 is released to the public (after being tested by select partners for months).
Anthropic launched Claude, an AI chatbot “that’s easier to talk to” and produces “less harmful content.”
Google opened up its AI language model PaLM to challenge OpenAI and GPT-3.
Google starts adding Generative AI tools to Workspaces (Gmail, Docs, Calendar, etc.).
AdeptAI raised $350 million as part of its Series B to develop an AI assistant that can turn text commands into sets of actions.
On Wednesday, Pytorch 2.0 (a widely used technology for machine learning training) and Midjourney V5 (the newest version of the popular AI image generator's neural network) were made available.
And on Thursday, Microsoft introduces 365 Copilot for apps like Word, Excel, PowerPoint, Outlook, Teams, and more.
But the news that’s overshadowing them all is still GPT-4:
The OpenAI Developer Livestream got 1.5 million views in about 20 hours.
The announcement tweet got 4x more likes than the same for ChatGPT, itself the biggest story of 2022.
GPT-4 is already the 11th-most upvoted Hacker News story of all time.
Let’s look at what we know (and don’t know) about GPT-4.
GPT-4 Executive Summary
Currently, GPT-4 is available to ChatGPT Plus users, and API access is being rolled out. Free access via ChatGPT will follow at some point.
Training, dataset, and method
OpenAI did not disclose what it used or how it trained the model, including the energy costs and hardware used, making GPT-4 the company’s most secretive release thus far.
Inside the 98-page technical report, OpenAI says:
“Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method."
We don't know how GPT-4 differs from GPT-3 regarding data, computer power, hardware, settings, or how it was trained.
But we know that GPT-4 started learning two years ago and finished in August 2022. The latest data cutoff is from September 2021.
Of course, there are tightening limits around “safety” and “harmful content” (20-30% fewer hallucinations and unsafe content). The favoring of a certain kind of political bias continues unabated.
OpenAI focused instead on what GPT-4 can do. They worked with other companies to launch it and showed examples of how it's being used:
Duolingo showed new ways to learn Spanish and French, like explaining answers and role-playing. GPT-4 can help with other languages too.
Intercom launched a chatbot called Fin that can better understand what people want, give clear answers, and even hand over to a real person when needed.
It turns out Bing was GPT-4 all along. Microsoft said they called GPT-4 "Prometheus" secretly.
Morgan Stanley wealth management deploys GPT-4 to organize its vast knowledge base.
And the list will continue to grow as API access rolls out.
Quick summary of various capabilities:
In general, GPT-4 is more reliable, creative, and can handle much more nuanced instructions than GPT-3.5.
There’s an expanded ability for creativity, superior reasoning, problem-solving, broader general knowledge, association, and output quality.
When I asked GPT-4 what it could do, it gave me a “short list” with 15 examples:
GPT-4 passes many standardized exams (BAR, LSAT, AP, GRE, etc.)
This is important because of what it implies (reasoning abilities, probably close to displaying a level of human intelligence in some instances and under the right circumstances).
GPT-4 is multimodal. The API accepts images as inputs to generate captions and analyses.
Given a photograph, chart, or diagram, GPT-4 can provide a detailed, paragraph-long description of the image and answer questions about its contents.
In the demo, Greg Brockman (President and Co-founder of OpenAI) had GPT-4 create website code from a pencil mockup made in a physical notebook.
It can describe a screenshot of a Discord app:
It can summarize images of a paper and answer questions about figures:
Imagine taking a photo of your fridge and getting recipes and meal ideas:
Now imagine taking a photo of your closet and getting outfit recommendations.
Imagine taking a photo of your living room and getting interior design advice.
GPT-4 can even explain why an image is funny, though it doesn’t understand humor:
The implications of this can’t be stressed enough: Does it matter that it doesn’t “understand” humor if it can accurately describe what’s funny—and write jokes?
Either way, the multimodal visual API capability is exclusive to BeMyEyes for now.
The token input upgrade is one of the more impressive functions. You can include the text of full documents within a single prompt.
You can use 8x more context than ChatGPT (50 pages, 25k words of context means unlocking better AI-enabled coding by simply pasting docs, or better chat by pasting entire Wikipedia articles or comparing two articles.)
This also expands the capability to organize huge knowledge bases, instantly helping you “interact” with it. Imagine having your entire company’s knowledge in one place, accessible via chat.
A lawyer can put entire case history, documents, precedents, and more into a prompt and uncover legal arguments.
Your doctor or nutritionist can put everything about your health into a prompt and find new ideas for treatments or food.
GPT-4 examples in the wild
Of course, Twitter and the internet have been flooded with examples ever since, and it’s a never-ending stream.
Pong recreated in under 60 seconds:
Scanning a live Ethereum contract for vulnerabilities:
Your personal investment assistant:
Making art and animations:
New possibilities for dating apps as matchmaker:
Get recommendations for what to watch tonight:
Over the next few days, weeks, and months, this will become a hurricane (a “GPT-nado”?) of applications across every industry.
AI is coming in everything, everywhere, all at once. How this technology behaves now that it’s loose, is hard to predict.
So, where does this end?
For now, the only apparent limit to what’s possible is whatever you can imagine, at least when it comes to convenience.
This brings me to a question I keep asking but have no answer to, yet:
With all these convenience upgrades, when will we see something truly revolutionary? Don’t get me wrong, what we’re seeing is incredible. But revolutionary would be a cancer cure, flying to Mars, solving complex threats to our privacy, and so on.
When everyone can do anything, in whatever degree of quality, what would be revolutionary?
And when you remove the friction and struggle of creativity and thinking, can you still create masterpieces?
Time will tell. And time is speeding up.
Alpaca was the biggest news last week to me. It doesn't run on phones though, that's LLaMA-7B quantized to 4 bits. Alpaca is a fine-tuned version of the original LLaMA0-7B. It should however be able to perform inference with 16 GBs of system RAM on any hardware.