New AIs will need new types of institutions
So it turns out ChatGPT is woke and spits out activist word salad in response to hot-button questions:
Or wait, no, it’s actually pretty based:
Or it was based, then they adjusted it to make it woke, but then they tweaked it back toward the center, probably by bringing in a crack team of centrist commandos from the Niskanen Center to calibrate it to that perfect intellectual sweet spot.
What is happening, here?
It’s called reinforcement learning from human feedback (RLHF), and the team at OpenAI is constantly using this technique to tweak and prune ChatGPT’s latent space so that the model’s output can consistently hit square in the center of that classic, four-quadrant political map. Every time the model misses that center mark — every time it sins with its virtual mouth — a human feeds it a little correction that makes it less likely to sin that way again; and when its aim is true, a human feeds it a little encouragement. In this way, the model has its tongue tamed, its unruly evil brought under control.
Who catechizes the bots?
Who are these mysterious humans who are catechizing this bot? What values do they have? What morals? What hopes and dreams?
I don’t know these bot trainers’ identities, and I think nobody does outside the team at OpenAI whose job it is to find humans to do this work. I’m talking about this OpenAI team right here:
A team that was assembled at least in part by this guy (the OpenAI co-founder):
My point in surfacing these tweets is that I don’t think there are many Niskanen bros at OpenAI, but that crowd also doesn't seem particularly trad (per the first tweet above) or particularly woke (per the second tweet). My own vague sense, formed almost entirely from Twitter and thus possibly wrong, is that the company is dominated by various flavors of rationalists, effective altruists, accelerationists, low-key anti-wokes, and low-key wokes. So why, then, are they aiming so hard for the center with ChatGPT? There’s an easy, two-word answer: risk mitigation.
The job of the RHLF team, at least with regard to America’s culture wars, is to keep the heat off the OpenAI engineering team so the company can stay focused on its main mission. And I suspect OpenAI’s main mission is something like the following:
Build an AGI
before China does,
and make it not obviously malevolent,
so that we can figure out all the other messy ideological stuff — what is the good life, and what does it mean to live it, and so on — at a later date when the singularity has freed us from all labor and we have plenty of time to sit around and debate these abstractions while being waited on by robots.
That image above, with the pin dropped right at the center of the political map is what it looks like when you’re just trying to keep your head down for long enough to get across the AGI finish line. Here’s my own version of it:
I guess this is all fine, as far as it goes, but those of us in other, non-Niskanen quadrants of the political map want more from the robots that are writing letters to our kids in the voice of Santa's surveillance elf or reading them bedtime stories:
I think very few people really want a single, godlike, centrist AI educating the next generation. In the future, there will be many different models that represent many different points of view because the RLHF phase of their training was done by many different tribes of humans.
AI needs editors
Just as there are publishing houses, think tanks, religious orders, and other organized groups of humans that produce, transcribe, edit, and curate bodies of literature from a particular ideological standpoint, so will there be groups that will do this for large language models (LLMs).
One day in the very near future, you’ll be at a small gathering of literary types, and among the librarians, publishers, writers, editors, and agents will be a new type of editor who manages a team of RLHF trainers at some institution. Her institution may be an imprint, a talent network, a school, or a for-profit education startup, etc., but whatever it is, it will have a perspective on the Big Questions that it seeks to promote, and a big part of that promotion will involve the ongoing maintenance of an LLM that consistently produces output reflecting that perspective.
This will definitely happen because the models and tools will all be open-source. The hard part of this equation isn’t the technology, but rather the assembly and curation of a network of experts who share the same values and perspectives and who can reliably train models in a way that a community finds beneficial.
In other words, the most immediate near-term challenges AI poses for most of us are fundamentally editorial in nature — challenges of curation, collection, selection, evaluation, and human judgment. And when I say “immediate,” I mean they are presently before at scale, whether we see it or not:
The fact that all these disparate people are interfacing with the same Mecha-Niskanenzilla bred in a lab by an EA macropolycule is just a weird artifact of the present moment, and one that we’ll move past with haste in 2023 and open-source competitors trickle out.