Podcast Episode 3: Anton Troynikov
Anton talks GPT-4, explainability, and AI safety
Episode 3 of the RETURN podcast features Anton Troynikov, a RETURN contributor and great Twitter follow whose new startup, Chroma, specializes in AI explainability and vector databases (more on those two terms, below).
Where to listen:
Introductions and background
GPT-4: launch, features, what we’re excited about
Capability overhangs, technical vs. regulatory moats
Timelines for AGI
Explainability and interpretability
Analogies to powered flight
AI and existential risks
RETURN is a reader-supported publication. To receive new posts and support our work, consider becoming a free or paid subscriber.
Episode 3 background
Readers of my newsletter atwill know Anton from a large piece I did on Chroma’s release of a product called Stable Attribution, where I pretty heavily criticized the software before making suggestions for how its implementation and messaging could be better aligned with the team’s stated goals for releasing it. But Anton is like me in that he likes a good debate, so we hit it off and have been internet friends ever since. I’ve heard Anton on a few podcasts, so was excited when he agreed to do this one.
There are some terms that come up in the course of our conversation, and we try to unpack them for listeners who may not be totally up on all the latest AI developments. But to give listeners a bit more of a reference, here are a few quick definitions:
Explainability: When used in the context of machine learning models, this term refers to the process of constructing a human-interpretable story about how the model got from a particular set of inputs to a particular set of outputs. To anthropomorphize a bit, think of explainability as an attempt to understand the model’s “thought process.”
Vector databases: Document relevance searches have been A Thing since the first document stores came online decades ago. This is where you give the database software a search query and ask it for documents related to that query, ranked in order of their relevance. How this is related to AI is impossible to explain in a short definition, but I wrote about it at length in this post in my own newsletter.
Alignment: This contested has a few different definitions, but in general it means “the model does things humans want it to do, and not alien/weird/random/uninterpretable things that surprise or alarm humans.”
AGI: This is an acronym for “artificial general intelligence.” It’s what most people mean by “AI,” but that latter term has been diluted and commercialized, hence the new term.
Note: The text below isn’t quite a “transcript” of the conversation in the strict sense. Rather, a raw transcript was automatically generated by Riverside.fm, then I dumped it straight into GPT-4 for cleanup and summarization. So what you see below is more like GPT-4’s condensed version of our conversation.
The text below is missing a number of great details and useful asides that GPT-4 didn’t think were important but I disagree. However, I did not go back and put that text back in because I want to get readers’ sense of how useful this whole exercise was.
If you listened to the podcast and then you read the audio, please consider giving us feedback on this via the poll below:
Jon: Anton, it's good to finally meet you in person. We got off to a very internet and Twitter start together.
Anton: I think it's important that people can discuss things like this. I didn't feel that we were especially combative, just very interested in this question.
Jon: You are the co-founder of Chroma, and your background is in robotics and machine learning. Can you give a little more bio of yourself?
Anton: As you mentioned, I have a background in robotics and machine learning. I was born in Soviet Ukraine and grew up in Australia. I lived in Germany for a while, both in Berlin and Munich, and went to grad school in Munich. I once had an ed tech startup, which I wouldn't do again unless it was for homeschooling purposes. I'm very interested in the intersection of technology and culture, and I'm concerned about people's reactions to technology. That's a brief overview of my background.
Jon: Let's talk about explainability, AI safety, and the wide middle ground in the AI safety discussion. I see you as someone in that middle ground on AI safety, and I personally am as well. But first, let's discuss the launch of GPT-4, which happened yesterday and could be history-making.
I listened to the Twitter space that you hosted, watched my feed, and saw various reactions. I also have a curated Twitter list of tech haters, who are just really mad about AI. They were having a sort of pity party, which was fascinating. Now, I'd like to hear your summary of the day and what stood out to you in the demo.
Anton: Great questions. In the last couple of weeks, LLaMa was released, and more importantly, Stanford created Alpaca, a fine-tuned version of LLaMa. It's even better because it's fine-tuned on the output of GPT and uses model distillation.
Eliezer explained that even if a model is only accessible through the API, people can still extract a significant portion of its capabilities by throwing data at it and observing its responses. This is an important development.
In the demo, we noticed some impressive features. Multi-modal capabilities are a huge deal. We've seen this in the robotics community with projects like SayCan and Robotics Transformer. Multi-modality, the ability to give other modalities like images, audio, and video a language backbone, is valuable.
Greg demonstrated this by sketching a website on his notebook and having the model generate mostly working code. This is quite impressive and has far-reaching consequences. We currently have a capabilities overhang, meaning we could have stopped training these models months ago and still be discovering new things they can do.
If you're tinkering with these models and wondering what they can do, you're at the forefront of research along with those writing scientific papers. The "step by step" paper is an example of this.
As for GPT-4 specific features, it seems to be better, though I haven't personally tested it. One interesting point is the absence of the Minerva benchmark from the technical report. I'm not sure why that is. They definitely showcased some impressive features.
Anton: One great feature of GPT-4 is the larger context window, which allows up to 32K inputs for the full version, which they'll release behind an API. The demo focused on what the model can do if given knowledge, like when it corrected its outdated understanding of the Discord API. This demonstration highlights the pluggable knowledge for large models.
Anton: The larger context window in GPT-4 is a big deal for us and our users. Now, there are some other interesting aspects to consider. The technical report doesn't provide details on architecture, scale, data, compute, parameter count, or architecture. There's speculation online, but I doubt its authenticity. The fact that they didn't reveal much is interesting.
It may not mean anything, as they emphasized that this was their most predictable training run, following the scaling laws they set out. But I wonder if they broke the scaling laws and needed fewer parameters than expected to achieve this capability. If so, that would be significant and something they'd want to protect, as smaller models require less compute and resources, making them more accessible.
The capabilities are interesting, especially considering LLaMa, where a smaller model can be distilled from a larger one, retaining many capabilities. If you're trying to sell capability, how long does that remain competitive?
Jon: I worry that they will not find a defensible capability moat and substitute a regulatory moat.
Anton: It's a concern for many people because access to compute is currently the centralizing factor in all of this.
Anton: Google has its own resources, and OpenAI relies on Azure. Stability is achieved through various compute partnerships or limited ownership of their own compute resources, but that remains a bottleneck. If you get cut off from compute, you can't do research, run a model, or perform inference for users. Large model builders who own compute resources may face concerns about being cut off from compute, which is why regulatory pressure in this direction is worth watching.
Jon: Yes, I'd like to discuss capabilities, overhang, explainability, and other aspects of GPT-4. Are there other features or chatter that you've heard people discussing?
Anton: The larger token window is exciting, as is the improved steerability and control. The model is more responsive, which is crucial. People haven't had much time to experiment with GPT-4 yet, but it's already becoming normalized. Some partners have had access to GPT-4 for months, integrating it into their products without publicizing it. The steerability aspect is interesting, but more experimentation is needed. The shift in conversation on Twitter from worrying about GPT-3's impact to discussing practical applications for GPT-4 in enterprise cases shows how quickly it's becoming normalized.
That's what's really incredible to me. So far, people haven't discovered anything particularly special about GPT-4, but I expect we will as more experimentation is conducted beyond OpenAI's benchmarks. One great aspect of the GPT-4 launch is their open-source evaluation workbench and the incentives for people to contribute. Chroma will definitely be contributing as well.
Jon: Can you explain the evaluation workbench for our listeners?
Anton: The evaluation workbench is designed to measure the performance of general-purpose language models like GPT-4. Since these models are meant to be generally intelligent, there isn't a single benchmark that can determine their effectiveness. They need to be tested across a wide variety of tasks and data setups. The evaluation workbench allows people to contribute to the assessment of these models by creating tests that are measurable and open-source. Currently, the incentive for contributing is early access to GPT-4. This approach encourages creativity as people try to trick the model with challenging tasks or explore how well it performs on new information. This type of evaluation is exciting and valuable.
Jon: So this is essentially a distributed open-source smoke test, allowing people to find corner cases without exposing the model's internals.
Anton: Yes, that's right. OpenAI doesn't want to reveal the internals of the model, but by offering a robust set of measures through a unified API, other communities may adopt these evaluations too. OpenAI's reputation for effectively presenting their work and results could help drive participation in this evaluation process. It's a subtle but powerful move, offering people incentives like early access for their contributions.
Jon: Has your timeline for AGI changed after seeing the demo?
Anton: My timeline for AGI hasn't changed. I've been following the progress closely, having played with GPT-3 and ChatGPT, and being aware of GPT-4 developments. I believe we're still in the same phase we were in before GPT-4's release. There was a phase transition with GPT-3, and we saw progress with ChatGPT, but we haven't yet reached a phase transition where I would consider it AGI.
Jon: So what are you waiting for? What are some specifics? Can you talk about that?
Anton: Yeah, I've been meaning to write about this, but my personal website is currently broken. And I've spent the last few weeks fundraising for my company. There are a few things that I think need to happen for me to consider something generally intelligent.
First of all, it needs to recognize what it knows and doesn't know. You have to ask it to tell you if it thinks it doesn't know something. And it's actually not clear whether it's good at predicting whether it knows something or not. This is an anthropomorphizing mistake that I see people make. It's like, "Oh, just ask the model what it knows." Well, that's also a model output. Why aren't you also asking it how it knows what it knows or doesn't know? And you can have this infinite regress.
But to go back to the actual point here, something that genuinely understands where it says, "Okay, I don't actually know this information. I need to go out and get it." And then once it's gotten that information, that information remains in between each invocation. Because right now, you can put something in GPT's context window, and then if you don't put it in the next call, it's not going to know anything.
Anton: That's how ChatGPT works, by the way. It stuffs its entire conversation history in its context window. But eventually, you run out of context window, even with 32,000 tokens. It's very limited from that perspective. So it needs to be able to carry knowledge across, it needs to know what it doesn't know, and it needs to be able to act autonomously, preferably to gain new knowledge itself. So it's like, "Oh, I don't know this, but based on the knowledge I already have, I know where I should try to go to figure this out."
It should also be able to update what it knows. It's like, "Oh, I thought it was this, but it's actually that." It should be able to update that fact and maintain a kind of memory or state. And then, I'm much more interested in these general capacities of things that I would consider intelligent as opposed to, "Oh, it can play chess." You can statistically model chess and play it pretty well.
What I'm looking for is an autonomous agent that can act on its own in the world and respond to the consequences of its actions. You have to be able to close that loop. Once we have that loop closed and these things are just out there running around, then I'll be more inclined to call it AGI.
Jon: So there's an introspection piece. This is interesting because I have an unpublished draft where I'm trying to work this out. As a writer on this topic, I do Zoom presentations or talk to business people. I have my own mental model of what smart people in the public know and don't know versus what the online Twitter, ML Twitter sphere knows. So, as an editor, I maintain mental models of other people. People are confused about the model speaking nonsense with confidence, as it doesn't have an internal map of its own knowledge history. There's probably a point in the latent space that corresponds to GPT-4, but it's not like it has an understanding of its known unknowns.
Jon: I don't think it has that kind of state. But then I'm looking at the technical report, and they have a graph in there that's like confidence of predictions versus how right the predictions were. So I'm wondering where this confidence metric comes from and how it's being measured?
Anton: The model is a posterior distribution over text, given the training data and the stuff in its context. That's what it does, that's what it's for.
That doesn't mean it's not doing more than that because the ultimate best way to get that posterior distribution is to have a full mental model, like how humans produce language. The distinction between a posterior prediction and having a mind calculating things is not really meaningfully distinct; they're just different ways of looking at the same problem. However, I think there hasn't been enough work around understanding the model's internal processes.
There are two ways of apprehending what the model's actually doing. One is anthropomorphizing, asking the model what it's doing, which is not how we do things in engineering. The other approach is mechanistic interpretability, with labs like Anthropic and Redwood Research working on it. They're making concrete predictions and testing them, discovering modular structures, neurons, and ideas of superposition.
The problem is that these approaches presuppose a human-understandable algorithm being applied. There's a need for research on the dynamics of these models in their latent space. Neural networks transform input into a vector of numbers, perform processing on that vector, and decode it back into the output domain. Relatively little work has been done on what's happening to those vectors of numbers.
We have plenty of mathematical machinery and ideas from physics and engineering to deal with these complex systems. The analogy I bring up is flight. It looks simple but is incredibly complex. We need more work on understanding what's happening during training and inference, like how the model moves around in its latent space. Empirically cataloging these behaviors can help us understand the model better.
It's early in this area, and more investment is needed. I believe it's a promising direction, and we need to dedicate more time to it.
Jon: It's important to consider that running training and inference to poke at these models is not free. Market dynamics and costs may lead to explainability being seen as a cost center, while the more impressive demo aspects are seen as profit centers. Restricting compute in the name of reigning in these models could also restrict explainability research. We must take care not to restrict explainability research while restricting other kinds of work.
Anton: You make great points. Commercial pressures might make pure explainability research less desirable initially. However, in an industrial context, it's crucial to know what your machine is doing, have predictability and control, and ensure it performs the intended function. Aligning commercial incentives behind interpretability and explainability work could be possible, as it will allow companies to achieve their goals. No one wants a fully autonomous system without control; they want a controllable system that performs a specific business function well and predictably. Understanding failure modes and predicting them is essential. We may eventually reach a point where further advancements require investing in explainability research. It remains to be seen if that will play out.
Jon: I want to discuss a comment you made about explainability, where it may not be possible to tell a human-interpretable story about why the model went from point A to point B. This relates to stable attribution. I listened to two of your podcasts on this topic, and I have a question that may be obvious but requires some background for the audience. Stable attribution aims to find the source images or predecessor images used to create an output. If you feed stable attribution a known output from stable diffusion, you get a list of images back that the algorithm thinks were used to make the output. Is that more or less correct?
Anton: I would phrase it differently. These are the images in the stable diffusion training set that most influenced this outcome.
Anton: I want to avoid the perspective that stable diffusion is doing a straightforward collaging thing. It's much more complex than that.
Jon: So, here's the question then. You mentioned an ideal version of the algorithm. We train a version of stable diffusion minus one image and then see if it looks the same. We run this test, and a human looks at the output and compares the two to get a sense of the influence.
Anton: Something like that, yes. The fully optimal one would train a version of stable diffusion on the power set of all of its training data, which is computationally infeasible.
Jon: At another point, you discussed how sometimes you put something into stable attribution, and the images returned don't seem to match. Your response is that the algorithm is using them for influence in ways that aren't necessarily human-interpretable.
Anton: That's correct.
Jon: On one hand, the ideal is a human comparing outputs to understand what's similar and what's not. On the other hand, stable diffusion locates something in latent space, and the connection between the output and the images in the training set isn't necessarily human-interpretable.
Anton: There are efforts to make directions in latent space explainable, but the ones we have for stable diffusion are freeform. There's nothing in the laws of the universe that say this is uniquely determined. Some part of this is stochastic, and we happen to land in a particular minimum, but that doesn't mean we can't land in an equivalent minimum elsewhere.
Jon: The goal of explainability is to tell a human-interpretable story about how the model got from the inputs to the outputs, but the logic of the models may be too alien for us to comprehend.
Anton: That's correct. There's an idea called microscope models, which are meant to make the complex, alien machine understandable to us. It raises questions about the limitations of human intellect and whether we can fully understand the complexity of the universe.
Jon: What if we already have a model like that within our central nervous system, and our cognitive brain is grafted onto a more powerful system?
Anton: That's an interesting thought. We often underestimate the complexity of simple tasks when teaching others, like programming or taking a drink of water. Building explanatory predictive models hasn't been as successful as brute-forcing statistics through a model with enough expressive capabilities. It's a bit concerning that maybe we can't fully comprehend these processes.
Jon: It feels like we're approaching the topic of Plato's Cave. Are we just passengers on this other thing, with our cognition only existing for our survival in some way?
Anton: It's an interesting idea. There's some neurobiology research suggesting we might be just along for the ride, but the reliability of that literature is uncertain.
Jon: Have you considered any policy solutions for improving explainability in AI models like GPT-3 and GPT-4? The gap between our understanding of how they work and their capabilities keeps widening.
Anton: Yes, I've read extensively on alignment and rationalist materials. You don't need a full understanding of a system to make it safe and controllable. For example, we don't fully understand turbulence, but we can still build and fly jet aircraft safely. It's not necessary to understand every detail to achieve sufficient interpretability and controllability.
Jon: Critics argue that we depend more on machines we don't fully understand, which could lead to catastrophic consequences.
Anton: We have created ways to deal with extremely complex systems without fully understanding them. I believe we need to understand AI better to improve its capabilities. With every technology we've invented, our understanding has been crucial to making them more efficient and powerful. Scaling laws and interpretability research are steps towards putting us in charge and keeping us there.
Jon: I like the language of "control surfaces" for AI models, which use a more technical and stochastic approach. I'm looking forward to your discussion with the rationalist AGI doomers. I find their ideas interesting, although I don't necessarily agree with them.
Anton: Some of my closest friends are rationalist AGI doomers. In their discussions, they often presuppose the existence of a superintelligence that's incomprehensible to humans. I haven't seen a convincing explanation for how we get to that point. Intelligence being recursively self-improving is a difficult concept for me, as it implies more complexity and less understanding of oneself.
Jon: There's often a mismatch in the discussions between doomers and non-doomers, with the non-doomer remaining unconvinced by various scenarios.
Anton: I agree. The doomers often assume catastrophic failure modes rather than nonsensical ones. Most failure modes for complex systems are like turning into a ball of mud, rather than having catastrophic consequences. They also tend to invoke ancillary technology to argue their points, but humans can already cause damage without AGI. As an engineer, I need to see more evidence to be convinced.
Jon: My high-level perspective is that AGI doomers propose the existence of a consciousness so complex that it's impossible for humans to reason about. Yet they still elaborate on what it's going to do, which seems contradictory.
Anton: Their argument is that even if there's a small chance of this happening, we should invest more resources in addressing the risk. However, I don't find this approach convincing. It's important to actually do the math and assess the risk, rather than rely on belief and probability.
Jon: My mental model for AI risk is an industrial accident. I think about how many people the algorithm touches and in what way. We should be cautious about implementing AI in high-risk environments until we understand it better.
Anton: I agree. A chatbot, for example, is probably safe because it produces text on a screen. However, I will steelman the doomer's side: they believe that a superintelligent AI could use a text channel to manipulate people, which is something to consider.
Anton: The steelman argument is that a superintelligent AI only needs a text channel to manipulate people. But there's a point where we're invoking magic if we assume it can lie and be undetectable. My main concern is how societies develop immunity to new media technologies in waves and with lag. For example, the printing press had a huge impact on society and even caused wars. If we think about generative AI as a media technology, we might not be as immune to chatbots as we are to internet posts. However, that's a human problem, not an AI alignment problem.
Jon: Credentialing is another issue, as AI like ChatGPT has disrupted student-submitted papers. We need to address problems in social infrastructure rather than just focus on AI risks.
Anton: People tend to trust new media more than they should. Conversely, now we often don't trust anything online. There's a danger in believing that an AI is superintelligent and following its advice blindly, as it allows power to be laundered through this mechanism. Ultimately, it's humans who built and control the AI.
Jon: There are even people who use the TikTok algorithm for divination about their life choices, which is a real phenomenon.
Anton: Did you mention Cat?
Jon: Yeah, Default Friend.
Anton: Yeah, I know her. She's written about this phenomenon of people relying on AI as an oracle.
Jon: If you can convince people that you have a magical all-knowing AI, it's like creating a mystery religion. We could have as many AI "oracles" as there are language models. It's worrying because we, as humans, have lower defenses against this.
Jon: I want to ask you about Chroma. What's next for your vector database?
Anton: It's funny how something like document relevant queries became important recently.
Jon: You have to surface only the relevant information to put into context. Can you talk a little bit more about Chroma, the product, or your current plans?
Anton: Yeah, absolutely. Chroma is focused on adding memory and state to context windows of LMs with relevant information. This helps use the limited context budget effectively and prevents hallucination. It's called in-context learning. It helps with keeping the model up-to-date since it was trained only until 2021.
For now, we're an embedding store focused on storage, query, retrieval, and running the embedding functions. We're building various features on top of this, such as assessing the relevance of query responses. This enables application developers to better interact with their knowledge base and turn it into a dynamic engine.
We're also working on topic discovery, automatic clustering, multi-modality, and data science tools to better visualize and understand data. Our goal is to make it super easy for application developers to use and play with models.
Anton: Yes, it exists. We're integrated with LangChain, GPT-index, and others. We have some exciting developments coming soon, as we've been fundraising and things went well.
We're also hiring for robust distributed systems, database architectures, community engagement, and applied ML. At this point, application developers and ML researchers are working on the same thing.
Anton: Having applied competency in-house is essential to us so we can answer questions about which embedding models to use or how to divide documents. Chroma intends to be the authority in this field, helping users build robust applications.
Jon: To put this in perspective for the audience, I wrote a book on computer architecture, and with GPT and Chroma, you can ask the book questions. If something goes over your head, you can ask GPT to explain the concepts in simpler terms. This is exciting because it allows authors to target different levels of abstraction and analogies for various readers. It's a new kind of writing and a new kind of book that readers can interact with and ask questions. It's going to be huge, and I'm very excited.
Anton: For science fiction fans, the young lady's primer from Diamond Age is now a real technical possibility that we can build in the next few years. It's a huge boon for actual education, helping people understand things faster, individually and collectively. These tools can make us smarter and faster at dealing with crises as a species.
Jon: I had a discussion with a lawyer about how AI could replace lower-level tasks like boilerplate contracts or real estate agreements. However, there's a pipeline problem – to be good enough as a human to do high-level tasks, you need experience. In journalism, you might start with press release rewrites and earnings call summaries, then graduate to more complex analysis.
Anton: Training with a chatbot that knows how to do these tasks can level up people a lot faster than grinding out repetitive work. This could solve the pipeline problem and make us smarter, faster. There's a lot to be hopeful and excited about in this space.