The chief technology officer of a robotics startup told me earlier this year, “We thought we’d have to do a lot of work to build ‘ChatGPT for robotics.’ Instead, it turns out that, in a lot of cases, ChatGPT is ChatGPT for robotics.”
Until recently, AI models were specialized tools. Using AI in a particular area, like robotics, meant spending time and money creating AI models specifically and only for that area. For example, Google’s AlphaFold, an AI model for predicting protein folding, was trained using protein structure data and is only useful for working with protein structures.
So this founder thought that to benefit from generative AI, the robotics company would need to create its own specialized generative AI models for robotics. Instead, the team discovered that for many cases, they could use off-the-shelf ChatGPT for controlling their robots without the AI having ever been specifically trained for it.
I’ve heard similar things from technologists working on everything from health insurance to semiconductor design. To create ChatGPT, a chatbot that lets humans use generative AI by simply having a conversation, OpenAI needed to change large language models (LLMs) like GPT3 to become more responsive to human interaction.
But perhaps inadvertently, these same changes let the successors to GPT3, like GPT3.5 and GPT4, be used as powerful, general-purpose information-processing tools—tools that aren’t dependent on the knowledge the AI model was originally trained on or the applications the model was trained for. This requires using the AI models in a completely different way—programming instead of chatting, new data instead of training. But it’s opening the way for AI to become general purpose rather than specialized, more of an “anything tool.”
How did we get here?
Fundamentals: Probability, gradient descent, and fine-tuning
Let’s take a moment to touch on how the LLMs that power generative AI work and how they’re trained.
LLMs like GPT4 are probabilistic; they take an input and predict the probability of words and phrases relating to that input. They then generate an output that is most likely to be appropriate given the input. It’s like a very sophisticated auto-complete: Take some text, and give me what comes next. Fundamentally, it means that generative AI doesn’t live in a context of “right and wrong” but rather “more and less likely.”
Being probabilistic has strengths and weaknesses. The weaknesses are well-known: Generative AI can be unpredictable and inexact, prone to not just producing bad output but producing it in ways you’d never expect. But it also means the AI can be unpredictably powerful and flexible in ways that traditional, rule-based systems can’t be. We just need to shape that randomness in a useful way.
Here’s an analogy. Before quantum mechanics, physicists thought the universe worked in predictable, deterministic ways. The randomness of the quantum world came as a shock at first, but we learned to embrace quantum weirdness and then use it practically. Quantum tunneling is fundamentally stochastic, but it can be guided so that particles jump in predictable patterns. This is what led to semiconductors and the chips powering the device you’re reading this article on. Don’t just accept that God plays dice with the universe—learn how to load the dice.
The same thing applies to AI. We train the neural networks that LLMs are made of using a technique called “gradient descent.” Gradient descent looks at the outputs a model is producing, compares that with training data, and then calculates a “direction” to adjust the neural network’s parameters so that the outputs become “more” correct—that is, to look more like the training data the AI is given. In the case of our magic auto-complete, a more correct answer means output text that is more likely to follow the input.
Probabilistic math is a great way for computers to deal with words; computing how likely some words are to follow other words is just counting, and “how many” is a lot easier for a computer to work with than “more right or more wrong.” Produce output, compare with the training data, and adjust. Rinse and repeat, making many small, incremental improvements, and eventually you’ll turn a neural network that spits out gibberish into something that produces coherent sentences. And this technique can also be adapted to pictures, DNA sequences, and more.