Novemeber 2023

What on earth is Q* (Q_Star) you might ask yourself.

OpenAI's Q* learning leak is having everyone going into a frenzy and is indeed a pretty significant deal no doubt about that.

However the hysteria surrounding it with lots of people claiming they've achieved AGI is completely unfounded.

To better understand what Q* actually is and remove some of the hysteria, lets step back a little. There are 3 main ways of training AI agents:

Supervised Learning: When you know what you want the model to learn. This is the most basic of modern approaches to AI/machine learning. Models are trained on labeled data about a specific subject; lets say cats. By showing it a picture of what is a cat is and isn't the model learn to recognize cats. This is great for labeling things but is only as good as the data it's been trained on.

Unsupervised Learning: When you don't know what you want the model to learn. This is used on unlabeled data and is used for finding patterns in data without knowing what those might be. This can be used for things like clustering, association, and dimensionality reduction.

Reinforcement Learning: This is is the type of learning where you are not looking for the models ability to recognize patterns but to make decisions. You do so by giving it a reward/penalty system and through that it learns to optimize for what it gets rewarded for. This is how ChatGPT is trained btw and why its quite different than the GPT-4 model.

Q* Learning is basically the last of the 3 but on steroids.

To illustrate how it's different lets imagine a maze.

With classic reinforcement learning the AI Agent receives a reward when it reaches the end of the maze and possibly negative feedback for hitting walls. It learns through trial and error which means every time the maze changes the AI Agent have to learn the new maze through trial and error again.

With Q* based reinforcement learning on the other hand the AI agent not only learns from rewards but also estimates the best possible rewards it can achieve from each point in the maze, considering future steps (Q*). Or in other words Q* guides the agent to make decisions that gives it long term rewards not just intermediate ones. Or to simplify it even more Q* learning teaches trains the model something general about mazes not just the specific maze. Which means that every time the maze changes the Agent won't have to relearn everything through trial and error again. It has a general concept of mazes.

And so Q* leads to a more strategic approach to learning, enabling the AI to quickly identify efficient routes even in new or altered maze configurations.

Just to name a few industries that greatly benefit from this: Robotics Autonomous Vehicles Healthcare Finance Supply Chain and Logistics Energy Management Education and Training Simulations

So of course to OpenAI this is a big deal, but it's NOT AGI even though if you squint you can see how some could confuse it for that.