The Future Belongs To Four-Year-Olds
AI is about to make top-tier forecasts available to all. What comes next?
Last October, I gave a talk on AI and forecasting with the intention of writing it up here. But intentions, like forecasts, often fail. I never got around to it.
This week, The Atlantic published a good article covering much of the background on AI and forecasting, so in this piece I’ll summarize that and go from there.
What’s happening is fascinating. But the implications? I predict these strange days are about to get a lot stranger.
From The Atlantic:
For years, some elite forecasters have been competing against one another in tournaments where they answer questions about events that will happen—or not—in the coming months or years. The questions span diverse subject matter because they’re meant to measure general forecasting ability, not narrow expertise. Players may be asked whether a coup will occur in an unstable country, or to project the future deforestation rate in some part of the Amazon. They may be asked how many songs from a forthcoming Taylor Swift album will top the streaming charts. The forecaster who makes the most accurate predictions, as early as possible, can earn a cash prize and, perhaps more important, the esteem of the world’s most talented seers.
The author does not note that these tournaments were pioneered by the eminent psychologist Philip Tetlock in two programs of research that spanned the 1980s and 1990s, for the first program, and the Bush and Obama years for the second. To popularize Phil’s work, he and I published Superforecasting in 2015. I think it’s accurate, and not immodest, to say Superforecasting was the spark — or at least a spark — that created the explosion in competitive forecasting over the past ten years. It really should be mandatory to mention Phil in any article of this sort.
These tournaments have become much more popular during the recent boom of prediction markets such as Polymarket and Kalshi, where hundreds of thousands of people around the world now trade billions of dollars a month on similar sorts of forecasting questions. And now AIs are playing in them, too. At first, the bots didn’t fare too well: At the end of 2024, no AI had even managed to place 100th in one of the major competitions. But they have since vaulted up the leaderboards. AIs have already proved that they can make superhuman predictions within the bounded context of a board game, but they may soon be better than us at divining the future of our entire messy, contingent world.
That’s the key here. AI is tested relentlessly against all sorts of yardsticks, but these are mostly artificial benchmarks where the staggering volumes of information they’re trained on may tilt the scales. If AI beats a human at Jeopardy, so what? But forecasting tournaments involve questions whose answers are not yet known by anyone, anywhere. And the questions are about human affairs. Sometimes they are trivial. But more often, they are questions that matter, questions that people really care about. They are complicated. They are messy. They are damned hard to get right.
So if AI can do that as well as the best humans? It’s time to sit up and pay attention.
Three times a year, the forecasting platform Metaculus hosts a tournament that is known to have especially difficult questions. It generally attracts the more serious forecasters, Ben Shindel, a materials scientist who ranked third among participants in a recent competition, told me. Last year, at its Summer Cup, a London-based start-up called Mantic entered an AI prediction engine. Like other participants, the Mantic AI had to answer 60 questions by assigning probabilities to certain outcomes. The AI had to guess how the battle lines in Ukraine would shift. It had to pick the winner of the Tour de France and estimate Superman’s global box-office gross during its opening weekend. It had to say whether China would ban the export of a rare earth element, and predict whether a major hurricane would strike the Atlantic coast before September. It had to figure out whether Elon Musk and Donald Trump would disparage each other, in public, within a certain range of dates.
A few months later, the guesses from Mantic’s prediction engine and the other tournament participants were scored against the real-life outcomes and one another. The AI placed eighth out of more than 500 entrants, a new record for a bot. “It was an unexpected breakthrough” according to Toby Shevlane, Mantic’s CEO. Shevlane told me that he left a cushy gig as a research scientist at Google DeepMind to co-found the company. He wanted to celebrate the AI’s triumph, but he worried that it had been the product of some lucky guesses. He and his team entered a new version of it into the Metaculus Fall Cup. That bot did even better. Not only did it finish fourth, another record, it beat a weighted average of all human-forecaster predictions. It proved itself wiser than the wisdom of a pretty wise crowd.
If the hair on the back of your neck isn’t standing up, read that again.
This is not a computer coming close to beating the world champion of chess or go. This is a computer becoming one of the best in the world at predicting the human affairs.
This is big.
And I think it’s even bigger than The Atlantic article makes it out to be. While that article focuses on a purpose-built AI model called Mantic, researchers have been putting other AI models — including the off-the-shelf varieties that helps draft your resume or suggests recipes — to the same sort of test.
Way back in July, 2023 — half a century ago in AI terms — researchers with a consortium known as the Forecasting Research Institute (Phil Tetlock is chief scientist) found that AI was not as good as ordinary humans, but the gap was small. The researchers also used another benchmark, “superforecasters,” who are people with substantial track records of forecasting accuracy that puts them in the top tier in the world (and the source of the title for our book). Superforecasters are to forecasters what Connor McDavid and the Canadian Olympic hockey team are to hockey players. And in July, 2023, there was a wide gap in performance between the average human and superforecasters and an even wider gap between AI and superforecasters.
That hasn’t lasted. Following is a recently updated chart mapping AI’s progress. (See here for a full discussion.)
Extrapolate that line forward and Ai will equal superforecasters around December, 2026.
That’s this December.
When I first saw that chart, I got a little dizzy.
Forecasting is not something done only in forecasting departments. Nor is it a quirky competition for nerds who suck at sports. Forecasting is fundamental to human life. Some psychologists have even dubbed humanity “homo prospectus” on the grounds that looking ahead and planning is so central to our existence.
Even at the prosaic level of dollars and cents, forecasting is staggeringly important. Someone whose forecasts are consistently a few percent better than the vast majority can rake in billions on stock markets. Or millions on prediction markets. In much of the business world, forecasting is as central to what people do and how they do it as laptops and email.
Imagine the value of having hundreds of provably elite forecasters at your beck and call, 24 hours a day, seven days a week. It seems that every person with an Internet connection will soon have exactly that.
By the way, I asked Gemini (Google’s AI) when AI would get better at forecasting than even the best humans. Its response: “I predict that AI will consistently outperform the world’s best human forecasters by June 2027.”
Are you dizzy? I’m dizzy.
Now, I don’t want to get carried away. “We expect AI will excel in certain categories of questions, like monthly inflation rates,” Warren Hatch told The Guardian last year. Hatch is the chief executive of Good Judgment Inc., Phil’s forecasting company. “For categories with sparse data that require more judgment, humans retain the edge. The main point for us is that the answer isn’t human or AI, but instead human and AI to get the best forecast possible as quickly as possible.”
I suspect that’s right. But I may be biased: Way back in 2015 — a century and a half ago in AI terms — Phil and I concluded Superforecasting with pretty much exactly that expectation.
But still. Even if the Connor McDavids of forecasting are not at risk of unemployment, the potential for change is enormous. Just try to wrap your head around a world in which every executive, investor, politician, entrepreneur, advisor, journalist, commentator — everyone on the whole damned planet — can quickly get top-tier forecasts on almost any question imaginable at close to zero cost.
I asked Gemini what it made of the preceding paragraph. It responded: “That statement hits on a fascinating (and slightly terrifying) paradox of the AI age: the democratization of elite intelligence. Comparing top-tier forecasters to Connor McDavid—the NHL’s gold standard—is a great way to frame it. Even if the ‘superstars’ remain relevant for their intuition and edge-case handling, the ‘floor’ of global competence is about to be raised to an unprecedented height.”
Not bad. (But stop sucking up, Gemini.) The AI then went on to provide a lengthy analysis of the ramifications of this shift — ie it made predictions — which I will not share. Because they were good. And one touched on the main point I want to get to below.
Fuck. Wow. In equal measure.
I’m about to start hyperventilating. So let’s step back and note three crucial caveats.
First, superforecasters are not omniscient. As they would be the first to tell you, their proven ability to reliably forecast with better-than-a-flipped-coin accuracy extends perhaps a year to two into the future. Reality just won’t permit more than that: The further into the future you attempt to peer, the greater the complexity, the harder it gets to see anything accurately, until you get to the “forecasting horizon” (the distance to which varies from subject to subject.) Look beyond that and you may as well flip that coin. With superior compute, AI may be able to push that horizon out somewhat, but reality is always going to win this fight; our desire to forecast the future will forever dwarf our ability to forecast. (Yes, that’s a forecast. If I’m wrong, look me up at the end of time and gloat.)
Second, the extent to which superforecasters can anticipate black swan events — low probability, high impact — is not known, for the simple reason that such events happen far too rarely to generate the numbers needed to determine it. Similarly, I see no reason why AI can save us from black swans, at least not simply by forecasting such events.
Third, there is the big problem of reflexivity.
Superforecasters make forecasts about events they have no control over, so they act as detached observers whose forecasts have no practical effect in the world. But that’s often not true of people and forecasts. If you are the only person in the world who expects the stock market to crash on Friday, you can sell everything on Thursday, buy after the crash, and clean up when your forecast proves accurate. But if anybody who asks AI can get the same forecast, what happens? They are all going to try to do the same. Which will cause the market to crash not on Friday but Thursday or Wednesday. Ah! But you are clever! You recognize this problem, so you sell on Tuesday. Whoops! There are millions of clever people in the world and they all spot the problem and do the same, so the market will crash on Monday. And so on.
Reflexivity is common, so a world in which everyone has access to superb forecasting may not be as predictable as we might assume. Instead, the problem will shift from forecasting per se to wrangling reflexivity, which gets gnarly fast.
Those are three big asterisks.
But still. Everyone having an elite-tier forecasting department on call 24/7 is transformative stuff. You have questions about the future? You get answers. Boom. Just like that. Pretty amazing. And not only you but also the several billion other people connected to the Internet.
I think at least one implication of this transformation is clear. And it’s right there in the the key word of the preceding paragraph.
“Questions.”
This will be a world in which answers are abundant for all. When supply rises, cost falls. When supply is effectively limitless, cost is effectively zero. Answers are going to be cheap, cheap, cheap.
But there are no answers without questions, and AI only answers the questions you pose. Thus, the supply of questions won’t change even as the cost of answers plunges.
So what will be valuable in future? Questions.
When we published Superforecasting, I spent years on the road giving talks and discussing the book and one thing that stood out for me was what didn’t come up in the Q and A: People never asked where the questions came from. That was just a given. The ability to answer the questions was what everyone wanted to talk about.
And that’s a problem. Because questions have always had a value independent of the answers they generate.
Imagine it’s September 15th, 2001. You ask the question, “will there be major terrorist attacks in the United States over the next six months?” The answer is all that matters. The question is a given. After all, the 9/11 attack occurred mere days earlier and everyone is asking that same question. It takes not the slightest insight to ask it.
But now imagine it’s September 1st, 2001. You ask the same question. Here, the answer doesn’t particularly matter. Barring inside information, any answer will be some very low probability that will tell you little. But you deserve a round of applause simply for asking the question because prior to 9/11, terrorism was on the radar of very few people. If you asked that question, on that day, you were at least aware of the possibility and its importance, so by asking the question you demonstrated a clearer grasp of reality than most. And if you asked that question of others, you alerted them, too.
In Superforecasting, we only briefly mentioned that answering questions and posing questions were different skills, and we didn’t delve into the latter. If AI is indeed taking us to a future in which answers are superabundant, we really have to start appreciating the value of questions. And figure out how to get better at asking them.
I know a good place to start.
Long-time readers of PastPresentFuture will know that I’m a fan of an “applied history” book called Thinking In Time. Published in the 1980s, the authors were Richard Neustadt and Ernest May, a political scientist and historian, respectively, at Harvard. Thinking In Time explores how executives should use the human capacity to switch between present, past, and future to enrich their thinking and make better decisions. They called this “oscillation.” Good decision-makers, they argued, do it constantly. (It also inspired the name of PastPresentFuture because temporal jumping is my jam.)
Drawing on historical records as well as Neustadt’s personal experience as an advisor to multiple presidents, Neustadt and May argued there is a classic mistake executives make when confronted with some new problem. “What do we do?” they ask. The conversation proceeds from there.
Never do that, Neustadt and May advised. Instead, always begin by asking “what’s the situation?”
That question shifts your mindset from action to exploration. When you do that, you’ll start asking questions about the situation now but those answers will, in turn, raise questions about the past: “If X is true now, how did X come to be? When did it happen? Why?”
See what’s happening already? Questions, questions, and more questions. That’s how we explore.
Neustadt and May have excellent illustrations of how this approach produces better decisions (although the stories in which people went straight to “what do we do?” and screwed up as a result are more entertaining.) But I came across a story which, I think, tops them all.
It involves Paul Van Riper, a retired Marine Corps general who is something of a legend in the Corps. In the late 1960s, Van Riper was a decorated captain commanding a company of Marines in Vietnam when he was ordered to solve a seemingly impossible situation in which every conceivable solution had already been tried multiple times and failed. So Van Riper called a meeting. And he set unusual ground rules for the meeting: Everyone was forbidden from talking about what had been done to solve the problem, or what they could do to solve the problem. Instead, Van Riper started from square one by asking a series of completely naive questions — as if he knew nothing whatsoever.
I published the whole story a few years ago. I urge you to read it. How Van Riper turned naive questions into a completely novel solution that worked brilliantly still blows me away.
What connects Neustadt and May with Paul Van Riper is the deliberate choice to assume ignorance. What is it that you don’t you know? You don’t know what you don’t know! So you must assume your ignorance runs broad and deep. If you start there, it’s obvious that what you must do is shut up about what you think. Set that aside. Instead, ask naive questions.
Nothing about that is natural. In fact, as the late, great Daniel Kahneman wrote, the default assumption of the human brain is “WYSIATI” — what you see is all there is. So we naturally feel that what we already know is all there is to know. That’s why executives immediately go to “what do we do?” If you already know all there is to know, all that’s left to decide is what to do about it.
Assuming there’s much more to learn, and asking naive questions, is the foundation of “design thinking,” an approach widely used in industry and the military. Crucially, designing comes before planning. It’s about discovery, not action. Done well, the plan for action emerges from that. “The solution emerges from the discourse,” Paul Van Riper told me. “When you begin to see the logic of the problem, the counter-logic emerges.”
Years ago, I researched a lovely illustration involving the Dutch medical technology company Philips.
Philips makes MRIs. The key to a good MRI image is clarity. An image that is blurry can’t be used for diagnosis. It’s a waste of staff time and an expensive resource. Philips wanted to get rid of blurry images.
This sounds like a problem for the engineers and scientists who design and build MRIs. Tell them to improve the machine. That’s obvious.
Philips worked on that. But it also did something quite different.
Philips had its own team of highly trained anthropologists examine the problem the way any good anthropologist would: They studied the whole MRI process with a focus on people — the patients, nurses, doctors, technicians and others — and the places those people lived and worked in. All along the way, they set aside their own assumptions and beliefs and asked naive questions.
To put a label on it, they were “radically curious.”
Much of what the anthropologists learned was not terribly surprising. Patients who get MRIs have a known or suspected medical problem. This is upsetting. They are often on edge.
Then they come to the hospital. For most patients, the hospital is a strange, busy, alien environment. Their anxiety spikes. They undress and get in the strange tube of the MRI. The machine makes horrendous noises. It can be frightening.
And what do people do when they are afraid and uncertain? Their muscles contract. They get tense.
Being tense makes it hard for them to remain perfectly still for the extended periods required for a clear image. So the MRI produces a blurry image.
See where this is going? Philips realized that more than the machine mattered to the outcome. Which gave the company a whole new way they could approach the problem.
Philips started to experiment with ambient lighting, sounds, and images on the walls and they discovered these can be highly effective at calming patients. Calm patients relax their muscles. They stay still better. And the images come out sharper.
In short order, this approach went from crazy new idea to a common feature in hospitals the world over. And it never would have happened if the researchers hadn’t started by assuming ignorance and approaching the problem with radical curiosity.
There’s a whole, elaborate literature on how to ask probing, fruitful questions, but I think the core of it is right there.
If you want something even simpler, consider the “five why’s,” the legendary policy of Toyota used to ensure root causes of problems were identified and fixed. The name is the technique.
“Why did the machine stop?”
The fuse blew due to an overload.
“Why was there an overload?”
The bearing wasn’t sufficiently lubricated.
And so on. One question after another, going deeper and deeper. Until you get to the root cause.
If you’ve ever spent time with a four year old, you probably recognize the pattern.
“Why?”
Answer.
“Why"?”
Answer.
“Why?”
It doesn’t take long before the four-year-old with his naive little questions is forcing you to wrestle with difficult problems of science, ethics, and the meaning of life. And forcing you to realize how limited your own knowledge is. It’s damned annoying.
Four-year-olds do that because they are radically curious. They assume there are whole worlds they know nothing about. And it never occurs to them that asking naive questions might make them look ignorant or dumb, and looking ignorant or dumb is something they should avoid.
So here’s my prediction for a brave new world in which AI delivers answers to all who ask questions: That world will belong to adults who can think like four-year-olds.





Eh, I'm quite skeptical of the conclusions here. Not that AI's can get to be as good or better as top human superforecasters - that I'll take as given. More like, that this is actually all that helpful.
I think the flaw is right here: "Everyone having an elite-tier forecasting department on call 24/7 is transformative stuff."
No. The bottleneck is almost always not the marginal accuracy of forecasts. That is, people don't make anywhere near optimal use of what they have now, so it doesn't help to make that particular factor somewhat better.
Consider Covid. It's not a secret that there was a probability of a pandemic over the years. Some people repeated warned of such an event, and had governments make some preparations. They were mocked and attacked as wasting money, since low-probability events *usually* don't happen. And when they were proven right, nobody cared. A more accurate points probability will not change this at all.
Basically, having highly refined probabilities usually just doesn't matter.
Real example: What's the probability that the US goes into civil war under Trump? Now tell me what I do differently based on whether it's 2.78% or 3.14%
When computers first came out at the insurance statistics company my dad had worked for as a statistician/actuary for a decade, he was not phased nor jumping on the messianic or doom and gloom band wagons like his colleagues.
Dad said, "Keep your pencils sharp, people."
Back then, the new "business machines" called computers, took up the volume in a space the size of my current home's ground floor, made loud noises, and required a small army of data people, key-punch operators, even air conditioning and refrigeration techs, to name but a few.
My dear old dad took it in stride. He was not in awe of it all like his younger newbie co-workers. Nor was he lamenting the inevitable employment changes, job losses, re-structuring. Dad had been to war and survived. This was nothing.
He'd tell the young whippersnappers that the computer was only as good as the humans who made it, managed it, programmed and coded it, and asked the important questions for getting the sought after information, and then used it. In other words, keep your pencils sharp, people. ✏️ ❤️🇨🇦