The Three Forms of Scientific Intelligence: A Conversation with DeepMind's Pushmeet Kohli

How to pick the problems worth solving, biology's scaling laws, and what AI-driven science actually looks like in practice.

May 01, 2026

"We had the pleasure of speaking with Pushmeet Kohli, VP of Science and Strategic Initiatives at Google DeepMind, where he leads the unit behind some of the most consequential AI-for-science work of the past decade: AlphaFold, AlphaGenome, AlphaProof, AlphaEvolve, and AI Co-Scientist, alongside work spanning fusion, materials science, weather, and quantum computing. Pushmeet is leading one of the most ambitious AI for Science teams in the world, tackling important problems. We were keen to understand how he and his team select those problems, and how he sees science advancing in the near future."

- Pablo Lubroth

Welcome to Decoding Science: every other week our writing collective highlight notable news—from the latest scientific papers to the latest funding rounds in AI for Science —and everything in between. All in one place.

I had the pleasure of speaking with Pushmeet Kohli, VP of Science and Strategic Initiatives at Google DeepMind, where he leads the unit behind some of the most consequential AI-for-science work of the past decade: AlphaFold, AlphaGenome, AlphaProof, AlphaEvolve, and AI Co-Scientist, alongside work spanning fusion, materials science, weather, and quantum computing. Before DeepMind, he spent over a decade at Microsoft Research working on computer vision and machine learning.

DeepMind pursues problems the field agrees are transformative and inevitable but believes are decades away; betting that they can be solved in a third of the time. Pushmeet is leading one of the most ambitious AI for Science teams in the world, tackling important problems. I was keen to understand how he and his team select those problems, and how he sees science advancing in the near future.

In this interview we discuss:

The four-condition framework DeepMind uses to select problems, and why the unit only takes on roughly two new ones per year.
Why the distinction between “predictive” and “visionary” AI may be a category error.
The bottlenecks on virtual cells (and virtual organisms) and scaling laws in biology.
The three forms of intelligence and why fixating on the AlphaFold-style cases underrates how the other two will reshape research.

This was a live interview and the transcript has been edited for readability.

Pablo Lubroth: Tell us about your role at DeepMind and your background.

Pushmeet Kohli: I lead the Science and Strategic Initiatives unit at DeepMind. The whole purpose of the unit is to go after problems that can have transformative impact. There is a very simple algorithm we follow to isolate problems the unit works on:

The first step is to find problems where there is complete consensus in the community that, if solved, will be transformative; it will completely change the way something is done.
The second requirement is that there should also be consensus that this problem is inevitable; we don’t want to work on infeasible things.
The third requirement is that there should be almost a consensus that, even though it’s inevitable and transformative, it’s not going to be solved in the next five to ten years.
The fourth condition is that we believe the community consensus is correct on the first two points and incorrect on the third: that we can solve the problem in half or one third of the time the community believes will be needed.

If those four conditions are met, then we take on those problems. We started with key problems in biology, materials science, and quantum chemistry, and over the last eight to nine years really expanded across fusion, mathematics, algorithmic discovery, quantum computing, weather, geospatial understanding, and many other projects.

You span mathematics, machine learning, computer vision, drug discovery, quantum computing. Most scientists go deep in one area. How do you actually develop enough domain competence to make original contributions across fields, and what’s your strategy?

When you’re looking at any problem, you really need to have an understanding of why this problem is interesting, and how you measure progress on the problem. There’s the what and the how of the problem. The what cannot be short-circuited. You really need to understand the problem deeply, including what metrics will quantify progress. So we don’t start these projects ten at a time. We only do it one at a time. Over the last eight, nine years, we’ve only done roughly two each year, because there is a lot of work needed to define the problem crisply.

On the how, we approach the whole thing in a multidisciplinary fashion, getting the best team that tackles the problem and thinks about various plans of attack. We hire some of the best people in the area who know what techniques have been used until now, but also how those techniques can be combined with a machine learning-based approach and what data would be needed. My experience in machine learning is useful in that context, complemented by team members coming from the respective disciplines who have a really good understanding of what has been tried earlier.

Some writers distinguish between predictive AI, which gets better and better at answering questions within existing paradigms, and “visionary” AI: systems capable of generating entirely new conceptual vocabularies the way Einstein replaced the luminiferous ether with special relativity. Do you think that distinction is real? And if it is, does any of the work coming out of DeepMind today actually cross that line, or are we still firmly in the business of building better maps of known territory?

I don’t see that distinction so explicitly, because once you think about a very complicated solution, there are always ways to compress them and come up with a new concept. Concepts can be used to compress solutions. Even the theory of relativity, you could say, is based on a new series of operations. We gave a specific series of operations names (this particular equation became really famous), but we could have just said it’s multiplications and additions. If there are particular solutions that reappear in the solutions of many different problems, then you can compress them and call them different concepts.

We are making new discoveries in the sense that, if you look at what things like AlphaEvolve does, where it discovers new kinds of algorithms, those are very difficult to figure out currently. Maybe there are ways to compress those solutions and conceptualize them, and that requires more work on interpretability. But I don’t see the distinction being that we are searching in a narrower space of solutions, and that there are certain types of problems we will never solve by searching in that narrow space.

In a 2024 paper by DeepMind authors called “Open-Endedness Is Essential for Artificial Superhuman Intelligence“ it is mentioned that “no foundation model, regardless of scale, can be truly open-ended, because a system trained on a fixed dataset will always eventually exhaust its epistemic novelty for any observer.” FunSearch is cited as one of the more promising steps toward open-ended AI, using evolutionary search over programs to discover new mathematics. AlphaEvolve has now taken its place. Are systems like these actually open-ended in the sense the paper requires, or are they constrained by the closed datasets and proxy evaluators (a correctness checker, a fitness function) they rely on?

At some point you have to ground scientific progress. How do we quantify the advancement of science? You can say, “now I have a better understanding of what I have observed about the world.” So it’s about collecting data about the world, whether through experiments or through a simulator with certain behaviors you want to explain. If you can come up with theories or solutions that are more effective at explaining those measurements, then you have a better model.

There is open-endedness in the sense that the only closed thing is the set of measurements you are working with. But if you let your agent interact with the world and take more measurements, then it is not closed. It is actually open-ended; it can try to explain more things. In doing so, it can leverage the fact that it can go beyond its known search space. In the case of AlphaProof, it can not only try existing proofs, it can go beyond existing rules to different proof concepts and different tools to solve the problems it is looking at.

It’s similar to what AlphaGo did with move 37, which was almost never seen in human gameplay. The fact that the policy function of AlphaGo, when coupled with Monte Carlo Tree Search, was able to hone in on that solution shows that you can discover new knowledge. That’s what AlphaEvolve and AlphaProof have demonstrated time and again, with new algorithms for matrix multiplication, new Ramsey numbers, and many more discoveries that did not exist in the literature.

Is the end state of AI-driven science a world where humans are primarily curators and validators, or hermeneuticists, or do you think there’s a permanent role for human intuition in hypothesis generation that won’t be automated away?

There are two parts to the activity of science: defining what the problem is, and how you solve that problem. The what, which is how you formalize the problem, how you crisply say “this is what I’m really interested in,” is a very human thing. You want to define it in terms of “this is what I, as a human, don’t understand.” I, as a human scientist, do not understand the structure of this protein. I, as a human, don’t know what is going to happen if I use these transcription factors on this cell. I, as a human, don’t know the worst-case complexity of this computing problem.

Eventually, AI will take on much of the manual labour of the “how”. You will have systems that are much faster, with much broader knowledge because of their ability to search vast literature and reason about many different solution search spaces, far more than a single human mind or even a whole community can look at.

Eventually you could ask the AI, “what is the best problem humanity should care about?” But for an AI to answer that, it will need to have a model not just of a single human but of society itself, and that is not possible at the moment. So for the foreseeable future, human scientists will be there to define what is the problem, to formalize the key problems that AI should help us solve.

Looking at the biology portfolio (AlphaFold, AlphaGenome): if the destination is virtual cell to virtual organism to virtual human, what’s the constraint?

Biology is a great domain for AI because of its inherent complexity. It’s one of the most sophisticated information-processing systems we know of. The number of processes happening in every millisecond within the human body, just to quantify them, is fascinating. AI is the perfect tool for resolving the complexity of biology.

The real challenge is to give the model the right experience. In some places, like protein structure prediction and even protein function, we have collected a lot of data as a community, like the Protein Data Bank. We also had evaluation mechanisms like the Critical Assessment of Structure Prediction (CASP). A lot of progress was based on the fact that these resources existed and enabled the training of systems like AlphaFold.

In genomics, we have collected a lot of data as a community, which is why AlphaGenome has been very successful in solving certain problems. But if you go up the complexity hierarchy and think about cells or whole organisms, the same kind of data does not exist in terms of quality or scale. There are also differences. A protein measured through cryo-EM in Asia versus Europe versus Africa is roughly the same thing, but cells can behave quite differently. If you look at datasets in cell genomics like CELLxGENE, there is so much variability and noise that it’s very difficult for models to handle. We will need to invest much more in curating high-quality data for those domains.

Can you build a virtual cell from constituent parts at all?

Either you need the relevant experience in the form of a costly simulation where you know the constituent pieces but don’t really understand the emergent phenomena. So you can simulate it, generate a lot of synthetic data, and let the model compress that data to understand what kind of phenomena will emerge.

On the other hand, you could say you don’t have an accurate characterization of the simulation. It’s very hard to simulate things. Even for protein folding, people had attempted to build supercomputers to simulate molecular dynamics over millisecond timeframes, and those were not enough. We are very far from simulating a whole cell.

If that simulation is not feasible, the other source of experience is collecting data about the cell. And what is the right amount? We have to approach this in a scientific manner. As a community, we will need to build the scaling laws, similar to what the machine learning community has figured out for large language models. We need a similar set of scaling laws for reasoning about what kind and what amount of data we need to understand biological systems at the level of a cell, an organism, a tissue. It’s not just about size; it’s also about coverage, including different interventions and different cell lines. There needs to be evidence we build using systematic analysis, not shortcuts.

How much of this is about experimental automation? Is that part of your work in the biology portfolio?

Our philosophy has been to leverage AI to the max. Wherever possible we try to take the in silico machine learning approach and see how much further we can push it. But at the end of the day, there are things you have to test in the carbon substrate, in a lab. We have a wet lab, and we recently announced we are building a new autonomous materials lab.

The purpose of those labs is validation. But of course, having them incorporated in an active learning loop where you can generate data that guides the training of these systems becomes possible if you can make that whole process much more efficient, so you can make predictions, get the results, see where you were different, and use it as feedback for the model.

A recent paper in Nature showed that scientists who engage in AI-augmented research publish 3x more papers, receive 5x more citations, and become research project leaders 1.4 years earlier than those who do not. By contrast, AI adoption shrinks the collective volume of scientific topics studied by 5% and decreases scientists’ engagement with one another by 22%. AI adoption in science presents what seems to be a paradox: an expansion of individual scientists’ impact but a contraction in collective science’s reach, as AI-augmented work moves toward areas richest in data. With reduced follow-on engagement, AI tools seem to automate established fields rather than explore new ones, creating a tension between personal advancement and collective scientific progress. Do you think this is due to “early-phase” adoption? If so, what will lead to a shift toward less explored areas in science, perhaps with less data to mine?

It’s a classic correlation versus causation phenomenon. There are certain elements where machine learning requires access to data. Fields that are very quantitative, where there’s a lot of data, and where scientists are already familiar with computational methods will see much earlier uptake of AI-based approaches. But this is not to say that other problems will not be affected. They will be; it’s just that it will take time as we approach those problems more computationally, gather data, and build the right infrastructure to measure progress.

It’s a matter of time. We will see discrepancies in the short term, but in the long term, it’s a tide that will lift all boats. All fields will be affected. But of course, we will need to become more quantitative to be able to use these techniques.

What’s a scientific problem you’ve worked on that you now believe is unsolvable, not hard, but fundamentally beyond what any system, AI or human, will ever answer?

I haven’t come across a problem that is truly infeasible. I would need to see a proof. If I could come up with a lower bound on a mathematical or computational problem, I could say, “you can’t have a matrix multiplication algorithm that is N-squared because the lower bound is this high number.” But I haven’t encountered such proofs for specific problems in science.

In cell genomics, when we have approached these problems, we have realized we need much more data, or that the quality of data in current repositories is not there. But that just implies we need much more investment and more time to collect this data. It does not point to the fact that they cannot be solved. So I wouldn’t be able to give you examples of problems we will not be able to solve. There are problems that will require more work and more data, particularly in biology and cell genomics.

AI Co-Scientist re-discovered an unpublished bacterial gene transfer mechanism in 48 hours that took a lab 10+ years. Is this representative of what’s coming? What does it imply for how we should be allocating human scientific effort?

The rate-determining step will be coming up with the right problem and crisp formalizations. Specifying the problem is going to become very important, and so is validating the result. AI will increasingly take on the how. This is where AI systems will really accelerate solutions. But the what, which problem to solve, how do we define success, how do we detect whether we have solved the problem, those are the things that will determine the rate of progress.

How do DeepMind’s agentic systems like AI Co-Scientist fit into the portfolio, and how do you think they will reshape science?

A lot of our early work was on specialized domain-specific models like AlphaFold and AlphaGenome. More recently, we are looking at more general agentic systems, things like AlphaEvolve and Co-Scientist, which are not there to solve one specific problem but to accelerate the scientific process. That’s an interesting way of describing the role of AI in science: both as systems that can solve unsolvable problems, and as agents that can accelerate the scientific process.

When people think about intelligence, they think about it in a homogeneous way. They imagine a conversational agent, or AlphaFold, or AlphaEvolve. But there are different forms of intelligence. There is common intelligence, the ability to look at images and detect what’s in an image, to transcribe things, to do calculations, to extract numbers from a table. Many people can do it, and now AI systems can do it at scale. What used to take a PhD student or a collection of PhD students two years to tabulate from the literature can be done in a few hours by an agent.

Then there is expert-level intelligence, the ability to make a diagnosis, or to think about lawyers and mathematicians who can solve IMO-level problems. That helps accelerate science because you can use that mathematical ability to prove certain theorems at scale, formalize mathematics, and many other things.

On the final end of the spectrum is superhuman intelligence, where no human has the ability to solve the problem. AlphaFold is a type example: from a single protein, you want to understand the 3D structure. All three types of intelligence will affect science in different ways. AI is going to affect the scientific process writ large, rather than just one specific problem.

Thank you, Pushmeet. I really appreciated the conversation.

Would you like to contribute to Decoding Science by writing a guest post? Drop us a note here or chat with us on X.