Guest essay by Eric Worrall
Scientists claim deep mind’s reinforcement learning AI has demonstrated superior control of fusion plasmas. But they may also have inadvertently revealed a critical weakness in their approach.
DeepMind Has Trained an AI to Control Nuclear Fusion
The Google-backed firm taught a reinforcement learning algorithm to control the fiery plasma inside a tokamak nuclear fusion reactor.
THE INSIDE OF a tokamak—the doughnut-shaped vessel designed to contain a nuclear fusion reaction—presents a special kind of chaos. Hydrogen atoms are smashed together at unfathomably high temperatures, creating a whirling, roiling plasma that’s hotter than the surface of the sun. Finding smart ways to control and confine that plasma will be key to unlocking the potential of nuclear fusion, which has been mooted as the clean energy source of the future for decades. At this point, the science underlying fusion seems sound, so what remains is an engineering challenge. “We need to be able to heat this matter up and hold it together for long enough for us to take energy out of it,” says Ambrogio Fasoli, director of the Swiss Plasma Center at École Polytechnique Fédérale de Lausanne in Switzerland.
That’s where DeepMind comes in. The artificial intelligence firm, backed by Google parent company Alphabet, has previously turned its hand to video games and proteinfolding, and has been working on a joint research project with the Swiss Plasma Center to develop an AI for controlling a nuclear fusion reaction.
DeepMind has developed an AI that can control the plasma autonomously. A paperpublished in the journal Nature describes how researchers from the two groups taught a deep reinforcement learning system to control the 19 magnetic coils inside TCV, the variable-configuration tokamak at the Swiss Plasma Center, which is used to carry out research that will inform the design of bigger fusion reactors in the future. “AI, and specifically reinforcement learning, is particularly well suited to the complex problems presented by controlling plasma in a tokamak,” says Martin Riedmiller, control team lead at DeepMind.
…Read more: https://www.wired.com/story/deepmind-ai-nuclear-fusion/
The abstract of the paper;
Magnetic control of tokamak plasmas through deep reinforcement learning
Nuclear fusion using magnetic confinement, in particular in the tokamak configuration, is a promising path towards sustainable energy. A core challenge is to shape and maintain a high-temperature plasma within the tokamak vessel. This requires high-dimensional, high-frequency, closed-loop control using magnetic actuator coils, further complicated by the diverse requirements across a wide range of plasma configurations. In this work, we introduce a previously undescribed architecture for tokamak magnetic controller design that autonomously learns to command the full set of control coils. This architecture meets control objectives specified at a high level, at the same time satisfying physical and operational constraints. This approach has unprecedented flexibility and generality in problem specification and yields a notable reduction in design effort to produce new plasma configurations. We successfully produce and control a diverse set of plasma configurations on the Tokamak à Configuration Variable1,2, including elongated, conventional shapes, as well as advanced configurations, such as negative triangularity and ‘snowflake’ configurations. Our approach achieves accurate tracking of the location, current and shape for these configurations. We also demonstrate sustained ‘droplets’ on TCV, in which two separate plasmas are maintained simultaneously within the vessel. This represents a notable advance for tokamak feedback control, showing the potential of reinforcement learning to accelerate research in the fusion domain, and is one of the most challenging real-world systems to which reinforcement learning has been applied.Read more: https://www.nature.com/articles/s41586-021-04301-9
What is this critical weakness I mentioned?
Deep mind is a tremendously powerful AI. But by virtue of its architecture the Deep Mind system is stateless. It has no memory of the past.
Deep Mind accepts an input, like the current state of the reactor. It processes the input. Then it provides an output, and forgets everything it has just done. Every day is the first day for Deep Mind.
This is fine for playing a game of chess, because it is perfectly possible to evaluate the current state of the chess board, and generate a technically perfect move. A chess AI does not have to know the past, all it has to do is evaluate the present, and decide the best move based on the current layout of the pieces.
This living in the moment model starts to break down when you attempt to control real world processes.
Imagine trying to train an AI to catch a baseball, by showing the robot pictures of the field. You can’t catch a ball by moving the robot hand to where the ball is, you have to intercept the ball in flight, by predicting where the ball will be in the time it takes to move the robot’s hand to the right position. This requires not just knowledge of where the ball is now, but the ability to evaluate the flight of the ball, the velocity and direction it is moving. This is knowledge which the robot can only obtain by remembering where the ball was, and how quickly and from which direction it moved towards its current position.
Only knowledge of the past can give the robot the ability to truly manage a real world process.
I’m not dissing what Google and the Swiss Plasma Center achieved – they demonstrated AIs have a role in managing fusion plasmas. What I am questioning is whether the deep mind reinforcement learning architecture is the best solution.
Because there is another class of AI architectures which can learn by their mistakes just like Deep Mind, but which can also evolve to have memory of the past. For example, NEAT, or Neuroevolution of augmenting topologies.
Unlike Deep Mind style architectures, which have a finite, well defined path from stimuli to response, after which the neural network forgets everything until presented with a new stimulus, NEAT systems are messy. They evolve their own network of connections, even adding new neurones and layers if needed, which can include connections which flow backwards. A signal can enter a NEAT network, and kind of bounce around, affecting the process, never truly being forgotten until the information is no longer relevant. Unlike Deep Mind, NEAT networks can respond differently to the same stimuli, depending on the NEAT network’s memory of the past. NEAT can catch the baseball.
But NEAT style architectures do not fit well with Google’s AI business model.
Stateless neural net AI architectures are much easier to manage from a business perspective, they allow Google to create vast arrays of independent computers which are all perfect clones of each other, and assign the next inbound processing request, the next set of plasma sensor reading, to any computer in their array.
NEAT systems by contrast need a dedicated computer. If the specific NEAT solution has a feedback loop, the current state matters. The current state of the NEAT network cannot be purged after the task is complete, and rehydrated back to initial state on whatever computer is available, it has to remember what happened before.
In Google’s world, this would be an absolute nightmare – either each client would need their own dedicated computer. This would completely mess up their business model, because if the experiment ends, the client won’t necessarily tell Google they no longer need the dedicated computer. Assigning dedicated computers ties up resources, depleting the pool of computers available for other clients.
Either that or the current state of the NEAT network would have to be saved somewhere, and passed around Google’s network – which would require an enormous increase in storage and network capacity, over what is required for Google’s current stateless business model.
It is going to be very interesting to see how far Google can carry their stateless Deep Mind model when it comes to process control. I’m impressed they got it to work at all. Perhaps they simulated holding state somehow, by asking the client to remember the previous result, and pass those results back through their network.
Or perhaps plasmas are almost like a chess board – most of the time, the current state of the plasma is enough information to calculate the next step required to maintain control.
But the Google Deep Mind experiment was not a complete success. I suspect that last missing piece of the puzzle will be to reject Deep Mind’s stateless architecture, and embrace a neural network architecture which can catch the baseball.
Simulated state, in which a human tries to guess what memory of the past is required to catch the baseball, cannot match the flexibility of a NEAT style neural net architecture which can evolve its own memory of the past, which is capable of making its own mind up about what state it has to keep to perform its task, and how long that state remains relevant.