If you know even just a little bit about science, you probably already know that molecules are often referred to as “the building blocks of life.” Made of a group of atoms that have bonded together, molecules make up all kinds of materials, but behave totally differently in regards to macroscopic objects than atoms do. Picture how a LEGO model is made of many teeny tiny bricks—it’s easy for us to move these bricks around, but if you think of molecules as these bricks, it’s much more difficult to do so, as each one basically requires its own separate set of instructions.
A team of researchers from Germany and Korea are working to turn this around, and have created an artificial intelligence (AI) system that can learn how to selectively grip, and move, separate molecules through the autonomous use of a scanning tunneling microscope (STM) which is used for imaging surfaces at the atomic level. So, if we go back to the LEGO metaphor, the team made an autonomous robot that can play with LEGO bricks on the nanoscale, which could have major ramifications for molecular 3D printing.
Dr. Christian Wagner, head of the ERC working group on molecular manipulation at Forschungszentrum Jülich, explained, “If this concept could be transferred to the nanoscale to allow individual molecules to be specifically put together or separated again just like LEGO bricks, the possibilities would be almost endless, given that there are around 1060 conceivable types of molecule.”
Researchers from Forschungszentrum Jülich, Jülich Aachen Research Alliance (JARA), RWTH Aachen University, Technische Universität Berlin, the Max Planck Institute for Informatics, and Korea University make up the team. They recently published a paper, “Autonomous robotic nanofabrication with reinforcement learning,” describing their method in Science Advances.
The abstract states, “The ability to handle single molecules as effectively as macroscopic building blocks would enable the construction of complex supramolecular structures inaccessible to self-assembly. The fundamental challenges obstructing this goal are the uncontrolled variability and poor observability of atomic-scale conformations. Here, we present a strategy to work around both obstacles and demonstrate autonomous robotic nanofabrication by manipulating single molecules. Our approach uses reinforcement learning (RL), which finds solution strategies even in the face of large uncertainty and sparse feedback.”
Figure 1. Subtractive manufacturing with an RL agent. (A) PTCDA molecules can spontaneously bind to the SPM tip and be removed from a monolayer upon tip retraction on a suitable trajectory. Bond formation and breaking cause strong increases or decreases in the tunneling current (L inset). Removal is challenging, because PTCDA is retained in the layer by a network of hydrogen bonds (dotted lines, R right inset). The RL agent can repeatedly choose from the five indicated actions a′1−5 (green arrows) to find a suitable trajectory (action set A: ∆z = 0.1 Å step plus ±0.3-Å step in the x or y direction, or no lateral movement). (B) STM image of a PTCDA layer with 16 vacancies created by the RL agent. (C) Probability of bond rupture in intervals of 0.5 Å around tip height z as a function of z, based on all bond-breaking events accumulated during RL agent experiments (inset). (D) The Q function is approximated by a neural network with 30 neurons in the first and 2 × 15 neurons in the second hidden layer. This dueling network architecture (39) features separate outputs Ai and V, with Qi = V + Ai for actions a′i=1…5. The actually performed action is then randomly chosen from A with probabilities computed with the policy π.
To guide the rigid cone at the end of the STM tip so that individual molecules can be arranged in a specified manner, instead of just moving back and forth, a special recipe is needed, and because nanoscale mechanics are so complicated, it’s not something a scientist can just easily figure out or calculate.
“To date, such targeted movement of molecules has only been possible by hand, through trial and error,” explained Prof. Dr. Stefan Tautz, head of Jülich’s Quantum Nanoscience institute. “But with the help of a self-learning, autonomous software control system, we have now succeeded for the first time in finding a solution for this diversity and variability on the nanoscale, and in automating this process.”
So, what makes this possible? An area of machine learning, called reinforcement learning (RL), that concerns itself with the way software agents should take action in a specific environment to enhance the possibility of receiving a cumulative reward. In this case, the researchers used robotics and RL to automate a manipulation task—moving molecules—at the nanoscale. So, the algorithm the researchers use will keep working to solve its given task, and learn from its experience each time.
Prof. Dr. Klaus-Robert Müller, head of the Machine Learning department at TU Berlin, explained, “We do not prescribe a solution pathway for the software agent, but rather reward success and penalize failure.”
Figure 2. Training and performance of RL agents. (A) Map [2D slice through the 3D system] of synthetic bond rupture criteria used to study the RL agent’s behavior under controlled conditions. The criteria are based on a successful experimental trajectory around which a corridor of variable diameter has been created (light red) beyond which the bond ruptures (blue). The corridor diameter is chosen to approximately reproduce the experimental bond-rupture probabilities (Fig. 1C). One successful trajectory [see (C)] is indicated in green. (B) Probability of agent failure in z intervals of 0.5 Å in the simulation in (A). (C) Learning progress of one RL agent. Six plots show 2D cuts (y = 0) through the color-encoded value function V after the number of episodes indicated in the upper right corner. A 2D projection of the agent’s trajectory in each episode is shown as a black line. Crosses indicate bond-breaking events triggered according to the criteria in (A). (D) Swarm plot comparing performance of different RL agents acting in the simulation (A). Plotted is the number of episodes n required to accomplish the removal task for four sets of 80 simulated experiments each. An experiment was considered a failure after 150 unsuccessful episodes. The respective probabilities of agent failure are indicated in the upper part of the graph.
You might be familiar with the AI system AlphaGo Zero, a version of DeepMind’s AlphaGo software. In 2017, AlphaGo Zero was able to autonomously come up with strategies to win a complicated game, without having to watch humans play it; the system was able to beat professional players within a few days. But this team demonstrated its RL approach by autonomously removing molecules from a supramolecular (made of many molecules) structure with a scanning probe microscope.
“In our case, the agent was given the task of removing individual molecules from a layer in which they are held by a complex network of chemical bonds,” Dr. Wagner explained. “To be precise, these were perylene molecules, such as those used in dyes and organic light-emitting diodes.”
One difficulty the researchers ran into was that the force needed to move the molecules can’t surpass “the strength of the bond with which the tip of the STM attracts the molecule” because it would break. So, at the beginning, the software agent would move at random, and break the bond between the molecule and the microscope tip, but eventually created its own set of rules to ensure this would not continue.
Fig. 3 Performance of RL agent in experiment. (A) Swarm plot of number of episodes n required to accomplish removal. Groups of at least three data points acquired with the same tip are identically colored (except black). If a tip capable of removal (proven by a successful experiment) failed in another experiment, the respective data point is labeled as “agent fail.” Points labeled as “tip fail” denote tips with which the removal task has never been accomplished, notwithstanding that this could, in principle, also be an agent failure. (B) Density of (x, y) positions where all (ultimately successful) tip trajectories pass through the z region of highest bond-rupture probability (z = 2 Å; Fig. 1C). The positions for Tip D (strong) and Tip E (weak) are indicated by dots. (C) (x, y) projections of all bond ruptures occurring within the first 10 episodes for R- and P-agents. Cross sizes indicate rupture heights z. The quoted numbers give the percentage of rupture points located in each of the four quadrants of the coordinate system. The green curve shows the last trajectory chosen by the P-agent during its pretraining. Its direction indicates why the P-agents have a clear preference to explore the promising (B) lower left quadrant, which explains their performance edge (A).
Another issue that occurred due to the use of RL on the nanoscale is that the metal atoms which create the STM’s tip can shift a little, which changes the bond strength. But the researchers figured this out as well, by having the software learn “a simple model of the environment” where manipulation occurs parallel with the first cycles. Then, the agent is able to train in reality and within its own model at the same time, which can really speed up its learning process.
“Every new attempt makes the risk of a change and thus the breakage of the bond between tip and molecule greater. The software agent is therefore forced to learn particularly quickly, since its experiences can become obsolete at any time. It’s a little as if the road network, traffic laws, bodywork, and rules for operating the vehicle are constantly changing while driving autonomously,” stated Prof. Dr. Stefan Tautz.
“Up until now, this has only been a ‘proof of principle’. However, we are confident that our work will pave the way for the robot-assisted automated construction of functional supramolecular structures, such as molecular transistors, memory cells, or qubits — with a speed, precision, and reliability far in excess of what is currently possible.”