A ‘Google’ for chemistry invents best path to new compounds in seconds
August 24, 2012
Northwestern University scientists have connected 250 years of organic chemical knowledge into one giant computer network called Chematica — a chemical “Google” on steroids.
A decade in the making, the software optimizes syntheses of drug molecules and other important compounds, combines long (and expensive) syntheses of compounds into shorter and more economical routes, and identifies suspicious chemical recipes that could lead to chemical weapons.
The number of possible synthetic pathways leading to the desired target of a synthesis can be astronomical (1019 pathways within five synthetic steps).
“I realized that if we could link all the known chemical compounds and reactions between them into one giant network, we could create not only a new repository of chemical methods but an entirely new knowledge platform where each chemical reaction ever performed and each compound ever made would give rise to a collective ‘chemical brain,'” said Bartosz A. Grzybowski, who led the work.
“The brain then could be searched and analyzed with algorithms akin to those used in Google or telecom networks.”
The Chematica network comprises some seven million chemicals connected by a similar number of reactions. A family of algorithms that searches and analyzes the network allows the chemist at his or her computer to easily tap into this vast compendium of chemical knowledge. And the system learns from experience, as more data and algorithms are added to its knowledge base.
Details and demonstrations of the system are published in three back-to-back papers in the Aug. 6 issue of the journal Angewandte Chemie.
Grzybowski is the senior author of all three papers. He is the Kenneth Burgess Professor of Physical Chemistry and Chemical Systems Engineering in the Weinberg College of Arts and Sciences and the McCormick School of Engineering and Applied Science.
Searching billions of chemical syntheses for a desired molecule in seconds
In the Angewandte paper titled “Parallel Optimization of Synthetic Pathways Within the Network of Organic Chemistry,” the researchers have demonstrated algorithms that find optimal syntheses leading to drug molecules and other industrially important chemicals.
“The way we coded our algorithms allows us to search within a fraction of a second billions of chemical syntheses leading to a desired molecule,” Grzybowski said. “This is very important since within even a few synthetic steps from a desired target the number of possible syntheses is astronomical and clearly beyond the search capabilities of any human chemist.”
Chematica can test and evaluate every possible synthesis that exists, not only the few a particular chemist might have an interest in. In this way, the algorithms find truly optimal ways of making desired chemicals.
The software already has been used in industrial settings, Grzybowski said, to design more economical syntheses of companies’ products. Synthesis can be optimized with various constraints, such as avoiding reactions involving environmentally dangerous compounds. Using the Chematica software, such green chemistry optimizations are just one click away.
“One-pot” reactions to eliminate multiple steps
Another important area of application is the shortening of synthetic pathways into the so-called “one-pot” reactions. One of the holy grails of organic chemistry has been to design methods in which all the starting materials could be combined at the very beginning and then the process would proceed in one pot — much like cooking a stew — all the way to the final product.
The Northwestern researchers detail how this can be done in the Angewandte paper titled “Rewiring Chemistry: Algorithmic Discovery and Experimental Validation of One-Pot Reactions in the Network of Organic Chemistry.”
The chemists have taught their network some 86,000 chemical rules that check — again, in a fraction of a second — whether a sequence of individual reactions can be combined into a one-pot procedure. Thirty predictions of one-pot syntheses were tested and fully validated. Each synthesis proceeded as predicted and had excellent yields.
In one striking example, Grzybowski and his team synthesized an anti-asthma drug using the one-pot method. The drug typically would take four consecutive synthesis and purification steps.
“Our algorithms told us this sequence could be combined into just one step, and we were naturally curious to check it out in a flask,” Grzybowski said. “We performed the one-pot reaction and obtained the drug in excellent yield and at a fraction of the cost the individual steps otherwise would have accrued.”
Identifying possible chemical weapons
The third area of application is the use of the Chematica network approach for predicting and monitoring syntheses leading to chemical weapons. This is reported in the Angewandte paper titled “Chemical Network Algorithms for the Risk Assessment and Management of Chemical Threats.”
“Since we now have this unique ability to scrutinize all possible synthetic strategies, we also can identify the ones that a potential terrorist might use to make a nerve gas, an explosive or another toxic agent,” Grzybowski said.
Algorithms known from game theory first are applied to identify the strategies that are hardest to detect by the federal government — the use of substances, for example, such as kitchen salt, clarifiers, grain alcohol and a fertilizer, all freely available from a local convenience store. Characteristic combinations of seemingly innocuous chemicals, such as this example, are red flags.
This strategy is very different from the government’s current approach of monitoring and regulating individual substances, Grzybowski said. Chematica can be used to monitor patterns of chemicals that together become suspicious, instead of monitoring individual compounds. Grzybowski is working with the federal government to implement the software.
Chematica now is being commercialized. “We chose this name,” Grzybowski said, “because networks will do to chemistry what Mathematica did to scientific computing. Our approach will accelerate synthetic design and discovery and will optimize synthetic practice at large.”