In his essay Generating Crazy Structures, Derek Lowe writes of a problem that “everyone would like to do and no one can be quite sure that they’re able to manage yet.” That problem is predicting the structure of molecules we could synthesize easily, and would have good industrial applications.
I’m most interested in molecules that can be made by synthetic biology. For example, yeast can produce ethanol quite well. Other organisms generate different outputs, and we can engineer them to get novel molecules. However, the chemical architecture of some of these make them much more difficult to produce. The question is, can we develop computational models that can predict which molecules will work?
There are plenty of ways to approach the problem, and these range from “indistinguishable from trivial” all the way to “indistinguishable from magic.”Derek Lowe
The problem is multifaceted. We want molecules that will work for us in pharma, agriculture, and industry. But it’s hard to predict how a new molecule will behave – as, say, a drug or bioplastic. The chemical space is widely expansive. Any chemistry student knows the complexity of atomic interactions, with their hydrogen bonds and Van der Waals forces and the like. Add biology, and we find large sets of homologous genes for some step in the chemical synthesis pathway. Some of those enzymes are more efficient that others. It’s hard to predict which one will work the best in any given cell.
Both the chemistry and biology might work against you. And in the meantime, we’re trying to come up with algorithms for predicting which of these molecules we should make. Many computational methods that have been developed can predict potentially useful molecules. So far, they too often ignore whether the chemistry is feasible or not. Gao & Coley (2020) recognize this, and attempted to incorporate a sense of practicality into their method. Their criteria:
- Does the molecule do something useful?
- Can the molecule be synthesized easily in a chemistry lab?
- Can we train deep learning algorithms to improve predictions?
In other words, we have to consider (1) application, (2) chemical feasibility, and (3) computational optimization. This goes into a computational method for generating molecule suggestions. What about the science of it? Could experimental validation be used to feed into and improve the model?
Synthetic biologists want to go a step further. Can those molecules be produced by a cell? What if we could get an Alzheimer’s drug as easily as we get beer from yeast fermentation? But the output these models are generating haven’t been terribly helpful – yet. They give you a “crazy list” – that scientists end up amending:
When you get one of those crazy lists, applying the synthesizability screen throws out so many compounds that the ones that are left may not even be anything that the algorithms found very interesting in the first place.Derek Lowe
The chemical space is vast and the biology contains many unknowns. There’s need for improvement here – exciting research to be sure.
Lowe, Derek. (2020). “Generating Crazy Structures.” In the Pipeline, Science Translational Medicine. blogs.sciencemag.org/pipeline/archives/2020/09/30/generating-crazy-structures
Gao, W., & Coley, C. W. (2020). The Synthesizability of Molecules Proposed by Generative Models. Journal of Chemical Information and Modeling. https://doi.org/10.1021/acs.jcim.0c00174