profile-img
Ben Birnbaum

@benbirnbaum

Machine learning engineer. Former lead of the Machine Learning Team at @flatironhealth, now diving into drug discovery and computational chemistry.

calendar_today16-02-2009 22:41:53

316 Tweets

468 Followers

911 Following

Ben Birnbaum(@benbirnbaum) 's Twitter Profile Photo

But these planners are very slow, and so this turns out not to be a very practical fix.

The authors solve this problem by going bottoms-up rather than top-down.

account_circle
Ben Birnbaum(@benbirnbaum) 's Twitter Profile Photo

Instead of learning to generate molecules optimizing one or more properties and then filtering to the ones that can be synthesized, they learn to synthesize analogs of a molecule and then use an optimization algorithm (in this case a genetic algorithm) to improve those analogs.

account_circle
Ben Birnbaum(@benbirnbaum) 's Twitter Profile Photo

Their method is parameterized by which molecules can be used as building blocks and which reaction templates can be used for combining molecules. I love that these inputs are customizable, since in practice groups will often have their own that they want to use.

account_circle
Ben Birnbaum(@benbirnbaum) 's Twitter Profile Photo

The first step of their method is model training. They use the building blocks and reaction templates to randomly generate a bunch of synthetic plans and then train models to predict each step in those plans (e.g. which reaction template and/or building block will be used).

account_circle
Ben Birnbaum(@benbirnbaum) 's Twitter Profile Photo

The trick is that the models have access not only to the inputs at each step but also to an embedding of the target molecule.

account_circle
Ben Birnbaum(@benbirnbaum) 's Twitter Profile Photo

So, with a representation of what has been synthesized so far, as well as a representation of where the synthesis should go, the model has, at least in theory, what it needs to predict the next reaction step.

account_circle
Ben Birnbaum(@benbirnbaum) 's Twitter Profile Photo

Once the model is trained, it can be run on a set of target compounds to find synthetic plans for those compounds. Sometimes the model will succeed, and sometimes it will fail.

account_circle
Ben Birnbaum(@benbirnbaum) 's Twitter Profile Photo

But this failure is actually a feature, not a bug. The compounds that are generated instead will tend to be analogous, since they are close in embedding space, and they will also be synthesizable, since only the supplied building blocks and reaction templates were used.

account_circle
Ben Birnbaum(@benbirnbaum) 's Twitter Profile Photo

The final step is to layer in an optimization algorithm like a genetic algorithm. The procedure above is used to generate a bunch of analogs and then score them according to whatever metrics are of interest (e.g. docking, ML property prediction, MPO, etc.).

account_circle
Ben Birnbaum(@benbirnbaum) 's Twitter Profile Photo

Each analog is represented by its embedding, and new embeddings are created via mating and mutating the embeddings of the best molecules generated. These new embeddings can then be used to guide the synthesis of new molecules, and the whole process repeats until convergence.

account_circle