Supervised Machine Learning
We develop state-of-the-art deep learning based model architectures to predict important molecular properties for lead discovery and optimization. We use large-scale distributed training and transfer learning to build special-purpose models.
In-House Data Generation via Active Learning
Data from discovery programs is not as good as data for discovery programs. We are using a range of statistical methods including active learning to build high quality in-house datasets for bioactivity, ADMET, and PK.
Billion-Scale Virtual Library Construction
We regularly enumerate and screen virtual libraries of millions to billions of compounds for lead optimization. We use state of the art retrosynthesis and reaction planning methods to accelerate preclinical discovery in early- and late-stage compounds.
Massive Scale Compute
We are a cloud-first company. Our data scientists can rapidly train models on thousands of machines, and our medicinal chemists can enumerate, search, and prioritize millions of compounds every week, allowing us to iterate incredibly quickly.
Representation and Visualization
We invest heavily in building rich representations for our data. From string-based SMILES methods to custom 4D QM-based representations, we use a diverse range of featurizations for small molecules and proteins. We focus on using interpretable representations to inform decision-making.
Algorithmic Molecular Generation
We use computational techniques, from genetic algorithms to continuous latent models, to generate new ideas for molecules. We place a strong emphasis on generating molecules that will be synthesizable, stable, and optimized against multiple criteria.