A Roivant Sciences spinout says it has made the first AI model, to its knowledge, that simultaneously predicts biomolecule structures while generating new molecules.
The diffusion-based model, called Neo-1, can generate new molecular glues, or small molecules that get two proteins to stick together, VantAI’s leaders exclusively told
Endpoints News
. The biotech will publish a 6,000-word blog post Friday describing the model’s performance and broader potential in drug discovery. All told, VantAI sees Neo-1 as going beyond the capabilities of AlphaFold 3, the state-of-the-art model developed by Google DeepMind and Isomorphic Labs — especially to help make new glues.
“It tackles the big next step after AlphaFold 3,” VantAI’s chief technology officer Luca Naef said in an interview. “What we’re really excited about is unlocking the next chapter of this modality, so not just degradation but going beyond.”
Neo-1’s unveiling is the most substantive update yet on what VantAI has been working on since its 2019 founding. The startup has previously announced research partnerships to discover new molecular glues with Johnson & Johnson, Bristol Myers Squibb and Blueprint Medicines. Neo-1 could allow VantAI to stand out in
an increasingly crowded glue space
, since it can generate new candidates from scratch.
Roivant and the Korean conglomerate SK Group hold equity in VantAI, according to regulatory filings. VantAI now has about 50 employees with offices in New York and Zurich, as well as a data-generation facility in Berlin, CEO Zachary Carpenter said.
Neo-1’s release figures as a key part of VantAI’s pitch to potential investors. The ability to simultaneously predict structure while generating a new molecule may crack a unique, thorny problem related to finding glues.
Glues work by linking two proteins in a cell, creating a biomolecular complex — the two proteins and the linking glue molecule — that doesn’t otherwise form. Today’s leading AI models, such as AlphaFold 3 and similar open-source versions like Boltz-1, struggle with these complicated structures, particularly because there are so few examples of them in datasets like the Protein Data Bank.
Neo-1, by contrast, can take a mix of sequences and structures of biomolecules as an input to predict the overall structure and generate a new molecule that fits that prediction.
“It’s impossible to generate something if you don’t know the molecule that will form it,” Carpenter said. “Having a model that can take two sequences as input and generate that molecule unlocks a lot of potential in this space, and that’s exactly why this model was built.”
VantAI is betting that Neo-1 can deliver a whole slew of new glues. The broader field uses the same group of 10 or 15 glues, Carpenter said, because it’s been so difficult to discover or create new ones.
VantAI’s own biological dataset, called NeoLink, was key in training the model. The idea somewhat mirrors shotgun sequencing, where DNA fragments are reassembled into a picture of the genome. In this case, VantAI tested massive numbers of chemical linkers and proteins, creating a training dataset of its own fragments of proteins and molecules that only holds value for AI models, rather than human researchers.
“Five years ago, this data was rather useless,” Naef said. “Now we have folding models where we can put this as input and use it to reconstruct the complex in cases where this is not possible without any data.”
Michael Bronstein, a leading figure in AI bio who is also chief scientist-in-residence of VantAI, called NeoLink a “very good example of blackbox data.” Bronstein and Naef wrote a
10,000-plus word treatise last year
on the future of blackbox data, or datasets that are not interpretable or valuable to human eyes.
A dataset like NeoLink “makes sense only in conjunction with the appropriate machine-learning model,” Bronstein said.
For now, Neo-1 and its results are being released only in a technical blog post. Carpenter and Naef said a richer technical paper is on the horizon. They are also considering releasing a portion of the NeoLink dataset for academic research.
The blog post acknowledges that the team described retrospective tests of the model, with plans to share prospective and experimentally validated results in the future.