10 Protein Models You Should Know

The field of protein modeling has experienced a revolution over the past few years. What began as a longstanding challenge in structural biology—predicting how a linear sequence of amino acids folds into a three-dimensional shape—has now exploded into an entire ecosystem of models for structure prediction, sequence design, and generative biology.

The turning point came in 2020 with the release of AlphaFold2, which for the first time delivered near-experimental accuracy in protein structure prediction. Since then, an array of follow-ups from academia, tech companies, and biotech startups has pushed the field into new territory. Below are 10 models that mark the most important milestones in this story.

1. AlphaFold

The story began with AlphaFold1 at CASP13 in 2018, the first deep learning model to surpass traditional structure prediction methods; it set the stage for AlphaFold2 in 2020, which stunned the field by reaching near-experimental accuracy in CASP14 and effectively solving the folding problem for many proteins; and most recently, AlphaFold3 (2024) extended the framework to complexes, ligands, and nucleic acids, marking a decisive shift from structure prediction toward modeling molecular interactions and firmly establishing AlphaFold as the turning point in modern computational biology.

2. OpenFold

OpenFold is a community-driven reimplementation of AlphaFold2. It was developed to provide transparency, reproducibility, and modularity, making it easier for researchers to extend or retrain the model. OpenFold has become a standard reference for groups wanting AlphaFold-like performance without relying on DeepMind’s closed-source code.

3. RoseTTAFold

Released by the Baker Lab soon after AlphaFold2, RoseTTAFold provided an independent, open-source architecture with a three-track design that integrated sequence, distance, and coordinate information. It not only democratized high-accuracy folding but also laid the groundwork for later Baker Lab projects, including RoseTTAFold All-Atom and RoseTTAFoldNA for protein–nucleic acid complexes.

4. RFdiffusion

A landmark in protein design, RFdiffusion used diffusion models (familiar from image generation) to create new protein backbones rather than just predict existing ones. It allowed researchers to generate proteins with new topologies and functions. The recently released RFdiffusion2 improved speed, controllability, and design accuracy, making generative protein design more practical for real biotech applications.

5. ProteinMPNN

ProteinMPNN is a graph neural network for sequence design: given a backbone structure, it efficiently generates compatible amino acid sequences. It quickly became a core tool in many design pipelines because of its speed and reliability. Its extension, LigandMPNN, adapts the same principles for protein–ligand interactions, moving closer to rational drug design.

6. ESM

Meta’s FAIR team released the ESM family of protein language models, trained on massive sequence datasets. ESMFold extended these models to predict structures directly from sequence, offering extremely fast inference. In 2023, the team spun out into a startup called Evolutionary Scale Modeling (ESM) to develop next-generation models with drug discovery applications in mind.

7. ProtTrans

The ProtTrans paper (2020) was a landmark in applying large transformer architectures to protein sequences. It introduced several models, including ProtBERT, ProtT5, and ProtXLNet, trained on hundreds of millions of sequences. These models provided embeddings useful for a wide range of downstream tasks—function prediction, mutation effects, enzyme classification—and remain heavily used in bioinformatics pipelines.

8. ProtGPT2

Inspired by natural language generation, ProtGPT2 was one of the first attempts to generate entirely new protein sequences from scratch using autoregressive transformers. It highlighted the possibility of exploring vast “dark” regions of protein space beyond naturally occurring sequences, raising intriguing questions about how generative AI might invent new biomolecules.

9. Boltz

Generate Biomedicines introduced Boltz, a generative model family for protein design. Boltz-1 showed early promise, and the recently announced Boltz-2 (2025) substantially improved design fidelity and biological relevance. These models represent the industrial push toward therapeutic applications of generative protein design.

10. Chroma

As a complement to Boltz, Generate Biomedicines developed Chroma, another generative model designed to create functional proteins at scale. Chroma focuses on controllability—allowing researchers to bias generation toward desired functions or properties—and illustrates how biotech companies are moving beyond structure prediction into programmable biology.

Summary

From AlphaFold’s historic breakthrough to the latest generative models like RFdiffusion 2 and Boltz 2, protein modeling has moved from prediction into design. The ecosystem now includes open-source community projects, academic breakthroughs, and industry-driven platforms aimed at drug discovery and synthetic biology. If the last five years are any indication, the next five will transform how we design enzymes, therapeutics, and even entire biological systems.