ESM-2
Date Published

Links
ESM-2 is a suite of large transformer protein language models from Meta’s FAIR team designed to learn structure and function from raw protein sequences at evolutionary scale. Trained on UniRef50 clusters and related datasets, ESM-2 models range from lightweight (t6/t12) to production-scale (t33_650M, t36_3B, t48_15B). The models produce per-residue and per-sequence embeddings, unsupervised attention maps that correlate with contacts, and representations that can be used directly or frozen as input to downstream predictors. ESM-2 and its companion ESMFold demonstrate that large protein language models can match or exceed prior single-sequence structure prediction methods and power end-to-end folding from sequence. Capabilities include: extracting high-dimensional per-token and pooled sequence embeddings for downstream supervised learning; unsupervised contact prediction derived from attention patterns; high-throughput structure prediction when combined with the ESMFold structure module; and support for design/variant analyses either directly (zero-shot scoring) or via adjacent models (e.g., ESM-IF1 for inverse folding). Pretrained checkpoints are provided at multiple scales (e.g., esm2_t33_650M_UR50D, esm2_t36_3B_UR50D, esm2_t48_15B_UR50D) so users can trade off accuracy vs compute. The repo also ships models for related tasks (ESM-1v for zero-shot variant effects, ESM-IF1 for inverse folding, MSA transformers) and publishing-quality utilities for bulk embedding extraction, contact prediction, and structure batching. The project provides practical tooling and integration paths for research and production: pip-installable packages (fair-esm), PyTorch Hub helpers (torch.hub.load), and Hugging Face transformer model wrappers for standardized inferencing. There are command-line utilities (esm-extract to compute embeddings in bulk from FASTA; esm-fold to predict structures from FASTA in batch) and example scripts and notebooks demonstrating variant prediction, contact prediction, inverse folding sampling and scoring, and supervised fine-tuning on embeddings. For structure prediction, ESMFold is available as a packaged model and is also integrated into ColabFold and the ESM Metagenomic Atlas (esmatlas.com) which exposes an API and bulk-downloads of predicted structures. For large-model inference on limited hardware, the repository documents CPU offloading and Fully Sharded Data Parallel (FSDP) approaches using fairscale so a single GPU can run larger checkpoints. Typical use-cases: (1) Rapid structure prediction for uncharacterized sequences or metagenomic ORFs using ESMFold or the Atlas API; (2) Feature extraction — compute per-residue and pooled embeddings to train small downstream classifiers for activity, localization, or stability with limited labels; (3) Variant effect and mutational scanning — use zero-shot scoring protocols or supervised models built on ESM embeddings to prioritize mutations; (4) Protein design — combine ESM language models and ESMFold/ESM-IF1 for de novo sequence design, conditional sampling, or fixed-backbone sequence generation and scoring; (5) Contact and interaction inference — exploit attention-derived contact maps for structural hypotheses or to augment docking pipelines. These workflows are supported by example notebooks and scripts in the repository to accelerate adoption. Operational notes and licensing: ESM-2 requires PyTorch and for some workflows GPU acceleration (esp. ESMFold/OpenFold dependencies). Installing ESMFold has extra requirements (OpenFold and nvcc-compatible CUDA) while many embedding and smaller-model tasks run on CPU or modest GPUs. Larger checkpoints (3B–15B parameters) will benefit from FSDP/sharding or CPU offload; prototyping is easier with 150M–650M models. The codebase is MIT licensed; the ESM Metagenomic Atlas data is available under CC BY 4.0 with Meta open-source terms for reuse. The repository includes thorough examples, model download links, and citations to the foundational papers (ESM/ESM-2/ESMFold/ESM-IF1) to guide reproducible research and responsible use.