InterPro

InterPro is a curated, integrated database and analysis resource for protein sequence annotation. It aggregates predictive models (called signatures) from several member databases into a single searchable framework, allowing users to classify proteins into families, identify conserved domains, repeats and sites, and infer likely functions. The resource is maintained at EMBL-EBI, forms part of the ELIXIR infrastructure and the Global Biodata Coalition, and is widely used for automated annotation of genomes, proteomes and individual protein sequences. At the core of InterPro are signature matches that map patterns, profiles and HMM-based models onto input sequences. Those aggregated matches are reconciled and curated to provide unified entries that describe families, domains and functional sites, and that are linked to controlled vocabularies such as Gene Ontology (GO) terms. InterPro provides a web interface for interactive searching by protein, family, domain, keyword or GO term, as well as programmatic access via a JSON-based API with endpoints for entries, proteins, structures, taxonomy and proteomes. The resource is also discoverable through EBI Search, and visualisation components such as ProtVista/Nightingale are used to display sequence feature maps in the browser. InterProScan is the accompanying software package for local batch annotation: it takes protein (and some nucleic) sequences and scans them against the InterPro signatures to produce integrated annotation for each input. InterProScan is designed for high-throughput usage—commonly used when annotating genomes, metagenomes or large proteomes—and runs on Linux systems. For many users, running InterProScan locally is the way to annotate private or large datasets prior to deposition or downstream analysis, while the web interface and API are convenient for single-sequence queries, ad hoc exploration and integration into web services and pipelines. Typical use-cases include rapid functional annotation of predicted proteins from genome assemblies, motif and domain detection to guide experimental design, and automated assignment of GO terms for downstream enrichment and pathway analysis. InterPro is also useful for variant interpretation: mapping mutations onto known domains and active sites helps predict potential functional impact. Researchers performing metagenomics or microbiome studies rely on InterPro to characterise the functional potential of community proteomes, and developers integrate InterPro results into annotation pipelines, knowledgebases and visualization portals to provide consolidated protein-level insight. InterPro’s outputs are easy to integrate into analysis workflows. The JSON API enables direct programmatic retrieval of entry definitions, protein matches, structural annotations and proteome summaries. The InterPro team provides machine-readable endpoints for entries, proteins, structures, taxonomy and proteomes, and the public web pages link out to supporting evidence and cross-references. Visualization libraries developed in the same ecosystem—ProtVista and Nightingale components—allow feature-rich, interactive displays of domain architecture, sequence features and cross-links to structure models and literature, facilitating interpretation and presentation of results. The resource is actively curated and regularly updated; recent releases highlight improvements including AI-driven classification to aid the growing protein sequence universe and website and data-interface enhancements for easier browsing. InterPro is distributed as part of EMBL-EBI’s open-data portfolio and should be cited in publications using the resource (the InterPro team requests citation of their latest publication). Whether you need single-protein annotation, whole-proteome classification, or to embed trusted functional annotation into a bioinformatics pipeline, InterPro provides integrated, maintained, and interoperable protein functional annotation backed by a large community of contributing databases and curators.

Links