ChEMBL

ChEMBL is a manually curated, open-access database of bioactive, drug-like small molecules and their measured activities against biological targets. Maintained by EMBL-EBI and recognised as part of the ELIXIR core resources and global biodata infrastructure, ChEMBL aggregates chemical structures, assay and ADME/Tox data, target annotations and cross-references to genomic and structural resources to make experimental pharmacology and medicinal chemistry knowledge findable and reusable under permissive data licences. The database spans multiple bioactivity data types and historical time periods and brings together primary literature, deposited screening datasets and extracted patent chemistry to provide a broad view of compound–target relationships. ChEMBL provides thematic portals and subresources for particular user needs — for example, a dedicated ChEMBL-NTD repository for rapid deposition and distribution of neglected tropical disease screening and medicinal chemistry data, and a patent-derived collection of molecules from SureChEMBL. Some contributed datasets are published under CC0 while the core ChEMBL releases are distributed under Creative Commons terms; individual depositions typically include citation guidance and attribution information. ChEMBL is designed for both human and programmatic access. The project maintains RESTful web services and bulk download options so researchers can query compounds, assays, targets and activity values at scale or integrate ChEMBL lookups into pipelines. Structure- and identifier-based queries (SMILES/InChI/InChIKey/other connectivity identifiers) can be used, and the UniChem service provides large-scale, non-redundant cross-references between ChEMBL and other chemistry resources (e.g., PubChem, ChEBI, PDBe), enabling federated workflows and look-ups across databases. The web interfaces include search and reporting pages for compound pages, assay summaries, ATC classifications and target pages, while programmatic endpoints return structured JSON for downstream analysis. Typical use cases for ChEMBL include: target identification and validation by aggregating bioactivity evidence across assays and species; SAR and medicinal chemistry work by inspecting compound series, activity cliffs and potency trends; hit triage and prioritisation using assay metadata and ADME/Tox flags; drug repurposing searches based on multi-target profiles and clinical annotations; and open drug discovery for neglected diseases through ChEMBL-NTD datasets such as curated screening collections, Pathogen Box results and other community-contributed screening hits. Academic teams, biotech R&D groups and computational chemists commonly combine ChEMBL data with internal screening results, docking outputs or machine learning models to accelerate hit-to-lead decisions. Integration with external resources and community standards is a core strength of ChEMBL. UniChem links let you map identifiers across chemistry-centric resources, while ChEMBL entries reference protein targets with links to structural and genomic databases. The REST web services and downloadable releases make it straightforward to include ChEMBL lookups in bioinformatics pipelines, cheminformatics toolchains and ML training sets. The project also supports thematic portals and community collections to expose specialist datasets quickly (for example, rapid-deposit NTD data that may not wait for a full ChEMBL release). Users should be aware of the distinction between curated ChEMBL data and uncurated deposit repositories: ChEMBL’s core releases are curated to harmonise assays and standardise activity values, whereas some rapid-deposit resources such as ChEMBL-NTD may be provided by depositors without additional curation and carry the depositor’s accuracy statement. The site and dataset pages provide citation guidance and terms of use; the developers encourage reuse, redistribution and contribution of value-added annotations back to the resource. For teams building drug-discovery workflows or mining public pharmacology, ChEMBL provides a comprehensive, well-connected starting point to assemble assay evidence, trace chemical provenance and integrate public screening data into reproducible analyses.

Links