biotite

Biotite is positioned as a Swiss-army knife for bioinformatics: a comprehensive, open-source Python package that collects common tasks in sequence and structural bioinformatics into a unified, high-performance API. It was designed to let users skip repetitive work (parsing files, data conversion, glue code) and focus on analysis and custom algorithms. The project is documented with tutorials and an extensive example gallery, is available on PyPI and conda-forge, and requests citation to Kunzmann & Hamacher (BMC Bioinformatics, 2018) when used in publications. At its core Biotite stores most objects as NumPy ndarrays, giving users an intuitive, array-oriented interface with the performance and interoperability of the scientific Python stack. That design choice enables direct, low-level access when you need it, while wrapping common operations in higher-level objects (e.g., ProteinSequence, StructureArray). For sequence analysis Biotite supports any sequence alphabet — nucleotide, amino-acid, and structural alphabets such as 3Di — and supplies fast, modular alignment routines, scoring matrices, and visualization helpers (alignment rendering, feature maps) based on Matplotlib. For structural work it handles both macromolecules and small molecules in a NumPy-like, sliceable structure representation and implements a variety of analysis routines: coordinate transformations, filtering and selection, surface area and contact calculations, secondary-structure detection, disulfide and contact-site identification, and trajectory/ensemble support. Biotite also focuses on interoperability and input/output. It reads and writes many commonly used file formats (PDB, mmCIF/BinaryCIF, MOL/SDF and multiple trajectory formats) and exposes functions to construct, edit and export objects without forcing intermediary disk-format juggling. On the data side, Biotite can fetch and query records from public biological databases such as NCBI Entrez, UniProt, PDB and PubChem; queries are expressed in Pythonic logical operators, so you rarely need to learn raw REST calls. Where Biotite's built-ins are not enough, the library provides interfaces to external software (for example for multiple sequence alignment or secondary structure annotation) — file creation and command-line execution are handled under the hood so you input Python objects and get Python objects back. The project ecosystem also includes community-maintained bridges and helper packages to tools like OpenMM, MDTraj, and PyMOL to ease simulation preparation, trajectory analysis and visualization. Typical use-cases range from small analysis scripts to components of larger bioinformatics software. Practical examples in the documentation show workflows such as downloading sequences from Entrez and aligning them with affine-gap scoring, creating structure-derived feature maps, detecting conserved regions across a family, computing per-residue solvent-accessible surface area, examining contact networks, and preparing input for docking or molecular dynamics. Because data are held in ndarrays, Biotite is especially well suited for pipelines that mix custom NumPy/SciPy code, machine-learning preprocessing, or large-scale batch processing. The project maintains an example gallery, API reference and tutorial to help new users get started quickly, and an active community (GitHub, Discord) for support and contributions.

Links