Logo
Libraries and SDKs,  Molecular Biology

Bio3D (R)

Date Published

Links

Bio3D is a comprehensive R package family for structural bioinformatics and comparative analysis of protein structures, sequences and molecular dynamics trajectories. Developed by Barry Grant, Xin-Qiu Yao, Lars Skjaerven and colleagues, Bio3D provides utilities to read, write and manipulate PDB files and trajectory data, perform atom selection and superposition, and apply multivariate and physical analyses such as principal component analysis (PCA), normal mode analysis (NMA) and ensemble-level NMA. The package leverages R’s statistical and graphical capabilities to enable exploratory and reproducible workflows for structure-dynamics investigations. At its core Bio3D supports typical structure-processing tasks: reading PDBs with read.pdb (accepting local files or RCSB IDs), inspecting atomic and coordinate matrices, and selecting atom subsets via atom.select.pdb (e.g. calpha, backbone, ligand, water). From these building blocks you can do rigid-core identification, re-orientation and superposition, compute distance matrices and torsion statistics, and derive PCA modes (pca.xyz) to describe dominant conformational variation across ensembles. For dynamics, Bio3D provides trajectory handling for beginning MD analysis and integrates normal mode methods (nma) including ensemble NMA to compare predicted flexibility across multiple structures or species. Higher-level analysis modules include correlation and protein structure network analysis, ensemble difference distance matrix (eDDM) comparisons, and conservation analysis linking sequence variation to structural dynamics. Typical use-cases include comparing conformational states from experimental structures (X‑ray/NMR) or MD simulations to identify conserved cores and flexible regions, mapping correlated motions and allosteric networks, and contrasting functional dynamics across homologous proteins using ensemble NMA. Example workflows are straightforward: fetch a structure with read.pdb("1hel"), select Cα atoms with atom.select.pdb(..., string = "calpha"), align and superpose multiple conformers, then run pca.xyz to extract collective modes or nma to compute theoretical fluctuation profiles. Bio3D vignettes and worked examples demonstrate extended applications such as ensemble NMA of E. coli DHFR, cross-species DHFR comparisons, protein structure network construction and eDDM analyses for functionally important conformational changes. Bio3D is cross-platform and distributed as platform-independent source on CRAN (and development versions on Bitbucket); a Windows binary is also provided. It runs inside R (R >= 3.1 recommended; recent vignettes use R 4.x) and installs like any R package. Some advanced features require optional external programs or R packages: MUSCLE or Clustal Omega for sequence alignment, DSSP/STRIDE for secondary structure annotation, NetCDF support for binary trajectory I/O, and visualization integrations with PyMOL or VMD for 3D inspection. A minimal Bio3D installation provides a large subset of functionality, while installing the listed external tools enables full trajectory I/O, secondary-structure parsing and richer visualization. Extensive documentation, help pages for every function, and multiple vignettes (Getting started, Trajectory analysis, Correlation network analysis, Normal mode analysis, eDDM, etc.) are bundled and available from the Bio3D website and package help, making it suitable for both interactive exploration and scripted pipelines. Bio3D is aimed at researchers who are comfortable with basic R usage and who want to combine structural bioinformatics with rigorous statistical analysis and publication-ready graphics. Its modular design—core bio3d functions plus separate packages for ensemble NMA, network analysis and web-app support—makes it adaptable for single-structure exploration up to large-scale comparative studies. The package is well-cited in the literature and supported by a series of tutorials and example datasets; users are also encouraged to contribute code and issues via the project repository for ongoing development.