AUGUSTUS

Background AUGUSTUS is a mature, widely used gene prediction suite developed by the Bioinformatics Group at the University of Greifswald. It implements a generalized hidden Markov model with an improved intron submodel and a flexible mechanism for incorporating extrinsic evidence. Because AUGUSTUS can be trained for species-specific parameters and can incorporate diverse evidence types such as EST/mRNA alignments, protein homology, RNA‑Seq-derived hints and comparative alignments, it frequently ranks among the most accurate ab initio gene finders for genomes it has been trained on. Core capabilities At its core, AUGUSTUS predicts exon–intron structures, transcripts and can report 5' and 3' UTRs (where trained). It can predict alternative splicing by reporting multiple transcript models per locus and provides per-transcript and per-feature probabilities. AUGUSTUS supports reporting suboptimal gene structures, enabling users to explore alternative models and tune sensitivity. The software accepts a variety of extrinsic “hints” (CDS/intron hints from alignments, spliced read evidence, protein-family profiles) to guide predictions and improve accuracy beyond pure ab initio calls. Advanced features and extensions AUGUSTUS includes a protein-profile extension (PPX) that uses protein-family conservation to identify gene family members and predict exon–intron boundaries guided by a block profile derived from multiple sequence alignments. For comparative gene prediction, AUGUSTUS can predict genes simultaneously in multiple genomes using whole-genome alignments and syntenic information. It also integrates RNA‑Seq data by consuming intron and exon hints derived from short-read alignments, improving detection of expressed isoforms. Training utilities and optimization scripts are provided to estimate model parameters from curated training sets and to tune meta-parameters (e.g., splice window sizes) for best performance. Typical use cases AUGUSTUS is commonly used for de novo annotation of newly assembled eukaryotic genomes, refinement of existing annotations by incorporating RNA‑Seq or protein evidence, and discovery of gene-family members using PPX. It is suitable for single-genome annotation projects as well as comparative annotations across multiple related genomes. Typical workflows include (1) preparing or curating a training set for a species, (2) running AUGUSTUS with extrinsic hints from ESTs, mRNA, proteins or RNA‑Seq, and (3) inspecting and exporting predicted models for downstream analyses such as functional annotation or visualization in genome browsers. Integration, deployment and outputs AUGUSTUS is open-source and hosted on GitHub; it is distributed under an Artistic License and packaged for common platforms (apt packages for Debian/Ubuntu). The project provides Docker and Singularity recipes for containerized deployments, and Windows users can run AUGUSTUS via WSL. A web interface (WebAUGUSTUS) and web services are available for training and prediction, and outputs can be visualized in genome browsers such as GBrowse or incorporated into automated annotation pipelines like BRAKER. AUGUSTUS produces standard annotation formats and detailed per-feature probabilities, making its results compatible with downstream annotation, comparative genomics and manual curation workflows. Provenance and support AUGUSTUS has a long publication history describing its HMM framework, intron submodel and strategies for incorporating external evidence, and it has been applied in many genome projects spanning fungi, plants, animals and parasites. The project provides documentation, example parameter sets for many species, and community support via its GitHub repository and web server documentation. Because prediction accuracy strongly depends on the quality of the training set, users are encouraged to iteratively curate training genes and to leverage available transcript and protein evidence to maximize annotation quality.

Links