Logo
Libraries and SDKs,  Bioinformatics Tools

HTSeq

Date Published

HTSeq is a mature, community-developed Python package for analysis of high-throughput sequencing (HTS) experiments. Originally described in Bioinformatics and continued in Putri et al. (Analysing high-throughput sequencing data in Python with HTSeq 2.0), HTSeq provides both a programmatic API and ready-to-use command-line utilities to perform common HTS tasks while remaining flexible for custom workflows. It is distributed under the GNU GPLv3 license and is maintained by contributors who focus on reproducible, scriptable analyses in the Python ecosystem. At its core, HTSeq focuses on reading and manipulating aligned sequencing data and performing reliable feature-level quantification. A widely used component is htseq-count, a script for quantifying gene expression by counting reads that overlap annotated genomic features — a standard step in bulk RNA-Seq pipelines. HTSeq also supports workflows specific to single-cell RNA-Seq: it offers functionality to count reads when cell barcodes and UMIs are present (including scripts such as htseq-count-barcodes), enabling deduplication and per-cell expression matrices from barcode-tagged reads. In addition to counting, HTSeq provides utilities for simple quality-assessment tasks (htseq-qa) and for handling various HTS file formats programmatically through its Python API. HTSeq is designed to be used both as a library for developers and as a set of command-line tools for analysts. The Python API lets you parse alignment files, iterate over features (from GFF/GTF-style annotations), and implement custom counting rules or filtering logic — useful when standard tools do not fit a particular experimental design. The command-line scripts provide convenient, reproducible entry points for common tasks (feature counting, barcode-aware counting, and basic QA), making it straightforward to incorporate HTSeq into analysis pipelines or to run single-step quantification on a per-sample basis. Installation and environment details are straightforward: HTSeq requires Python 3.7 or newer and is available on PyPI (with platform binaries for Linux and macOS). Some optional capabilities — for example, manipulating BigWig files — require additional dependencies. While HTSeq ships ready-to-run scripts for most routine tasks, a source package is available for development, and contributors are welcome. The project notes that Windows support is not officially maintained, though community contributions to expand platform support are encouraged. Common use-cases for HTSeq include: generating gene-level count tables from bulk RNA-Seq alignments, producing per-cell expression matrices from single-cell experiments with barcodes and UMIs, implementing custom exclusion/inclusion rules for reads and features, and integrating read-counting as a reproducible step in larger pipelines. Because HTSeq combines a well-documented Python API with command-line convenience, it is often embedded in automated workflows (for example, as a step in Snakemake or other pipeline managers) or used interactively in notebooks for exploratory analysis and method development. HTSeq integrates naturally into the Python-based genomics stack: it complements aligners and annotation resources by handling the read-to-feature mapping and counting step, and its output fits downstream differential-expression and single-cell analysis tools. Users are encouraged to cite the appropriate publications (including the HTSeq 2.0 paper) when using the software in published work. Documentation, API reference, tutorials, and installation instructions are available from the project website and Read the Docs pages, and the project welcomes contributions and issue reports from the community.