Logo
Workflow Orchestration

Cromwell

Date Published

Cromwell is a mature, open‑source Workflow Management System originally developed at the Broad Institute to run scientific and bioinformatics pipelines. It is purpose‑built to execute workflows written in the Workflow Description Language (WDL) and to translate WDL source into an internal Workflow Object Model (WOM) for scheduling, expression evaluation and job orchestration. Cromwell focuses on reproducibility, portability and reliable execution of complex, branching workflows common to genomics, imaging and other data‑intensive research. Architecturally, Cromwell provides a WDL parser, graph construction and a runtime engine that evaluates WDL expressions, expands subworkflows, resolves call inputs/outputs and maps logical tasks to platform‑specific jobs. The engine includes abstractions such as workflow, subworkflow and job stores, plus a job key/value store for tracking state and metadata during execution. Developers can extend Cromwell with custom engine functions and benefit from detailed expression evaluation and WOM debugging aids when authoring or troubleshooting WDL workflows. Cromwell supports multiple execution backends so the same WDL workflow can run on a laptop, on HPC clusters, or at cloud scale. Official and community backends include Google Cloud (with documentation for migrating to GCP Batch), Microsoft Azure (notably via the CromwellOnAzure project), and an AWS backend labelled beta. Users can deploy Cromwell as a downloadable JAR or a Docker image for self‑managed instances, or run WDL workflows through managed platforms such as Terra which embed Cromwell for a turnkey experience. Backend maintenance and development effort vary by demand, with core teams prioritizing commonly used cloud integrations and the community contributing support for other environments. Typical use cases for Cromwell include large‑scale genomic pipelines (alignment, variant calling, joint genotyping), batch image or signal processing workflows, and multi‑step analyses where provenance and reproducibility are critical. Teams use Cromwell to standardize pipeline execution across environments, move from exploratory scripts to productionized workflows, and to leverage cloud elasticity for parallel job bursts. Operationally, users submit WDL plus inputs and choose a backend configuration; Cromwell handles task scheduling, retries, logging and file staging to backends, helping simplify lifecycle tasks like resuming failed runs, capturing metadata and exporting execution logs for auditing. Cromwell is BSD 3‑Clause licensed and backed by documentation, tutorials and an active community ecosystem. New users can follow quick‑start tutorials in the documentation site and join community Slack channels for help. The project is maintained on GitHub where the team accepts reproducible bug reports; security issues have a dedicated contact. For teams that need enterprise support or a managed service, platforms such as Terra provide a hosted Cromwell experience. Contributors are encouraged via the project’s contribution guide, and a growing set of ecosystem projects extend Cromwell’s capabilities for diverse scientific workloads.