Logo
Workflow Orchestration

Common Workflow Language (CWL)

Date Published

Common Workflow Language (CWL) is a vendor-neutral open standard for describing how to run command-line tools and connect them into dataflow-style, batch workflows. Designed by a multi-vendor community and governed through an open process, CWL makes the inputs, outputs, and execution details of each step explicit so workflows can be shared, reused and reproduced across different execution environments. Its focus is on portable, declarative descriptions of tools and workflows rather than on a single runtime implementation. At its core CWL describes tools and workflows in a machine-readable YAML/JSON syntax with first-class support for inputs/outputs, types (including records and enums), optional parameters, and value transformations. Workflows express data dependencies rather than a strict linear order: steps run when their inputs are ready, enabling automatic parallelization and scatter/gather patterns without changing the underlying tool descriptions. CWL also supports inline JavaScript expressions and external libraries for flexible value computations, explicit file handling (renaming inputs/outputs, creating files at runtime), and containerized execution — most commonly Docker, with many runners able to execute Docker-format images using Singularity where needed. Because CWL is a specification rather than a single product, there is an ecosystem of implementations and tooling. The reference runner (cwltool) and other engines such as Toil, Arvados, and several commercial or institutional offerings can execute CWL descriptions and integrate with cluster schedulers, cloud providers and HPC systems. The standards have been adopted in bioinformatics, medical imaging, astronomy, high-energy physics and machine learning pipelines — any domain that benefits from batch, command-line-driven analyses. CWL's design emphasizes reusability: individual tool descriptions can be embedded in many workflows, and the same workflow file can run on a laptop, a shared cluster, a cloud batch service, or within managed platforms that support CWL. Typical use cases include complex sequencing pipelines where many command-line tools must be chained and parallelized; machine learning preprocessing and training workflows that need to run reproducibly on different hardware; and large-scale image or signal processing that must scale from single-machine tests to distributed runs on HPC. CWL helps capture provenance (which tools and versions were run, and how they were wired together), supports automation of repetitive analyses, and reduces friction when sharing workflows between collaborators or publishing methods. Practical features and FAQs in the CWL documentation address common needs: how to rename inputs/outputs, reference local scripts or containerized tools, make inputs optional, define mutually exclusive or dependent parameters with record types, and handle command-line argument construction using valueFrom. The docs also cover debugging inline JavaScript, running tools inside Docker and using Singularity with CWL runners, and tips for avoiding filename problems (e.g., spaces or hyphens). These examples and tutorials are supported by an active community, mailing list, Matrix chatroom, and a suite of example tool descriptions to help new users get started. CWL is open-source and community-led, with governance and contributions coordinated via public forums and GitHub repositories. Instructional material and much of the project's content are available under open licenses, and the project is hosted in a non-profit stewarding environment to maintain independence and openness. For teams that need an interoperable, declarative way to describe command-line batch analyses that will run reliably across infrastructure, CWL provides a mature, well-documented path to portability, reproducibility and scale.