I don't get why you claim something like airflow doesn't bridge the gap well wit...

lebovic · on July 16, 2023

The bioinformatics workflow managers are designed around the quirkiness of bioinformatics, and they remove a lot of boilerplate. That makes them easier to grok for someone who doesn't have a strong programming background, at the cost of some flexibility.

Some features that bridge the gap:

1. Command-line tools are often used in steps of a bioinformatics pipeline. The workflow managers expect this and make them easier to use (e.g. https://github.com/snakemake/snakemake-wrappers).

2. Using file I/O to explicitly construct a DAG is built-in, which seems easier to understand for researchers than constructing DAGs from functions.

3. Built-in support for executing on a cluster through something like SLURM.

4. Running "hacky" shell or R scripts in steps of the pipeline is well-supported. As an aside, it's surprising how often a mis-implemented subprocess.run() or os.system() call causes issues.

5. There's a strong community building open-source bioinformatics pipelines for each workflow manager (e.g. nf-core, warp, snakemake workflows).

Airflow – and the other less ___domain-specific workflow managers – are arguably better for people who have a stronger software engineering basis. For someone who moved wet lab to dry lab and is learning to code on the side, I think the bioinformatics workflow managers lower the barrier to entry.

geoffjentry · on July 17, 2023

> are arguably better for people who have a stronger software engineering basis

As someone who is a software developer in the bioinformatics space (as opposed to the other way around) and have spent over 10 years deep in the weeds of both the bioinformatics workflow engines as well as more standard ones like Airflow - I still would reach for a bioinfx engine for that ___domain.

But - what I find most exciting is a newer class of workflow tools coming out that appear to bridge the gap, e.g. Dagster. From observation it seems like a case of parallel evolution coming out of the ML/etc world where the research side of the house has similar needs. But either way, I could see this space pulling eyeballs away from the traditional bioinformatics workflow world.

jghn · on July 16, 2023

The problem with Airflow is that each step of the DAG for a bioinformatics workflow is generally going to be running a command line tool. And it'll expect files to have been staged in and living in the exact right spot. And it'll expect files to have been staged out from the exact right spot.

This can all be done with Airflow, but the bioinformatics workflow engines understand that this is a first class use case for these users, and make it simpler.