Sambamba is a suite of programs for users to quickly and efficiently process their high-throughput sequencing data. It is functionally similar to Samtools, but the source code is written in the D Language; it allows for faster performance while still being easy to use.

Supported commands:

  • markdup


This module parses key phrases in the output log files to find duplicate + unique reads and then calculates duplicate rate per sample. It will will work for both single and paired-end data. The absolute number of reads by type are displayed in a stacked bar plot, and duplicate rates are in the general statistics table.

Duplicate rates are calculated as follows:

Paired end

duplicate_rate = duplicateReads / (sortedEndPairs * 2 + singleEnds - singleUnmatchedPairs) * 100

Single end

duplicate_rate = duplicateReads / singleEnds * 100

If Sambamba Markdup is invoked using Snakemake, the following bare-bones rule should work fine:

rule markdup:
    "sambamba markdup {input} {output} > {log} 2>&1"

File search patterns

  contents: finding positions of the duplicate reads in the file
  num_lines: 50