Sambamba

Description

Sambamba is a suite of programs for users to quickly and efficiently process their high-throughput sequencing data. It is functionally similar to Samtools, but the source code is written in the D Language; it allows for faster performance while still being easy to use.

Supported commands:

markdup

markdup

This module parses key phrases in the output log files to find duplicate + unique reads and then calculates duplicate rate per sample. It will will work for both single and paired-end data. The absolute number of reads by type are displayed in a stacked bar plot, and duplicate rates are in the general statistics table.

Duplicate rates are calculated as follows:

Paired end

duplicate_rate = duplicateReads / (sortedEndPairs * 2 + singleEnds - singleUnmatchedPairs) * 100

Single end

duplicate_rate = duplicateReads / singleEnds * 100

If Sambamba Markdup is invoked using Snakemake, the following bare-bones rule should work fine:

rule markdup:
  input:
    "data/align/{sample}.bam"
  output:
    "data/markdup/{sample}.markdup.bam"
  log:
    "data/logs/{sample}.log"
  shell:
    "sambamba markdup {input} {output} > {log} 2>&1"

File search patterns

sambamba/markdup:
  contents: finding positions of the duplicate reads in the file
  num_lines: 50

Supported Tool

Description

markdup

Paired end

Single end

File search patterns