Welcome to the MultiQC docs.

These docs are bundled with the MultiQC download for your convenience, so you can also read in your installation or on Github.

Table of Contents

Discuss on Gitter

Back to top

Using MultiQC


MultiQC is a reporting tool that parses summary statistics from results and log files generated by other bioinformatics tools. MultiQC doesn't run other tools for you - it's designed to be placed at the end of analysis pipelines or to be run manually when you've finished running your tools.

When you launch MultiQC, it recursively searches through any provided file paths and finds files that it recognises. It parses relevant information from these and generates a single stand-alone HTML report file. It also saves a directory of data files with all parsed data for further downstream use.

Installing MultiQC

System Python

Before we start - a quick note that using the system-wide installation of Python is not recommended. This often causes problems and it's a little risky to mess with it. If you find yourself prepending sudo to any MultiQC commands, take a step back and think about Python virtual environments / conda instead (see below).

Installing Python

To see if you have python installed, run python --version on the command line. MultiQC needs Python version 2.7+, 3.4+ or 3.5+.

We recommend using virtual environments to manage your Python installation. Our favourite is conda, a cross-platform tool to manage Python environments. You can installation instructions for Miniconda here.

Once conda is installed, you can create a Python environment with the following commands:

conda create --name py3.7 python=3.7
conda activate py3.7

You'll want to add the conda activate py3.7 line to your .bashrc file so that the environment is loaded every time you load the terminal.

Installing with conda

If you're using conda as described above, you can install MultiQC from the bioconda channel as follows:

conda install -c bioconda -c conda-forge multiqc

Please see the Bioconda documentation for more details.

Installation with pip

This is the easiest way to install MultiQC. pip is the package manager for the Python Package Manager. It comes bundled with recent versions of Python, otherwise you can find installation instructions here.

You can now install MultiQC from PyPI as follows:

pip install multiqc

If you would like the development version, the command is:

pip install git+https://github.com/ewels/MultiQC.git

Note that if you have problems with read-only directories, you can install to your home directory with the --user parameter (though it's probably better to use virtual environments, as described above).

pip install --user multiqc

Manual installation

If you'd rather not use either of these tools, you can clone the code and install the code yourself:

git clone https://github.com/ewels/MultiQC.git
cd MultiQC
pip install .

git not installed? No problem - just download the flat files:

curl -LOk https://github.com/ewels/MultiQC/archive/master.zip
unzip master.zip
cd MultiQC-master
pip install .

Note that it is not recommended to use the command python setup.py install as this has been superseded by pip and does not correctly handle some package management, such as pre-releases.

Updating MultiQC

You can update MultiQC from PyPI at any time by running the following command:

pip install --upgrade multiqc

To update the development version, use:

pip install --force git+https://github.com/ewels/MultiQC.git

If you cloned the git repo, just pull the latest changes and install:

cd MultiQC
git pull
pip install .

If you downloaded the flat files, just repeat the installation procedure.

Using a specific python interpreter

If you prefer, you can also run MultiQC with a specific python interpreter. The command line usage and flags are then exactly the same as if you ran just multiqc.

For example:

python -m multiqc .
python3 -m multiqc .
~/my_env/bin/python -m multiqc .

Using with a Python script

You can import and run MultiQC from within a Python script, using the multiqc.run() function as follows:

import multiqc

Installing on Windows

MultiQC is has primarily been designed for us on Unix systems (Linux, Mac OSX). However, it should work on Windows too. Indeed, automated continuous integration tests run using GitHub Actions to check compatibility (see test config here).

Note that support for using the base multiqc command was improved in MultiQC version 1.8.

Using the Docker container

A Docker container is provided on Docker Hub called ewels/MultiQC. It's based on czentye/matplotlib-minimal to give the smallest size I could manage (~80MB).

To use, call the docker run with your current working directory mounted as a volume and working directory:

docker run -v `pwd`:`pwd` -w `pwd` ewels/multiqc

By default, docker will use the :latest tag. For MultiQC, this is set to be the most recent release. To use the most recent development code, use ewels/multiqc::dev. You can also specify specific versions, eg: ewels/multiqc:1.9.

You can also specify additional MultiQC parameters as normal:

docker run -v `pwd`:`pwd` -w `pwd` ewels/multiqc . --title "My amazing report" -b "This was made with docker"

Note that all files on the command line (eg. config files) must be mounted in the docker container to be accessible. For more help, look into the Docker documentation

Using Singularity

Although there is no dedicated Singularity image available for MultiQC, you can use the above Docker container.

First, build a singularity container image from the docker image (where 1.9 is the MultiQC version):

singularity build multiqc-1.9.sif docker://ewels/multiqc:1.9

Then, use singularity run to run the image with the normal MultiQC arguments:

singularity run multiqc-1.9.sif my_results/ --title "Report made using Singularity"

Import errors with Singularity

Sometimes, Singularity can be over-ambitious with sharing file paths which can result in the Python environment in your local system interacting with Python inside the image. This can give rise to ImportError errors for numpy and other packages.

The giveaway for when this is the problem is that traceback will list python package paths which are on your system and look different that of MultiQC inside the container (eg. /usr/lib/python3.8/site-packages/multiqc/).

To fix this, run the command export PYTHONNOUSERSITE=1 before running MultiQC. This variable tells Python not to add site-packages to the system path when loading, which should avoid the conflicts.

Python 2

As of MultiQC version 1.9, Python 2 is no longer officially supported. Automatic CI tests will no longer run with Python 2 and Python 2 specific workarounds are no longer guaranteed.

Whilst it may be possible to continue using MultiQC with Python 2 for a short time by pinning dependencies, MultiQC compatibility for Python 2 will now slowly drift and start to break. If you haven't already, you need to switch to Python 3 now.

Python 2 had its official sunset date on January 1st 2020, meaning that it will no longer be developed by the Python community. Part of the python.org statement reads:

That means that we will not improve it anymore after that day, even if someone finds a security problem in it. You should upgrade to Python 3 as soon as you can.

Very many Python packages no longer support Python 2 and it whilst the MultiQC code is currently compatible with both Python 2 and Python 3, it is increasingly difficult to maintain compatibility with the dependency packages it uses, such as MatPlotLib, numpy and more.

Using MultiQC through Galaxy

On the main Galaxy instance

The easiest and fast manner to use MutliQC is to use the usegalaxy.org main Galaxy instance where you will find MultiQC Galaxy tool under the NGS: QC and manipualtion tool panel section.

On your instance

You can install MultiQC on your own Galaxy instance through your Galaxy admin space, searching on the main Toolshed for the MultiQC repository available under the visualization, statistics and Fastq Manipulation sections.

Installing from FreeBSD

If you're using FreeBSD you can install MultiQC via the FreeBSD ports system:

pkg install py36-multiqc

(or py27-multiqc, py37-multiqc, or any other currently mainstream python version).

This will install a prebuilt binary using only highly-portable optimizations, much like apt, yum, etc.

FreeBSD ports can also be built and installed from source:

cd /usr/ports/biology/py-multiqc
make install

To report issues with a FreeBSD port, please submit a PR on the FreeBSD bug reports page. For more information, visit https://www.freebsd.org/ports/

Installing as an environment module

Many people using MultiQC will be working on a HPC environment. Every server / cluster is different, and you're probably best off asking your friendly sysadmin to install MultiQC for you. However, with that in mind, here are a few general tips for installing MultiQC into an environment module system:

MultiQC comes in two parts - the multiqc python package and the multiqc executable script. The former must be available in $PYTHONPATH and the script must be available on the $PATH.

A typical installation procedure with an environment module Python install might look like this: (Note that $PYTHONPATH must be defined before pip installation.)

module load python/2.7.6
mkdir $INST
export PYTHONPATH=$INST/lib/python2.7/site-packages
pip install --install-option="--prefix=$INST" multiqc

Once installed, you'll need to create an environment module file. Again, these vary between systems a lot, but here's an example:

## MultiQC

set components [ file split [ module-info name ] ]
set version [ lindex $components 1 ]
set modroot /path/to/software/multiqc/$version

proc ModulesHelp { } {
    global version modroot
    puts stderr "\tMultiQC - use MultiQC $version"
    puts stderr "\n\tVersion $version\n"
module-whatis   "Loads MultiQC environment."

# load required modules
module load python/2.7.6

# only one version at a time
conflict multiqc

# Make the directories available
prepend-path    PATH        $modroot/bin
prepend-path    PYTHONPATH  $modroot/lib/python2.7/site-packages

Running MultiQC

Once installed, just go to your analysis directory and run multiqc, followed by a list of directories to search. At it's simplest, this can just be . (the current working directory):

multiqc .

That's it! MultiQC will scan the specified directories and produce a report based on details found in any log files that it recognises.

See Using MultiQC Reports for more information about how to use the generated report.

For a description of all command line parameters, run multiqc --help.

Choosing where to scan

You can supply MultiQC with as many directories or files as you like. Above, we supply . - just the current directory, but all of these would work too:

multiqc data/
multiqc data/ ../proj_one/analysis/ /tmp/results
multiqc data/*_fastqc.zip
multiqc data/sample_1*

You can also ignore files using the -x/--ignore flag (can be specified multiple times). This takes a string which it matches using glob expansion to filenames, directory names and entire paths:

multiqc . --ignore *_R2*
multiqc . --ignore run_two/
multiqc . --ignore */run_three/*/fastqc/*_R2.zip

Some modules get sample names from the contents of the file and not the filename (for example, stdout logs can contain multiple samples). In this case, you can skip samples by name instead:

multiqc . --ignore-samples sample_3*

These strings are matched using glob logic (* and ? are wildcards).

All of these settings can be saved in a MultiQC config file so that you don't have to type them on the command line for every run.

Finally, you can supply a file containing a list of file paths, one per row. MultiQC only search the listed files.

multiqc --file-list my_file_list.txt

Renaming reports

The report is called multiqc_report.html by default. Tab-delimited data files are created in multiqc_data/, containing additional information. You can use a custom name for the report with the -n/--filename parameter, or instruct MultiQC to create them in a subdirectory using the -o/-outdir parameter.

Note that different MultiQC templates may have different defaults.

Overwriting existing reports

It's quite common to repeatedly create new reports as new analysis results are generated. Instead of manually deleting old reports, you can just specify the -f parameter and MultiQC will overwrite any conflicting report filenames.

Sample names prefixed with directories

Sometimes, the same samples may be processed in different ways. If MultiQC finds log files with the same sample name, the previous data will be overwritten (this can be inspected by running MultiQC with -v/--verbose).

To avoid this, run MultiQC with the -d/--dirs parameter. This will prefix every sample name with the directory path for that log file. As such, sample names should now be unique, and not overwrite one-another.

By default, --dirs will prepend the entire path to each sample name. You can choose which directories are added with the -dd/--dirs-depth parameter. Set to a positive integer to use that many directories at the end of the path. A negative integer takes directories from the start of the path.

For example:

$ multiqc -d .
# analysis_1 | results | type | sample_1 | file.log
# analysis_2 | results | type | sample_2 | file.log
# analysis_3 | results | type | sample_3 | file.log

$ multiqc -d -dd 1 .
# sample_1 | file.log
# sample_2 | file.log
# sample_3 | file.log

$ multiqc -d -dd -1 .
# analysis_1 | file.log
# analysis_2 | file.log
# analysis_3 | file.log

Using different templates

MultiQC is built around a templating system. You can produce reports with different styling by using the -t/--template option. The available templates are listed with multiqc --help.

If you're interested in creating your own custom template, see the writing new templates section.

PDF Reports

Whilst HTML is definitely the format of choice for MultiQC reports due to the interactive features that it can offer, PDF files are an integral part of some people's workflows. To try to accommodate this, MultiQC has a --pdf command line flag which will try to create a PDF report for you.

To do this, MultiQC uses the simple template. This uses flat plots, has no navigation or toolbar and strips out all JavaScript. The resulting HTML report is pretty basic, but this simplicity is helpful when generating PDFs.

Once the report is generated MultiQC attempts to call Pandoc, a command line tool able to convert documents between different file formats. You must have Pandoc already installed for this to work. If you don't have Pandoc installed, you will get an error message that looks like this:

Error creating PDF - pandoc not found. Is it installed? http://pandoc.org/

Please note that Pandoc is a complex tool and uses LaTeX / XeLaTeX for PDF generation. Please make sure that you have the latest version of Pandoc and that it can successfully convert basic HTML files to PDF before reporting and errors. Also note that not all plots have flat image equivalents, so some will be missing (at time of writing: FastQC sequence content plot, beeswarm dot plots, heatmaps).

Printing to stdout

If you would like to generate MultiQC reports on the fly, you can print the output to standard out by specifying -n stdout. Note that the data directory will not be generated and the template used must create stand-alone HTML reports.

Parsed data directory

By default, MultiQC creates a directory alongside the report containing tab-delimited files with the parsed data. This is useful for downstream processing, especially if you're running MultiQC with very large numbers of samples.

Typically, these files are tab-delimited tables. However, you can get JSON or YAML output for easier downstream parsing by specifying -k/--data-format on the command line or data_format in your configuration file.

You can also choose whether to produce the data by specifying either the --data-dir or --no-data-dir command line flags or the make_data_dir variable in your configuration file. Note that the data directory is never produced when printing the MultiQC report to stdout.

To zip the data directory, use the -z/--zip-data-dir flag.

Exporting Plots

In addition to the HTML report, it's also possible to get MultiQC to save plots as stand alone files. You can do this with the -p/--export command line flag. By default, plots will be saved in a directory called multiqc_plots as .png, .svg and .pdf files. Raw data for the plots are also saved to files.

You can instruct MultiQC to always do this by setting the export_plots config option to true, though note that this will add a few seconds on to execution time. The plots_dir_name changes the default directory name for plots and the export_plot_formats specifies what file formats should be created (must be supported by MatPlotLib).

Note that not all plot types are yet supported, so you may find some plots are missing.

Note: You can always save static image versions of plots from within MultiQC reports, using the Export toolbox in the side bar.

Choosing which modules to run

Sometimes, it's desirable to choose which MultiQC modules run. This could be because you're only interested in one type of output and want to keep the reports small. Or perhaps the output from one module is misleading in your situation.

You can do this by using -m/--modules to explicitly define which modules you want to run. Alternatively, use -e/--exclude to run all modules except those listed.

You can get a group of modules by using --tag followed by a tag e.g. RNA or DNA.

Using MultiQC Reports

Once MultiQC has finished, you should have a HTML report file called multiqc_report.html (or something similar, depending on how you ran MultiQC). You can launch this report with open multiqc_report.html on the command line, or double clicking the file in a file browser.

Browser compatibility

MultiQC reports should work in any modern browser. They have been tested using OSX Chrome, Firefox and Safari. If you find any report bugs, please report them as a GitHub issue.

Report layout

MultiQC reports have three main page sections:

  • The navigation menu (left side)
    • Links to the different module sections in the report
    • Click the logo to go to the top of the page
  • The toolbox (right side)
    • Contains various tools to modify the report data (see below)
  • The report (middle)
    • This is what you came here for, the data!

Note that if you're viewing the report on a mobile device / small window, the content will be reformatted to fit the screen.

General Statistics table

At the top of every MultiQC report is the 'General Statistics' table. This shows an overview of key values, taken from all modules. The aim of the table is to bring together stats for each sample from across the analysis so that you can see it in one place.

Hovering over column headers will show a longer description, including which module produced the data. Clicking a header will sort the table by that value. Clicking it again will change the sort direction. You can shift-click multiple headers to sort by multiple columns.

sort column

Above the table there is a button called 'Configure Columns'. Clicking this will launch a modal window with more detailed information about each column, plus options to show/hide and change the order of columns.

configure columns


MultiQC modules can take plot more extensive data in the sections below the general statistics table.

Interactive plots

Plots in MultiQC reports are usually interactive, using the HighCharts JavaScript library.

You can hover the mouse over data to see a tooltip with more information about that dataset. Clicking and dragging on line graphs will zoom into that area.

plot zoom

To reset the zoom, use the button in the top right:

reset zoom

Plots have a grey bar along their base; clicking and dragging this will resize the plot's height:

plot zoom

You can force reports to use interactive plots instead of flat by specifying the --interactive command line option (see below).

Flat plots

Reports with large numbers of samples may contain flat plots. These are rendered when the MultiQC report is generated using MatPlotLib and are non-interactive (flat) images within the report. The reason for generating these is that large sample numbers can make MultiQC reports very data-intensive and unresponsive (crashing people's browsers in extreme cases). Plotting data in flat images is scalable to any number of samples, however.

Flat plots in MultiQC have been designed to look as similar to their interactive versions as possible. They are also copied to multiqc_data/multiqc_plots

You can force reports to use flat plots with the --flat command line option.

See the Large sample numbers section of the Configuring MultiQC docs for more on how to customise the flat / interactive plot behaviour.

Exporting plots

If you want to use the plot elsewhere (eg. in a presentation or paper), you can export it in a range of formats. Just click the menu button in the top right of the plot:

plot zoom

This opens the MultiQC Toolbox Export Plots panel with the current plot selected. You have a range of export options here. When deciding on output format bear in mind that SVG is a vector format, so can be edited in tools such as Adobe Illustrator or the free tool Inkscape. This makes it ideal for use in publications and manual customisation / annotation. The Plot scaling option changes how large the labels are relative to the plot.

Dynamic plots

Some plots have buttons above them which allow you to change the data that they show or their axis. For example, many bar plots have the option to show the data as percentages instead of counts:

percentage button


MultiQC reports come with a 'toolbox', accessible by clicking the buttons on the right hand side of the report:

toolbox buttons

Active toolbox panels have their button highlighted with a blue outline. You can hide the toolbox by clicking the open panel button a second time, or pressing Escape on your keyboard.

Highlight Samples

If you run MultiQC plots with a lot of samples, plots can become very data-heavy. This makes it difficult to find specific samples, or subsets of samples.

To help with this, you can use the Highlight Samples tool to colour datasets of interest. Simply enter some text which will match the samples you want to highlight and press enter (or click the add button). If you like, you can also customise the highlight colour.

toolbox highlight

To make it easier to match groups of samples, you can use a regular expressions by turning on 'Regex mode'. You can test regexes using a nice tool at regex101.com. See a nice introduction to regexes here. Note that pattern delimiters are not needed (use pattern, not /pattern/).

Here, we highlight any sample names that end in _1:

highligh regex

Note that a new button appears above the General Statistics table when samples are highlighted, allowing you to sort the table according to highlights.

Search patterns can be changed after creation, just click to edit. To remove, click the grey cross on the right hand side.

Searching for an empty string will match all samples.

Renaming Samples

Sample names are typically generated based on processed file names. These file names are not always informative. To help with this, you can do a search and replace within sample names. Here, we remove the SRR1067 and _1 parts of the sample names, which are the same for all samples:

rename samples

Again, regular expressions can be used. See above for details. Note that regex groups can be used - define a group match with parentheses and use the matching value with $1, $2 etc. For example - a search string SRR283(\d{3}) and replace string $1_SRR283 would move the final three digits of matching sample names to the start of the name.

Often, you may have a spreadsheet with filenames and informative sample names. To avoid having to manually enter each name, you can paste from a spreadsheet using the 'bulk import' tool:

bulk rename

Hiding Samples

Sometimes, you want to focus on a subset of samples. To temporarily hide samples from the report, enter a search string as described above into the 'Hide Samples' toolbox panel.

Here, we hide all samples with _trimmed in their sample name: (Note that plots will tell you how many samples have been hidden)

hide samples


This panel allows you to download MultiQC plots as images or as raw data. You can configure the size and characteristics of exported plot images: Width and Height set the output size of the images, scale sets how "zoomed-in" they should look (typically you want the plot to be more zoomed for printing). The tick boxes below these settings allow you to download multiple plots in one go.

Plots with multiple tabs will all be exported as files when using the Data tab. For plots with multiple tags, the currently visible plot will be exported.

Note: You can also save static plot images when you run MultiQC. See Exporting Plots for more information.

Save Settings

To avoid having to re-enter the same toolbox setup repeatedly, you can save your settings using the 'Save Settings' panel. Just pick a name and click save. To load, choose your set of settings and press load (or delete). Loaded settings are applied on top of current settings. All configs are saved in browser local storage - they do not travel with the report and may not work in older browsers.

Configuring MultiQC

Whilst most MultiQC settings can be specified on the command line, MultiQC is also able to parse system-wide and personal config files. At run time, it collects the configuration settings from the following places in this order (overwriting at each step if a conflicting config variable is found):

  1. Hardcoded defaults in MultiQC code
  2. System-wide config in <installation_dir>/multiqc_config.yaml
    • Manual installations only, not pip or conda
  3. User config in ~/.multiqc_config.yaml
  4. File path set in environment variable MULTIQC_CONFIG_PATH
    • For example, define this in your ~/.bashrc file and keep the file anywhere you like
  5. Config file in the current working directory: multiqc_config.yaml
  6. Config file paths specified in the command with --config / -c
    • You can specify multiple files like this, they can have any filename.
  7. Command line config (--cl_config)
  8. Specific command line options (e.g. --force)

Sample name cleaning

MultiQC typically generates sample names by taking the input or log file name, and 'cleaning' it. To do this, it uses the fn_clean_exts settings and looks for any matches. If it finds any matches, everything to the right is removed. For example, consider the following config:

    - '.gz'
    - '.fastq'

This would make the following sample names:

mysample.fastq.gz  ->  mysample
secondsample.fastq.gz_trimming_log.txt  ->  secondsample
thirdsample.fastq_aligned.sam.gz  ->  thirdsample

There is also a config list called fn_clean_trim which just removes strings if they are present at the start or end of the sample name.

Usually you don't want to overwrite the defaults (though you can). Instead, add to the special variable names extra_fn_clean_exts and extra_fn_clean_trim:

    - '.myformat'
    - '_processedFile'
    - '#'
    - '.myext'

Other search types

File name cleaning can also take strings to remove (instead of removing with truncation). Also regex strings can be supplied to match patterns and remove or keep matching substrings.

truncate (default)

If you just supply a string, the default behavior is similar to "trim". The filename will be truncated beginning with the matching string.

    - '.fastq'

The above is equivalent to the more explicit:

    - type: 'truncate'
      pattern: '.fastq'

This rule would produce the following sample names:

mysample.fastq.gz  ->  mysample
thirdsample.fastq_aligned.sam.gz  ->  thirdsample

remove (formerly replace)

The remove type allows you to remove the exact match from the filename.

    - type: remove
      pattern: .sorted

This rule would produce the following sample names:

secondsample.sorted.deduplicated  ->  secondsample.deduplicated


You can also remove a substring with a regular expression. Here's a good resource to interactively try it out.

    - type: regex
      pattern: '^processed.'

This rule would produce the following sample names:

processed.thirdsample.processed  ->  thirdsample.processed


If you'd rather like to keep the match of a regular expression you can use the regex_keep type. This simplifies things if you can e.g. directly target samples names.

    - type: regex_keep
      pattern: '[A-Z]{3}[1-9]{2}'

This rule would produce the following sample names:

merged.recalibrated.XZY97.alignment.bam  ->  XZY97


This key will tell MultiQC to only apply the pattern to a specific MultiQC module. This should be a string that matches the module's anchor - the #module bit when you click the main module heading in the sidebar (remove the #).

For example, to truncate all sample names to 5 characters for just Kallisto:

    - type: regex_keep
      pattern: '^.{5}'
      module: kallisto

You can also supply a list of multiple module anchors if you wish:

    - type: regex_keep
      pattern: '^.{5}'
        - kallisto
        - cutadapt

Clashing sample names

This process of cleaning sample names can sometimes result in exact duplicates. A duplicate sample name will overwrite previous results. Warnings showing these events can be seen with verbose logging using the --verbose/-v flag, or in multiqc_data/multiqc.log.

Problems caused by this will typically be discovered be fewer results than expected. If you're ever unsure about where the data from results within MultiQC reports come from, have a look at multiqc_data/multiqc_sources.txt, which lists the path to the file used for every section of the report.

Directory names

One scenario where clashing names can occur is when the same file is processed in different directories. For example, if sample_1.fastq is processed with four sets of parameters in four different directories, they will all have the same name - sample_1. Only the last will be shown. If the directories are different, this can be avoided with the --dirs/-d flag.

For example, given the following files:

├── analysis_1
│   └── sample_1.fastq.gz.aligned.log
├── analysis_2
│   └── sample_1.fastq.gz.aligned.log
└── analysis_3
    └── sample_1.fastq.gz.aligned.log

Running multiqc -d . will give the following sample names:

analysis_1 | sample_1
analysis_2 | sample_1
analysis_3 | sample_1

Filename truncation

If the problem is with filename truncation, you can also use the --fullnames/-s flag, which disables all sample name cleaning. For example:

├── sample_1.fastq.gz.aligned.log
└── sample_1.fastq.gz.subsampled.fastq.gz.aligned.log

Running multiqc -s . will give the following sample names:


You can turn off sample name cleaning permanently by setting fn_clean_sample_names to false in your config file.

Module search patterns

Many bioinformatics tools have standard output formats, filenames and other signatures. MultiQC uses these to find output; for example, the FastQC module looks for files that end in _fastqc.zip.

This works well most of the time, until someone has an automated processing pipeline that renames things. For this reason, as of version v0.3.2 of MultiQC, the file search patterns are loaded as part of the main config. This means that they can be overwritten in <installation_dir>/multiqc_config.yaml or ~/.multiqc_config.yaml. So if you always rename your _fastqc.zip files to _qccheck.zip, MultiQC can still work.

To see the default search patterns, see the search_patterns.yaml file. Copy the section for the program that you want to modify and paste this into your config file. Make sure you make it part of a dictionary called sp as follows:

        fn: _mysearch.txt

Search patterns can specify a filename match (fn) or a file contents match (contents), as well as a number of additional search keys. See below for the full reference.

Ignoring Files

MultiQC begins by indexing all of the files that you specified and building a list of the ones it will use. You can specify files and directories to skip on the command line using -x/--ignore, or for more permanent memory, with the following config file options: fn_ignore_files, fn_ignore_dirs and fn_ignore_paths (the command line option simply adds to all of these).

For example, given the following files:

├── analysis_1
│   └── sample_1.fastq.gz.aligned.log
├── analysis_2
│   └── sample_1.fastq.gz.aligned.log
└── analysis_3
    └── sample_1.fastq.gz.aligned.log

You could specify the following relevant config options:

    - '*.log'
    - 'analysis_1'
    - 'analysis_2'
    - '*/analysis_*/sample_1*'

Note that the searched file paths will usually be relative to the working directory and can be highly variable, so you'll typically want to start patterns with a * to match any preceding directory structure.

Ignoring samples

Some modules get sample names from the contents of the file and not the filename (for example, stdout logs can contain multiple samples). You can skip samples by their resolved sample names (after cleaning) with two config options: sample_names_ignore and sample_names_ignore_re. The first takes a list of strings to be used for glob pattern matching (same behaviour as the command line option --ignore-samples), the latter takes a list of regex patterns. For example:

    - 'SRR*'
    - '^SR{2}\d{7}_1$'

Large sample numbers

MultiQC has been written with the intention of being used for any number of samples. This means that it should work well with 6 samples or 6000. Very large sample numbers are becoming increasingly common, for example with single cell data.

Producing reports with data from many hundreds or thousands of samples provides some challenges, both technically and also in terms of data visualisation and report usability.

Disabling on-load plotting

One problem with large reports is that the browser can hang when the report is first loaded. This is because it loading and processing the data for all plots at once. To mitigate this, large reports may show plots as grey boxes with a "Show Plot" button. Clicking this will render the plot as normal and prevents the browser from trying to do everything at once.

By default this behaviour kicks in when a plot has 50 samples or more. This can be customised by changing the num_datasets_plot_limit config option.

Flat / interactive plots

Reports with many samples start to need a lot of data for plots. This results in inconvenient report file sizes (can be 100s of megabytes) and worse, web browser crashes. To allow MultiQC to scale to these sample numbers, most plot types have two plotting functions in the code base - interactive (using HighCharts) and flat (rendered with MatPlotLib). Flat plots take up the same disk space irrespective of sample number and do not consume excessive resources to display.

By default, MultiQC generates flat plots when there are 100 or more samples. This cutoff can be changed by changing the plots_flat_numseries config option. This behaviour can also be changed by running MultiQC with the --flat / --interactive command line options or by setting the plots_force_flat / plots_force_interactive config options to True.

Tables / Beeswarm plots

Report tables with thousands of samples (table rows) can quickly become impossible to use. To avoid this, tables with large numbers of rows are instead plotted as a Beeswarm plot (aka. a strip chart / jitter plot). These plots have fixed dimensions with any number of samples. Hovering on a dot will highlight the same sample in other rows.

By default, MultiQC starts using beeswarm plots when a table has 500 rows or more. This can be changed by setting the max_table_rows config option.

Coloured log output

As of MultiQC version 1.8, log output is coloured using the coloredlogs Python package. The code attempts to detect if the logs on the terminal are being redirected to a file or piped to another tool and will disable colours if so. If the colours annoy you or you're ending up with weird characters in your MultiQC output, you can disable this feature with the command line flag --no-ansi. Sadly it's not possible to set this in a config file, as the logger is initilised before configs are loaded.

Command-line config

Sometimes it's useful to specify a single small config option just once, where creating a config file for the occasion may be overkill. In these cases you can use the --cl_config option to supply additional config values on the command line.

Config variables should be given as a YAML string. You will usually need to enclose this in quotes. If MultiQC is unable to understand your config you will get an error message saying Could not parse command line config.

As an example, the following command configures the coverage levels to use for the Qualimap module: (as described in the docs)

multiqc ./datadir --cl_config "qualimap_config: { general_stats_coverage: [20,40,200] }"

Optimising run-time

Usually, MultiQC run time is fairly insignificant - in the order of seconds. Unless you are running MultiQC on many thousands of analysis files, the optimisations described below will have limited practical benefit.

In other words, if you're running with 15 RNAseq samples, you may as well save yourself some time and stick with the defaults.

Profile your MultiQC run time

As of version 1.9, MultiQC has a command line option to profile what it spends its time doing: --profile-runtime (config.profile_runtime). Whilst you're working with writing your pipeline / setting up your analysis, you can specify and MultiQC will add a section to the bottom of your report describing how much time it spent searching files and what it did with those files. You'll also get a breakdown in the command-line log of how long the different steps of MultiQC execution took:

[INFO   ]         multiqc : MultiQC complete
[INFO   ]         multiqc : Run took 35.28 seconds
[INFO   ]         multiqc :  - 31.01s: Searching files
[INFO   ]         multiqc :  - 1.75s: Running modules
[INFO   ]         multiqc :  - 0.96s: Compressing report data
[INFO   ]         multiqc : For more information, see the 'Run Time' section in multiqc_report.html

If MultiQC is finishing in a few seconds or minutes, you probably don't need to do anything. If you are working with huge numbers of files then it may be worth looking into these results to see if you can speed up MultiQC. The documentation below explains how to do this.

Be picky with which modules are run

Probably the easiest way to speed up MultiQC is to only use the modules that you know you have files for. MultiQC supports a lot of different tools and searches for matching files for all of them every time you run it.

You can do this with the -m / --module flag (can be repeated) or in a MultiQC config file by using config.module_order. See Order of modules.

Optimise file search patterns

Secondly, think about customising the search patterns of the slowest searches.

As an example, logs from Picard are published to STDOUT and so can have any file name. Some people concatenate logs, so the contents can be anywhere in the file and the files must also be searched by subsequent tools in case they contain multiple outputs. If you know that all of your Picard MarkDuplicate log files have the filename mysamplename_markduplicates.log then you can safely customise that search pattern with the following MultiQC config:

        fn: '*_markduplicates.log'

If you know that this is the only type of Picard output that you're interested in, you can also change all of the other Picard search patterns to use skip: True:

        fn: '*_markduplicates.log'
        skip: true
        skip: true
        skip: true
        skip: true
        skip: true
        skip: true
        skip: true
        skip: true
        skip: true
        skip: true
        skip: true
        skip: true
        skip: true
        skip: true
        skip: true

This can speed up execution a bit if you really want to squeeze that running time. The MultiQC Modules documentation shows the search patterns for every module.

Note that it's only worth using skip: true on search patterns if you want to use one from a module that has several. Usually it's better to just specify which modules you want to run instead.

Force interactive plots

One step that can take some time is running MatPlotLib to generate static-image plots (see Flat / interactive plots). You can force MultiQC to skip this and only use interactive plots by using the --interactive command line option (config.plots_force_interactive).

This approach is not recommended if you have a very large number of samples, as this can produce a huge report file with all of the embedded plot data and crash your browser when opening it. If you are running MultiQC for the multiqc_data folder and never intend to look at the report, it speed things up though.

Customising Reports

MultiQC offers a few ways to customise reports to easily add your own branding and some additional report-level information. These features are primarily designed for core genomics facilities.

Note that much more extensive customisation of reports is possible using custom templates.

Titles and introductory text

You can specify a custom title for the report using the -i/--title command line option. The -b/--comment option can be used to add a longer comment to the top of the report at run time.

You can also specify the title and comment, as well as a subtitle and the introductory text in your config file:

title: "My Title"
subtitle: "A subtitle to go underneath in grey"
intro_text: "MultiQC reports summarise analysis results."
report_comment: "This is a comment about this report."

Note that if intro_text is None the template will display the default introduction sentence. Set this to False to hide this, or set it to a string to use your own text.

Report time and analysis paths

It's not always appropriate to include the file paths that MultiQC was run with in a report, for example if sharing reports with others outside your organisation.

If you wish, you can disable the analysis paths and/or time in the report header with the following config parameters:

show_analysis_paths: False
show_analysis_time: False

To add your own custom logo to reports, you can add the following three lines to your MultiQC configuration file:

custom_logo: '/abs/path/to/logo.png'
custom_logo_url: 'https://www.example.com'
custom_logo_title: 'Our Institute Name'

Only custom_logo is needed. The URL will make the logo open up a new web browser tab with your address and the title sets the mouse hover title text.

Project level information

You can add custom information at the top of reports by adding key:value pairs to the config option report_header_info. Note that if you have a file called multiqc_config.yaml in the working directory, this will automatically be parsed and added to the config. For example, if you have the following saved:

    - Contact E-mail: 'phil.ewels@scilifelab.se'
    - Application Type: 'RNA-seq'
    - Project Type: 'Application'
    - Sequencing Platform: 'HiSeq 2500 High Output V4'
    - Sequencing Setup: '2x125'

Then this will be displayed at the top of reports:

report project info

Note that you can also specify a path to a config file using -c.

Bulk sample renaming

Although it is possible to rename samples manually and in bulk using the report toolbox, it's often desirable to embed such renaming patterns into the report so that they can be shared with others. For example, a typical case could be for a sequencing centre that has internal sample IDs and also user-supplied sample names. Or public sample identifiers such as SRA numbers as well as more meaningful names.

It's possible to supply a file with one or more sets of sample names using the --sample-names command line option. This file should be a tab-delimited file with a header row (used for the report button labels) and then any number of renamed sample identifiers. For example:

MultiQC Names   Proper Names    AWESOME NAMES
SRR1067503_1    Sample_1    MYBESTSAMP_1
SRR1067505_1    Sample_2    MYBESTSAMP_2
SRR1067510_1    Sample_3    MYBESTSAMP_3

If supplied, buttons will be generated at the top of the report with your labels. Clicking these will populate and apply the Toolbox renaming panel.

NB: Sample renaming works with partial substrings - these will be replaced!

It's also possible to supply such renaming patterns within a config file (useful if you're already generating a config file for a run). In this case, you need to set the variables sample_names_rename_buttons and sample_names_rename. For example:

    - "MultiQC Names"
    - "Proper Names"
    - ["SRR1067503_1", "Sample_1", "MYBESTSAMP_1"]
    - ["SRR1067505_1", "Sample_2", "MYBESTSAMP_2"]
    - ["SRR1067510_1", "Sample_3", "MYBESTSAMP_3"]

Show / Hide samples buttons

It is possible to filter which samples are visible through the report toolbox, but it can be desirable to embed such patterns into the report so that they can be shared with others. One example can be to add filters for batches, to easily scan if certain quality metrics overlap between these batches.

It's possible to supply a file with one or more patterns to filter samples on using the --sample-filters command line option. This file should be a tab-delimited file with each row containing the button name, whether the pattern should be hidden (hide) or shown (show) and the patterns to be applied (all subsequent columns).

For example, to filter on read pair groups, you could use the following file:

Read Group 1    show    _R1
Read Group 2    show    _R2

To filter on controls and sample groups you could use:

Controls    show    input_
Conditions  show    group_1_    group_2_    group_3_

MultiQC automatically adds an Show all button at the start, which reverts back to showing all samples.

If you prefer, you can also add these buttons using a MultiQC config file:

  - Read Group 1
  - Read Group 2
  - Controls
  - Conditions
  - show
  - show
  - show
  - show
  - _R1
  - _R2
  - input_
  - [ "group_1_", "group_2_", "group_3_" ]

Module and section comments

Sometimes you may want to add a custom comment above specific sections in the report. You can do this with the config option section_comments as follows:

    featurecounts: 'This comment is for a module header, but should still work'
    star_alignments: 'This new way of commenting above sections is **awesome**!'

Comments can be written in Markdown. The section_comments keys should correspond to the HTML IDs of the report section. You can find these by clicking on a navigation link in the report and seeing the #section_id at the end of the browser URL.

Removing modules or sections

If you don't want an entire module to be used in a MultiQC report, use the -e/--exclude command line flags to skip running that tool. You can also use the config option exclude_modules:

    - fastqc
    - cutadapt

If you want to run only specific modules, you can do that with -m/--module or the config option run_modules:

    - fastqc
    - cutadapt

If you would like to remove just one section of a module report, you can do so with the remove_sections config option as follows:

    - section-id-one
    - second-section-id

The section ID is the string appended to the URL when clicking a report section in the navigation. For example, the GATK module has a section with the title "Compare Overlap". When clicking that in the report's left hand side navigation, the web browser URL has #gatk-compare-overlap appended. Here, you would add gatk-compare-overlap to the remove_sections config.

Removing General Statistics

The General Statistics is a bit of a special case in MultiQC, but there is added code to make it behave well with the above mechanism. On the command line, you can specify -e general_stats. Alternatively, you can set the following config flag in your MultiQC config:

skip_generalstats: true

Order of modules

By default, modules are included in the report as in the order specified in config.module_order. Any modules found which aren't in this list are appended at the top of the report.

Top modules

To specify certain modules that should always come at the top of the report, you can configure config.top_modules in your MultiQC configuration file. For example, to always have the FastQC module at the top of reports, add the following to your ~/.multiqc_config.yaml file:

    - 'fastqc'

Running modules multiple times

A module can be specified multiple times in either config.module_order or config.top_modules, causing it to be run multiple times. By itself you'll just get two identical report sections. However, you can also supply configuration options to the modules as follows:

    - moduleName:
        name: 'Module (filtered)'
        info: 'This section shows the module with different files'
            - '*_special.txt'
            - '*_others.txt'
    - moduleName:
        name: 'Module (not-special)'
            - '*_special.txt'

These overwrite the defaults that are hardcoded in the module code. path_filters and path_filters_exclude being the exception. These filter the file searches for a given list of glob filename patterns:

Pattern Meaning
* matches everything
? matches any single character
[seq] matches any character in seq
[!seq] matches any character not in seq

Note that exclusion superseeds inclusion for the path filters.

The other available configuration options are:

  • name: Section name
  • anchor: Section report ID
  • target: Intro link text
  • href: Intro link URL
  • info: Intro text
  • extra: Additional HTML after intro.
  • custom_config: Custom module-level settings. Translated into config.moduleName, but specifically for this section.

For example, to run the FastQC module twice, before and after adapter trimming, you could use the following config:

    - fastqc:
        name: 'FastQC (trimmed)'
        anchor: 'fastqc_trimmed'
        info: 'This section of the report shows FastQC results after adapter trimming.'
        target: ''
            - '*_1_trimmed_fastqc.zip'
    - cutadapt
    - fastqc:
        name: 'FastQC (raw)'
        anchor: 'fastqc_raw'
            - '*_1_fastqc.zip'

Note that if you change the name then you will get multiples of columns in the General Statistics table. If unchanged, the topmost module may overwrite output from the first iteration.

If you set a custom anchor, then this can be used for other configuration options. For example, using the anchors above and the report_section_order described below:

        before: fastqc_raw

NB: Currently, you can not list a module name in both top_modules and module_order. Let me know if this is a problem..

Order of module and module subsection output

The module_order config changes the order in which each MultiQC module is executed. However, sometimes it's desirable to customise the order of specific sections in a report, independent of the order of module execution. For example, the custom_content module can generate multiple sections from different input files. Also, module_order does not allow you to change the sequence of sections within a MultiQC module.

To change the order of MultiQC outputs, follow a link in a report navigation to skip to the section you want to move (either a major section header or a subheading). Find the ID of that section by looking at the URL. For example, clicking on FastQC changes the URL to multiqc_report.html#fastqc - the ID is the text after (not including) the # symbol: fastqc. The FastQC Status Checks subsection is multiqc_report.html#fastqc_status_checks and has the id fastqc_status_checks.

Next, specify the report_section_order option in your MultiQC config file. Modules and sections in the report are given a number ranging from 10 (section at bottom of report), incrementing by +10 for each section. You can change this number (eg. a very low number to always get at the bottom of the report or very high to always be at the top), or you can move a section to before or after another existing section (has no effect if the other named ID is not in the report).

Note that module sub-sections can only be move within their module. So you can't have the FastQC Adapter Content section shown under the GATK module header.

You can also use this config option to completely remove module sub-sections. To do this, just set the subsection ID to remove (NB: no : or -). This only works for module subsections. To remove an entire module, use the -e/--exclude flag.

For example, you could add the following to your MultiQC config file:

        order: -1000
        after: 'diffsection'
        before: 'othersection'

Customising plots

Almost every plot in all MultiQC reports are created using standard plotting functions and a plot config. You can override any plot config variable you like for any plot to customise how these are generated.

To do this, first find the plot that you would like to customise and copy it's unique ID. You can find this by clicking export - the name next to the checkbox is the ID.

Next, you need to find the plot config key(s) that you would like to change. You can find these by reading the MultiQC documentation below.

For example, to set a new limit for the Picard InsertSizeMetrics x-axis, you can use the following:

        xmax: 300

You can customise multiple variables for multiple plots:

    # Show the percentages tab by default for the FastQC sequence counts plot
        cpswitch_c_active: False

    # Only show up to 20bp on the x axis for cutadapt, change the title
        xmax: 20
        title: "How many base pairs have been removed from the data"

    # Add a coloured band in the background to show what is a good result
    # Yes I know this doesn't make sense for this plot, it's just an example ;)
            - from: 0
              to: 40
              color: '#e6c3c3'
            - from: 40
              to: 80
              color: '#e6dcc3'
            - from: 80
              to: 100
              color: '#c3e6c3'

As of version 1.8, this also works for customising the config of bargraph categories:

      color: '#d84e2f'
      color: '#f2e63f'
      color: '#8bbc21'

Customising tables

Hiding columns

Report tables such as the General Statistics table can get quite wide. To help with this, columns in the report can be hidden. Some MultiQC modules include columns which are hidden by default, others may be uninteresting to some users.

To allow customisation of this behaviour, the defaults can be changed by adding to your MultiQC config file. This is done with the table_columns_visible value. Open a MultiQC report and click Configure Columns above a table. Make a note of the Group and ID for the column that you'd like to alter. For example, to make the % Duplicate Reads column from FastQC hidden by default, the Group is FastQC and the ID is percent_duplicates. These are then added to the config as follows:

        percent_duplicates: False

You can also specify a value for an entire module / table namespace. This will then show or hide all columns for that module. For example:

    FastQC: False

Note that you can set these values to True to show columns that would otherwise be hidden by default.

Column order

In the same way, you can force a column to appear at the start or end of the table, or indeed impose a custom ordering on all the columns, by setting the table_columns_placement. High values push columns to the right hand side of the table and low to the left. The default value is 1000. For example:

        reads_mapped: 900
        properly_paired: 1010
        secondary: 1020

In this case, since the default placement weighting is 1000, the reads_mapped will end up as the leftmost column and the other two will and up as the final columns on the right of the table.

The columns are organised by either namespace or table ID, then column ID. In the above example, Samtools is the namespace in the General Statistics table - the text that is at the start of the tooltip. For custom tables, the ID may be easier to use.

Conditional formatting

It's possible to highlight values in tables based on their value. This is done using the table_cond_formatting_rules config setting. Rules can be applied to every table column, or to specific columns only, using that column's unique ID.

The default rules are as follows:

            - s_eq: 'pass'
            - s_eq: 'true'
            - s_eq: 'warn'
            - s_eq: 'unknown'
            - s_eq: 'fail'
            - s_eq: 'false'

These make any table cells that match the string pass or true have text with a green background, orange for warn, red for fail and so on. There can be multiple tests for each style of formatting - if there is a match for any, it will be applied. The following comparison operators are available:

  • s_eq - String exactly equals (case insensitive)
  • s_contains - String contains (case insensitive)
  • s_ne - String does not equal (case insensitive)
  • eq - Value equals
  • ne - Value does not equal
  • gt - Value is greater than
  • lt - Value is less than

To have matches for a specific column, use that column's ID instead of all_columns. For example:

            - gt: 80
            - lt: 80
            - lt: 70

Note that the formatting is done in a specific order - pass/warn/fail by default, so that anything matching both warn and fail will be formatted as fail for example. This can be customised with table_cond_formatting_colours (see below).

To find the unique ID for your column, right click a table cell in a report and inspect it's HTML (Inpsect in Chrome). It should look something like <td class="data-coloured mqc-generalstats-Assigned">, where the mqc-generalstats-Assigned bit is the unique ID.

I know this isn't the same method of IDs as above and isn't super easy to do. Sorry!

It's possible to highlight matches in any number of colours. MultiQC comes with the following defaults:

    - blue: '#337ab7'
    - lbue: '#5bc0de'
    - pass: '#5cb85c'
    - warn: '#f0ad4e'
    - fail: '#d9534f'

These can be overridden or added to with any string / CSS hex colour combinations you like. You can generate hex colour codes with lots of tools, for example http://htmlcolorcodes.com/

Note that the different sets of rules are formatted in order. So if a value matches both pass and fail then it will be formatted as a fail

Number base (multiplier)

To make numbers in the General Statistics table easier to read and compare quickly, MultiQC sometimes divides them by one million (typically read counts). If your samples have very low read counts then this can result in the table showing counts of 0.0, which isn't very helpful.

To change this behaviour, you can customise three config variables in your MultiQC config. The defaults are as follows:

read_count_multiplier: 0.000001
read_count_prefix: 'M'
read_count_desc: 'millions'

So, to show thousands of reads instead of millions, change these to:

read_count_multiplier: 0.001
read_count_prefix: 'K'
read_count_desc: 'thousands'

The same options are also available for numbers of base pairs:

base_count_multiplier: 0.000001
base_count_prefix: 'Mb'
base_count_desc: 'millions'

And for long reads:

long_read_count_multiplier: 0.001
long_read_count_prefix: 'K'
long_read_count_desc: 'thousands'

Number formatting

By default, the interactive HighCharts plots in MultiQC reports use spaces for thousand separators and points for decimal places (e.g. 1 234 567.89). Different countries have different preferences for this, so you can customise the two using a couple of configuration parameters - decimalPoint_format and thousandsSep_format.

For example, the following config would result in the following alternative number formatting: 1234567,89.

decimalPoint_format: ','
thousandsSep_format: ''

This formatting currently only applies to the interactive charts. It may be extended to apply elsewhere in the future (submit a new issue if you spot somewhere where you'd like it).


One tricky bit that caught me out whilst writing this is the different type casting between Python, YAML and Jinja2 templates. This is especially true when using an empty variable:

# Python
my_var = None
my_var: null
# Jinja2
if myvar is none # Note - Lower case!

Using MultiQC in pipelines

MultiQC has been designed to be placed at the end of bioinformatics workflows and works well when it's a final step in a pipeline. Here you can find a few tips about integration with different workflow tools.

I'll use FastQC as an example input in all of these examples as it's a common use case. However, the concepts should apply for any of the tools that MultiQC supports.

Remember that you can use Custom Content feature to easily collect pipeline-specific metadata (software version numbers, pipeline run-time data, links to documentation) in to a format that can be inserted in to your report.

If you know exactly which modules will be used by MultiQC, you can use the -m/--modules flag to specify just these. This will speed up MultiQC a little. This will probably only make a noticeable impact if your pipeline has thousands of input files for MultiQC.

In the following, we show examples for Nextflow and Snakemake integration.


An example mini-pipeline for nextflow which runs FastQC and MultiQC is below. See the nf-core pipelines for lots of examples of full nextflow pipelines that use MultiQC.

#!/usr/bin/env nextflow

params.reads = "data/*{1,2}.fastq.gz"
Channel.fromFilePairs( params.reads ).into { read_files_fastqc }

process fastqc {
    set val(name), file(reads) from read_files_fastqc

    file "*_fastqc.{zip,html}" into fastqc_results

    fastqc -q $reads

process multiqc {
    file ('fastqc/*') from fastqc_results.collect().ifEmpty([])

    file "multiqc_report.html" into multiqc_report
    file "multiqc_data"

    multiqc .

You will need to specify channels for all files that MultiQC needs to run, so that nextflow stages them correctly in the MultiQC work directory.

Note that .collect() is needed to make MultiQC run once for all upstream outputs.

The .ifEmpty([]) add on isn't really needed here, but is helpful in larger pipelines where some processes may be optional. Without this, if any channels are not processed then MultiQC won't run.

Clashing input filenames

If a nextflow process tries to stage more than one input file with an identical filename, it will throw an error. Putting inputs into their own subfolder (file ('fastqc/*')) in the above examples is not really needed, but reduces the chance of input filename clashes.

If you're using a tool that gives the same filename to each file that MultiQC uses, you'll need to tell nextflow to rename the inputs to prevent clashes.

For example, StringTie prints statistics to STDOUT that MultiQC uses to generate reports. We can easily collect this in nextflow by using the .command.log file that nextflow saves as an output file from the process (file ".command.log" into stringtie_log). However, now every sample has the same filename for MultiQC.

We get around this by using dynamic input file names with nextflow:

file ('stringtie/stringtie_log*') from stringtie_log.collect().ifEmpty([])

This file pattern renames each stringtie log file to stringtie_log1, stringtie_log2 and so on, ensuring that we avoid any filename clashes.

Note that MultiQC finds output from some tools based on their filename, so use with caution (you may need to define some custom module search patterns).

Custom run name

When you launch nextflow, you can use the -name command line flag (single hyphen) to give a name to that specific pipeline run. You can configure your pipeline to pass this on to MultiQC. It can then be used as the report title and filename.

Here's an example snippet for this:

custom_runName = params.name
if( !(workflow.runName ==~ /[a-z]+_[a-z]+/) ){
  custom_runName = workflow.runName

process multiqc {
    file ('fastqc/*') from ch_fastqc_results_for_multiqc.collect().ifEmpty([])

    file "*_report.html" into ch_multiqc_report
    file "*_data"

    rtitle = custom_runName ? "--title \"$custom_runName\"" : ''
    rfilename = custom_runName ? "--filename " + custom_runName.replaceAll('\\W','_').replaceAll('_+','_') + "_multiqc_report" : ''
    multiqc $rtitle $rfilename .

Note that we use params.name as a placeholder, this gives the benefit that both -name and the common typo --name work for this case.

There isn't an easy way to know if a custom value for -name has been given to nextflow, but all default names are two lowercase words with a single underscore so if the name matches this pattern then we ignore it.

MultiQC config file

It can be nice to use a config file for MultiQC to add in some static content to reports about your pipeline (eg. report_comment). Remember that even this config file should also be in a nextflow channel, so that nextflow correctly stages it (especially important when running on the cloud).

This snippet works with a params variable again, so that pipeline users can replace the config file with one of their own if they wish.

params.multiqc_config = "$baseDir/assets/multiqc_config.yaml"
Channel.fromPath(params.multiqc_config, checkIfExists: true).set { ch_config_for_multiqc }

process multiqc {
    file multiqc_config from ch_config_for_multiqc
    file ('fastqc/*') from fastqc_results.collect().ifEmpty([])

    file "multiqc_report.html" into multiqc_report
    file "multiqc_data"

    multiqc --config $multiqc_config .


The best-practice for using MultiQC in Snakemake pipelines is to use a predefined Snakemake wrapper. For example, see the following mini-pipeline:

configfile: "config.yaml"

rule all:

rule fastqc:

rule multiqc:
        expand("fastqc/{sample}.html", sample=config["samples"])

Snakemake wrappers not only deliver predefined and unit tested code for generating the requested output with the respective tool, but also define the required software stack in terms of conda packages.

You can run the above workflow as follows:

snakemake --use-conda

This first installs all the required tools into isolated conda environments, and then executes all necessary steps to create the target that is given in the top rule. In other words, it will execute FastQC twice (creating fastqc/1.html and fastqc/2.html) and MultiQC once, creating multiqc_report.html.

Of course, using conda is optional, but it greatly increases reproducibility. Snakemake is not limited to wrappers (although its wrapper repository provides many in the field of bioinformatics), but also supports direct execution of shell commands and integration of custom scripts (e.g., for plotting).

If you prefer to use MultiQC without a snakemake wrapper, you can see a minimal example on GitHub: jakevc/snakemake_multiqc. This has an example script and some test data for you to play with.


Hopefully MultiQC will be easy to use and run without any hitches. If you have any problems, please do get in touch with the developer (Phil Ewels) by e-mail or by submitting an issue on github. Before that, here are a few things previously encountered that may help...

Not enough samples found

In this scenario, MultiQC finds some logs for the bioinformatics tool in question, but not all of your samples appear in the report. This is the most common question I get regarding MultiQC operation.

Usually, this happens because sample names collide. This happens innocently a lot - MultiQC overwrites previous results of the same name and you get the last one seen in the report. You can see warnings about this by running MultiQC in verbose mode with the -v flag, or looking at the generated log file in multiqc_data/multiqc.log. If you are unsure about what log file ended up in the report, look at multiqc_data/multiqc_sources.txt which lists each source file used.

To solve this, try running MultiQC with the -d and -s flags. The Clashing sample names section of the docs explains this in more detail.

Big log files

Another reason that log files can be skipped is if the log filesize is very large. For example, this could happen with very long concatenated standard out files. By default, MultiQC skips any file that is larger than 10MB to keep execution fast. The verbose log output (-v or multiqc_data/multiqc.log) will show you if files are being skipped with messages such as these:

[DEBUG  ]  Ignoring file as too large: filename.txt

You can configure the threshold and parse your files by changing the log_filesize_limit config option. For example, to parse files up to 2GB in size, add the following to your MultiQC config file:

log_filesize_limit: 2000000000

No logs found for a tool

In this case, you have run a bioinformatics tool and have some log files in a directory. When you run MultiQC with that directory, it finds nothing for the tool in question.

There are a couple of things you can check here:

  1. Is the tool definitely supported by MultiQC? If not, why not open an issue to request it!
  2. Did your bioinformatics tool definitely run properly? I've spent quite a bit of time debugging MultiQC modules only to realise that the output files from the tool were empty or incomplete. If your data is missing, take a look and the raw files and make sure that there's something to see!

If everything looks fine, then MultiQC probably needs extending to support your data. Tools have different versions, different parameters and different output formats that can confuse the parsing code. Please open an issue with your log files and we can get it fixed.

Error messages about mkl trial mode / licences

In this case you run MultiQC and get something like this:

$ multiqc .

Vendor:  Continuum Analytics, Inc.
Package: mkl
Message: trial mode EXPIRED 2 days ago

    You cannot run mkl without a license any longer.
    A license can be purchased it at: http://continuum.io
    We are sorry for any inconveniences.


The mkl library provides optimisations for numpy, a requirement of MatPlotLib. Recent versions of Conda have a bundled version which should come with a licence and remove the warning. See this page for more info. If you already have Conda installed you can get the updated version by running:

conda remove mkl-rt
conda install -f mkl

Another way around it is to uninstall mkl. It seems that numpy works without it fine:

$ conda remove --features mkl

Problem solved! See more here and here.

If you're not using Conda, try installing MultiQC with that instead. You can find instructions here.

Locale Error Messages

Two MultiQC dependencies have been known to throw errors due to problems with the Python locale settings, or rather the lack of those settings.

MatPlotLib can complain that some strings (such as en_SE) aren't allowed. Running MultiQC gives the following error:

$ multiqc --version
# ..long traceback.. #
 File "/sw/comp/python/2.7.6_milou/lib/python2.7/locale.py", line 443, in _parse_localename
   raise ValueError, 'unknown locale: %s' % localename
ValueError: unknown locale: UTF-8

Click can have a similar problem if the locale isn't set when using Python 3. That generates an error that looks like this:

# ..truncated traceback.. #
File "click/_unicodefun.py", line 118, in _verify_python3_env 'for mitigation steps.' + extra)

RuntimeError: Click will abort further execution because Python 3 was configured to use ASCII
as encoding for the environment.  Consult http://click.pocoo.org/python3/for mitigation steps.

You can fix both of these problems by changing your system locale to something that will be recognised. One way to do this is by adding these lines to your .bashrc in your home directory (or .bash_profile):

export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8

Other locale strings are also fine, as long as the variables are set and valid.

MultiQC Modules


Adapter Removal

This program searches for and removes remnant adapter sequences from High-Throughput Sequencing (HTS) data and (optionally) trims low quality bases from the 3' end of reads following adapter removal. AdapterRemoval can analyze both single end and paired end data, and can be used to merge overlapping paired-ended reads into (longer) consensus sequences. Additionally, the AdapterRemoval may be used to recover a consensus adapter sequence for paired-ended data, for which this information is not available.

The adapterRemoval module parses *.settings logs generated by Adapter Removal, a tool for rapid adapter trimming, identification, and read merging.

supported setting file results:

  • single end
  • paired end noncollapsed
  • paired end collapsed
Adapter Removal file search patterns See docs
  fn: '*.settings'
  contents: AdapterRemoval
  num_lines: 1


The AfterQC module parses results generated by AfterQC. AfterQC can simply go through all fastq files in a folder and then output three folders: good, bad and QC folders, which contains good reads, bad reads and the QC results of each fastq file/pair.

AfterQC file search patterns See docs
  fn: '*.json'
  contents: allow_mismatch_in_poly


There are two versions of this software: bcl2fastq for MiSeq and HiSeq sequencing systems running RTA versions earlier than 1.8, and bcl2fastq2 for Illumina sequencing systems running RTA version 1.18.54 and above. This module currently only covers output from the latter.

bcl2fastq file search patterns See docs
  fn: Stats.json
  contents: DemuxResults
  num_lines: 300

BioBloom Tools

BioBloom Tools (BBT) provides the means to create filters for a given reference and then to categorize sequences. This methodology is faster than alignment but does not provide mapping locations. BBT was initially intended to be used for pre-processing and QC applications like contamination detection, but is flexible to accommodate other purposes. This tool is intended to be a pipeline component to replace costly alignment steps.

BioBloom Tools file search patterns See docs
  contents: >
    filter_id	hits	misses	shared	rate_hit	rate_miss	rate_shared
  num_lines: 2

Cluster Flow

Cluster Flow is a simple and flexible bioinformatics pipeline tool. It's designed to be quick and easy to install, with flexible configuration and simple customization.

Cluster Flow easy enough to set up and use for non-bioinformaticians (given a basic knowledge of the command line), and it's simplicity makes it great for low to medium throughput analyses.

The MultiQC module for Cluster Flow parses *_clusterflow.txt logs and finds consensus commands executed by modules in each pipeline run.

The Cluster Flow *.run files are also parsed and pipeline information shown (some basic statistics plus the pipeline steps / params used).

Cluster Flow file search patterns See docs
  fn: '*_clusterFlow.txt'
  shared: true
  fn: '*.run'
  contents: Cluster Flow Run File
  num_lines: 2


The Cutadapt module parses results generated by Cutadapt, a tool to find and remove adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.

This module should be able to parse logs from a wide range of versions of Cutadapt. It's been tested with log files from v1.2.1, 1.6 and 1.8. Note that you will need to change the search pattern for very old log files (such as v.1.2) with the following MultiQC config:

        contents: 'cutadapt version'

See the module search patterns section of the MultiQC documentation for more information.

Cutadapt file search patterns See docs
  contents: This is cutadapt
  shared: true


An application to clip adapter sequences and merge reads in ancient DNA analysis. Note, that versions < 1.7.8 use the basename of the file path to distinguish samples, whereas newer versions produce logfiles with a sample identifer that gets parsed by MultiQC.

ClipAndMerge file search patterns See docs
  contents: ClipAndMerge (
  num_lines: 5

FastQ Screen

The FastQ Screen module parses results generated by FastQ Screen, a tool that allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

By default, the module creates a plot that emulates the FastQ Screen output with blue and red stacked bars showing unique and multimapping read counts. This plot only works for a handful of samples however, so if # samples * # organisms >= 160, a simpler stacked barplot is shown. This is also shown when generating flat-image plots.

To always show this style of plot, add the following line to a MultiQC config file:

fastqscreen_simpleplot: true
FastQ Screen file search patterns See docs
  fn: '*_screen.txt'


The FastQC module parses results generated by FastQC, a quality control tool for high throughput sequence data written by Simon Andrews at the Babraham Institute.

FastQC generates a HTML report which is what most people use when they run the program. However, it also helpfully generates a file called fastqc_data.txt which is relatively easy to parse.

A typical run will produce the following files:


Sometimes the directory is zipped, with just mysample_fastqc.zip.

The FastQC MultiQC module looks for files called fastqc_data.txt or ending in _fastqc.zip. If the zip files are found, they are read in memory and fastqc_data.txt parsed.

Note: The directory and zip file are often both present. To speed up MultiQC execution, zip files will be skipped if the file name suggests that they will share a sample name with data that has already been parsed.

You can customise the patterns used for finding these files in your MultiQC config (see Module search patterns). The below code shows the default file patterns:

        fn: 'fastqc_data.txt'
        fn: '*_fastqc.zip'

Note: Sample names are discovered by parsing the line beginning Filename in fastqc_data.txt, not based on the FastQC report names.

Theoretical GC Content

It is possible to plot a dashed line showing the theoretical GC content for a reference genome. MultiQC comes with genome and transcriptome guides for Human and Mouse. You can use these in your reports by adding the following MultiQC config keys (see Configuring MultiQC):

    fastqc_theoretical_gc: 'hg38_genome'

Only one theoretical distribution can be plotted. The following guides are available: (txome = transcriptome)

  • hg38_genome
  • hg38_txome
  • mm10_genome
  • mm10_txome

Alternatively, a custom theoretical guide can be used in reports. To do this, create a file with fastqc_theoretical_gc in the filename and place it with your analysis files. It should be tab delimited with the following format (column 1 = %GC, column 2 = % of genome):

# FastQC theoretical GC content curve: YOUR REFERENCE NAME
0   0.005311768
1   0.004108502
2   0.004060371
3   0.005066476

You can generate these files using an R package called fastqcTheoreticalGC written by Mike Love. Please see the package readme for more details.

Result files from this package are searched for with the following search pattern (can be customised as described above):

        fn: '*fastqc_theoretical_gc*'

If you want to always use a specific custom file for MultiQC reports without having to add it to the analysis directory, add the full file path to the same MultiQC config variable described above:

    fastqc_theoretical_gc: '/path/to/your/custom_fastqc_theoretical_gc.txt'

Changing the order of sections

Remember that it is possible to customise the order in which the different module sections appear in the report if you wish. See the docs for more information.

For example, to show the Status Checks section at the top, use the following config:

        order: -1000
FastQC file search patterns See docs
  fn: fastqc_data.txt
  fn: '*_fastqc.zip'
  fn: '*fastqc_theoretical_gc*'


The Fastp module parses results generated by Fastp. Fastp can simply go through all fastq files in a folder and perform a series of quality control and filtering. Quality control and reporting are displayed both before and after filtering, allowing for a clear depiction of the consequences of the filtering process. Notably, the latter can be conducted on a variety of paramaters including quality scores, length, as well as the presence of adapters, polyG, or polyX tailing.

Fastp file search patterns See docs
  fn: '*fastp.json'


The FLASh module parses the log messages generated by the FLASh read merger. To create a log file, you can use tee. From the FLASh help:

flash reads_1.fq reads_2.fq 2>&1 | tee logfilename.log

The sample name is set by the first input filename listed in the log. However, this can be changed to using the first output filename (i.e. if you used FLASh's --output-prefix=PREFIX option) by using the following config:

    use_output_name: true

The module can also parse the .hist numeric histograms output by FLASh.

Note that the histogram's file format and extension are too generic by themselves which could result in the accidental parsing a file output by another tool. To get around this, the MultiQC module only parses files with the filename pattern *flash*.hist.

To customise this (for example, enabling for any file ending in *.hist), use the following config change:

        fn: '*.hist'
FLASh file search patterns See docs
  contents: '[FLASH]'
  shared: true
  fn: '*flash*.hist'


Flexbar preprocesses high-throughput sequencing data efficiently. It demultiplexes barcoded runs and removes adapter sequences. Moreover, trimming and filtering features are provided. Flexbar increases read mapping rates and improves genome as well as transcriptome assemblies.

Flexbar file search patterns See docs
  contents: 'Flexbar - flexible barcode and adapter removal'
  shared: true


This module parses the output from the InterOp Summary executable and creates a table view. The aim is to replicate the Run & Lane Metrics table from the Illumina Basespace interface. The executable used can easily be installed from the BioConda channel using conda install -c bioconda illumina-interop.

The MultiQC interop module can parse the outputs of the interop_summary and interop_index-summary executables. Note that these must be run with the --csv=1 option.

InterOp file search patterns See docs
  contents: 'Level,Yield,Projected Yield,Aligned,Error Rate,Intensity C1,%>=Q30'
  contents: 'Total Reads,PF Reads,% Read Identified (PF),CV,Min,Max'


iVar is a computational package that contains functions broadly useful for viral amplicon-based sequencing.

This module parses the output from the ivar trim command and creates a table view. Both output from V1 and V2 of the tool are supported and parsed accordingly.

iVar file search patterns See docs
  contents: Number of references
  num_lines: 8


JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. JELLYFISH can count k-mers using an order of magnitude less memory and an order of magnitude faster than other k-mer counting packages by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism.

The MultiQC module for Jellyfish parses only *_jf.hist files. The general usage of jellyfish to be parsed by MultiQC module needs to be:

  • gunzip -c file.fastq.gz | jellyfish count -o file.jf -m ...
  • jellyfish histo -o file_jf.hist -f file.jf

In case a user wants to customise the matching pattern for jellyfish, then multiqc can be run with the option --cl_config "sp: { jellyfish: { fn: 'PATTERN' } }" where PATTERN is the pattern to be matched. For example:

multiqc . --cl_config "sp: { jellyfish: { fn: '*.hist' } }"
Jellyfish file search patterns See docs
  fn: '*_jf.hist'


The KAT multiqc module interprets output from KAT distribution analysis json files, which typically contain information such as estimated genome size and heterozygosity rates from your k-mer spectra.

KAT file search patterns See docs
  fn: '*.dist_analysis.json'


leeHom is a Bayesian maximum a posteriori algorithm for stripping sequencing adapters and merging overlapping portions of reads. The algorithm is mostly aimed at ancient DNA and Illumina data but can be used for any dataset.

leeHom file search patterns See docs
  contents: Adapter dimers/chimeras
  shared: true


The MinIONQC module parses results generated by MinIONQC. It uses the sequencing_summary.txt files produced by ONT (Oxford Nanopore Technologies) long-read base-callers to perform QC on the reads. It allows quick-and-easy comparison of data from multiple flowcells

The MultiQC module parses data in the summary.yaml MinIONQC output files.

minionqc file search patterns See docs
  fn: summary.yaml
  contents: total.gigabases


PycoQC relies on the sequencing_summary.txt file generated by Albacore and Guppy, but if needed it can also generate a summary file from basecalled fast5 files.

The package supports 1D and 1D2 runs generated with MinION, GridION and PromethION devices and basecalled with Albacore 1.2.1+ or Guppy 2.1.3+.

pycoQC file search patterns See docs
  contents: '"pycoqc":'
  num_lines: 2


The SeqyClean module will visualize the results from a SeqyClean, a comprehensive preprocessing software pipeline. SeqyClean removes noise from Fastq files to improve de-novo genome assembly and genome mapping.

The module parses the *SummaryStatistics.tsv files that results from a SeqyClean cleaning.

SeqyClean file search patterns See docs
  fn: '*_SummaryStatistics.tsv'


The Sickle module parses standard error generated by Sickle, a windowed Adaptive Trimming for fastq files using quality. StdOut can be captured by directing it to a file e.g. sickle command 2> sickle_out.log

The module generates the sample names based on the filenames.

Sickle file search patterns See docs
  contents_re: 'FastQ \w*\s?records kept: .*'
  num_lines: 2


The Skewer module parses results generated by Skewer, an adapter trimming tool specially designed for processing next-generation sequencing (NGS) paired-end sequences.

Skewer file search patterns See docs
  contents: 'maximum error ratio allowed (-r):'
  shared: true


SortMeRNA is a program tool for filtering, mapping and OTU-picking NGS reads in metatranscriptomic and metagenomic data. The core algorithm is based on approximate seeds and allows for fast and sensitive analyses of nucleotide sequences. The main application of SortMeRNA is filtering ribosomal RNA from metatranscriptomic data.

The MultiQC module parses the log files, which are created when SortMeRNA is run with the --log option.

The default header in the 'General Statistics' table is '% rRNA'. Users can override this using the configuration option:

    tab_header: 'My database hits'
SortMeRNA file search patterns See docs
  contents: Minimal SW score based on E-value
  shared: true


The Trimmomatic module parses standard error generated by Trimmomatic, a flexible read trimming tool for Illumina NGS data. StdErr can be captured by directing it to a file e.g. trimmomatic command 2> trim_out.log

By default, the module generates the sample names based on the command line used by Trimmomatic. If you prefer, you can tell the module to use the filenames as sample names instead. To do so, use the following config option:

    s_name_filenames: true
Trimmomatic file search patterns See docs
  contents: Trimmomatic
  shared: true



The BISCUIT module parses logs generated by BISCUIT and the quality control script, QC.sh, included with the BISCUIT software.

Note - as of MultiQC v1.9, the module supports only BISCUIT version v0.3.16 and onwards. If you have BISCUIT data from before this, please use MultiQC v1.8.

Insert Size Distribution

The second tab of this plot uses the config option read_count_multiplier, so if millions of reads is not useful for your data you can customise this.

See Number base (multiplier) in the documentation.

BISCUIT file search patterns See docs
  fn: '*_mapq_table.txt'
  contents: BISCUITqc Mapping Quality Table
  num_lines: 3
  fn: '*_strand_table.txt'
  contents: BISCUITqc Strand Table
  num_lines: 3
  fn: '*_isize_table.txt'
  contents: BISCUITqc Insert Size Table
  num_lines: 3
  fn: '*_dup_report.txt'
  contents: BISCUITqc Read Duplication Table
  num_lines: 3
  fn: '*_cv_table.txt'
  contents: BISCUITqc Uniformity Table
  num_lines: 3
  fn: '*_covdist_all_base_botgc_table.txt'
  fn: '*_covdist_all_base_table.txt'
  fn: '*_covdist_all_base_topgc_table.txt'
  fn: '*_covdist_q40_base_botgc_table.txt'
  fn: '*_covdist_q40_base_table.txt'
  fn: '*_covdist_q40_base_topgc_table.txt'
  fn: '*_covdist_all_cpg_botgc_table.txt'
  fn: '*_covdist_all_cpg_table.txt'
  fn: '*_covdist_all_cpg_topgc_table.txt'
  fn: '*_covdist_q40_cpg_botgc_table.txt'
  fn: '*_covdist_q40_cpg_table.txt'
  fn: '*_covdist_q40_cpg_topgc_table.txt'
  fn: '*_CpGRetentionByReadPos.txt'
  fn: '*_CpHRetentionByReadPos.txt'
  fn: '*_totalBaseConversionRate.txt'
  fn: '*_totalReadConversionRate.txt'


The Bismark module parses logs generated by Bismark, a tool to map bisulfite converted sequence reads and determine cytosine methylation states.

Bismark file search patterns See docs
  fn: '*_[SP]E_report.txt'
  fn: '*.deduplication_report.txt'
  fn: '*_splitting_report.txt'
  fn: '*M-bias.txt'
  fn: '*.nucleotide_stats.txt'

Bowtie 1

The Bowtie 1 module parses results generated by Bowtie, an ultrafast, memory-efficient short read aligner.

Bowtie 1 file search patterns See docs
  contents: '# reads processed:'
    - bowtie.left_kept_reads.log
    - bowtie.left_kept_reads.m2g_um.log
    - bowtie.left_kept_reads.m2g_um_seg1.log
    - bowtie.left_kept_reads.m2g_um_seg2.log
    - bowtie.right_kept_reads.log
    - bowtie.right_kept_reads.m2g_um.log
    - bowtie.right_kept_reads.m2g_um_seg1.log
    - bowtie.right_kept_reads.m2g_um_seg2.log
  shared: true

Bowtie 2

The Bowtie 2 module parses results generated by Bowtie 2, an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.

Please note that the Bowtie 2 logs are difficult to parse as they don't contain much extra information (such as what the input data was). A typical log looks like this:

314537 reads; of these:
  314537 (100.00%) were paired; of these:
    111016 (35.30%) aligned concordantly 0 times
    193300 (61.46%) aligned concordantly exactly 1 time
    10221 (3.25%) aligned concordantly >1 times
    111016 pairs aligned concordantly 0 times; of these:
      11377 (10.25%) aligned discordantly 1 time
    99639 pairs aligned 0 times concordantly or discordantly; of these:
      199278 mates make up the pairs; of these:
        112779 (56.59%) aligned 0 times
        85802 (43.06%) aligned exactly 1 time
        697 (0.35%) aligned >1 times
82.07% overall alignment rate

Bowtie 2 logs are from STDERR - some pipelines (such as Cluster Flow) print the Bowtie 2 command before this, so MultiQC looks to see if this can be recognised in the same file. If not, it takes the filename as the sample name.

Bowtie 2 is used by other tools too, so if your log file contains the word bisulfite, MultiQC will assume that this is actually Bismark and ignore the Bowtie 2 logs.

Bowtie 2 file search patterns See docs
  contents: 'reads; of these:'
    - bisulfite
    - HiC-Pro
  shared: true


The BBMap module produces summary statistics from the BBMap suite of tools. The module can summarise data from the following BBMap output files (descriptions from command line help output):

  • stats
    • BBDuk filtering statistics.
  • covstats (not yet implemented)
    • Per-scaffold coverage info.
  • rpkm (not yet implemented)
    • Per-scaffold RPKM/FPKM counts.
  • covhist
    • Histogram of # occurrences of each depth level.
  • basecov (not yet implemented)
    • Coverage per base location.
  • bincov (not yet implemented)
    • Print binned coverage per location (one line per X bases).
  • scafstats (not yet implemented)
    • Statistics on how many reads mapped to which scaffold.
  • refstats
    • Statistics on how many reads mapped to which reference file; only for BBSplit.
  • bhist
    • Base composition histogram by position.
  • qhist
    • Quality histogram by position.
  • qchist
    • Count of bases with each quality value.
  • aqhist
    • Histogram of average read quality.
  • bqhist
    • Quality histogram designed for box plots.
  • lhist
    • Read length histogram.
  • gchist
    • Read GC content histogram.
  • indelhist
    • Indel length histogram.
  • mhist
    • Histogram of match, sub, del, and ins rates by read location.
  • statsfile (not yet implemented)
    • Mapping statistics are printed here.

Additional information on the BBMap tools is available on SeqAnswers.

BBMap file search patterns See docs
  contents: '#Name	Reads	ReadsPct'
  num_lines: 4
  contents: '#Quality	count1	fraction1	count2	fraction2'
  num_lines: 1
  contents: '#Pos	A	C	G	T	N'
  num_lines: 1
  contents: '#RefName	Cov	Pos	RunningPos'
  num_lines: 3
  contents: '#BaseNum	count_1	min_1	max_1	mean_1	Q1_1	med_1	Q3_1	LW_1	RW_1	count_2	min_2	max_2	mean_2	Q1_2	med_2	Q3_2	LW_2	RW_2'
  num_lines: 1
  contents: '#Coverage	numBases'
  num_lines: 1
  contents: '#ID	Avg_fold'
  num_lines: 1
  contents: '#Errors	Count'
  num_lines: 1
  contents: '#GC	Count'
  num_lines: 5
  contents: '#Mean_reads'
  num_lines: 1
  contents: '#InsertSize	Count'
  num_lines: 6
  contents: '#Length	Deletions	Insertions'
  num_lines: 1
  contents: '#Length	Count'
  num_lines: 1
  contents: '#BaseNum	Match1	Sub1	Del1	Ins1	N1	Other1	Match2	Sub2	Del2	Ins2	N2	Other2'
  num_lines: 1
  contents: '#Deviation'
  num_lines: 1
  contents: '#BaseNum	Read1_linear	Read1_log	Read1_measured	Read2_linear	Read2_log	Read2_measured'
  num_lines: 1
  contents: '#File	'
  num_lines: 1
  contents: Reads Used=
  num_lines: 1
  contents: 'Reads Used:'
  num_lines: 1


The HiCUP module parses results generated by HiCUP, (Hi-C User Pipeline), a tool for mapping and performing quality control on Hi-C data.

HiCUP file search patterns See docs
  fn: 'HiCUP_summary_report*'


The HiC-Pro module parses results generated by HiC-Pro, a tool for efficient processing and quality control of Hi-C data.

HiC-Pro file search patterns See docs
  fn: '*.mmapstat'
  fn: '*.mpairstat'
  fn: '*.mergestat'
  fn: '*.mRSstat'
  fn: '*.assplit.stat'


HISAT2 is a fast and sensitive alignment program for mapping NGS reads (both DNA and RNA) to a population of human genomes (as well as to a single reference genome).

The HISAT2 MultiQC module parses summary statistics generated by versions >= v2.1.0 where the command line option --new-summary has been specified.

Note that running HISAT2 without this option (and older versions) gives log output identical to Bowtie2. These logs are indistinguishable and summary statistics will appear in MultiQC reports labelled as Bowtie2. See GitHub issues on the HISAT2 repository and the MultiQC repository for more information.

HISAT2 does not report the input file names in the log, so MultiQC takes the filename as the sample. Note that if you specify --summary-file when running HISAT2 the same summary output appears both there and in the stdout. So if you save both with different names you may end up with duplicate samples in your MultiQC report.

HISAT2 file search patterns See docs
  contents: 'HISAT2 summary stats:'
  shared: true


The Kallisto module parses logs generated by Kallisto, a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads.

Note - MultiQC parses the standard out from Kallisto, not any of its output files (abundance.h5, abundance.tsv, and run_info.json). As such, you must capture the Kallisto stdout to a file when running to use the MultiQC module.

Kallisto file search patterns See docs
  contents: '[quant] finding pseudoalignments for the reads'
  shared: true


Currently supported Longranger pipelines:

  • wgs
  • targeted
  • align


longranger wgs --fastqs=/path/to/fastq --id=NA12878
multiqc /path/to/NA12878

This module will look for the files _invocation and summary.csv in the the NA12878 folder, i.e. the output folder of Longranger in this example. The file summary.csv is required. If the file _invocation is not found the sample will receive a generic name in the MultiQC report (longranger#1), instead of NA12878 or whatever was given by the --id parameter.

Longranger file search patterns See docs
  fn: '*summary.csv'
  contents: >
  num_lines: 2
  fn: _invocation
  contents: call PHASER_SVCALLER_CS(
  max_filesize: 2048


The Salmon module parses results generated by Salmon, a tool for quantifying the expression of transcripts using RNA-seq data.

Salmon file search patterns See docs
  fn: meta_info.json
  contents: salmon_version
  fn: flenDist.txt


STAR is an ultrafast universal RNA-seq aligner.

This MultiQC module parses summary statistics from the Log.final.out log files. Sample names are taken either from the filename prefix (sampleNameLog.final.out) when set with --outFileNamePrefix in STAR. If there is no filename prefix, the sample name is set as the name of the directory containing the file.

In addition to this summary log file, the module parses ReadsPerGene.out.tab files generated with --quantMode GeneCounts, if found.

STAR file search patterns See docs
  fn: '*Log.final.out'
  fn: '*ReadsPerGene.out.tab'


The TopHat module parses results generated by TopHat, a fast splice junction mapper for RNA-Seq reads that aligns RNA-Seq reads to mammalian-sized genomes.

TopHat file search patterns See docs
  fn: '*align_summary.txt'
  shared: true


Illumina DRAGEN is a Bio-IT Platform that provides ultra-rapid secondary analysis of sequencing data using field-programmable gate array technology (FPGA).

DRAGEN has a number of different pipelines and outputs, including base calling, DNA and RNA alignment, post-alignment processing and variant calling, covering virtually all stages of typical NGS data processing. For each stage, it generates QC files with metrics resembling those of samtools-stats, mosdepth, bcftools-stats and alike. This MultiQC module supports some of the output but not all.

  • <output prefix>.wgs_fine_hist_<tumor|normal>.csv
    • Coverage distribution and cumulative coverage plots
  • <output prefix>.mapping_metrics.csv
    • General stats table, a dedicated table, and a few barplots
  • <output prefix>.wgs_coverage_metrics_<tumor|normal>.csv
    • General stats table and a dedicated table
  • <output prefix>.wgs_contig_mean_cov_<tumor|normal>.csv
    • A histogram like in mosdepth, with each chrom as a category on X axis, plus a category for autosomal chromosomes average
  • <output prefix>.fragment_length_hist.csv
    • A histogram plot
  • <output prefix>.ploidy_estimation_metrics.csv
    • Add just Ploidy estimation into the general stats table
  • <output prefix>.vc_metrics.csv
    • A dedicated table and the total number of Variants into the general stats table

Each QC output adds a section into the report if a corresponding QC file is found.

DRAGEN file search patterns See docs
  fn: '*.vc_metrics.csv'
  fn: '*.ploidy_estimation_metrics.csv'
  fn_re: '.*\.wgs_contig_mean_cov_?(tumor|normal)?\.csv'
  fn_re: '.*\.wgs_coverage_metrics_?(tumor|normal)?\.csv'
  fn_re: '.*\.wgs_fine_hist_?(tumor|normal)?\.csv'
  fn: '*.fragment_length_hist.csv'
  fn: '*.mapping_metrics.csv'
  contents: >
    Number of unique reads (excl. duplicate
    marked reads)
  num_lines: 50


MALT performs alignment of metagenomic reads against a database of reference sequences (such as NR, GenBank or Silva) and produces a MEGAN RMA file as output.

The MALT MultiQC module reads the header of the MALT log files and procudes three MultiQC sections:

  • A MALT summary statistics table
  • A Mappability bargraph
  • A Taxonomic assignment success bargraph
MALT file search patterns See docs
  contents: 'MaltRun - Aligns sequences using MALT (MEGAN alignment tool)'
  num_lines: 2



The Bamtools module parses bamtools stats logs generated by Bamtools, a programmer's API and an end-user's toolkit for handling BAM files.

Supported commands: stats

Bamtools file search patterns See docs
  contents: 'Stats for BAM file(s):'
  shared: true
  num_lines: 10


The Bcftools module parses results generated by Bcftools, a suite of programs for interacting with variant call data.

Supported commands: stats

Collapse complementary substitutions

In non-strand-specific data, reporting the total numbers of occurences for both changes in a comlementary pair - like A>C and T>G - might not bring any additional information. To collapse such statistics in the substitutions plot, you can add the following section into your configuration:

    collapse_complementary_changes: true

MultiQC will sum up all complementary changes and show only A>* and C>* substitutions in the resulting plot.

Bcftools file search patterns See docs
  contents: This file was produced by bcftools stats
  shared: true


Currently, the biobambam2 module only processes output from the bamsormadup command. Not only that, but it cheats by using the module code from Picard/MarkDuplicates. The output is so similar that the code simply sets up a module with unique name and filename search pattern and then uses the parsing code from the Picard module.

Apart from behind the scenes coding, this module should work in exactly the same way as all other MultiQC modules.

biobambam2 file search patterns See docs
  contents: '# bamsormadup'
  num_lines: 2


BUSCO v2 provides quantitative measures for the assessment of genome assembly, gene set, and transcriptome completeness, based on evolutionarily-informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB v9.

The MultiQC module parses the short_summary_[samplename].txt files and plots the proportion of BUSCO types found. MultiQC has been tested with output from BUSCO v1.22 - v2.

BUSCO file search patterns See docs
  fn: 'short_summary*'
  contents: 'BUSCO version is:'
  num_lines: 1


Conpair is a fast and robust method dedicated for human tumour-normal studies to perform concordance verification (= samples coming from the same individual), as well as cross-individual contamination level estimation in whole-genome and whole-exome sequencing experiments.

Conpair file search patterns See docs
  contents: 'markers (coverage per marker threshold : '
  num_lines: 3
  contents: 'Tumor sample contamination level: '
  num_lines: 3


A tool for DNA damage pattern retrieval for ancient DNA analysis and verification.

DamageProfiler file search patterns See docs
  fn: '*dmgprof.json'


Improved Duplicate Removal for merged/collapsed reads in ancient DNA analysis

By default, tables show read counts in thousands. To customise this, you can set the following MultiQC config variables:

ancient_read_count_prefix: 'K'
ancient_read_count_desc: 'thousands'
ancient_read_count_multiplier: 0.001
DeDup file search patterns See docs
  fn: '*dedup.json'


deepTools addresses the challenge of handling the large amounts of data that are now routinely generated from DNA sequencing centers. deepTools contains useful modules to process the mapped reads data for multiple quality checks, creating normalized coverage files in standard bedGraph and bigWig file formats, that allow comparison between different files (for example, treatment and control). Finally, using such normalized and standardized files, deepTools can create many publication-ready visualizations to identify enrichments and for functional annotations of the genome.

The MultiQC module for deepTools parses a number of the text files that deepTools can produce. In particular, the following are supported:

  • bamPEFragmentSize --table
  • bamPEFragmentSize --outRawFragmentLengths
  • estimateReadFiltering
  • plotCoverage ---outRawCounts (as well as the content written normally to the console)
  • plotEnrichment --outRawCounts
  • plotFingerprint --outQualityMetrics --outRawCounts
  • plotPCA --outFileNameData
  • plotCorrelation --outFileCorMatrix
  • plotProfile --outFileNameData

Please be aware that some tools (namely, plotFingerprint --outRawCounts and plotCoverage --outRawCounts) are only supported as of deepTools version 2.6. For earlier output from plotCoverage --outRawCounts, you can use #'chr' 'start' 'end' in utils/search_patterns.yaml (see here for more details). Also for these types of files, you may need to increase the maximum file size supported by MultiQC (log_filesize_limit in the MultiQC configuration file). You can find details regarding the configuration file location here.

Note that sample names are parsed from the text files themselves, they are not derived from file names.

deepTools file search patterns See docs
  contents: >
    	Frag. Sampled	Frag. Len. Min.	Frag.
    Len. 1st. Qu.	Frag. Len. Mean	Frag. Len.
    Median	Frag. Len. 3rd Qu.
  num_lines: 1
  contents: '#bamPEFragmentSize'
  num_lines: 1
  contents: >
    Sample	Total Reads	Mapped
    Reads	Alignments in blacklisted
    regions	Estimated mapped reads
  num_lines: 1
  contents: '#plotCorrelation --outFileCorMatrix'
  num_lines: 1
  contents: 'sample	mean	std	min	25%	50%	75%	max'
  num_lines: 1
  contents: '#plotCoverage --outRawCounts'
  num_lines: 1
  contents: >
    file	featureType	percent	featureReadCount	totalReadCount
  num_lines: 1
  contents: '#plotFingerprint --outRawCounts'
  num_lines: 1
  contents: >
    Sample	AUC	Synthetic
    AUC	X-intercept	Synthetic
    X-intercept	Elbow Point	Synthetic Elbow
  num_lines: 1
  contents: '#plotPCA --outFileNameData'
  num_lines: 1
  contents: bin labels
  num_lines: 1


Disambiguation algorithm for reads aligned to two species (e.g. human and mouse genomes) from Tophat, Hisat2, STAR or BWA mem. Both a Python and C++ implementation are offered.

The MultiQC module for Disambiguate parses the summary files generated by Disambiguate.

Disambiguate file search patterns See docs
  contents: unique species A pairs
  num_lines: 2


The featureCounts module parses results generated by featureCounts, a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations.

As of MultiQC v1.10, the module should also work with output from Rsubread. Note that your filenames must end in .summary to be discovered. See Module search patterns for how to customise this.

Please note that if files are in "Rsubread mode" then lines will be split by any whitespace, instead of tab characters. As such, filenames with spaces in will cause the parsing to fail.

featureCounts file search patterns See docs
  fn: '*.summary'
  shared: true


The fgbio MultiQC module currently supports tool the following outputs:

Fgbio file search patterns See docs
  contents: fraction_gt_or_eq_family_size
  num_lines: 3
  contents: >
    read_number	position	bases_total	errors	error_rate	a_to_c_error_rate	a_to_g_error_rate	a_to_t_error_rate	c_to_a_error_rate	c_to_g_error_rate	c_to_t_error_rate
  num_lines: 3


Developed by the Data Science and Data Engineering group at the Broad Institute, the GATK toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping. Its powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Supported tools:

  • BaseRecalibrator
  • VariantEval


BaseRecalibrator is a tool for detecting systematic errors in read base quality scores of aligned high-throughput sequencing reads. It outputs a base quality score recalibration table that can be used in conjunction with the PrintReads tool to recalibrate base quality scores.


VariantEval is a general-purpose tool for variant evaluation. It gives information about percentage of variants in dbSNP, genotype concordance, Ti/Tv ratios and a lot more.

GATK file search patterns See docs
  contents: '#:GATKTable:TiTvVariantEvaluator'
  shared: true
  contents: '#:GATKTable:Arguments:Recalibration'
  num_lines: 3

goleft indexcov

The goleft indexcov module parses results generated by goleft indexcov. It uses the PED and ROC data files to create diagnostic plots of coverage per sample, helping to identify sample gender and coverage issues.

By default, we attempt to only plot chromosomes using standard human-like naming (chr1, chr2... chrX or 1, 2 ... X) but you can specify chromosomes for detailed ROC plots for alternative naming schemes in your configuration with:

    - I
    - II
    - III

The number of plotted chromosomes is limited to 50 by default, you can customise this with the following:

  max_chroms: 80
goleft indexcov file search patterns See docs
  fn: '*-indexcov.roc'
  fn: '*-indexcov.ped'


Hap.py file search patterns See docs
  fn: '*.summary.csv'
  contents: Type,Filter,TRUTH


The HiCExplorer module parses results generated by HiCExplorere's hicBuildMatrix, a tool to create an interaction matrix out of mapped Hi-C reads.

HiCExplorer file search patterns See docs
  contents: Min rest. site distance
  max_filesize: 4096
  num_lines: 26


This module takes the JSON output of the HOPS postprocessing R script (Version >= 0.34). to recreate the possible positives heatmap, with the heat intensity representing the number of 'ancient DNA characteristics' categories (small edit distance, damage, both edit distance and aDNA damage) that a particular taxon has.

HOPS file search patterns See docs
  fn: heatmap_overview_Wevid.json


HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and next-gen sequencing analysis. HOMER contains many useful tools for analyzing ChIP-Seq, GRO-Seq, RNA-Seq, DNase-Seq, Hi-C and numerous other types of functional genomics sequencing data sets.

The HOMER MultiQC module currently only parses output from the findPeaks tool. If you would like support to be added for other HOMER tools, please open a new issue on the MultiQC GitHub page.


The HOMER findPeaks MultiQC module parses the summary statistics found at the top of HOMER peak files. Three key statistics are shown in the General Statistics table, all others are saved to multiqc_data/multiqc_homer_findpeaks.txt.


The HOMER tag directory submodule parses output from files tag directory output files, generating a number of diagnostic plots.

HOMER file search patterns See docs
  contents: '# HOMER Peaks'
  num_lines: 3
  fn: tagGCcontent.txt
  fn: genomeGCcontent.txt
  fn: 'petagRestrictionDistribution.*.txt'
  fn: tagLengthDistribution.txt
  fn: tagInfo.txt
  fn: petag.FreqDistribution_1000.txt


HTSeq is a general purpose Python package that provides infrastructure to process data from high-throughput sequencing assays. htseq-count is a tool that is part of the main HTSeq package - it takes a file with aligned sequencing reads, plus a list of genomic features and counts how many reads map to each feature.

HTSeq file search patterns See docs
  contents: __too_low_aQual


The Kaiju module parses output generated by kaiju2table from Kaiju, a fast and sensitive taxonomic classification for metagenomics. e.g:

kaiju -i R1.fq.gz -j R2.fq.gz -o output_kaiju.txt
kaiju2table -t nodes.dmp -n names.dmp -r species -o kaiju2table_species.txt output_kaiju.txt
kaiju2table -t nodes.dmp -n names.dmp -r phylum -o kaiju2table_phylum.txt output_kaiju.txt
Kaiju file search patterns See docs
  contents_re: >
  num_lines: 1


The MultiQC module supports outputs from both Kraken and Kraken 2.

It works with report files generated using the --report flag, that look like the following:

11.66   98148   98148   U   0   unclassified
88.34   743870  996 -   1   root
88.22   742867  0   -   131567    cellular organisms
88.22   742866  2071    D   2       Bacteria
87.95   740514  2914    P   1239          Firmicutes

A bar graph is generated that shows the number of fragments for each sample that fall into the top categories for each taxa rank. The top categories are calculated by summing the library percentages across all samples.

Kraken file search patterns See docs
  contents_re: '^\s{1,2}(\d{1,2}\.\d{1,2})\t(\d+)\t(\d+)\t([\dUDKPCOFGS-]{1,3})\t(\d+)\s+(.+)'
  num_lines: 2


MACS2 (Model-based Analysis of ChIP-Seq) is a tool for identifying transcript factor binding sites. MACS captures the influence of genome complexity to evaluate the significance of enriched ChIP regions.

The MACS2 MultiQC module reads the header of the *_peaks.xls results files and prints the redundancy rates in the General Statistics table. Numerous additional values are parsed and saved to multiqc_data/multiqc_macs2.txt.

MACS2 file search patterns See docs
  fn: '*_peaks.xls'


The methylQA module parses results generated by methylQA, a methylation sequencing data quality assessment tool.

methylQA file search patterns See docs
  fn: '*.report'
  shared: true


Mosdepth performs fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.

It can generate several output files all with a common prefix and different endings:

  • per-base depth ({prefix}.per-base.bed.gz),
  • mean per-window depth given a window size ({prefix}.regions.bed.gz, if a BED file provided with --by),
  • mean per-region given a BED file of regions ({prefix}.regions.bed.gz, if a window size provided with --by),
  • a distribution of proportion of bases covered at or above a given threshhold for each chromosome and genome-wide ({prefix}.mosdepth.global.dist.txt and {prefix}.mosdepth.region.dist.txt),
  • quantized output that merges adjacent bases as long as they fall in the same coverage bins ({prefix}.quantized.bed.gz),
  • threshold output to indicate how many bases in each region are covered at the given thresholds ({prefix}.thresholds.bed.gz)

The MultiQC module plots coverage distributions from 2 kinds of outputs:

  • {prefix}.mosdepth.region.dist.txt
  • {prefix}.mosdepth.global.dist.txt

Using "region" if exists, otherwise "global". Plotting 3 figures:

  • Distribution of the number of locations in the genome with a given depth of coverage.
  • Absoulute number of locations in the genome with a given depth of coverage.
  • Average coverage per contig/chromosome.

Also plotting the percentage of the genome covered at a threshold in the General Stats section. The default thresholds are 1, 5, 10, 30, 50, which can be customised in the config as follows:

        - 10
        - 20
        - 40
        - 200
        - 30000

You can also specify which columns would be hidden when the report loads (by default, all values are hidden except 30X):

        - 10
        - 20
        - 200
mosdepth file search patterns See docs
  fn: '*.mosdepth.global.dist.txt'
  fn: '*.mosdepth.region.dist.txt'


The miRTrace module parses results generated by miRTrace, a quality control software for small RNA sequencing data.

miRTrace performs adapter trimming and discards the reads that fail to pass the QC filters. miRTrace specifically addresses sequencing quality, read length, sequencing depth and miRNA complexity and also identifies the presence of both miRNAs and undesirable sequences derived from tRNAs, rRNAs, or Illumina artifact sequences.

miRTrace also profiles clade-specific miRNAs based on a comprehensive catalog of clade-specific miRNA families identified previously. With this information, miRTrace can detect exogenous miRNAs, which could be contamination derived, e.g. index mis-assignment on sample demultiplexing, or biologically derived, e.g. parasitic RNAs.

miRTrace file search patterns See docs
  fn: mirtrace-results.json
  fn: mirtrace-stats-length.tsv
  fn: mirtrace-stats-contamination_basic.tsv
  fn: mirtrace-stats-mirna-complexity.tsv


This tool is dedicated to the creation and management of miRNA alignment output using the standardized GFF3 format (https://github.com/miRTop/mirGFF3). A unified miRNA alignment format allows to easily compare the output of different alignement tools.

Currently, mirtop can convert into mirGFF3 the outputs of commonly used pipelines, such as seqbuster, isomiR-SEA, sRNAbench, Prost! as well as BAM files.

mirtop file search patterns See docs
  fn: '*_mirtop_stats.log'


A method to compute mitochondrial to nuclear reads ratios for NGS data.

MTNucRatio file search patterns See docs
  fn: '*mtnuc.json'


MultiVCFanalyzer reads multiple VCF files as produced by GATK UnifiedGenotyper, performs filtering based on a number of criteria, and provides the combined genotype calls in a number of formats that are suitable for follow-up analyses such as phylogenetic reconstruction, SNP effect analyses, population genetic analyses etc.

MultiVCFAnalyzer file search patterns See docs
  fn: MultiVCFAnalyzer.json


Used to generate three quality metrics: NSC, RSC, and PBC. The NSC (Normalized strand cross-correlation) and RSC (relative strand cross-correlation) metrics use cross-correlation of stranded read density profiles to measure enrichment independently of peak calling. The PBC (PCR bottleneck coefficient) is an approximate measure of library complexity. PBC is the ratio of (non-redundant, uniquely mappable reads)/(uniquely mappable reads).

phantompeakqualtools file search patterns See docs
  fn: '*.spp.out'


Peddy compares familial-relationships and sexes as reported in a PED file with those inferred from a VCF.

It samples the VCF at about 25000 sites (plus chrX) to accurately estimate relatedness, IBS0, heterozygosity, sex and ancestry. It uses 2504 thousand genome samples as backgrounds to calibrate the relatedness calculation and to make ancestry predictions.

It does this very quickly by sampling, by using C for computationally intensive parts, and by parallelization.

Peddy file search patterns See docs
  fn: '*.peddy.ped'
  fn: '*.het_check.csv'
  fn: '*.ped_check.csv'
  fn: '*.sex_check.csv'
  fn: '*.background_pca.json'


The Picard module parses results generated by Picard, a set of Java command line tools for manipulating high-throughput sequencing data.

Supported commands:

  • MarkDuplicates
  • InsertSizeMetrics
  • GcBiasMetrics
  • HsMetrics
  • OxoGMetrics
  • BaseDistributionByCycl
  • RnaSeqMetrics
  • AlignmentSummaryMetrics
  • RrbsSummaryMetrics
  • ValidateSamFile
  • VariantCallingMetrics
  • QualityByCycleMetrics
  • QualityScoreDistributionMetrics
  • QualityYieldMetrics


If a BAM file contains multiple read groups, Picard MarkDuplicates generates a report with multiple metric lines, one for each "library".

By default, MultiQC will sum the values for every library it finds and recompute the PERCENT_DUPLICATION and ESTIMATED_LIBRARY_SIZE fields, giving a single set of results for each BAM file.

If instead you would prefer each library to be treated as a separate sample, you can do so by setting the following MultiQC config:

    markdups_merge_multiple_libraries: False

This prevents the merge and recalculation and appends the library name to the sample name.

This behaviour is present in MultiQC since version 1.9. Before this, only the metrics from the first library were taken and all others were ignored.


By default, the insert size plot is smoothed to contain a maximum of 500 data points per sample. This is to prevent the MultiQC report from being very large with big datasets. If you would like to customise this value to get a better resolution you can set the following MultiQC config values, with the new maximum number of points:

    insertsize_smooth_points: 10000

Coverage Levels

It's possible to customise the HsMetrics "Target Bases 30X" coverage and WgsMetrics "Fraction of Bases over 30X" that are shown in the general statistics table. This must correspond to field names in the picard report, such as PCT_TARGET_BASES_2X / PCT_10X. Any numbers not found in the reports will be ignored.

The coverage levels available for HsMetrics are typically 1, 2, 10, 20, 30, 40, 50 and 100X.

The coverage levels available for WgsMetrics are typically 1, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90 and 100X.

To customise this, add the following to your MultiQC config:

        - 10
        - 50

ValidateSamFile Search Pattern

Generally, Picard adds identifiable content to the output of function calls. This is not the case for ValidateSamFile. In order to identify logs the MultiQC Picard submodule ValidateSamFile will search for filenames that contain 'validatesamfile' or 'ValidateSamFile'. One can customise the used search pattern by overwriting the picard/sam_file_validation pattern in your MultiQC config. For example:

        fn: '*[Vv]alidate[Ss]am[Ff]ile*'


Note that the Target Region Coverage plot is generated using the PCT_TARGET_BASES_ table columns from the HsMetrics output (not immediately obvious when looking at the log files).

You can customise the columns shown in the HsMetrics table with the config keys HsMetrics_table_cols and HsMetrics_table_cols_hidden. For example:

        - OFF_BAIT_BASES
        - ON_BAIT_BASES

Only values listed in HsMetrics_table_cols will be included in the table. Anything listed in HsMetrics_table_cols_hidden will be hidden by default.

Picard file search patterns See docs
  contents: AlignmentSummaryMetrics
  shared: true
  contents: BaseDistributionByCycleMetrics
  shared: true
  contents: GcBias
  shared: true
  contents: HsMetrics
  shared: true
  contents: InsertSizeMetrics
  shared: true
  contents: DuplicationMetrics
  shared: true
  contents: OxoGMetrics
  shared: true
  contents: TargetedPcrMetrics
  shared: true
  contents_re: '[Qq]uality[Bb]y[Cc]ycle'
  contents: MEAN_QUALITY
  shared: true
  contents_re: '[Qq]uality[Ss]core[Dd]istribution'
  contents: COUNT_OF_Q
  shared: true
  contents: QualityYieldMetrics
  shared: true
  contents_re: '[Rr]na[Ss]eq[Mm]etrics'
  contents: '## METRICS CLASS'
  shared: true
  contents: RrbsSummaryMetrics
  shared: true
  fn: '*[Vv]alidate[Ss]am[Ff]ile*'
  fn: '*.variant_calling_detail_metrics'
  contents: CollectVariantCallingMetrics
  shared: true
  contents: CollectWgsMetrics
  shared: true


The Preseq module parses results generated by Preseq, a tool that estimates the complexity of a library, showing how many additional unique reads are sequenced for increasing total read count.

When preseq lc_extrap is run with the default parameters, the extrapolation points reach 10 billion molecules making the plot difficult to interpret in most scenarios. It also includes a lot of data in the reports, which can unnecessarily inflate report file sizes. To avoid this, MultiQC trims back the x axis until each dataset shows 80% of its maximum y-value (unique molecules).

To disable this feature and show all of the data, add the following to your MultiQC configuration:

    notrim: true

Using coverage instead of read counts

Preseq reports its numbers as "Molecule counts". This isn't always very intuitive, and it's often easier to talk about sequencing depth in terms of coverage. You can plot the estimated coverage instead by specifying the reference genome or target size, and the read length in your MultiQC configuration:

    genome_size: 3049315783
    read_length: 300

These parameters make the script take every molecule count and divide it by (genome_size / read_length).

MultiQC comes with effective genome size presets for Human and Mouse, so you can provide the genome build name instead, like this: genome_size: hg38_genome. The following values are supported: hg19_genome, hg38_genome, mm10_genome.

When the genome and read sizes are provided, MultiQC will plot the molecule counts on the X axis ("total" data) and coverages on the Y axis ("unique" data). However, you can customize what to plot on each axis (counts or coverage), e.g.:

    x_axis: counts
    y_axis: coverage

Plotting externally calculated read counts

To mark on the plot the read counts calculated externally from BAM or fastq files, create a file with preseq_real_counts in the filename and place it with your analysis files. It should be space or tab delimited with 2 or 3 columns (column 1 = preseq file name, column 2 = real read count, optional column 3 = real unique read count). For example:

Sample_1.preseq.txt 3638261 3638011
Sample_2.preseq.txt 1592394 1592133

You can generate a line for such a file using samtools:

echo "Sample_1.preseq.txt "$(samtools view -c -F 4 Sample_1.bam)" "$(samtools view -c -F 1028 Sample_1.bam)
Preseq file search patterns See docs
    num_lines: 2
    contents: distinct_reads
    num_lines: 2
  fn: '*preseq_real_counts*'


The Prokka module analyses summary results from the Prokka annotation pipeline for prokaryotic genomes. The Prokka module accepts two configuration options:

  • prokka_table: default False. Show a table in the report.
  • prokka_barplot: default True. Show a barplot in the report.
  • prokka_fn_snames: default False. Use filenames for sample names (see below).

Sample names are generated using the first line in the prokka reports:

organism: Helicobacter pylori Sample1

The module assumes that the first two words are the organism name and the third is the sample name. So the above will give a sample name of Sample1.

If you prefer, you can set config.prokka_fn_snames to True and MultiQC will instead use the log filename as the sample name.

Prokka file search patterns See docs
  contents: 'contigs:'
  num_lines: 2


The QoRTs software package is a fast, efficient, and portable multifunction toolkit designed to assist in the analysis, quality control, and data management of RNA-Seq datasets. Its primary function is to aid in the detection and identification of errors, biases, and artifacts produced by paired-end high-throughput RNA-Seq technology. In addition, it can produce count data designed for use with differential expression and differential exon usage tools, as well as individual-sample and/or group-summary genome track files suitable for use with the UCSC genome browser.

QoRTs file search patterns See docs
  contents: BENCHMARK_MinutesOnSamIteration
  num_lines: 100


The Qualimap module parses results generated by Qualimap, a platform-independent application to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.

The MultiQC module supports the Qualimap commands BamQC and RNASeq. Note that Qualimap must be run with the -outdir option as well as -outformat HTML (which is on by default). MultiQC uses files found within the raw_data_qualimapReport folder (as well as genome_results.txt).

Qualimap adds lots of columns to the General Statistics table. To avoid making the table too wide and bloated, some of these are hidden by default (Error Rate, M Aligned, M Total reads). You can override these defaults in your MultiQC config file - for example, to show Error Rate by default and hide Ins. size by default, add the following:

        general_error_rate: True
        median_insert_size: False

See the relevant section of the documentation for more detail.

In addition to this, it's possible to customise which coverage thresholds calculated by the Qualimap BamQC module (default: 1, 5, 10, 30, 50) and which of these are hidden in the General Statistics tablewhen the report loads (default: all hidden except 30X).

To do this, add something like the following to your MultiQC config file:

        - 10
        - 20
        - 40
        - 200
        - 30000
        - 10
        - 20
        - 200
Qualimap file search patterns See docs
  fn: genome_results.txt
  fn: coverage_histogram.txt
  fn: insert_size_histogram.txt
  fn: genome_fraction_coverage.txt
  fn: mapped_reads_gc-content_distribution.txt
  fn: rnaseq_qc_results.txt
  fn: coverage_profile_along_genes_(total).txt


QUAST evaluates genome assemblies by computing various metrics, including

  • N50, length for which the collection of all contigs of that length or longer covers at least 50% of assembly length
  • NG50, where length of the reference genome is being covered
  • NA50 and NGA50, where aligned blocks instead of contigs are taken
  • Misassemblies, misassembled and unaligned contigs or contigs bases
  • Genes and operons covered

The QUAST MultiQC module parses the report.tsv files generated by QUAST and adds key metrics to the report General Statistics table. All statistics for all samples are saved to multiqc_data/multiqc_quast.txt.


By default, the QUAST module is configured to work with large de-novo genomes, showing thousands of contigs, mega-base pairs and other sensible defaults.

If these aren't appropriate for your genomes, you can configure them as follows:

    contig_length_multiplier: 0.001
    contig_length_suffix: 'Kbp'
    total_length_multiplier: 0.000001
    total_length_suffix: 'Mbp'
    total_number_contigs_multiplier: 0.001
    total_number_contigs_suffix: 'K'

The default module values are shown above. See the main MultiQC documentation for more information about how to configure MultiQC.


The QUAST module will also parse output from MetaQUAST runs (metaquast.py).

The combined_reference/report.tsv file is parsed, and folders runs_per_reference and not_aligned are ignored.

If you want to run MultiQC against auxiliary MetaQUAST runs, you must explicitly pass these files to MultiQC:

multiqc runs_per_reference/reference_1/report.tsv

Note that you can pass as many file paths to MultiQC as you like and use glob expansion (eg. runs_per_reference/*/report.tsv).

QUAST file search patterns See docs
  fn: report.tsv
  shared: true


The RSeQC module parses results generated by RNA-SeQC, (not to be confused with RSeQC, which MultiQC also supports).

RNA-SeQC is a java program which computes a series of quality control metrics for RNA-seq data.

This module shows the Spearman correlation heatmap if both Spearman and Pearson's are found. To plot Pearson's by default instead, add the following to your MultiQC config file:

    default_correlation: pearson
RNA-SeQC file search patterns See docs
  fn: '*metrics.tsv'
  contents: 'Sample	Note	'
  shared: true
  fn: '*metrics.tsv'
  contents: High Quality Ambiguous Alignment Rate
  shared: true
  fn_re: meanCoverageNorm_(high|medium|low)\.txt
  fn_re: corrMatrix(Pearson|Spearman)\.txt


Rockhopper aligns reads to coding sequences, rRNAs, tRNAs, and miscellaneous RNAs on both the sense and anti-sense strand. These statistics are summarized in the Rockhopper bar plot in this module.

Rockhopper file search patterns See docs
  fn: summary.txt
  contents: >
    Number of gene-pairs predicted to be
    part of the same operon
  max_filesize: 500000


The rsem module parses results generated by RSEM, a software package for estimating gene and isoform expression levels from RNA-Seq data

Supported scripts:

  • rsem-calculate-expression

This module search for the file .cnt created by RSEM into directory named PREFIX.stat

RSEM file search patterns See docs
  fn: '*.cnt'


The RSeQC module parses results generated by RSeQC, a package that provides a number of useful modules that can comprehensively evaluate high throughput RNA-seq data.

Supported scripts:

  • bam_stat
  • gene_body_coverage
  • infer_experiment
  • inner_distance
  • junction_annotation
  • junction_saturation
  • read_distribution
  • read_duplication
  • read_gc

You can choose to hide sections of RSeQC output and customise their order. To do this, add and customise the following to your MultiQC config file:

    - read_distribution
    - gene_body_coverage
    - inner_distance
    - read_gc
    - read_duplication
    - junction_annotation
    - junction_saturation
    - infer_experiment
    - bam_stat

Change the order to rearrage sections or remove to hide them from the report.

RSeQC file search patterns See docs
  contents: 'Proper-paired reads map to different chrom:'
  max_filesize: 500000
  fn: '*.geneBodyCoverage.txt'
  fn: '*.inner_distance_freq.txt'
  contents: 'Partial Novel Splicing Junctions:'
  max_filesize: 500000
  fn: '*.junctionSaturation_plot.r'
  fn: '*.GC.xls'
  contents: 'Group               Total_bases         Tag_count           Tags/Kb'
  max_filesize: 500000
  fn: '*.pos.DupRate.xls'
    fn: '*infer_experiment.txt'
    contents: Fraction of reads explained by
    max_filesize: 500000


The Samblaster module parses results generated by Samblaster, a tool to mark duplicates and extract discordant and split reads from sam files.

Samblaster file search patterns See docs
  contents: 'samblaster: Version'
  shared: true


The Samtools module parses results generated by Samtools, a suite of programs for interacting with high-throughput sequencing data.

Supported commands:

  • stats
  • flagstats
  • idxstats
  • rmdup


The samtools idxstats prints its results to standard out (no consistent file name) and has no header lines (no way to recognise from content of file). As such, idxstats result files must have the string idxstat somewhere in the filename.

There are a few MultiQC config options that you can add to customise how the idxstats module works. A typical configuration could look as follows:

# Always include these chromosomes in the plot
    - X
    - Y

# Never include these chromosomes in the plot
    - MT

# Threshold where chromosomes are ignored in the plot.
# Should be a fraction, default is 0.001 (0.1% of total)
samtools_idxstats_fraction_cutoff: 0.001

# Name of the X and Y chromosomes.
# If not specified, MultiQC will search for any chromosome
# names that look like x, y, chrx or chry (case insensitive search)
samtools_idxstats_xchr: myXchr
samtools_idxstats_ychr: myYchr
Samtools file search patterns See docs
  contents: This file was produced by samtools stats
  shared: true
  contents: >
    in total (QC-passed reads + QC-failed
  shared: true
  fn: '*idxstat*'
  contents: '[bam_rmdup'
  shared: true


The sargasso module parses results generated by Sargasso, a tool for separating mixed-species RNA-seq reads according to their species of origin.

Sargasso file search patterns See docs
  fn: overall_filtering_summary.txt


A python script to calculate the relative coverage of X and Y chromosomes, and their associated error bars, from the depth of coverage at specified SNPs.

Sex.DetErrMine file search patterns See docs
  fn: sexdeterrmine.json


Slamdunk is a tool to analyze data from the SLAM-Seq sequencing protocol.

This module should be able to parse logs from v0.2.2-dev onwards.

Slamdunk file search patterns See docs
  contents: '# slamdunk summary'
  num_lines: 1
  contents: '# slamdunk PCA'
  num_lines: 1
  contents: '# slamdunk rates'
  num_lines: 1
  contents: '# slamdunk utrrates'
  num_lines: 1
  contents: '# slamdunk tcperreadpos'
  num_lines: 1
  contents: '# slamdunk tcperutr'
  num_lines: 1


The SnpEff module parses results generated by SnpEff, a genetic variant annotation and effect prediction toolbox. It annotates and predicts the effects of variants on genes (such as amino acid changes).

MultiQC parses the summary .csv file that is generated by SnpEff. Note that you must run SnpEff with -csvStats <filename> for this to be generated. See the SnpEff documentation for more information.

SnpEff file search patterns See docs
  contents: SnpEff_version
  max_filesize: 5000000


Currently only the "Allele-tagging" and "Allele-sorting" reports are supported. The log files from the genome creation steps are not parsed and there are no plots/tables produced from the "SNP coverage" report.

Differences between the numbers in the tagging and sorting reports are due to paired-end reads. For these, if only a single mate in a pair is assigned to a genome then it will "rescue" its mate and both will be "sorted" into that genome (even though only one of them was tagged). Conversely, if the mates in a pair are tagged as arising from different genomes, then the pair as a whole is unassignable.

SNPsplit file search patterns See docs
  contents: 'Writing allele-flagged output file to:'
  num_lines: 2
  fn: '*SNPsplit_report.yaml'


Somalier can be used to find sample swaps or duplicates in cancer projects, where there is often no jointly-called VCF across samples.

It is also extremely efficient and so can be used to find all-vs-all relatedness estimates for thousands of samples.

It also outputs information on sex, depth, heterozgyosity, and ancestry to be use for general QC.

Somalier file search patterns See docs
  fn: '*.somalier-ancestry.tsv'
  fn: '*.samples.tsv'
  contents: '#family_id'
  num_lines: 5
  fn: '*.pairs.tsv'
  contents: hom_concordance
  num_lines: 5


Important notes

Due to the size of the histogram_kmer_count.json files, MultiQC is likely to skip these files. To be able to display these you will need to change the MultiQC configuration to allow for larger logfiles, see the MultiQC documentation. For instance, if you run MultiQC as part of an analysis pipeline, you can create a multiqc_config.yaml file in the working directory, containing the following line:

log_filesize_limit: 100000000

General Notes

The Supernova module parses the reports from an assembly run. As a bare minimum it requires the file report.txt, found in the folder sampleID/outs/, to function. Note! If you are anything like the author (@remiolsen), you might only have files (often renamed to, e.g. sampleID-report.txt) lying around due to disk space limitations and for ease of sharing with your colleagues. This module will search for *report*.txt. If available the stats in the report file will be superseded by the higher precision numbers found in the file sampleID/outs/assembly/stats/summary.json. In the same folder, this module will search for the following plots and render them:

  • histogram_molecules.json -- Inferred molecule lengths
  • histogram_kmer_count.json -- Kmer multiplicity

This module has been tested using Supernova versions 1.1.4 and 1.2.0

Supernova file search patterns See docs
  fn: '*report*.txt'
  num_lines: 100
  contents: '- assembly checksum ='
  fn: summary.json
  num_lines: 120
  contents: '"lw_mean_mol_len":'
  fn: histogram_molecules.json
  num_lines: 10
  contents: '"description": "molecules",'
  fn: histogram_kmer_count.json
  num_lines: 10
  contents: '"description": "kmer_count",'


Very important note

This module will only work with Stacks version 2.1 or greater. Furthermore, this module is designed to only parse some of the output from the denovo_map pipeline. If you are missing some functionality, please submit an issue on the MultiQC github page

Stacks file search patterns See docs
  fn: gstacks.log.distribs
  contents: BEGIN effective_coverages_per_sample
  fn: populations.log.distribs
  contents: BEGIN missing_samples_per_loc_prefilters
  fn: '*.sumstats_summary.tsv'
  contents: '# Pop ID	Private	Num_Indv	Var	StdErr	P	Var'
  max_filesize: 1000000


THeTA2 (Tumor Heterogeneity Analysis) is an algorithm that estimates the tumour purity and clonal / subclonal copy number aberrations directly from high-throughput DNA sequencing data.

The THeTA2 MultiQC module plots the % germline and % tumour subclone for each sample. Note that each sample can have multiple maximum likelihood solutions - the MultiQC module plots proportions for the first one in the results file (*.BEST.results). Also note that if there are more than 5 tumour subclones, their percentages are summed.

THeTA2 file search patterns See docs
  fn: '*.BEST.results'


VarScan is a platform-independent mutation caller for targeted, exome, and whole-genome resequencing data generated on Illumina, SOLiD, Life/PGM, Roche/454, and similar instruments. The newest version, VarScan 2, is written in Java, so it runs on most operating systems.

VarScan can be used to detect different types of variation:

  • Germline variants (SNPs an dindels) in individual samples or pools of samples.
  • Multi-sample variants (shared or private) in multi-sample datasets (with mpileup).
  • Somatic mutations, LOH events, and germline variants in tumor-normal pairs.
  • Somatic copy number alterations (CNAs) in tumor-normal exome data.

The MultiQC module can read output from mpileup2cns, mpileup2snp and mpileup2indel logfiles.

VarScan2 file search patterns See docs
  contents: Only SNPs will be reported
  num_lines: 3
  contents: Only indels will be reported
  num_lines: 3
  contents: Only variants will be reported
  num_lines: 3


Important General Note

  • Depending on the size and density of the variant data (vcf), some of the stat files generated by vcftools can be very large. If you find that some of your input files are missing, increase the config.log_filesize_limit so that the large file(s) will not be skipped by MultiQC. Note, however, that this might make MultiQC very slow!

This module parses the outputs from VCFTools' various commands:


  • relatedness2
    • Plots a heatmap of pairwise sample relatedness.
    • Not to be confused with the similarly-named command relatedness
  • TsTv-by-count
    • Plots the transition to transversion ratio as a function of alternative allele count (using only bi-allelic SNPs).
  • TsTv-by-qual
    • Plots the transition to transversion ratio as a function of SNP quality threshold (using only bi-allelic SNPs).
  • TsTv-summary
    • Plots a bargraph of the summary counts of each type of transition and transversion SNPs.

To do

VCFTools has a number of outputs not yet supported in MultiQC which would be good to add. Please check GitHub if you'd like these added or (better still), would like to contribute!

VCFTools file search patterns See docs
  fn: '*.relatedness2'
  fn: '*.TsTv.count'
  fn: '*.TsTv.qual'
  fn: '*.TsTv.summary'


A key step in any genetic analysis is to verify whether data being generated matches expectations. verifyBamID checks whether reads in a BAM file match previous genotypes for a specific sample. In addition, it detects possible sample mixture from population allele frequency only, which can be particularly useful when the genotype data is not available.

Using a mathematical model that relates observed sequence reads to an hypothetical true genotype, verifyBamID tries to decide whether sequence reads match a particular individual or are more likely to be contaminated (including a small proportion of foreign DNA), derived from a closely related individual, or derived from a completely different individual.

This module currently only imports data from the .selfSM output. The chipmix and freemix columns are imported into the general statistics table. A verifyBAMID section is then added, with a table containing the entire selfSM file.

If no chip data was parsed, these columns will not be added to the MultiQC report.

Should you wish to remove one of these columns from the general statistics table add the below lines to the table_columns_visible section of your config file

        CHIPMIX: False
        FREEMIX: False

This was designed to work with verifyBamID 1.1.3 January 2018

VerifyBAMID file search patterns See docs
  fn: '*.selfSM'

Custom Content


Bioinformatics projects often include non-standardised analyses, with results from custom scripts or in-house packages. It can be frustrating to have a MultiQC report describing results from 90% of your pipeline but missing the final key plot. To help with this, MultiQC has a special "custom content" module.

Custom content parsing is a little more restricted than standard modules. Specifically:

  • Only one plot per section is possible
  • Plot customisation is more limited

All plot types can be generated using custom content - see the test files for examples of how data should be structured.

Data from a released tool

If your data comes from a released bioinformatics tool, you shouldn't be using this feature of MultiQC! Sure, you can probably get it to work, but it's better if a fully-fledged core MultiQC module is written instead. That way, other users of MultiQC can also benefit from results parsing.

Note that proper MultiQC modules are more robust and powerful than this custom-content feature. You can also write modules in MultiQC plugins if they're not suitable for general release.


As of MultiQC v1.7, you can import custom images into your MultiQC reports. Simply add _mqc to the end of the filename for .png, .jpg or .jpeg files, for example: my_image_file_mqc.png or summmary_diagram.jpeg.

Images will be embedded within the HTML file, so will be self contained. Note that this means that it's very possible to make the HTML file very very large if abused!

The report section name and description will be automatically based on the filename.

MultiQC-specific data file

If you can choose exactly how your data output looks, then the easiest way to parse it is to use a MultiQC-specific format. If the filename ends in *_mqc.(yaml|yml|json|txt|csv|tsv|log|out|png|jpg|jpeg|html) then it will be found by any standard MultiQC installation with no additional customisation required (v0.9 onwards).

These files contain configuration information specifying how the data should be parsed, alongside the data. If you want to use YAML, this is an example of how it should look:

id: 'my_pca_section'
section_name: 'PCA Analysis'
description: 'This plot shows the first two components from a principal component analysis.'
plot_type: 'scatter'
    id: 'pca_scatter_plot'
    title: 'PCA Plot'
    xlab: 'PC1'
    ylab: 'PC2'
    sample_1: {x: 12, y: 14}
    sample_2: {x: 8, y: 6 }
    sample_3: {x: 5, y: 11}
    sample_4: {x: 9, y: 12}

The file format can also be JSON:

    "id": "custom_data_lineplot",
    "section_name": "Custom JSON File",
    "description": "This plot is a self-contained JSON file.",
    "plot_type": "linegraph",
    "pconfig": {
        "id": "custom_data_linegraph",
        "title": "Output from my JSON file",
        "ylab": "Number of things",
        "xDecimals": false
    "data": {
        "sample_1": { "1": 12, "2": 14, "3": 10, "4": 7, "5": 16 },
        "sample_2": { "1": 9, "2": 11, "3": 15, "4": 18, "5": 21 }

Note that if you're using plot_type: html then data just takes a string, with no sample keys.

For maximum compatibility with other tools, you can also use comma-separated or tab-separated files. Include commented header lines with plot configuration in YAML format:

# id: "Output from my script'
# section_name: 'Custom data file'
# description: 'This output is described in the file header. Any MultiQC installation will understand it without prior configuration.'
# format: 'tsv'
# plot_type: 'bargraph'
# pconfig:
#    id: 'custom_bargraph_w_header'
#    ylab: 'Number of things'
Category_1    374
Category_2    229
Category_3    39
Category_4    253

You can easily inject custom HTML snippets by ending the filename with _mqc.html - again the embedded config works in a similar way, but with a HTML comment:

id: 'custom-html'
section_name: 'Custom HTML'
description: 'This section is created using a custom HTML file'
<p>Some custom HTML content here.</p>

If no configuration is given, MultiQC will do its best to guess how to visualise your data appropriately. To see examples of typical file structures which are understood, see the test data used to develop this code. Something will be probably be shown, but it may produce unexpected results.

Data as part of MultiQC config

If you are already using a MultiQC config file to add data to your report (for example, titles / introductory text), you can give data within this file too. This can be in any MultiQC config file (for example, passed on the command line with -c my_yaml_file.yaml). This is useful as you can keep everything contained within a single file (including stuff unrelated to this specific custom content feature of MultiQC).

To be understood by MultiQC, the custom_data key must be found. This must contain a section with a unique id, specific to your new report section. Finally, the contents of this second dictionary will look the same as the above stand-alone YAML files. For example:

        id: 'mqc_config_file_section'
        section_name: 'My Custom Section'
        description: 'This data comes from a single multiqc_config.yaml file'
        plot_type: 'bargraph'
            id: 'barplot_config_only'
            title: 'MultiQC Config Data Plot'
            ylab: 'Number of things'
                first_thing: 12
                second_thing: 14
                first_thing: 8
                second_thing: 6
                first_thing: 11
                second_thing: 5
                first_thing: 12
                second_thing: 9

Or to add data to the General Statistics table:

        plot_type: 'generalstats'
            - col_1:
                max: 100
                min: 0
                scale: 'RdYlGn'
                suffix: '%'
            - col_2:
                min: 0
                col_1: 14.32
                col_2: 1.2
                col_1: 84.84
                col_2: 1.9

Note: Use a list of headers in pconfig (keys prepended with -) to specify the order of columns in the General Statistics table.

See the general statistics docs for more information about configuring data for the General Statistics table.

Separate configuration and data files

It's not always possible or desirable to include MultiQC configuration within a data file. If this is the case, you can add to the MultiQC configuration to specify how input files should be parsed.

As described in the above Data as part of MultiQC config section, this configuration should be held within a section called custom_data with a section-specific id. The only difference is that no data subsection is given and a search pattern for the given id must be supplied.

Search patterns are added as with any other module. Ensure that the search pattern key is the same as your custom_data section ID.

For example, a MultiQC config file could look as follows:

# Other MultiQC config stuff here
        file_format: 'tsv'
        section_name: 'Coverage Decay'
        description: 'This plot comes from files acommpanied by a mutliqc_config.yaml file for configuration'
        plot_type: 'linegraph'
            id: 'example_coverage_lineplot'
            title: 'Coverage Decay'
            ylab: 'X Coverage'
            ymax: 100
            ymin: 0
        fn: 'example_files_*'

And work with the following data file: example_files_Sample_1.txt:

0   98.22076066
1   97.96764159
2   97.78227175
3   97.61262195
# [...]

This kind of customisation should work with most Custom Content types. For example, using an image called some_science_mqc.jpeg gives us a report section some_science, which we can then add a nicer name and description to:

        section_name: 'Some real science'
        description: 'This description comes from multiqc_config.yaml and helps to annotate the Custom Content image.'

As mentioned above - if no configuration is given, MultiQC will do its best to guess how to visualise your data appropriately. To see examples of typical file structures which are understood, see the test data used to develop this code.


Grouping sections and subsections

If you have multiple content types that you would like to group together with MultiQC sub-sections, you can do so using the following keys:

parent_id: custom_section
parent_name: 'Some grouped data'
parent_description: 'This parent section contains one or more sub-sections below it'

Any custom-content files that share the same parent_id will be grouped.

Note that some things, such as parent_name are taken from the first file that MultiQC finds with this parent_id. So it's a good idea to specify this in every file. parent_description and extra is taken from the first file where it is set.

Order of sections

If you have multiple different Custom Content sections, their order will be random and may vary between runs. To avoid this, you can specify an order in your MultiQC config as follows:

    - first_cc_section
    - second_cc_section

Each section name should be the ID assigned to that section. You can explicitly set this (see below), or the Custom Content module will automatically assign an ID. To find out what your custom content section ID is, generate a report and click the side navigation to your section. The browser URL should update and show something that looks like this:


The section ID is the part after the # (my_cc_section in the above section).

Note that any Custom Content sections found that are not specified in the config will be placed at the top of the report.

Section configuration

See below for how these config options can be specified (either within the data file or in a MultiQC config file). All of these configuration parameters are optional, and MultiQC will do its best to guess sensible defaults if they are not specified.

All possible configuration keys and their default values are shown below:

id: null                # Unique ID for report section.
section_anchor: <id>    # Used in report section #soft-links
section_name: <id>      # Nice name used for the report section header
section_href: null      # External URL for the data, to find more information
description: null       # Introductory text to be printed under the section header
section_extra: null     # Custom HTML to add after the section description
file_format: null       # File format of the data (eg. csv / tsv)
plot_type: null         # The plot type to visualise the data with.
                        # generalstats | table | bargraph | linegraph | scatter | heatmap | beeswarm
pconfig: {}             # Configuration for the plot.

Note that any custom content data found with the same section id will be merged into the same report section / plot. The other section configuration keys are merged for each file, with identical keys overwriting what was previously parsed.

This approach means that it's possible to have a single file containing data for multiple samples, but it's also possible to have one file per sample and still have all of them summarised.

If you're using plot_type: 'generalstats' then a report section will not be created and most of the configuration keys above are ignored.

Data types generalstats and beeswarm are only possible by setting the above configuration keys (these can't be guessed by data format).

Plot configuration

Configuration of specific plots follows the same syntax as used when writing modules. To find out more, please see the later docs. Specifically, the plot config docs for bar graphs, line graphs, scatter plots, tables, beeswarm plots and heatmaps.

Wherever you see pconfig, any key can be used within the above syntax.

Tricky extras

Because of the way this module works, there are a few specifics that can trip you up. Most of these should probably be fixed one day. Feel free to complain on gitter or submit a pull request! I'll try to keep a list here to help the wary...

Differences between Tables and General Stats

Although they're both tables, note that general stats configures columns with a list in the pconfig scope (see above example). Files that are just tables use headers instead.

First columns in tables are special

The first column in every table is reserved for the sample name. As such, it shouldn't contain data. All header configuration will be ignored for the first column. The only exception is name: this can be tweaked using the somewhat tricky col1_header field in the pconfig scope (see table docs).


MultiQC has been developed to be as forgiving as possible and will handle lots of invalid or ignored configurations. This is useful for most users but can make life difficult when getting MultiQC to work with a new custom content format.

To help with this, you can run with the --lint flag, which will give explicit warnings about anything that is not optimally configured. For example:

multiqc --lint test_data


Probably the best way to get to grips with Custom Content is to see some examples. The MultiQC automated testing runs with a bunch of different files, and I try to add to these all the time.

You can see these examples here: https://github.com/ewels/MultiQC_TestData/tree/master/data/custom_content

For example, to see a file which generates a table in a report by itself, you can have a look at embedded_config/table_headers_mqc.txt (link).

Coding with MultiQC

Writing New Modules


Writing a new module can at first seem a daunting task. However, MultiQC has been written (and refactored) to provide a lot of functionality as common functions.

Provided that you are familiar with writing Python and you have a read through the guide below, you should be on your way in no time!

If you have any problems, feel free to contact the author - details here: @ewels

Core modules / plugins

New modules can either be written as part of MultiQC or in a stand-alone plugin. If your module is for a publicly available tool, please add it to the main program and contribute your code back when complete via a pull request.

If your module is for something very niche, which no-one else can use, you can write it as part of a custom plugin. The process is almost identical, though it keeps the code bases separate. For more information about this, see the docs about MultiQC Plugins below.


MultiQC has been developed to be as forgiving as possible and will handle lots of invalid or ignored code. This is useful most of the time but can be difficult when writing new MultiQC modules (especially during pull-request reviews).

To help with this, you can run with the --lint flag, which will give explicit warnings about anything that is not optimally configured. For example:

multiqc --lint test_data

Note that the automated MultiQC continuous integration testing runs in this mode, so you will need to pass all lint tests for those checks to pass. This is required for any pull-requests.

Initial setup

MultiQC file structure

The source code for MultiQC is separated into different folders. Most of the files you won't have to touch - the relevant files that you will need to edit or create are as follows:

├── docs
│   ├── README.md
│   └── modules
│       └── <your_module>.md
├── multiqc
│   ├── modules
│   |   └── <your_module>
│   │       ├── __init__.py
│   │       └── <your_module>.py
│   └── utils
│       └── search_patterns.yaml
└── setup.py

These files are described in more detail below.


MultiQC modules are Python submodules - as such, they need their own directory in /multiqc/ with an __init__.py file. The directory should share its name with the module. To follow common practice, the module code itself usually then goes in a separate python file (also with the same name, i.e. multiqc/modname/modname.py) which is then imported by the __init__.py file with:

from __future__ import absolute_import
from .modname import MultiqcModule

Entry points

Once your submodule files are in place, you need to tell MultiQC that they are available as an analysis module. This is done within setup.py using entry points. In setup.py you will see some code that looks like this:

entry_points = {
    'multiqc.modules.v1': [
        'bismark = multiqc.modules.bismark:MultiqcModule',

Copy one of the existing module lines and change it to use your module name. The order is irrelevant, so stick to alphabetical if in doubt. Once this is done, you will need to update your installation of MultiQC:

pip install -e .

MultiQC config

So that MultiQC knows what order modules should be run in, you need to add your module to the core config file.

In multiqc/utils/config_defaults.yaml you should see a list variable called module_order. This contains the name of modules in order of precedence. Add your module here in an appropriate position.


Next up, you need to create a documentation file for your module. The reason for this is twofold: firstly, docs are important to help people to use, debug and extend MultiQC (you're reading this, aren't you?). Secondly, having the file there with the appropriate YAML front matter will make the module show up on the MultiQC homepage so that everyone knows it exists. This process is automated once the file is added to the core repository.

This docs file should be placed in docs/modules/<your_module_name>.md and should have the following structure:

Name: Tool Name
URL: http://www.amazing-bfx-tool.com
Description: >
    This amazing tool does some really cool stuff. You can describe it
    here and split onto multiple lines if you want. Not too long though!

Your documentation goes here. Feel free to use markdown and write whatever
you think would be helpful. Please avoid using heading levels 1 to 3.

Make a reference to this in the YAML frontmatter list at the top of docs/README.md - this allows the website to find the file to build the documentation.


Last but not least, remember to add your new module to the CHANGELOG.md, so that people know that it's there.

MultiqcModule Class

If you've copied one of the other entry point statements, it will have ended in :MultiqcModule - this tells MultiQC to try to execute a class or function called MultiqcModule.

To use the helper functions bundled with MultiQC, you should extend this class from multiqc.modules.base_module.BaseMultiqcModule in your module code file (i.e. multiqc/modname/modname.py). This will give you access to a number of functions on the self namespace. For example:

from multiqc.modules.base_module import BaseMultiqcModule

class MultiqcModule(BaseMultiqcModule):
    def __init__(self):
        # Initialise the parent object
        super(MultiqcModule, self).__init__(name='My Module', anchor='mymod',
        info="is an example analysis module used for writing documentation.")

Ok, that should be it! The __init__() function will now be executed every time MultiQC runs. Try adding a print("Hello World!") statement and see if it appears in the MultiQC logs at the appropriate time...

Note that the __init__ variables are used to create the header, URL link, analysis module credits and description in the report.


Last thing - MultiQC modules have a standardised way of producing output, so you shouldn't really use print() statements for your Hello World in your module code ;).

Instead, use the logger module as follows:

import logging
log = logging.getLogger(__name__)
# Initialise your class and so on
log.info('Hello World!')

Log messages can come in a range of formats:

  • log.debug
    • Thes only show if MultiQC is run in -v/--verbose mode
  • log.info
    • For more important status updates
  • log.warning
    • Alert user about problems that don't halt execution
  • log.error and log.critical
    • Not often used, these are for show-stopping problems

Step 1 - Find log files

The first thing that your module will need to do is to find analysis log files. You can do this by searching for a filename fragment, or a string within the file. It's possible to search for both (a match on either will return the file) and also to have multiple strings possible.

First, add your default patterns to multiqc/utils/search_patterns.yaml

Each search has a yaml key, with one or more search criteria.

The yaml key must begin with the name of your module. If you have multiple search patterns for a single module, follow the module name with a forward slash and then any string. For example, see the fastqc module search patterns:

    fn: 'fastqc_data.txt'
    fn: '_fastqc.zip'

The following search criteria sub-keys can then be used:

  • fn
    • A glob filename pattern, used with the Python fnmatch function
  • fn_re
    • A regex filename pattern
  • contents
    • A string to match within the file contents (checked line by line)
  • contents_re
    • A regex to match within the file contents (checked line by line)
    • NB: Regex must match entire line (add .* to start and end of pattern to avoid this)
  • exclude_fn
    • A glob filename pattern which will exclude a file if matched
  • exclude_fn_re
    • A regex filename pattern which will exclude a file if matched
  • exclude_contents
    • A string which will exclude the file if matched within the file contents (checked line by line)
  • exclude_contents_re
    • A regex which will exclude the file if matched within the file contents (checked line by line)
  • num_lines
    • The number of lines to search through for the contents string. Default: all lines.
  • shared
    • By default, once a file has been assigned to a module it is not searched again. Specify shared: true when your file can be shared between multiple tools (for example, part of a stdout stream).
  • max_filesize
    • Files larger than the log_filesize_limit config key (default: 10MB) are skipped. If you know your files will be smaller than this and need to search by contents, you can specify this value (in bytes) to skip any files smaller than this limit.

Please try to use num_lines and max_filesize where possible as they will speed up MultiQC execution time.

Note that exclude_ keys are tested after a file is detected with one or more of the other patterns.

For example, two typical modules could specify search patterns as follows:

    fn: '_myprogram.txt'
    contents: 'This is myprogram v1.3'

You can also supply a list of different patterns for a single log file type if needed. If any of the patterns are matched, the file will be returned:

    - fn: 'mylog.txt'
    - fn: 'different_fn.out'

You can use AND logic by specifying keys within a single list item. For example:

    fn: 'mylog.txt'
    contents: 'mystring'
    - fn: 'different_fn.out'
      contents: 'This is myprogram v1.3'
    - fn: 'another.txt'
      contents: 'What are these files anyway?'

Here, a file must have the filename mylog.txt and contain the string mystring.

You can match subsets of files by using exclude_ keys as follows:

    fn: '*.myprog.txt'
    exclude_fn: 'not_these_*'
    fn: 'mylog.txt'
        - 'trimmed'
        - 'sorted'

Note that the exclude_ patterns can have either a single value or a list of values. They are always considered using OR logic - any matches will reject the file.

Remember that users can overwrite these defaults in their own config files. This is helpful as people have weird and wonderful processing pipelines with their own conventions.

Once your strings are added, you can find files in your module with the base function self.find_log_files(), using the key you set in the YAML:


This function yields a dictionary with various information about each matching file. The f key contains the contents of the matching file:

# Find all files for mymod
for myfile in self.find_log_files('mymod'):
    print( myfile['f'] )       # File contents
    print( myfile['s_name'] )  # Sample name (from cleaned filename)
    print( myfile['fn'] )      # Filename
    print( myfile['root'] )    # Directory file was in

If filehandles=True is specified, the f key contains a file handle instead:

for f in self.find_log_files('mymod', filehandles=True):
    # f['f'] is now a filehandle instead of contents
    for l in f['f']:
        print( l )

This is good if the file is large, as Python doesn't read the entire file into memory in one go.

Step 2 - Parse data from the input files

What most MultiQC modules do once they have found matching analysis files is to pass the matched file contents to another function, responsible for parsing the data from the file. How this parsing is done will depend on the format of the log file and the type of data being read. See below for a basic example, based loosely on the preseq module:

class MultiqcModule(BaseMultiqcModule):
    def __init__(self):
        # [...]
        self.mod_data = dict()
        for f in self.find_log_files('mymod'):
            self.mod_data[f['s_name']] = self.parse_logs(f['f'])

    def parse_logs(self, f):
        data = {}
        for l in f.splitlines():
            s = l.split()
            data[s[0]] = s[1]
        return data

Filtering by parsed sample names

MultiQC users can use the --ignore-samples flag to skip sample names that match specific patterns. As sample names are generated in a different way by every module, this filter has to be applied after log parsing.

There is a core function to do this task - assuming that your data is in a dictionary with the first key as sample name, pass it through the self.ignore_samples function as follows:

self.yourdata = self.ignore_samples(self.yourdata)

This will remove any dictionary keys where the sample name matches a user pattern.

If your data structure is not in the sample_name: data format then you can check each sample name individually using the self.is_ignore_sample() function:

if self.is_ignore_sample(f['s_name']):
    print("We will not use this sample!")

Note that this function should be used after cleaning the sample name with self.clean_s_name().

No files found

If your module cannot find any matching files, it needs to raise an exception of type UserWarning. This tells the core MultiQC program that no modules were found. For example:

if len(self.mod_data) == 0:
    raise UserWarning

Note that this has to be raised as early as possible, so that it halts the module progress. For example, if no logs are found then the module should not create any files or try to do any computation.

Custom sample names

Typically, sample names are taken from cleaned log filenames (the default f['s_name'] value returned). However, if possible, it's better to use the name of the input file (allowing for concatenated log files). To do this, you should use the self.clean_s_name() function, as this will prepend the directory name if requested on the command line:

input_fname = s[3] # Or parsed however
s_name = self.clean_s_name(input_fname, f['root'])

This function has already been applied to the contents of f['s_name'].

self.clean_s_name() must be used on sample names parsed from the file contents. Without it, features such as prepending directories (--dirs) will not work.

Identical sample names

If modules find samples with identical names, then the previous sample is overwritten. It's good to print a log statement when this happens, for debugging. However, most of the time it makes sense - programs often create log files and print to stdout for example.

if f['s_name'] in self.bowtie_data:
    log.debug("Duplicate sample name found! Overwriting: {}".format(f['s_name']))

Printing to the sources file

Finally, once you've found your file we want to add this information to the multiqc_sources.txt file in the MultiQC report data directory. This lists every sample name and the file from which this data came from. This is especially useful if sample names are being overwritten as it lists the source used. This code is typically written immediately after the above warning.

If you've used the self.find_log_files function, writing to the sources file is as simple as passing the log file variable to the self.add_data_source function:

for f in self.find_log_files('mymod'):

If you have different files for different sections of the module, or are customising the sample name, you can tweak the fields. The default arguments are as shown:

self.add_data_source(f=None, s_name=None, source=None, module=None, section=None)

Step 3 - Adding to the general statistics table

Now that you have your parsed data, you can start inserting it into the MultiQC report. At the top of every report is the 'General Statistics' table. This contains metrics from all modules, allowing cross-module comparison.

There is a helper function to add your data to this table. It can take a lot of configuration options, but most have sensible defaults. At it's simplest, it works as follows:

data = {
    'sample_1': {
        'first_col': 91.4,
        'second_col': '78.2%'
    'sample_2': {
        'first_col': 138.3,
        'second_col': '66.3%'

To give more informative table headers and configure things like data scales and colour schemes, you can supply an extra dict:

headers = OrderedDict()
headers['first_col'] = {
    'title': 'First',
    'description': 'My First Column',
    'scale': 'RdYlGn-rev'
headers['second_col'] = {
    'title': 'Second',
    'description': 'My Second Column',
    'max': 100,
    'min': 0,
    'scale': 'Blues',
    'suffix': '%'
self.general_stats_addcols(data, headers)

Here are all options for headers, with defaults:

headers['name'] = {
    'namespace': '',                # Module name. Auto-generated for core modules in General Statistics.
    'title': '[ dict key ]',        # Short title, table column title
    'description': '[ dict key ]',  # Longer description, goes in mouse hover text
    'max': None,                    # Minimum value in range, for bar / colour coding
    'min': None,                    # Maximum value in range, for bar / colour coding
    'scale': 'GnBu',                # Colour scale for colour coding. Set to False to disable.
    'suffix': None,                 # Suffix for value (eg. '%')
    'format': '{:,.1f}',            # Output format() string
    'shared_key': None              # See below for description
    'modify': None,                 # Lambda function to modify values
    'hidden': False,                # Set to True to hide the column on page load
    'placement' : 1000.0,           # Alter the default ordering of columns in the table
  • namespace
    • This prepends the column title in the mouse hover: Namespace: Title.
    • The 'Configure Columns' modal displays this under the 'Group' column.
    • It's automatically generated for core modules in the General Statistics table, though this can be overwritten (useful for example with custom-content).
  • scale
    • Colour scales are the names of ColorBrewer palettes. See below for available scales.
    • Add -rev to the name of a colour scale to reverse it
    • Set to False to disable colouring and background bars
  • shared_key
    • Any string can be specified here, if other columns are found that share the same key, a consistent colour scheme and data scale will be used in the table. Typically this is set to things like read_count, so that the read count in a sample can be seen varying across analysis modules.
  • modify
    • A python lambda function to change the data in some way when it is inserted into the table.
  • hidden
    • Setting this to True will hide the column when the report loads. It can then be shown through the Configure Columns modal in the report. This can be useful when data could be sometimes useful. For example, some modules show "percentage aligned" on page load but hide "number of reads aligned".
  • placement
    • If you feel that the results from your module should appear at the left side of the table set this value less than 1000. Or to move the column right, set it greater than 1000. This value can be any float.

The typical use for the modify string is to divide large numbers such as read counts, to make them easier to interpret. If handling read counts, there are three config variables that should be used to allow users to change the multiplier for read counts: read_count_multiplier, read_count_prefix and read_count_desc. For example:

'title': '{} Reads'.format(config.read_count_prefix),
'description': 'Number of reads ({})'.format(config.read_count_desc),
'modify': lambda x: x * config.read_count_multiplier,

Similar config options apply for base pairs: base_count_multiplier, base_count_prefix and base_count_desc.

And for the read count of long reads: long_read_count_multiplier, long_read_count_prefix and long_read_count_desc.

A third parameter can be passed to this function, namespace. This is usually not needed - MultiQC automatically takes the name of the module that is calling the function and uses this. However, sometimes it can be useful to overwrite this.

Table colour scales

Colour scales are taken from ColorBrewer2. Colour scales can be reversed by adding the suffix -rev to the name. For example, RdYlGn-rev.

The following scales are available:

color brewer

Step 4 - Writing data to a file

In addition to printing data to the General Stats, MultiQC modules typically also write to text-files to allow people to easily use the data in downstream applications. This also gives the opportunity to output additional data that may not be appropriate for the General Statistics table.

Again, there is a base class function to help you with this - just supply it with a dictionary and a filename:

data = {
    'sample_1': {
        'first_col': 91.4,
        'second_col': '78.2%'
    'sample_2': {
        'first_col': 138.3,
        'second_col': '66.3%'
self.write_data_file(data, 'multiqc_mymod')

If your output has a lot of columns, you can supply the additional argument sort_cols = True to have the columns alphabetically sorted.

This function will also pay attention to the default / command line supplied data format and behave accordingly. So the written file could be a tab-separated file (default), JSON or YAML.

Note that any keys with more than 2 levels of nesting will be ignored when being written to tab-separated files.

Step 5 - Create report sections

Great! It's time to start creating sections of the report with more information. To do this, use the self.add_section() helper function. This supports the following arguments:

  • name: Name of the section, used for the title
  • anchor: The URL anchor - must be unique, used when clicking the name in the side-nav
  • description: A very short descriptive text to go above the plot (markdown).
  • comment: A comment to add under the description. Big and blue text, mostly for users to customise the report (markdown).
  • helptext: Longer help text explaining what users should look for (markdown).
  • plot: Results from one of the MultiQC plotting functions
  • content: Any custom HTML
  • autoformat: Default True. Automatically format the description, comment and helptext strings.
  • autoformat_type: Default markdown. Autoformat text type. Currently only markdown supported.

For example:

self.add_section (
    name = 'Second Module Section',
    anchor = 'mymod-second',
    plot = linegraph.plot(data2)
self.add_section (
    name = 'First Module Section',
    anchor = 'mymod-first',
    description = 'My amazing module output, from the first section',
    helptext = """
        If you're not sure _how_ to interpret the data, we can help!
        Most modules use multi-line strings for these text blocks,
        with triple quotation marks.

        * Markdown
        * Lists
        * Are
        * `Great`
    plot = bargraph.plot(data)
self.add_section (
    content = '<p>Some custom HTML.</p>'

If a module has more than one section, these will automatically be labelled and linked in the left side-bar navigation (unless name is not specified).

Step 6 - Plot some data

Ok, you have some data, now the fun bit - visualising it! Each of the plot types is described in the Plotting Functions section of the docs.


User configuration

Instead of hardcoding defaults, it's a great idea to allow users to configure the behaviour of MultiQC module code.

It's pretty easy to use the built in MultiQC configuration settings to do this, so that users can set up their config as described above in the docs.

To do this, just assume that your configuration variables are available in the MultiQC config module and have sensible defaults. For example:

from multiqc import config

mymod_config = getattr(config, 'mymod_config', {})
my_custom_config_var = mymod_config.get('my_custom_config_var', 5)

You now have a variable my_custom_config_var with a default value of 5, but that can be configured by a user as follows:

    my_custom_config_var: 200

Please be sure to use a unique top-level config name to avoid clashes - prefixing with your module name is a good idea as in the example above. Keep all module config options under the same top-level name for clarity.

Finally, don't forget to document the usage of your module-specific configuration in docs/modules/mymodule.md so that people know how to use it.

Profiling Performance

It's important that MultiQC runs quickly and efficiently, especially on big projects with large numbers of samples. The recommended method to check this is by using cProfile to profile the code execution.

To do this, first find out where your copy of MultiQC is located:

$ which multiqc

Then run MultiQC with this path and the cProfile module as follows (the flags at the end can be any regular MultiQC flags):

python -m cProfile -o multiqc_profile.prof /Users/you/anaconda/envs/myenv/bin/multiqc -f .

You can create a .bashrc alias to make this easier to run:

alias profile_multiqc='python -m cProfile -o multiqc_profile.prof /Users/you/anaconda/envs/myenv/bin/multiqc '
profile_multiqc -f .

MultiQC should run as normal, but produce the additional binary file multiqc_profile.prof. This can then be visualised with software such as SnakeViz.

To install SnakeViz and visualise the results, do the following:

pip install snakeviz
snakeviz multiqc_profile.prof

A web page should open where you can explore the execution times of different nested functions. It's a good idea to run MultiQC with a comparable number of results from other tools (eg. FastQC) to have a reference to compare against for how long the code should take to run.

Adding Custom CSS / Javascript

If you would like module-specific CSS and / or JavaScript added to the template, just add to the self.css and self.js dictionaries that come with the BaseMultiqcModule class. The key should be the filename that you want your file to have in the generated report folder (this is ignored in the default template, which includes the content file directly in the HTML). The dictionary value should be the path to the desired file. For example, see how it's done in the FastQC module:

self.css = {
    'assets/css/multiqc_fastqc.css' :
        os.path.join(os.path.dirname(__file__), 'assets', 'css', 'multiqc_fastqc.css')
self.js = {
    'assets/js/multiqc_fastqc.js' :
        os.path.join(os.path.dirname(__file__), 'assets', 'js', 'multiqc_fastqc.js')

Plotting Functions

MultiQC plotting functions are held within multiqc.plots submodules. To use them, simply import the modules you want, eg.:

from multiqc.plots import bargraph, linegraph

Once you've done that, you will have access to the corresponding plotting functions:


These have been designed to work in a similar manner to each other - you pass a data structure to them, along with optional extras such as categories and configuration options, and they return a string of HTML to add to the report. You can add this to the module introduction or sections as described above. For example:

self.add_section (
    name = 'Module Section',
    anchor = 'mymod_section',
    description = 'This plot shows some really nice data.',
    helptext = 'This longer string (can be **markdown**) helps explain how to interpret the plot',
    plot = bargraph.plot(self.parsed_data, categories, pconfig)

Common options

All plots should as a minimum have a config with an id and a title. MultiQC is written to work with sensible defaults, so won't complain if you don't supply these, but it's good practice for usability (the ID is used as a filename when exporting plots, and all plots should have a title when exported).

Plot titles should use the format Module name: Plot name (this is partly for ease of use within MegaQC and other downstream tools).

Bar graphs

Simple data can be plotted in bar graphs. Many MultiQC modules make use of stacked bar graphs. Here, the bargraph.plot() function comes to the rescue. A basic example is as follows:

from multiqc.plots import bargraph
data = {
    'sample 1': {
        'aligned': 23542,
        'not_aligned': 343,
    'sample 2': {
        'not_aligned': 7328,
        'aligned': 1275,
html_content = bargraph.plot(data)

To specify the order of categories in the plot, you can supply a list of dictionary keys. This can also be used to exclude a key from the plot.

cats = ['aligned', 'not_aligned']
html_content = bargraph.plot(data, cats)

If cats is given as a dict instead of a list, you can specify a nice name and a colour too. Make it an OrderedDict to specify the order:

from collections import OrderedDict
cats = OrderedDict()
cats['aligned'] = {
    'name': 'Aligned Reads',
    'color': '#8bbc21'
cats['not_aligned'] = {
    'name': 'Unaligned Reads',
    'color': '#f7a35c'

Finally, a third variable should be supplied with configuration variables for the plot. The defaults are as follows:

config = {
    # Building the plot
    'id': '<random string>',                # HTML ID used for plot
    'cpswitch': True,                       # Show the 'Counts / Percentages' switch?
    'cpswitch_c_active': True,              # Initial display with 'Counts' specified? False for percentages.
    'cpswitch_counts_label': 'Counts',      # Label for 'Counts' button
    'cpswitch_percent_label': 'Percentages' # Label for 'Percentages' button
    'logswitch': False,                     # Show the 'Log10' switch?
    'logswitch_active': False,              # Initial display with 'Log10' active?
    'logswitch_label': 'Log10',             # Label for 'Log10' button
    'hide_zero_cats': True,                 # Hide categories where data for all samples is 0
    # Customising the plot
    'title': None,                          # Plot title - should be in format "Module Name: Plot Title"
    'xlab': None,                           # X axis label
    'ylab': None,                           # Y axis label
    'ymax': None,                           # Max y limit
    'ymin': None,                           # Min y limit
    'yCeiling': None,                       # Maximum value for automatic axis limit (good for percentages)
    'yFloor': None,                         # Minimum value for automatic axis limit
    'yMinRange': None,                      # Minimum range for axis
    'yDecimals': True,                      # Set to false to only show integer labels
    'ylab_format': None,                    # Format string for x axis labels. Defaults to {value}
    'stacking': 'normal',                   # Set to None to have category bars side by side
    'use_legend': True,                     # Show / hide the legend
    'click_func': None,                     # Javascript function to be called when a point is clicked
    'cursor': None,                         # CSS mouse cursor type.
    'tt_decimals': 0,                       # Number of decimal places to use in the tooltip number
    'tt_suffix': '',                        # Suffix to add after tooltip number
    'tt_percentages': True,                 # Show the percentages of each count in the tooltip

The keys id and title should always be passed as a minimum. The id is used for the plot name when exporting. If left unset, the Plot Export panel will call the filename mqc_hcplot_gtucwirdzx.png (with some other random string). Plots should always have titles, especially as they can stand by themselves when exported. The title should have the format Modulename: Plot Name

Switching datasets

It's possible to have single plot with buttons to switch between different datasets. To do this, give a list of data objects (same formats as described above). Also add the following config options to supply names to the buttons:

config = {
    'data_labels': ['Reads', 'Bases']

You can also customise the y-axis label and min/max values for each dataset:

config = {
    'data_labels': [
        {'name': 'Reads', 'ylab': 'Number of Reads'},
        {'name': 'Bases', 'ylab': 'Number of Base Pairs', 'ymax':100}

If supplying multiple datasets, you can also supply a list of category objects. Make sure that they are in the same order as the data.

Categories should contain data keys, so if you're supplying a list of two datasets, you should supply a list of two sets of keys for the categories. MultiQC will try to guess categories from the data keys if categories are missing.

For example, with two datasets supplied as above:

cats = [

Or with additional customisation such as name and colour:

from collections import OrderedDict
cats = [OrderedDict(), OrderedDict()]
cats[0]['aligned_reads'] =        {'name': 'Aligned Reads',        'color': '#8bbc21'}
cats[0]['unaligned_reads'] =      {'name': 'Unaligned Reads',      'color': '#f7a35c'}
cats[1]['aligned_base_pairs'] =   {'name': 'Aligned Base Pairs',   'color': '#8bbc21'}
cats[1]['unaligned_base_pairs'] = {'name': 'Unaligned Base Pairs', 'color': '#f7a35c'}

Interactive / Flat image plots

Note that the bargraph.plot() function can generate both interactive JavaScript (HighCharts) powered report plots and flat image plots made using MatPlotLib. This choice is made within the function based on config variables such as number of dataseries and command line flags.

Note that both plot types should come out looking pretty much identical. If you spot something that's missing in the flat image plots, let me know.

Line graphs

This base function works much like the above, but for two-dimensional data, to produce line graphs. It expects a dictionary with sample identifiers, each containing numeric x:y points. For example:

from multiqc.plots import linegraph
data = {
    'sample 1': {
        '<x val 1>': '<y val 1>',
        '<x val 2>': '<y val 2>'
    'sample 2': {
        '<x val 1>': '<y val 1>',
        '<x val 2>': '<y val 2>'
html_content = linegraph.plot(data)

Additionally, a configuration dict can be supplied. The defaults are as follows:

from multiqc.plots import linegraph
config = {
    # Building the plot
    'id': '<random string>',     # HTML ID used for plot
    'categories': False,         # Set to True to use x values as categories instead of numbers.
    'colors': dict()             # Provide dict with keys = sample names and values colours
    'smooth_points': None,       # Supply a number to limit number of points / smooth data
    'smooth_points_sumcounts': True, # Sum counts in bins, or average? Can supply list for multiple datasets
    'logswitch': False,          # Show the 'Log10' switch?
    'logswitch_active': False,   # Initial display with 'Log10' active?
    'logswitch_label': 'Log10',  # Label for 'Log10' button
    'extra_series': None,        # See section below
    # Plot configuration
    'title': None,               # Plot title - should be in format "Module Name: Plot Title"
    'xlab': None,                # X axis label
    'ylab': None,                # Y axis label
    'xCeiling': None,            # Maximum value for automatic axis limit (good for percentages)
    'xFloor': None,              # Minimum value for automatic axis limit
    'xMinRange': None,           # Minimum range for axis
    'xmax': None,                # Max x limit
    'xmin': None,                # Min x limit
    'xLog': False,               # Use log10 x axis?
    'xDecimals': True,           # Set to false to only show integer labels
    'yCeiling': None,            # Maximum value for automatic axis limit (good for percentages)
    'yFloor': None,              # Minimum value for automatic axis limit
    'yMinRange': None,           # Minimum range for axis
    'ymax': None,                # Max y limit
    'ymin': None,                # Min y limit
    'yLog': False,               # Use log10 y axis?
    'yDecimals': True,           # Set to false to only show integer labels
    'yPlotBands': None,          # Highlighted background bands. See http://api.highcharts.com/highcharts#yAxis.plotBands
    'xPlotBands': None,          # Highlighted background bands. See http://api.highcharts.com/highcharts#xAxis.plotBands
    'yPlotLines': None,          # Highlighted background lines. See http://api.highcharts.com/highcharts#yAxis.plotLines
    'xPlotLines': None,          # Highlighted background lines. See http://api.highcharts.com/highcharts#xAxis.plotLines
    'xLabelFormat': '{value}',   # Format string for the axis labels
    'yLabelFormat': '{value}',   # Format string for the axis labels
    'tt_label': '{point.x}: {point.y:.2f}', # Use to customise tooltip label, eg. '{point.x} base pairs'
    'tt_decimals': None,         # Tooltip decimals when categories = True (when false use tt_label)
    'tt_suffix': None,           # Tooltip suffix when categories = True (when false use tt_label)
    'pointFormat': None,         # Replace the default HTML for the entire tooltip label
    'click_func': function(){},  # Javascript function to be called when a point is clicked
    'cursor': None               # CSS mouse cursor type. Defaults to pointer when 'click_func' specified
    'reversedStacks': False      # Reverse the order of the category stacks. Defaults True for plots with Log10 option
html_content = linegraph.plot(data, config)

The keys id and title should always be passed as a minimum. The id is used for the plot name when exporting. If left unset, the Plot Export panel will call the filename mqc_hcplot_gtucwirdzx.png (with some other random string). Plots should always have titles, especially as they can stand by themselves when exported. The title should have the format Modulename: Plot Name

Switching datasets

You can also have a single plot with buttons to switch between different datasets. To do this, just supply a list of data dicts instead (same formats as described above). For example:

data = [
        'sample 1': { '<x val 1>': '<y val 1>', '<x val 2>': '<y val 2>' },
        'sample 2': { '<x val 1>': '<y val 1>', '<x val 2>': '<y val 2>' }
        'sample 1': { '<x val 1>': '<y val 1>', '<x val 2>': '<y val 2>' },
        'sample 2': { '<x val 1>': '<y val 1>', '<x val 2>': '<y val 2>' }

You'll also want to add the following configuration options to give names to the buttons and graph labels:

config = {
    'data_labels': [
        {'name': 'DS 1', 'ylab': 'Dataset 1', 'xlab': 'x Axis 1'},
        {'name': 'DS 2', 'ylab': 'Dataset 2', 'xlab': 'x Axis 2'}

All of these config values are optional, the function will default to sensible values if things are missing.

Additional data series

Sometimes, it's good to be able to specify specific data series manually. To do this, use config['extra_series']. For a single extra line this can be a dict (as below). For multiple lines, use a list of dicts. For multiple dataset plots, use a list of list of dicts.

For example, to add a dotted x = y reference line:

from multiqc.plots import linegraph
config = {
    'extra_series': {
        'name': 'x = y',
        'data': [[0, 0], [max_x_val, max_y_val]],
        'dashStyle': 'Dash',
        'lineWidth': 1,
        'color': '#000000',
        'marker': { 'enabled': False },
        'enableMouseTracking': False,
        'showInLegend': False,
html_content = linegraph.plot(data, config)

Scatter Plots

Scatter plots work in almost exactly the same way as line plots. Most (if not all) config options are shared between the two. The data structure is similar but not identical:

from multiqc.plots import scatter
data = {
    'sample 1': {
        x: '<x val>',
        y: '<y val>'
    'sample 2': {
        x: '<x val>',
        y: '<y val>'
html_content = scatter.plot(data)

Note that you must use the keys x and y for each data point.

If you want more than one data point per sample, you can supply a list of dictionaries instead. You can also optionally specify point colours and sample name suffixes (these are appended to the sample name):

data = {
    'sample 1': [
        { x: '<x val>', y: '<y val>', color: '#a6cee3', name: 'Type 1' },
        { x: '<x val>', y: '<y val>', color: '#1f78b4', name: 'Type 2' }
    'sample 2': [
        { x: '<x val>', y: '<y val>', color: '#b2df8a', name: 'Type 1' },
        { x: '<x val>', y: '<y val>', color: '#33a02c', name: 'Type 2' }

Remember that MultiQC reports can contain large numbers of samples, so this plot type is not suitable for large quantities of data - 20,000 genes might look good for one sample, but when someone runs MultiQC with 500 samples, it will crash the browser and be impossible to interpret.

See the above docs about line plots for most config options. The scatter plot has a handful of unique ones in addition:

pconfig = {
    'marker_colour': 'rgba(124, 181, 236, .5)', # string, base colour of points (recommend rgba / semi-transparent)
    'marker_size': 5,               # int, size of points
    'marker_line_colour': '#999',   # string, colour of point border
    'marker_line_width': 1,         # int, width of point border
    'square': False                 # Force the plot to stay square? (Maintain aspect ratio)

Creating a table

Tables should work just like the functions above (most like the bar graph function). As a minimum, the function takes a dictionary containing data - the first keys will be sample names (row headers) and each key contained within will be a table column header.

You can also supply a list of key names to restrict the data in the table to certain keys / columns. This also specifies the order that columns should be displayed in.

For more customisation, the headers can be supplied as a dictionary. Each key should match the keys used in the data dictionary, but values can customise the output. If you want to specify the order of the columns, you must use an OrderedDict.

Finally, the function accepts a config dictionary as a third parameter. This can set global options for the table (eg. a title) and can also hold default values to customise the output of all table columns.

The default header keys are:

single_header = {
    'namespace': '',                # Name for grouping. Prepends desc and is in Config Columns modal
    'title': '[ dict key ]',        # Short title, table column title
    'description': '[ dict key ]',  # Longer description, goes in mouse hover text
    'max': None,                    # Minimum value in range, for bar / colour coding
    'min': None,                    # Maximum value in range, for bar / colour coding
    'ceiling': None,                # Maximum value for automatic bar limit
    'floor': None,                  # Minimum value for automatic bar limit
    'minRange': None,               # Minimum range for automatic bar
    'scale': 'GnBu',                # Colour scale for colour coding. False to disable.
    'colour': '<auto>',             # Colour for column grouping
    'suffix': None,                 # Suffix for value (eg. '%')
    'format': '{:,.1f}',            # Value format string - default 1 decimal place
    'shared_key': None              # See below for description
    'modify': None,                 # Lambda function to modify values
    'hidden': False                 # Set to True to hide the column on page load

A third parameter can be specified with settings for the whole table:

table_config = {
    'namespace': '',                         # Name for grouping. Prepends desc and is in Config Columns modal
    'id': '<random string>',                 # ID used for the table
    'table_title': '<table id>',             # Title of the table. Used in the column config modal
    'save_file': False,                      # Whether to save the table data to a file
    'raw_data_fn':'multiqc_<table_id>_table' # File basename to use for raw data file
    'sortRows': True                         # Whether to sort rows alphabetically
    'only_defined_headers': True             # Only show columns that are defined in the headers config
    'col1_header': 'Sample Name'             # The header used for the first column
    'no_beeswarm': False    # Force a table to always be plotted (beeswarm by default if many rows)

Most of the header keys can also be specified in the table config (namespace, scale, format, colour, hidden, max, min, ceiling, floor, minRange, shared_key, modify). These will then be applied to all columns prior to applying column-specific heading config.

A very basic example of creating a table is shown below:

data = {
    'sample 1': {
        'aligned': 23542,
        'not_aligned': 343,
    'sample 2': {
        'aligned': 1275,
        'not_aligned': 7328,
table_html = table.plot(data)

A more complicated version with ordered columns, defaults and column-specific settings (eg. no decimal places):

data = {
    'sample 1': {
        'aligned': 23542,
        'not_aligned': 343,
        'aligned_percent': 98.563952271
    'sample 2': {
        'aligned': 1275,
        'not_aligned': 7328,
        'aligned_percent': 14.820411484
headers = OrderedDict()
headers['aligned_percent'] = {
    'title': '% Aligned',
    'description': 'Percentage of reads that aligned',
    'suffix': '%',
    'max': 100,
    'format': '{:,.0f}' # No decimal places please
headers['aligned'] = {
    'title': '{} Aligned'.format(config.read_count_prefix),
    'description': 'Aligned Reads ({})'.format(config.read_count_desc),
    'shared_key': 'read_count',
    'modify': lambda x: x * config.read_count_multiplier
config = {
    'namespace': 'My Module',
    'min': 0,
    'scale': 'GnBu'
table_html = table.plot(data, headers, config)

Table decimal places

You can customise how many decimal places a number has by using the format config key for that column. The default format string is '{:,.1f}', which specifies a float number with a single decimal place. To remove decimals use '{:,.0f}'. To have two decimal places, use '{:,.2f}'.

Table colour scales

Colour scales are taken from ColorBrewer2. Colour scales can be reversed by adding the suffix -rev to the name. For example, RdYlGn-rev.

The following scales are available:

color brewer

Beeswarm plots (dot plots)

Beeswarm plots work from the exact same data structure as tables, so the usage is just the same. Except instead of calling table, call beeswarm:

data = {
    'sample 1': {
        'aligned': 23542,
        'not_aligned': 343,
    'sample 2': {
        'not_aligned': 7328,
        'aligned': 1275,
beeswarm_html = beeswarm.plot(data)

The function also accepts the same headers and config parameters.


Heatmaps expect data in the structure of a list of lists. Then, a list of sample names for the x-axis, and optionally for the y-axis (defaults to the same as the x-axis).

heatmap.plot(data, xcats, ycats, pconfig)

A simple example:

hmdata = [
    [0.9, 0.87, 0.73, 0.6, 0.2, 0.3],
    [0.87, 1, 0.7, 0.6, 0.9, 0.3],
    [0.73, 0.8, 1, 0.6, 0.9, 0.3],
    [0.6, 0.8, 0.7, 1, 0.9, 0.3],
    [0.2, 0.8, 0.7, 0.6, 1, 0.3],
    [0.3, 0.8, 0.7, 0.6, 0.9, 1],
names = [ 'one', 'two', 'three', 'four', 'five', 'six' ]
hm_html = heatmap.plot(hmdata, names)

Much like the other plots, you can change the way that the heatmap looks using a config dictionary:

pconfig = {
    'title': None,                 # Plot title - should be in format "Module Name: Plot Title"
    'xTitle': None,                # X-axis title
    'yTitle': None,                # Y-axis title
    'min': None,                   # Minimum value (default: auto)
    'max': None,                   # Maximum value (default: auto)
    'square': True,                # Force the plot to stay square? (Maintain aspect ratio)
    'xcats_samples': True,         # Is the x-axis sample names? Set to False to prevent report toolbox from affecting.
    'ycats_samples': True,         # Is the y-axis sample names? Set to False to prevent report toolbox from affecting.
    'colstops': []                 # Scale colour stops. See below.
    'reverseColors': False,        # Reverse the order of the colour axis
    'decimalPlaces': 2,            # Number of decimal places for tooltip
    'legend': True,                # Colour axis key enabled or not
    'borderWidth': 0,              # Border width between cells
    'datalabels': True,            # Show values in each cell. Defaults True when less than 20 samples.
    'datalabel_colour': '<auto>',  # Colour of text for values. Defaults to auto contrast.

The colour stops are a bit special and can be used to define a custom colour scheme. These should be defined as a list of lists, with a number between 0 and 1 and a HTML colour. The default is RdYlBu from ColorBrewer:

pconfig = {
    'colstops' = [
        [0, '#313695'],
        [0.1, '#4575b4'],
        [0.2, '#74add1'],
        [0.3, '#abd9e9'],
        [0.4, '#e0f3f8'],
        [0.5, '#ffffbf'],
        [0.6, '#fee090'],
        [0.7, '#fdae61'],
        [0.8, '#f46d43'],
        [0.9, '#d73027'],
        [1, '#a50026'],

Javascript Functions

The javascript bundled in the default MultiQC template has a number of helper functions to make your life easier.

NB: The MultiQC Python functions make use of these, so it's very unlikely that you'll need to use any of this. But it's here for reference.

Plotting line graphs

plot_xy_line_graph (target, ds)

Plots a line graph with multiple series of (x,y) data pairs. Used by the linegraph.plot() python function.

Data and configuration must be added to the document level mqc_plots variable on page load, using the target as the key. The variables used are as follows:

mqc_plots[target]['plot_type'] = 'xy_line';

Multiple datasets can be added in the ['datasets'] array. The supplied variable ds specifies which is plotted (defaults to 0).

Available config options with default vars:

config = {
    title: undefined,            // Plot title
    xlab: undefined,             // X axis label
    ylab: undefined,             // Y axis label
    xCeiling: undefined,         // Maximum value for automatic axis limit (good for percentages)
    xFloor: undefined,           // Minimum value for automatic axis limit
    xMinRange: undefined,        // Minimum range for axis
    xmax: undefined,             // Max x limit
    xmin: undefined,             // Min x limit
    xDecimals: true,             // Set to false to only show integer labels
    yCeiling: undefined,         // Maximum value for automatic axis limit (good for percentages)
    yFloor: undefined,           // Minimum value for automatic axis limit
    yMinRange: undefined,        // Minimum range for axis
    ymax: undefined,             // Max y limit
    ymin: undefined,             // Min y limit
    yDecimals: true,             // Set to false to only show integer labels
    yPlotBands: undefined,       // Highlighted background bands. See http://api.highcharts.com/highcharts#yAxis.plotBands
    xPlotBands: undefined,       // Highlighted background bands. See http://api.highcharts.com/highcharts#xAxis.plotBands
    tt_label: '{point.x}: {point.y:.2f}', // Use to customise tooltip label, eg. '{point.x} base pairs'
    pointFormat: undefined,      // Replace the default HTML for the entire tooltip label
    click_func: function(){},    // Javascript function to be called when a point is clicked
    cursor: undefined            // CSS mouse cursor type. Defaults to pointer when 'click_func' specified

An example of the markup expected, with the function being called:

<div id="my_awesome_line_graph" class="hc-plot"></div>
<script type="text/javascript">
    mqc_plots['#my_awesome_bar_plot']['plot_type'] = 'xy_line';
    mqc_plots['#my_awesome_line_graph']['datasets'] = [
            name: 'Sample 1',
            data: [[1, 1.5], [1.5, 3.1], [2, 6.4]]
            name: 'Sample 2',
            data: [[1, 1.7], [1.5, 4.3], [2, 8.4]]
    mqc_plots['#my_awesome_line_graph']['config'] = {
        "title": "Best Plot Ever",
        "ylab": "Pings",
        "xlab": "Pongs"
    $(function () {

Plotting bar graphs

plot_stacked_bar_graph (target, ds)

Plots a bar graph with multiple series containing multiple categories. Used by the bargraph.plot() python function.

Data and configuration must be added to the document level mqc_plots variable on page load, using the target as the key. The variables used are as follows:

mqc_plots[target]['plot_type'] = 'bar_graph';

All available config options with default vars:

config = {
    title: undefined,           // Plot title
    xlab: undefined,            // X axis label
    ylab: undefined,            // Y axis label
    ymax: undefined,            // Max y limit
    ymin: undefined,            // Min y limit
    yDecimals: true,            // Set to false to only show integer labels
    ylab_format: undefined,     // Format string for x axis labels. Defaults to {value}
    stacking: 'normal',         // Set to null to have category bars side by side (None in python)
    xtype: 'linear',            // Axis type. 'linear' or 'logarithmic'
    use_legend: true,           // Show / hide the legend
    click_func: undefined,      // Javascript function to be called when a point is clicked
    cursor: undefined,          // CSS mouse cursor type. Defaults to pointer when 'click_func' specified
    tt_percentages: true,       // Show the percentages of each count in the tooltip
    reversedStacks: false,      // Reverse the order of the categories in the stack.

An example of the markup expected, with the function being called:

<div id="my_awesome_bar_plot" class="hc-plot"></div>
<script type="text/javascript">
    mqc_plots['#my_awesome_bar_plot']['plot_type'] = 'bar_graph';
    mqc_plots['#my_awesome_bar_plot']['samples'] = ['Sample 1', 'Sample 2']
    mqc_plots['#my_awesome_bar_plot']['datasets'] = [{"data": [4, 7], "name": "Passed Test"}, {"data": [2, 3], "name": "Failed Test"}]
    mqc_plots['#my_awesome_bar_plot']['config'] = {
        "title": "My Awesome Plot",
        "ylab": "# Observations",
        "ymin": 0,
        "stacking": "normal"
    $(function () {

Switching counts and percentages

If you're using the plotting functions above, it's easy to add a button which switches between percentages and counts. Just add the following HTML above your plot:

<div class="btn-group switch_group">
    <button class="btn btn-default btn-sm active" data-action="set_numbers" data-target="#my_plot">Counts</button>
    <button class="btn btn-default btn-sm" data-action="set_percent" data-target="#my_plot">Percentages</button>

NB: This markup is generated automatically with the Python self.plot_bargraph() function.

Switching plot datasets

Much like the counts / percentages buttons above, you can add a button which switches the data displayed in a single plot. Make sure that both datasets are stored in named javascript variables, then add the following markup:

<div class="btn-group switch_group">
    <button class="btn btn-default btn-sm active" data-action="set_data" data-ylab="First Data" data-newdata="data_var_1" data-target="#my_plot">Data 1</button>
    <button class="btn btn-default btn-sm" data-action="set_data" data-ylab="Second Data" data-newdata="data_var_2" data-target="#my_plot">Data 2</button>

Note the CSS class active which specifies which button is 'pressed' on page load. data-ylab and data-xlab can be used to specify the new axes labels. data-newdata should be the name of the javascript object with the new data to be plotted and data-target should be the CSS selector of the plot to change.

Custom event triggers

Some of the events that take place in the general javascript code trigger jQuery events which you can hook into from within your module's code. This allows you to take advantage of events generated by the global theme whilst keeping your code modular.

$(document).on('mqc_highlights', function(e, f_texts, f_cols, regex_mode){
    // This trigger is called when the highlight strings are
    // updated. Three variables are given - an array of search
    // strings (f_texts), an array of colours with corresponding
    // indexes (f_cols) and a boolean var saying whether the
    // search should be treated as a string or a regex (regex_mode)

$(document).on('mqc_renamesamples', function(e, f_texts, t_texts, regex_mode){
    // This trigger is called when samples are renamed
    // Three variables are given - an array of search
    // strings (f_texts), an array of replacements with corresponding
    // indexes (t_texts) and a boolean var saying whether the
    // search should be treated as a string or a regex (regex_mode)

$(document).on('mqc_hidesamples', function(e, f_texts, regex_mode){
    // This trigger is called when the Hide Samples filters change.
    // Two variables are given - an array of search strings
    // (f_texts) and a boolean saying whether the search should
    // be treated as a string or a regex (regex_mode)

$('#YOUR_PLOT_ID').on('mqc_plotresize', function(){
    // This trigger is called when a plot handle is pulled,
    // resizing the height

$('#YOUR_PLOT_ID').on('mqc_original_series_click', function(e, name){
    // A plot able to show original images has had a point clicked.
    // 'name' contains the name of the series that was clicked

$('#YOUR_PLOT_ID').on('mqc_original_chg_source', function(e, name){
    // A plot with original images has had a request to change the
    // original image source (eg. pressing Prev / Next)

$('#YOUR_PLOT_ID').on('mqc_plotexport_image', function(e, cfg){
    // A trigger to export an image of the plot. cfg contains
    // config variables for the requested image.

$('#YOUR_PLOT_ID').on('mqc_plotexport_data', function(e, cfg){
    // A trigger to export a data file of the plot. cfg contains
    // config variables for the requested data.

MultiQC Plugins

MultiQC is written around a system designed for extensibility and plugins. These features allow custom code to be written without polluting the central code base.

Please note that we want MultiQC to grow as a community tool! So if you're writing a module or theme that can be used by others, please keep it within the main MultiQC framework and submit a pull request.

Entry Points

The plugin system works using setuptools entry points. In setup.py you will see a section of code that looks like this (truncated):

entry_points = {
    'multiqc.modules.v1': [
        'qualimap = multiqc.modules.qualimap:MultiqcModule',
    'multiqc.templates.v1': [
        'default = multiqc.templates.default',
    # 'multiqc.cli_options.v1': [
        # 'my-new-option = myplugin.cli:new_option'
    # ],
    # 'multiqc.hooks.v1': [
        # 'before_config = myplugin.hooks:before_config',
        # 'config_loaded = myplugin.hooks:config_loaded',
        # 'execution_start = myplugin.hooks:execution_start',
        # 'before_modules = myplugin.hooks:before_modules',
        # 'after_modules = myplugin.hooks:after_modules',
        # 'execution_finish = myplugin.hooks:execution_finish',
    # ]

These sets of entry points can each be extended to add functionality to MultiQC:

  • multiqc.modules.v1
    • Defines the module classes. Used to add new modules.
  • multiqc.templates.v1
    • Defines the templates. Can be used for new templates.
  • multiqc.cli_options.v1
    • Allows plugins to add new custom command line options
  • multiqc.hooks.v1
    • Code hooks for plugins to add new functionality

Any python program can create entry points with the same name, once installed MultiQC will find these and run them accordingly. For an example of this in action, see the MultiQC_NGI setup file:

entry_points = {
        'multiqc.templates.v1': [
            'ngi = multiqc_ngi.templates.ngi',
            'genstat = multiqc_ngi.templates.genstat',
        'multiqc.cli_options.v1': [
            'project = multiqc_ngi.cli:pid_option'
        'multiqc.hooks.v1': [
            'after_modules = multiqc_ngi.hooks:ngi_metadata',

Here, two new templates are added, a new command line option and a new code hook.


List items added to multiqc.modules.v1 specify new modules. They should be described as follows:

modname = python_mod.dirname.submodname:classname'

Once this is done, everything else should be the same as described in the writing modules documentation.


As above, though no need to specify a class name at the end. See the writing templates documentation for further instructions.

Command line options

MultiQC handles command line interaction using the click framework. You can use the multiqc.cli_options.v1 entry point to add new click decorators for command line options. For example, the MultiQC_NGI plugin uses the entry point above with the following code in cli.py:

import click
pid_option = click.option('--project', type=str)

The values given from additional command line arguments are parsed by MultiQC and put into config.kwargs. The above plugin later reads the value given by the user with the --project flag in a hook:

if config.kwargs['project'] is not None:
  # do some stuff

See the click documentation or the main MultiQC script for more information and examples of adding command line options.


Hooks are a little more complicated - these define points in the core MultiQC code where you can run custom functions. This can be useful as your code is able to access data generated by other parts of the program. For example, you could tie into the after_modules hook to insert data processed by MultiQC modules into a database automatically.

Here, the entry point names are the hook titles, described as commented out lines in the core MultiQC setup.py: execution_start, config_loaded, before_modules, after_modules and execution_finish.

These should point to a function in your code which will be executed when that hook fires. Your custom code can import the core MultiQC modules to access configuration and loggers. For example:

#!/usr/bin/env python
""" MultiQC hook functions - we tie into the MultiQC
core here to add in extra functionality. """

import logging
from multiqc.utils import report, config

log = logging.getLogger('multiqc')

def after_modules():
  """ Plugin code to run when MultiQC modules have completed  """
  num_modules = len(report.modules_output)
  status_string = "MultiQC hook - {} modules reported!".format(num_modules)

Writing New Templates

MultiQC is built around a templating system that uses the Jinja python package. This makes it very easy to create new report templates that fit your needs.

Core or plugin?

If your template could be of use to others, it would be great if you could add it to the main MultiQC package. You can do this by creating a fork of the MultiQC GitHub repository, adding your template and then creating a pull request to merge your changes back to the main repository.

If it's very specific template, you can create a new Python package which acts as a plugin. For more information about this, see the plugins documentation.

Creating a template skeleton

For a new template to be recognised by MultiQC, it must be a python submodule directory with a __init__.py file. This must be referenced in the setup.py installation script as an entry point.

You can see the bundled templates defined in this way:

entry_points = {
    'multiqc.templates.v1': [
        'default = multiqc.templates.default',
        'default_dev = multiqc.templates.default_dev',
        'simple = multiqc.templates.simple',
        'geo = multiqc.templates.geo',

Note that these entry points can point to any Python modules, so if you're writing a plugin module you can specify your module name instead. Just make sure that multiqc.templates.v1 is the same.

Once you've added the entry point, remember to install the package again:

pip install -e .

Using -e tells pip to softlink the plugin files instead of copying, so changes made whilst editing files will be reflected when you run MultiQC.

The __init__.py files must define two variables - the path to the template directory and the main jinja template file:

template_dir = os.path.dirname(__file__)
base_fn = 'base.html'

Child templates

The default MultiQC template contains a lot of code. Importantly, it includes 1448 lines of custom JavaScript (at time of writing) which powers the plotting and dynamic functions in the report. You probably don't want to rewrite all of this for your template, so to make your life easier you can create a child template.

To do this, add an extra variable to your template's __init__.py:

template_parent = 'default'

This tells MultiQC to use the template files from the default template unless a file with the same name is found in your child template. For instance, if you just want to add your own logo in the header of the reports, you can create your own header.html which will overwrite the default header.

Files within the default template have comments at the top explaining what part of the report they generate.

Extra init variables

There are a few extra variables that can be added to the __init__.py file to change how the report is generated.

Setting output_dir instructs MultiQC to put the report and it's contents into a subdirectory. Set the string to your desired name. Note that this will be prefixed if -p/--prefix is set at run time.

Secondly, you can copy additional files with your report when it is generated. This is usually used to copy required images or scripts with the report. These should be a list of file or directory paths, relative to the __init__.py file. Directory contents will be copied recursively.

You can also override config options in the template. For example, setting the value of config.plots_force_flat can force the report to only have static image plots.

from multiqc.utils import config

output_subdir = 'multiqc_report'
copy_files = ['assets']
config.plots_force_flat = True

Jinja template variables

There are a number of variables that you can use within your Jinja template. Two namespaces are available - report and config. You can print these using the Jinja curly brace syntax, eg. {{ config.version }}. See the Jinja2 documentation for more information.

The default MultiQC template includes dependencies in the HTML so that the report is standalone. If you would like to do the same, use the include_file function. For example:

<script>{{ include_file('js/jquery.min.js') }}</script>
<img src="data:image/png;base64,{{ include_file('img/logo.png', b64=True) }}">


Custom plotting functions

If you don't like the default plotting functions built into MultiQC, you can write your own! If you create a callable variable in a template called either bargraph or linegraph, MultiQC will use that instead. For example:

def custom_linegraph(plotdata, pconfig):
    return '<h1>Awesome line graph here</h1>'
linegraph = custom_linegraph

def custom_bargraph(plotdata, plotseries, pconfig):
    return '<h1>Awesome bar graph here</h1>'
bargraph = custom_bargraph

These particular examples don't do very much, but hopefully you get the idea. Note that you have to set the variable linegraph or bargraph to your function.

Updating for compatibility

When releasing new versions of MultiQC we aim to maintain compatibility so that your existing modules and plugins will keep working. However, in some cases we have to make changes that require code to be modified. This section summarises the changes by MultiQC release.

v1.0 Updates

MultiQC v1.0 brings a few changes in the way that MultiQC modules and plugins are written. Most are backwards-compatible, but there are a couple that could break external plugins.

Module imports

New MultiQC module imports have been refactored to make them less inter-dependent and fragile. This has a bunch of advantages, notably allowing better, more modular, unit testing (and hopefully more reliable and maintainable code).

All MultiQC modules and plugins will need to change some of their import statements.

There are two things that you probably need to change in your plugin modules to make them work with the updated version of MultiQC, both to do with imports. Instead of this style of importing modules:

from multiqc import config, BaseMultiqcModule, plots

You now need this:

from multiqc import config
from multiqc.plots import bargraph   # Load specific plot types here
from multiqc.modules.base_module import BaseMultiqcModule

Modules that directly reference multiqc.BaseMultiqcModule instead need to reference multiqc.modules.base_module.BaseMultiqcModule.

Secondly, modules that use import plots now need to import the specific plots needed. You will also need to update any plotting functions, removing the plot. prefix.

For example, change this:

import plots
return plots.bargraph.plot(data, keys, pconfig)

to this:

from plots import bargraph
return bargraph.plot(data, keys, pconfig)

These changes have been made to simplify the module imports within MultiQC, allowing specific parts of the codebase to be imported into a Python script on their own. This enables small, atomic, clean unit testing.

If you have any questions, please open an issue.

Many thanks to @tbooth at @EdinburghGenomics for his patient work with this.

Searching for files

The core find_log_files function has been rewritten and now works a little differently. Instead of searching all analysis files each time it's called (by every module), all files are searched once at the start of the MultiQC execution. This makes MultiQC run much faster.

To use the new syntax, add your search pattern to config.sp using the new before_config plugin hook:


# [..]
  'multiqc.hooks.v1': [
    'before_config = myplugin.mymodule:load_config'


from multiqc.utils import config
def load_config():
    my_search_patterns = {
        'my_plugin/my_mod': {'fn': '*_somefile.txt'},
        'my_plugin/my_other_mod': {'fn': '*other_file.txt'},
    config.update_dict(config.sp, my_search_patterns)

This will add in your search patterns to the default MultiQC config, before user config files are loaded (allowing people to overwrite your defaults as with other modules).

Now, you can find your files much as before, using the string specified above:

for f in self.find_log_files('my_plugin/my_mod'):
  # do something

The old syntax (supplying a dict instead of a string to the function without any previous config setup) will still work, but you will get a depreciation notice. This functionality may be removed in the future.

Adding report sections

Until now, report sections were added by creating a list called self.sections and adding to it. If you only had a single section, the routine was to instead append to the self.intro string.

These methods have been depreciated in favour of a new function called self.add_section(). For example, instead of the previous:

self.sections = list()
  'name': 'My Section',
  'anchor': 'my-html-id',
  'content': '<p>Description of what this plot shows.</p>' +
             linegraph.plot(data, pconfig)

the syntax is now:

  name = 'My Section',
  anchor = 'my-html-id',
  description = 'Description of what this plot shows.',
  helptext = 'More extensive help text can about how to interpret this.'
  plot = linegraph.plot(data, pconfig)

Note that content should now be split up into three new keys: description, helptext and plot. This will allow consistent formatting and future developments with improved module help text. Text is wrapped in <p> tags by the function, so these are no longer needed. Raw content can still be provided in a content string as before if required.

All fields are optional. If name is omitted then the end result will be the same as previously done with self.intro += content.

Updated number formatting

A couple of minor updates to how numbers are handled in tables may affect your configs. Firstly, format strings looking like {:.1f} should now be {:,.1f} (note the extra comma). This enables customisable number formatting with separated thousand groups.

Secondly, any table columns reporting a read count should use new config options to allow user-configurable multipliers. For example, instead of this:

headers['read_counts'] = {
  'title': 'M Reads',
  'description': 'Read counts (millions)',
  'modify': lambda x: x / 1000000,
  'format': '{:.,2f} M',
  'shared_key': 'read_count'

you should now use this:

headers['read_counts'] = {
  'title': '{} Reads'.format(config.read_count_prefix),
  'description': 'Total raw sequences ({})'.format(config.read_count_desc),
  'modify': lambda x: x * config.read_count_multiplier,
  'format': '{:,.2f} ' + config.read_count_prefix,
  'shared_key': 'read_count'

Not as pretty, but allows users to view low depth coverage.