MultiQC logo

Using MultiQC in scripts

Docs / Usage

Importing MultiQC as a library in scripts and notebooks

Using MultiQC within scripts

Even though the primary way to run MultiQC is as a command line, it can also be imported like a Python module in order to build the report interactively, such as in custom Python scripts or in a Jupyter notebook environment (See an example notebook).

MultiQC provides a set of commands to iteratively parse logs and add sections to a report. All of them are available via importing MultiQC as a module:

import multiqc

Parse logs

Find files that MultiQC recognizes in analysis_dir and parse them, without generating a report. Data can be accessed with other methods: list_modules, show_plot, get_summarized_data, etc.

def parse_logs(*analysis_dir, **kwargs)

Parameters:

  • analysis_dir: Path(s) to search for files to parse
  • verbose: Print more information to the console
  • file_list: Supply a file containing a list of file paths to be searched, one per row
  • prepend_dirs: Prepend directory to sample names
  • dirs_depth: Prepend n directories to sample names. Negative number to take from start of path
  • fn_clean_sample_names: Do not clean the sample names (leave as full file name)
  • require_logs: Require all explicitly requested modules to have log files. If not, MultiQC will exit with an error
  • use_filename_as_sample_name: Use the log filename as the sample name
  • strict: Don’t catch exceptions, run additional code checks to help development
  • quiet: Only show log warnings
  • no_ansi: Disable coloured log output
  • profile_runtime: Add analysis of how long MultiQC takes to run to the report
  • no_version_check: Disable checking the latest MultiQC version on the server
  • ignore: Ignore analysis files
  • ignore_samples: Ignore sample names
  • run_modules: Use only this module. Can specify multiple times
  • exclude_modules: Do not use this module. Can specify multiple times
  • config_files: Specific config file to load, after those in MultiQC dir / home dir / working dir
  • module_order: Names of modules in order of precedence to show in report
  • extra_fn_clean_exts: Extra file extensions to clean from sample names
  • extra_fn_clean_trim: Extra strings to clean from sample names
  • preserve_module_raw_data: Preserve raw data from modules in the report - besides plots. Useful to use later interactively. Defaults to True. Set to False to save memory.

Examples

Parse logs found in the data directory.

multiqc.parse_logs('data')

Parse logs found in the data/fastp directory, the data/SAMPLE1.cutadapt.log file, and a data_mqc.tsv MultiQC custom content file.

multiqc.parse_logs('data/fastp', 'data/SAMPLE1.cutadapt.log', "data_mqc.tsv")

Parse logs found in the data directory for only the specified modules, and use and additional pattern to clean sample names.

multiqc.parse_logs(
    'data',
    run_modules=["fastp", "spades", "quast", "pangolin"],
    extra_fn_clean_exts=[".unclassified"],
)

Parse logs found in the data directory and run FastQC module twice for two sets of files - raw and trimmed reads - according to the provided path pattern (see Order of modules for details).

multiqc.parse_logs('data',
    module_order=[
        dict(
            fastqc=dict(
                name="FastQC (trimmed)",
                anchor="fastqc_trimmed",
                path_filters=["*_1_trimmed_fastqc.zip"],
            )
        ),
        dict(
            quast=dict(
                name="FastQC (raw)",
                anchor="fastqc_raw",
                path_filters=["*_1_fastqc.zip"],
            )
        ),
    ],
)

Load JSON dump data

Try find the multiqc_data.json generated by previous MultiQC run in the given directory, and load it into the report.

def parse_data_json(path: str | Path)

Parameters:

  • path: Path to the directory containing multiqc_data.json or the path to the file itself.

Example:

multiqc.parse_data_json('multiqc_data/multiqc_data.json')

List what’s loaded

Return list of the modules that have been loaded, ordered according to config:

def list_modules() ‑> list[str]

Return dict of plot names that have been loaded, indexed by module name and section:

def list_plots() ‑> dict[str, list[str | dict[str, str]]]]

Example:

multiqc.list_plots()
{'fastp': ['Filtered Reads',
  'Insert Sizes',
  {'Sequence Quality': ['Read 1: Before filtering',
    'Read 1: After filtering',
    'Read 2: Before filtering',
    'Read 2: After filtering']},
  {'GC Content': ['Read 1: Before filtering',
    'Read 1: After filtering',
    'Read 2: Before filtering',
    'Read 2: After filtering']},
  {'N content': ['Read 1: Before filtering',
    'Read 1: After filtering',
    'Read 2: Before filtering',
    'Read 2: After filtering']}]}

Return list of clean sample names that have loaded data:

def list_samples() ‑> list[str]

Example:

multiqc.list_samples()
['SAMPLE1_PE', 'SAMPLE2_PE']

Return list of found log files corresponding to the loaded data:

def list_data_sources() ‑> list[str]

Example:

multiqc.list_data_sources()
['data/SAMPLE1_PE.fastp.json', 'data/SAMPLE2_PE.fastp.json']

Access loaded data

There are several methods to access the data loaded by parse_logs.

Return parsed module data, indexed (if available) by data key, then by sample. Module is either the module name, or the anchor:

def get_module_data(module: str = None, sample: str = None, key: str = None) ‑> dict

The function takes data from report.saved_raw_data, which populated by self.write_data_file() calls in individual modules. This data is not necessarily normalized, e.g. numbers can be strings or numbers, depending on the individual module behaviour.

Example:

> multiqc.get_module_data(module="fastp", sample="SAMPLE1_PE")["summary"]
{'fastp_version': '0.23.2',
 'sequencing': 'paired end (301 cycles + 301 cycles)',
 'before_filtering': {'total_reads': 55442,
  'total_bases': 16571632,
  'q20_bases': 16267224,
  'q30_bases': 15853021,
  'gc_content': 0.38526},
 'after_filtering': {'total_reads': 48270,
  'total_bases': 14363465,
  'q20_bases': 14323363,
  'q30_bases': 14199841,
  'gc_content': 0.383991}}

Similarly, return parsed general stats data, indexed by sample, then by data key. If sample is specified, return only data for that sample.

def get_general_stats_data(sample: str = None) ‑> dict

Adding custom content

You can also custom section to the report by subclassing from multiqc.BaseMultiqcModule. This can be used to add a custom table or other content.

Example

Create a table (see plotting for more detail) and add it to the report.

import multiqc
from multiqc.plots import table
 
plot = table.plot(
    data=...,
    headers=...,
    pconfig={
        "id": "my_metrics_table",
        "title": "My metrics",
    },
)
module = multiqc.BaseMultiqcModule(
    name="my-module",
    anchor="custom_data",
)
module.add_section(
    plot=plot,
    name="My metrics",
    anchor="my_metrics_section",
    description=...,
)
multiqc.report.modules.append(module)

Get plot object

Get a plot object for a specific module and section. For list of available plots, use multiqc.list_plots.

def get_plot(module: str, section: str) -> Plot

Examples

Get plot object for the “GC Content” plot in the “fastp” module.

plot = multiqc.get_plot("fastp", "GC Content")

Get plot object for the “Number of Contigs” plot in the “QUAST” module.

plot = multiqc.get_plot("QUAST", "Number of Contigs")

Show plot

Show plot in the notebook cell.

class Plot:
    def show(self, dataset_id: int | str = 0, flat=False, **kwargs)

Parameters:

  • dataset_id: Dataset label, in case if plot has several tabs
  • flat: Show plot as static images without any interactivity
  • kwargs: Additional arguments passed to the plot

Examples

Create a bar graph and show it in the notebook cell:

from multiqc.plots import bargraph
plot = bargraph.plot(...)
plot.show(violin=True)

Get “fastp GC Content” plot and show it in the notebook cell. Since it has multiple tabs, we can select which tab to show with the dataset_id option (defaults to the first tab):

plot = multiqc.get_plot("fastp", "GC Content")
plot.show(dataset_id="Read 2: Before filtering")

Shows Samtools alignment stats as a violin plot. Use flat image without interactivity.

plot = multiqc.get_plot("Flagstat", "Alignment stats")
plot.show("Read counts", violin=True, flat=True)

Save plot to file

Similarly, you can save plot to a file instead of showing it in a notebook.

class Plot:
    def save(self, filename, dataset_id: int | str = 0, **kwargs)

Parameters:

  • filename: Path to save the plot
  • dataset_id: Dataset label, in case if plot has several tabs
  • kwargs: Additional arguments passed to the plot

Examples:

Save the “Number of Contigs” plot for the QUAST module to a file.

plot = multiqc.get_plot("QUAST", "Number of Contigs")
plot.save("quast_contigs.html")

Save the GC Content plot for the dataset labeled “Read 2: Before filtering” to a file, make it flat.

plot = multiqc.get_plot("fastp", "GC Content")
plot.save(
    "fastp_gc_content.png",
    dataset_id="Read 2: Before filtering",
)

Save Samtools alignment stats as a violin plot to a file.

plot = multiqc.get_plot("Flagstat", "Alignment stats")
plot.save(
    "flagstat_alignment_stats.html",
    dataset="Read counts",
    violin=True,
)

Writing report

Render HTML from parsed module data, and write a report along with auxiliary data files to disk.

def write_report(**kwargs)

Parameters:

  • title: Report title. Printed as page header, used for filename if not otherwise specified
  • report_comment: Custom comment, will be printed at the top of the report
  • template: Report template to use
  • output_dir: Create report in the specified output directory
  • filename: Report filename. Use ‘stdout’ to print to standard out
  • make_data_dir: Force the parsed data directory to be created
  • data_format: Output parsed data in a different format
  • zip_data_dir: Compress the data directory
  • force: Overwrite existing report and data directory
  • make_report: Generate the report HTML. Defaults to True, set to False to only export data and plots
  • export_plots: Export plots as static images in addition to the report
  • plots_force_flat: Use only flat plots (static images)
  • plots_force_interactive: Use only interactive plots (in-browser Javascript)
  • strict: Don’t catch exceptions, run additional code checks to help development
  • development: Development mode. Do not compress and minimise JS, export uncompressed plot data
  • make_pdf: Create PDF report. Requires Pandoc to be installed
  • no_megaqc_upload: Don’t upload generated report to MegaQC, even if MegaQC options are found
  • quiet: Only show log warnings
  • verbose: Print more information to the console
  • no_ansi: Disable coloured log output
  • profile_runtime: Add analysis of how long MultiQC takes to run to the report
  • no_version_check: Disable checking the latest MultiQC version on the server
  • run_modules: Use only these modules
  • exclude_modules: Do not use these modules
  • config_files: Specific config file to load, after those in MultiQC dir / home dir / working dir
  • custom_css_files: Custom CSS files to include in the report
  • module_order: Names of modules in order of precedence to show in report

Example:

multiqc.write_report(
    force=True,
    output_dir="my_multiqc_report",
    title="My Report",
    filename="report.html",
)

Load config from file

Load config on top of the current config from a MultiQC config file.

def load_config(config_file: str | Path)

Reset session

Reset the report to start fresh. Drops all previously parsed and loaded data.

def reset()