Picard is a set of Java command line tools for manipulating high-throughput sequencing data.
The Picard module parses results generated by Picard, a set of Java command line tools for manipulating high-throughput sequencing data.
It’s possible to customise the HsMetrics “Target Bases 30X” coverage and
WgsMetrics “Fraction of Bases over 30X” that are
shown in the general statistics table. This must correspond to field names in the
picard report, such as
PCT_10X. Any numbers not found in the
reports will be ignored.
The coverage levels available for HsMetrics are typically 1, 2, 10, 20, 30, 40, 50 and 100X.
The coverage levels available for WgsMetrics are typically 1, 5, 10, 15, 20, 25, 30, 40, 50, 60, 70, 80, 90 and 100X.
To customise this, add the following to your MultiQC config:
picard_config: general_stats_target_coverage: - 10 - 50
In addition to adding a table of results, a
Crosschecks All Expected column will be added to the General Statistics. If all comparisons for a sample were
Expected, then the value of the field will be
True and green. If not it will be
False and Red.
You can customize the columns show in the CrosscheckFingerprints table with the config keys
CrosscheckFingerprints_table_cols_hidden. For example:
picard_config: CrosscheckFingerprints_table_cols: - RESULT - LOD_SCORE CrosscheckFingerprints_table_cols_hidden: - LEFT_LANE - RIGHT_LANE
The column names will be normalized, ex
LOD_SCORE -> Lod score.
Note that if
CALCULATE_TUMOR_AWARE_RESULTS was set to true on the CLI for any of the CrosscheckFingerprints result files, then the
LOD_SCORE_NORMAL_TUMOR will be displayed.
Note that the Target Region Coverage plot is generated using the
PCT_TARGET_BASES_ table columns from the HsMetrics output (not immediately obvious when looking at the log files).
You can customize the columns shown in the HsMetrics table with the config keys
HsMetrics_table_cols_hidden. For example:
picard_config: HsMetrics_table_cols: - NEAR_BAIT_BASES - OFF_BAIT_BASES - ON_BAIT_BASES HsMetrics_table_cols_hidden: - MAX_TARGET_COVERAGE - MEAN_BAIT_COVERAGE - MEAN_TARGET_COVERAGE
Only values listed in
HsMetrics_table_cols will be included in the table.
Anything listed in
HsMetrics_table_cols_hidden will be hidden by default.
A similar config is available for customising the HsMetrics columns in the General Stats table:
picard_config: HsMetrics_genstats_table_cols: - NEAR_BAIT_BASES HsMetrics_genstats_table_cols_hidden: - MAX_TARGET_COVERAGE
By default, the insert size plot is smoothed to contain a maximum of 500 data points per sample. This is to prevent the MultiQC report from being very large with big datasets. If you would like to customise this value to get a better resolution you can set the following MultiQC config values, with the new maximum number of points:
picard_config: insertsize_smooth_points: 10000
The plotted maximum insert size can be set with:
picard_config: insertsize_xmax: 10000
BAM file contains multiple read groups, Picard MarkDuplicates generates a report
with multiple metric lines, one for each “library”.
By default, MultiQC will sum the values for every library it finds and recompute the
ESTIMATED_LIBRARY_SIZE fields, giving a single set of results
If instead you would prefer each library to be treated as a separate sample, you can do so by setting the following MultiQC config:
picard_config: markdups_merge_multiple_libraries: False
This prevents the merge and recalculation and appends the library name to the sample name.
This behaviour is present in MultiQC since version 1.9. Before this, only the metrics from the first library were taken and all others were ignored.
ValidateSamFile Search Pattern
Generally, Picard adds identifiable content to the output of function calls. This is not the case for ValidateSamFile. In order to identify logs the MultiQC Picard submodule
ValidateSamFile will search for filenames that contain ‘validatesamfile’ or ‘ValidateSamFile’. One can customise the used search pattern by overwriting the
picard/sam_file_validation pattern in your MultiQC config. For example:
sp: picard/sam_file_validation: fn: "*[Vv]alidate[Ss]am[Ff]ile*"
The coverage histogram from Picard typically shows a normal distribution with a very long tail. To make the plot easier to view, by default the module plots the line up to 99% of the data. This typically removes the long tail and gives a more useful graph.
If you would like, you can set a specific value for the maximum coverage to cut the graph at. By setting this to a very large value, you will disable the cutting (the graph will automatically limit the axis at the maximum data point). You can do this as follows:
picard_config: wgsmetrics_histogram_max_cov: 500
If running with very high coverage samples or using the Picard
the coverage histogram can become very large indeed. For eaxmple, if reporting coverages of 1 million,
it will have 1 million data points per sample. That can crash the browser and take a long time to run.
There are two customisation MultiQC options to help with this.
Firstly, MultiQC will automatically “smooth” the histogram to a maximum of
1000 data points by binning.
This should stop the browser from crashing. You can tweak how many bins are used with the following:
picard_config: wgsmetrics_histogram_smooth: 1000
1000 to whatever number you want. If you don’t want any smoothing, set it to a very high number
bigger than the number of data points you have.
Secondly, if you would prefer to instead simply skip the histogram, you can set the following:
picard_config: wgsmetrics_skip_histogram: True
This will omit that section from the report entirely, and also skip parsing the histogram data. By specifying this option you may speed up the run time for MultiQC with these types of files significantly.
File search patterns
picard/alignment_metrics: contents: AlignmentSummaryMetrics shared: true picard/basedistributionbycycle: contents: BaseDistributionByCycleMetrics shared: true picard/crosscheckfingerprints: contents: CrosscheckFingerprints shared: true picard/gcbias: contents: GcBias shared: true picard/hsmetrics: contents: HsMetrics shared: true picard/insertsize: contents: InsertSizeMetrics shared: true picard/markdups: contents: DuplicationMetrics shared: true picard/oxogmetrics: contents: OxoGMetrics shared: true picard/pcr_metrics: contents: TargetedPcrMetrics shared: true picard/quality_by_cycle: contents_re: "[Qq]uality[Bb]y[Cc]ycle" contents: MEAN_QUALITY shared: true picard/quality_score_distribution: contents_re: "[Qq]uality[Ss]core[Dd]istribution" contents: COUNT_OF_Q shared: true picard/quality_yield_metrics: contents: QualityYieldMetrics shared: true picard/rnaseqmetrics: contents_re: "[Rr]na[Ss]eq[Mm]etrics" contents: "## METRICS CLASS" shared: true picard/rrbs_metrics: contents: RrbsSummaryMetrics shared: true picard/sam_file_validation: fn: "*[Vv]alidate[Ss]am[Ff]ile*" picard/variant_calling_metrics: fn: "*.variant_calling_detail_metrics" contents: CollectVariantCallingMetrics shared: true picard/wgs_metrics: contents: CollectWgsMetrics shared: true picard/collectilluminabasecallingmetrics: contents: CollectIlluminaBasecallingMetrics shared: true picard/collectilluminalanemetrics: contents: CollectIlluminaLaneMetrics shared: true picard/extractilluminabarcodes: contents: ExtractIlluminaBarcodes shared: true picard/markilluminaadapters: contents: MarkIlluminaAdapters shared: true