Reports reference

Reports at three nested levels.

MEEGqc generates self-contained HTML reports at three scopes: subject-level (one recording), dataset-level (all subjects in one dataset, QA + QC variants), and multi-dataset-level (cross-dataset). They share the same tab grammar, lobe colour conventions, and set of interactive controls.

Shared interactive controls

Every interactive figure across every report supports:

  • Hover: subject / recording IDs and exact metric values.
  • Zoom: click-drag to zoom, double-click to reset.
  • Lobe legend: click an item to toggle, double-click to isolate.
  • Cap on / off on 3D topomaps adds a solid head model behind the sensors.
  • Line / point / text size controls for printable export.
  • Export each plot to PNG via the Plotly toolbar.
Lobe colours Lobe colours are shared across every plot and every report scope. The full palette is documented on the metrics page.

The three control panels

Every interactive figure exposes a control strip that scales the visual primitives (point size, line width, label size) and toggles overlays. Three flavours, one per plot type:

General plot controls
Standard distribution / line-plot controls.
3D topomap controls
3D topomap: rotate, zoom, toggle cap, swap projection.
Channel x epoch heatmap controls
Channel x epoch heatmap: colour scale, channel sort, marginal profiles.

Subject-level QA

The subject-level QA report shows the full quality profile of one subject across all of its runs and tasks. Use it for isolated triage and for explaining why a recording ranked low in the dataset-level QC report.

Open a live example report Real subject-level QA + QC reports rendered from the two tutorial datasets. Download, unzip, then open the .html file in any browser. The report is self-contained, no MEEGqc install needed.
MEG subject report → EEG subject report →

Tab hierarchy

LevelContent
1Top tabs: Overview, STD, PtP (manual), PtP (auto), PSD, ECG, EOG, Muscle, Head, Stimulus, QC summary.
2Run / task subtabs within each metric tab.
3Channel-type subtabs: MAG, GRAD, EEG, plus General for ECG / EOG signal views.
4Plot subtabs (3D topomap, distribution, channel x epoch heatmap, profile, etc.).

Explore the tabs

Click each chip to see what the report shows in that tab and which views are available.

Subject metadata, recording header details (acquisition date, sampling rate, hardware filters, channel counts), a run x metric availability table, and a 3D sensor geometry view colour-coded by lobe.

Subject report recording header
Recording header for the selected run. The chips at the top (e.g. deduction (run-1), induction (run-1)) switch between this subject's runs. The panel below is the MNE raw-info dump for that run: measurement date, experimenter, digitised points, good / bad channel counts per type (Gradiometers, Magnetometers, EEG, EOG, ECG, Stimulus, CHPI...), sampling frequency, and the online hardware high-pass / low-pass filters that were already applied at acquisition time.
Subject summary card
Subject summary card. Four facts: dataset name, subject ID (sub-XXX), number of recordings (runs) included in this report, and how many QA metrics actually produced figures for the subject. No thresholding happens here; this is a build-time summary of what made it into the report.
Recording x metric availability
Recording x metric availability table. Rows are this subject's runs, columns are the QA metrics. Each cell is Yes when the metric ran for that run, No otherwise. A No flags a metric the calculation step skipped or could not produce figures for (e.g. Head on EEG-only data, ECG when the reference channel failed the sanity check, or a metric you disabled in settings.ini).
3D sensor positions for the selected run, rendered from the raw file's channel coordinates and colour-coded by lobe. Drag to rotate; hover any sensor to see its channel name. The geometry is shown once here in Overview so it's not repeated under every metric tab.
Lobe legend interaction: single-click a lobe to toggle its sensors off / on, double-click to isolate just that lobe. Identical interaction model to every other topomap and lobe-coloured plot in the report.

Channel-wise standard deviation. Three subtabs: 3D topomap, distribution boxplot (one dot per channel), and channel x epoch heatmap with marginal profiles. Persistently high rows are bad-channel candidates; sparse vertical spikes are candidates for selective epoch rejection.

STD tab subtabs
The three STD subtabs: 3D topomap, distribution, channel x epoch heatmap. Same shape on every metric tab.
3D topomap of per-channel STD. Bright sensors = high amplitude dispersion. Hover for the channel name + exact value.
Boxplot of channel STD with one dot per channel, coloured by lobe. Dots well above the upper whisker are bad-channel candidates.
Channel x epoch heatmap with marginal profiles. Persistently bright rows = bad channels; sparse bright vertical stripes = epochs to reject. The marginal traces show the per-channel and per-epoch summaries.

Peak-to-peak amplitude per epoch (max(s) - min(s)). Catches transient bursts and outlier excursions that STD averages out. Two flavours: PtP (manual) uses the MEEGqc Numba-accelerated path, PtP (auto) uses MNE's automatic epoch annotation.

PtP tab subtabs
The three PtP subtabs, same shape as STD.
PtP 3D topomap. Catches transient bursts that STD averages out: a sensor that is mostly quiet with occasional huge excursions shows up here even when its STD is normal.
PtP distribution, one dot per channel, lobe-coloured. Outliers above the upper whisker are candidates for epoch-level rejection rather than whole-channel rejection.
Channel x epoch heatmap, PtP variant. Cross-reference the STD heatmap: a channel that lights up here but not in STD is bursty rather than uniformly noisy.

Power spectral density. Spot narrowband interference (mains harmonics at 50 / 60 / 100 / 120 Hz) and broadband contamination. Five views: SNR triage, Welch curves, relative-band amplitude, and a frequency-selectable topomap.

PSD tab subtabs
The four PSD subtabs, as labelled in the report: Channel-wise PSD topomap (3D), PSD curves by channel, Relative power (noise frequencies), and Relative power (canonical bands).
PSD relative power at noise frequencies
Relative power (noise frequencies) subtab: per-channel share of total spectral power concentrated at mains and its harmonics (50 / 60 Hz and multiples). Channels stacked at the top of the bar plot are the ones most contaminated by line noise.
Relative power per canonical band
Relative power (canonical bands) subtab: per-channel share of total power in the classical EEG / MEG bands (delta, theta, alpha, beta, gamma). Useful to spot channels with an abnormal band distribution relative to the rest of the recording.
PSD curves by channel subtab: the Welch PSD estimate for every channel overlaid as one line. Narrow tall peaks at 50 / 60 Hz and multiples = mains contamination; broad elevation across many channels = environmental broadening or motion-related noise.
Channel-wise PSD topomap (3D) subtab: PSD at the frequency you pick on the slider, drawn on the sensor layout. Hot spots at a mains harmonic frequency tell you where the noise is geographically concentrated on the cap / dewar.

Cardiac contamination. The General subtab visualises the ECG channel itself (peaks, BPM, average R-mean waveform). The MAG / GRAD / EEG subtabs show how strongly the ECG signal correlates with each sensor, with three buckets (most / moderately / least affected).

The ECG subtabs: General for the reference channel itself, then one tab per sensor type (MAG / GRAD / EEG) for the per-sensor correlation breakdown.
Raw ECG channel with the R-peaks MEEGqc detected. This is what the three-condition sanity check (amplitudes / breaks / bursts) is computed on.
Mean R-wave: the average of all detected peak-locked epochs. It is the reference template the per-channel correlation is measured against.
3D topomap of abs(corr_coef) between each sensor's average artifact and the mean R-wave. Brighter sensors carry more cardiac contamination.
Most affected (top third by abs(corr_coef)). Channel waves track the mean R-wave closely.
Moderately affected (middle third). Partial tracking.
Least affected (bottom third). Sensors here are essentially uncorrelated with the cardiac signature.

Ocular contamination (blinks, saccades). Same structure as ECG: General subtab for the raw EOG channel, MAG / GRAD / EEG subtabs for sensor correlation buckets. Strongest impact typically frontal.

The EOG subtabs: same structure as ECG (General + MAG / GRAD / EEG).
Raw EOG channel with detected blink events. The three-condition sanity check runs on this trace (with blink-specific bounds).
Mean blink wave averaged across all detected blink-locked epochs. Reference template for the per-channel correlation.
3D topomap of abs(corr_coef): ocular contamination typically concentrates on frontal sensors.
Most affected (top third by abs(corr_coef)). Tight tracking of the mean blink, usually frontal.
Moderately affected (middle third). Partial tracking.
Least affected (bottom third). Sensors here are largely unaffected by blinks.

High-frequency muscle noise. 110-140 Hz for MEG, 20-100 Hz for EEG. Burst-driven by jaw clenches, neck tension, and similar artifacts. The plot shows z-scored high-frequency power across time.

Z-scored high-frequency activity across time. Excursions above the threshold are flagged as muscle events; the recording's muscle event ratio feeds the mus family of the GQI.

MEG-only. Movement across the recording derived from continuous head localisation (cHPI) data: six motion parameters (three translations and three rotation quaternion components) plus a derived summary. EEG recordings show a banner explaining that the metric is skipped because cHPI is MEG-specific.

Head motion parameters derived from cHPI: three translations + three rotations across the recording. Sudden jumps mark gross-motion events you may want to annotate.

Reads events from BIDS _events.tsv first, falls back to the raw stim channels. Confirms event count, timing, and trial-type distribution.

Stimulus tab header
Event count summary: total events, distinct trial types, and which source MEEGqc used (_events.tsv or fallback stim channel reconstruction).
BIDS event timeline
Event timeline reconstructed from _events.tsv. Colour-coded by trial type; useful as a sanity check that triggers landed where the protocol expected.
Stim channel timeline
Fallback: raw stim-channel reconstruction when _events.tsv is missing or incomplete.

Compact metric-by-metric distillation of the QA results for this subject plus its GQI score across attempts. Short text summary, then auditable detail tables, then exact paths to the underlying derivatives.

QC summary subtabs
QC summary subtabs: one per metric plus the top-level GQI card.
Subject GQI summary
Subject GQI: composite 0-100 score plus the decomposition into the four penalty families (ch, corr, mus, psd). Read the dominant family to know what to fix.
STD QC summary: noisy / flat channel %, noisy / flat epoch % for this subject.
PtP QC summary: same shape as STD on the peak-to-peak metric.
PSD QC summary: percentage of total spectral power concentrated at mains and its harmonics.
ECG QC summary
ECG QC summary: high-correlation channel % per sensor type.
EOG QC summary
EOG QC summary: same metric on the ocular channel.
Muscle QC summary
Muscle QC summary: event count and event ratio for this recording.

Dataset-level QA

The dataset-level QA report aggregates quality assessment across all subjects in one dataset. Use it to identify patterns shared across the subjects, outlier subjects, and task-dependent shifts.

Open a live example report Real dataset-level QA report rendered from one of the tutorial datasets. Download, unzip, open the .html file.
Dataset-level QA report →
Dataset-level QA report provenance header
Report-provenance block at the top of every dataset-level QA report: QA group report: <dataset_name> title, the timestamp when the report was generated, the MEGqc version that produced it, the epoch label, and a full Settings snapshot: the exact settings.ini block used for this build (every metric's parameters, channel-type selection, epoching, plot flags). The closing Important note reminds the reader that the Cohort QA overview combines global cohort footprints with subject-aware summaries; metric-level panels keep recording identity in their hover text. Use this header to confirm exactly which build of the report you're looking at before drawing any conclusion.

Tab hierarchy

LevelContent
1Channel-type tabs: Combined (mag+grad), MAG, GRAD, EEG (when present).
2Section tabs (5 of them, listed below).
3Metric subtabs within Section 4 details.
4+Measures (Median, Mean, Upper Tail) and figure types (Boxplot, Violin, Histogram, Density).
Channel-type tabs
Channel-type tabs at the top of the report: Combined (mag + grad together), MAG, GRAD, and EEG when present. Switching tabs re-scopes every section below to that channel type; a recording's quality often differs sharply between sensor types, so check each before drawing conclusions.
Dataset section tabs
The five section tabs that organise the QA report: Summary distributions, Cohort overview, Metrics across tasks, Metric details, and ECDFs. Each tab answers a different question; the practical reading order below covers when to use which.

Explore the five sections

Violin + box plots of every metric across every recording in the dataset, with each recording plotted as a hoverable dot. Pooled 3D topomaps summarise where issues concentrate.

Violin / box plot controls
Plot-type and summary controls: flip between violin (full shape of the distribution) and box (median + IQR + whiskers), and switch the per-recording summary statistic between Median, Mean, and Upper Tail. Upper-tail is the most useful for spotting outliers driven by a few bad epochs.
Per-metric violin distributions (first group of metrics). Each violin is the spread of one metric across all recordings in the dataset; each dot inside is one recording (hover for subject + run identity). Long thin tails point at a few outlier recordings dragging the dataset-level summaries down.
Per-metric violin distributions (remaining metrics). Same grammar as part 1; the split into two parts is purely cosmetic so the page isn't crushed horizontally.
Pooled 3D topomaps across all subjects in the dataset: where each metric concentrates spatially. Hotspots in the same location across metrics often point at a consistently bad sensor.

General dataset information, subject ranking table (worst first), and recording x metric / subject x metric matrices.

Cohort general information tiles
Three-tile dataset-level overview at the top of the section. Dataset summary: total subject count, total task-level row count, and the list of distinct task labels present. Subjects per task / condition: per-task breakdown of how many subjects and how many derivative rows contributed to each task. Column availability: for every metric column written into the per-task TSVs (GQI_*, STD_*, PTP_*, PSD_noise_*, ECG_*, EOG_*, Muscle_*), how many of those rows have a non-missing value vs how many are missing. A non-zero Missing n on a single column points at a metric that did not compute for some recordings; non-zero across many columns usually indicates a calculation step that crashed or was skipped for a subset of the subjects.
Subject ranking table, sorted by an aggregated quality summary (a normalised composite across every metric). Worst recordings at the top. Click any row to jump to that subject's subject-level QA report; this is the primary entry point for "which recordings should I look at first?".
Recording x metric and subject x metric matrices. Each cell is a normalised metric value colour-coded by severity. Vertical stripe of bright cells = one bad recording dragging multiple metrics down; horizontal stripe = a specific metric that is systematically degraded across every subject in the dataset (often an acquisition-site-wide issue like mains contamination or shielding problems).

Line plots: one line per subject, dark line for the median across subjects. Reveals task-dependent shifts.

Metric trajectories across tasks, per subject. Sharp shifts at a specific task point at a task-dependent issue (instructions, condition, stim equipment).

Per-metric deep dive with three panel perspectives (recording / epoch-per-channel / channel-per-epoch) and four figure types (boxplot, violin, histogram, density). Switch between Raw and Normalized modes to compare across conditions without changing rank.

Metric details subtabs
One subtab per metric inside this section. Switching subtab re-scopes the controls and plots below to that metric; STD / PtP / PSD have the richest sub-views since they support both channel-per-epoch and epoch-per-channel perspectives.
Distribution plot controls
Distribution-plot controls: pick the figure type (boxplot, violin, histogram, density) and the summary measure (median / mean / upper tail). Toggle Raw vs Normalized to compare across conditions without letting absolute scale change the ranking.
Distribution plots, cycling through the available figure types and summary measures for one metric. The same data, four lenses: long tails are easiest to spot on the box, modality of the distribution is easiest on the violin, exact density shape on the histogram / density curve.
Per-task channel x epoch heatmap. The same grammar as the subject-level heatmap, but pooled across subjects for one task at a time.
Per-task 3D topomap. Useful to confirm whether a hotspot in the pooled topomap is driven by one task or persists across all of them.

Empirical cumulative distribution functions. Read "what percentage of recordings sit below value X" directly off the curve to support threshold-selection decisions. One ECDF per QA metric:

STD ECDF
STD ECDF across all recordings. Y axis is the cumulative fraction of recordings; X axis is the per-recording STD summary. Read "what % of recordings fall below a candidate threshold" directly off the curve. STD metric reference.
PtP ECDF
PtP ECDF. Same grammar as STD, but on peak-to-peak amplitude. PtP ECDF typically has a heavier right tail than STD because bursts inflate PtP more than STD. PtP metric reference.
PSD ECDF
PSD-noise ECDF. A pronounced "knee" far to the right typically reflects a subset of recordings with strong mains or environmental contamination. PSD metric reference.
ECG ECDF
ECG-correlation ECDF. The X axis is the fraction of channels with abs(corr_coef) above the operational threshold; the curve answers "what fraction of subjects have more than X % cardiac-affected channels?". ECG metric reference.
EOG ECDF
EOG-correlation ECDF. Same grammar as the ECG version on the ocular metric. EOG often shows a steeper curve because frontal sensors dominate the affected set in most recordings. EOG metric reference.
Muscle ECDF
Muscle-noise ECDF. X axis is the per-recording muscle event ratio; recordings far to the right are the ones with the most jaw / neck artifacts. Muscle metric reference.

Dataset-level QC

Dataset-level QC reports centre on the Global Quality Index (GQI). Unlike QA reports, QC reports summarise quality decisions based on configurable thresholds. Each recording gets a single 0-100 score; component breakdowns explain what dragged the score down.

Open a live example report Real dataset-level QC report rendered from one of the tutorial datasets, with the default GQI thresholds. Download, unzip, open the .html file.
Dataset-level QC report →
Dataset-level QC report header
Header of the dataset-level QC report: dataset name, GQI attempt number used to populate every chart on this page, and the analysis profile in effect. Different attempts can produce different decisions on the same data, so always confirm the attempt before drawing conclusions.

Dataset overview

QC dataset general information
General dataset-level information: subject count, recording count, tasks, channel-type breakdown. The QC view also surfaces the GQI parameter snapshot (global_quality_index_<n>.ini) tied to the current attempt, so the decision criteria are visible alongside the data.
Metrics availability matrix
Per-recording metrics-availability matrix. Each row is a recording; each column is a QC metric. Blank cells flag a metric the QC layer could not compute for that recording (e.g. Head on EEG-only data, ECG when the reference channel failed the sanity check) and is therefore neutralised in the GQI for that row.

Inputs and attempts

QC Group reads GQI results from attempt-indexed, per-modality TSVs:

Attempt resolution order:

  1. Explicit --input_tsv path.
  2. Explicit --attempt <n>.
  3. Latest available attempt (default).

Tab hierarchy

LevelContent
1Channel-type tabs: Combined (mag+grad), MAG, GRAD, EEG.
2Metric tabs (below).
QC channel-type tabs
Channel-type tabs at the top of the QC report (Combined / MAG / GRAD / EEG). The GQI tab uses the combined score; the per-metric tabs let you check whether a problem is sensor-type specific (e.g. magnetometers worse than gradiometers).
QC main tabs
Per-metric tabs: GQI (the composite) followed by each component metric (STD / PtP / PSD / ECG / EOG / Muscle). The drill-down workflow is "start at GQI, then open the tab matching the dominant penalty family".

Summary distributions

The summary distribution tab shows box / violin plots for every QC metric across all subjects in the dataset, so a single glance shows where the dataset sits relative to the configured thresholds.

QC box / violin plot controls
Plot-type and measure controls for the QC distributions. Toggle between violin and box; switch the per-recording aggregate (Median / Mean / Upper Tail). Upper-tail is the most diagnostic for picking GQI thresholds because it answers "how bad are my worst recordings?" rather than "how is the median doing?".
GQI score distribution across all recordings in the dataset. Each dot is a recording; hover for subject + task. The shape of the violin tells you whether the dataset is uniformly good (a tight pile near 100), bimodal (two clusters suggest a subset of subjects with consistent problems), or has a long low-quality tail (a few bad recordings dragging the mean down).
Per-metric QC distributions (first group). These are the components that feed the GQI: noisy / flat channel %, high-correlation channel %, muscle event ratio, PSD noise %. The dotted line on each shows the threshold the current attempt is using; bars beyond it are the recordings being penalised.
Per-metric QC distributions (remaining metrics). Same grammar as part 1; pairing the two views shows you whether one penalty family is dominating the GQI more than its weight suggests it should.

Metric details: across tasks + subject ranking

The metric-details tab adds two layers on top of the summary distributions: a per-task trajectory view and a worst-first subject ranking, both shared across every metric subtab.

QC metric trajectories across tasks: one line per subject, dark line for the median across subjects. Use it to decide whether a metric should have a different threshold for different tasks (e.g. muscle artifacts naturally higher in a speech production task than in resting state).
Per-task subject ranking
Subject ranking table per task and metric, worst at the top. Same identity as the QA ranking but scoped to the active metric tab; this is the table you screenshot for a meeting when you need to justify dropping specific recordings.

Explore the metric tabs

Score distribution, penalty decomposition across four families (ch, corr, mus, psd), and recording-level ranking. Penalty math and threshold defaults are on the metrics + GQI page.

RangeInterpretation
90-100 %Excellent. Minimal artifacts.
70-89 %Good. Some artifacts present.
50-69 %Moderate. Notable artifact contamination.
Below 50 %Poor. Significant issues.
GQI trajectory across tasks, per subject. Each line is one subject across their tasks. Crossing lines = subjects whose relative rank changes between tasks (often a condition-driven issue: same brain, different quality). Flat low lines = consistently low-quality recordings to consider dropping entirely. Wide spread at one task = subjects are heterogeneous on that task and may need its own threshold.

Noisy / flat channel %, noisy / flat epoch %. Distributions across recordings + ranking per task.

Per-task STD QC trajectory: one line per subject. Y axis is the noisy / flat channel percentage. A sustained shift up at a particular task (across many subjects) typically means the task induces a specific artifact (jaw movement during speech, eye motion during reading, etc.) rather than hardware drift.

Same content as STD but for peak-to-peak amplitude.

Per-task PtP QC trajectory. Burst-driven artifacts (e.g. occasional motion spikes) hit PtP harder than STD because PtP picks up the single worst excursion in each epoch. Compare with the STD tab: a task that elevates PtP but not STD is a task with transient bursts rather than uniformly noisy data.

PSD noise percentage per recording (fraction of total spectral power at mains and its harmonics), distribution + ranking per task.

Per-task PSD QC trajectory. Spikes at a specific task can signal task-correlated environmental noise (e.g. a stimulus device that emits broadband interference).

High-correlation channel % per recording, per task.

Per-task ECG QC
Per-task ECG QC: how the high-correlation fraction shifts task-to-task. A task with consistently higher cardiac correlation across subjects suggests a fixed condition (e.g. an instructed-rest block) where the heart rate effect is more visible.

High-correlation channel % per recording, per task.

Per-task EOG QC
Per-task EOG QC. Tasks that involve more visual fixation (e.g. fixation crosses) typically show fewer blinks here.

Muscle event count, event rate, and GQI muscle component per task.

Per-task Muscle QC
Per-task Muscle QC. Speaking / moving-jaw tasks tend to show consistently elevated muscle event rates here.

Drill-down workflow

  1. Start with the GQI tab. Identify low-scoring recordings and the dominant penalty family.
  2. If ch is high, open the STD / PtP tabs.
  3. If corr is high, open ECG / EOG.
  4. If mus is high, open Muscle.
  5. Switch MAG vs GRAD to see if the issue is sensor-type specific.
  6. Open the subject-level QA report for the flagged recording for the full channel x epoch picture.

Multi-dataset-level reports

Multi-dataset-level reports compare two or more datasets side by side. They come in two flavours that mirror QA / QC: QA Multi-dataset compares raw signal profiles; QC Multi-dataset compares GQI scores.

Open a live example report Real multi-dataset-level reports rendered across the two tutorial datasets (one MEG, one EEG) so you can see how the cross-dataset views render with heterogeneous modalities. Download, unzip, open the .html file.
Multi-dataset QA report → Multi-dataset QC report →

Multi-site studies

Compare data quality across acquisition sites; spot systematic shifts.

Longitudinal waves

Track quality changes across collection waves; detect equipment degradation and protocol drift.

Harmonization

Decide whether datasets can be pooled, or whether each needs its own QC threshold.

Benchmarking

Compare a new dataset against a reference to validate collection procedures.

Structure

Both multi-dataset reports follow the same shape as their single-dataset cousins, plus per-dataset subtabs:

QA Multi-dataset
  Top tabs: Combined | MAG | GRAD | EEG
    Section tabs:
      1. Summary distributions       (pooled across datasets)
      2. Cohort overview             (one subtab per dataset)
      3. Metrics across tasks        (one subtab per dataset)
      4. Metric details              (shared distributions + per-dataset heatmaps)
      5. ECDFs                       (pooled across datasets)

QC Multi-dataset
  Top tabs: Combined | MAG | GRAD | EEG
    Metric tabs: GQI | STD | PtP | PSD | ECG | EOG | Muscle
      Cross-dataset distribution comparisons

Preconditions

  • At least 2 datasets.
  • Compatible analysis profiles (same metrics enabled).
  • Similar GQI parameterisation for fair score comparison.
  • Comparable tasks / conditions for task-dependent views.

Practical reading order

The fastest path through a fresh dataset:

  1. Open the dataset-level QC report. Look at the GQI distribution and the penalty family ranking.
  2. Pick the worst few recordings; open them in the subject-level QA report.
  3. Open the dataset-level QA report's task-wise view to check if problems concentrate in one condition.
  4. For multi-site studies, finish with the multi-dataset-level report to inform harmonisation decisions.

Metric-by-metric interpretation and the GQI math are on the metrics + GQI page. Threshold tuning is in [GlobalQualityIndex] in the settings reference.