MRI primary dataset

The dataset the tutorial walks through.

A 3-folder Oldenburg Siemens dataset that clusters to two BIDS subjects. T1w + T2w anatomicals, BOLD with a multiband variant, DWI with fmap pair, and Siemens CMRR physio. The cleanest end-to-end example, picked as the GUI walkthrough's backing data because every datatype in the workflow appears at least once.

The point of this dataset: subject clustering.

DICOM folder names rarely match BIDS subject identities. Here, sub-001 was scanned twice (pre and post) and the scanner exported each session under its own folder. BIDS Manager clusters folders into subjects by their DICOM PatientID tag, so OL_0001 and OL_0002 become two sessions of one subject, while OL_0003 is a different patient with no session label.

Example data

Oldenburg neuroimaging unit (3 folders / 2 BIDS subjects)

Three source folders. The first two (OL_0001, OL_0002) share a PatientID and collapse into sub-001 with ses-pre / ses-post. The third (OL_0003) stands alone as sub-002 with no session entity.

Download MRI primary dataset →

Dataset overview

3 folders → 2 BIDS subjects.

sub-001 / ses-pre OL_0001 45 DICOMs · 9 series

First scanning session for sub-001. T1w, two fMRI tasks (sparse, rest) and two fieldmap pairs.

sub-001 / ses-post OL_0002 62 DICOMs · 10 series

Same patient, post-intervention. Multiband BOLD, one fieldmap pair, and a physio log. Cluster identified by shared PatientID; session order from StudyDate.

sub-002 OL_0003 287 DICOMs · 17 series

Different patient. T2w, three fMRI tasks (with multiband SBRefs), DWI with a PA b0 rerouted into fmap/, and two physio logs. No session label.

How the subject clustering works

The subject_identity.cluster_subjects module reads each folder's first DICOM, pulls PatientID + PatientName, then groups folders that share both. The folders in this dataset cluster like this:

PatientID XX00XX00 — OL_0001 and OL_0002 share a PatientID, so they collapse into one cluster. StudyDate orders them as ses-pre (older) and ses-post (newer).
PatientID YY11YY11 — OL_0003 is a separate cluster. Single session, so no ses entity is written.

The same logic applies modality-agnostically. EEG / MEG subjects come from path heuristics instead (EDF / FIF have no PatientID in their header).

What you'll see

Real numbers from the MRI primary pipeline.

The four CLI commands you saw in the CLI walkthrough produce the following on this dataset:

Scan 33 / 12 inv rows / skipped

--probe-convert runs in ~25 s on a Mac. 21 keepers across both subjects, 12 auto-skipped (scouts, Phoenix reports, calibrations).

Convert 21 NIfTI files written

21 NIfTI files (anat / func / dwi / fmap), one physio TSV.gz (from the CMRR physio log), 28 sidecar JSONs. No conversion failures on this dataset.

Validate 55 / 7 / 0 ok / warn / err

Zero errors. The 7 warnings are TODO placeholders the enrichment cannot infer: License / Authors in dataset_description.json, plus Instructions / TaskDescription on the BOLD sidecars. Fillable in the Editor in one pass.

Notable classifier moves on this dataset

Subject identity clustering. Three folders → two clusters → two BIDS subjects with the appropriate session split (the central reason this dataset is the primary tutorial sample).
B0-reference reroute. The PA-direction single-volume DWI on sub-002 is rerouted to fmap/_epi so it can serve as the B0 reference for distortion correction of the AP-direction DWIs. No heuristic file needed.
Phoenix and scouts auto-skipped. 12 rows on this dataset (AAHead Scouts + PhoenixZIPReports) come pre-marked with bids_guess_skip = true. The user can re-include any of them, but the default is "drop these".
CMRR physio. The CMRR physio log is dispatched to the PhysioDcmBackend, which uses the vendored bidsphysio to write the BIDS-conformant physio.tsv.gz + sidecar pair.

Walk the GUI tutorial → See the CLI walkthrough