The dataset the tutorial walks through.
A 3-folder Oldenburg Siemens dataset that clusters to two BIDS subjects. T1w + T2w anatomicals, BOLD with a multiband variant, DWI with fmap pair, and Siemens CMRR physio. The cleanest end-to-end example, picked as the GUI walkthrough's backing data because every datatype in the workflow appears at least once.
DICOM folder names rarely match BIDS subject identities.
Here, sub-001 was scanned twice
(pre and post) and the scanner exported
each session under its own folder. BIDS Manager clusters
folders into subjects by their DICOM
PatientID tag, so
OL_0001 and
OL_0002 become two sessions
of one subject, while OL_0003
is a different patient with no session label.
Oldenburg neuroimaging unit (3 folders / 2 BIDS subjects)
Three source folders. The first two
(OL_0001,
OL_0002) share a PatientID
and collapse into sub-001
with ses-pre / ses-post. The third
(OL_0003) stands alone as
sub-002 with no session
entity.
3 folders → 2 BIDS subjects.
First scanning session for sub-001. T1w, two fMRI
tasks (sparse,
rest) and two fieldmap
pairs.
Same patient, post-intervention. Multiband BOLD, one fieldmap pair, and a physio log. Cluster identified by shared PatientID; session order from StudyDate.
Different patient. T2w, three fMRI tasks (with multiband
SBRefs), DWI with a PA b0 rerouted into
fmap/, and two physio logs.
No session label.
How the subject clustering works
The subject_identity.cluster_subjects
module reads each folder's first DICOM, pulls
PatientID +
PatientName, then groups folders
that share both. The folders in this dataset cluster like
this:
-
PatientID XX00XX00 —
OL_0001andOL_0002share a PatientID, so they collapse into one cluster. StudyDate orders them as ses-pre (older) and ses-post (newer). -
PatientID YY11YY11 —
OL_0003is a separate cluster. Single session, so no ses entity is written.
The same logic applies modality-agnostically. EEG / MEG subjects come from path heuristics instead (EDF / FIF have no PatientID in their header).
Real numbers from the MRI primary pipeline.
The four CLI commands you saw in the CLI walkthrough produce the following on this dataset:
--probe-convert runs in
~25 s on a Mac. 21 keepers across both subjects, 12
auto-skipped (scouts, Phoenix reports, calibrations).
21 NIfTI files (anat / func / dwi / fmap), one physio TSV.gz (from the CMRR physio log), 28 sidecar JSONs. No conversion failures on this dataset.
Zero errors. The 7 warnings are TODO placeholders the
enrichment cannot infer: License / Authors in
dataset_description.json, plus
Instructions /
TaskDescription on the BOLD
sidecars. Fillable in the Editor in one pass.
Notable classifier moves on this dataset
- Subject identity clustering. Three folders → two clusters → two BIDS subjects with the appropriate session split (the central reason this dataset is the primary tutorial sample).
-
B0-reference reroute. The PA-direction
single-volume DWI on sub-002 is rerouted to
fmap/_episo it can serve as the B0 reference for distortion correction of the AP-direction DWIs. No heuristic file needed. -
Phoenix and scouts auto-skipped. 12 rows
on this dataset (AAHead Scouts + PhoenixZIPReports) come
pre-marked with
bids_guess_skip = true. The user can re-include any of them, but the default is "drop these". -
CMRR physio. The CMRR physio log is
dispatched to the
PhysioDcmBackend, which uses the vendoredbidsphysioto write the BIDS-conformantphysio.tsv.gz+ sidecar pair.