MitoEM (Instance Segmentation)¶

This tutorial reproduces 3D mitochondria instance segmentation on the MitoEM dataset released by Wei et al. in 2020. The recipe lives under tutorials/mitoEM/ with two dataset-specific entry points (R.yaml for MitoEM-Rat, H.yaml for MitoEM-Human, HR.yaml for joint training) sharing common.yaml.

The pipeline is multi-task: predict short-range affinity, long-range affinity (radius 5), and a skeleton-aware EDT head, then decode with a distance-watershed step. Evaluation uses Adapted Rand and Variation of Information.

Goal¶

The pipeline pins the following setup (encoded in tutorials/mitoEM/common.yaml and inherited by R.yaml / H.yaml / HR.yaml):

Input [32, 256, 256] patches at native MitoEM resolution 30 × 8 × 8 nm.
Model MedNeXt-M, kernel size 3, checkpoint_style: outside_block, three output heads:
- aff_r1 — 3-channel short-range affinity at offsets (0, 0, 1) / (0, 1, 0) / (1, 0, 0);
- aff_r5 — 3-channel long-range affinity at offsets (0, 0, 5) / (0, 5, 0) / (5, 0, 0);
- sdt — 1-channel skeleton-aware EDT head.
Loss per-channel BCE on each affinity head plus a SmoothL1 (tanh: true) on the EDT head, balanced by uncertainty loss-balancing.
Augmentation aug_em_neuron_fast profile with rotations on all three axes.
Optimization warmup_cosine_lr profile, 200 epochs × 1000 steps, accumulate_grad_batches=4, precision=bf16-mixed.
Inference sliding window 32 × 256 × 256, sw_batch_size=1, 50 % overlap, bump blending, replicate-padding mode; head set to aff_r1 for the saved primary output.
Decoder decode_distance_watershed over the EDT channel (distance_channels=[6], distance_threshold=[0.5, 0], min_seed_size=100, min_instance_size=50).
Metric adapted_rand + voi.

1 - Get the data¶

The MitoEM dataset is publicly available at the project page and the MitoEM Challenge. On the lab cluster it is staged at:

/projects/weilab/dataset/mito/mitoEM/
    EM30-R/                  # rat
        im_train.h5,  mito_train-v2.h5
        im_val.h5,    mito_val-v2.h5
        im_test.h5,   mito_test-v2.h5
    EM30-H/                  # human
        (same layout)

Each split is a 4096 × 4096 × {400|100|500} HDF5 stack at 30 × 8 × 8 nm. The train.data.root_path field in common.yaml points at this directory; override at the CLI if you stage data elsewhere.

The test labels for MitoEM challenge submission are not publicly released; mito_test-v2.h5 here refers to the locally maintained v2 labels for offline development.

2 - Run training¶

Pick the dataset variant and run:

conda activate pytc

# MitoEM-Rat
python scripts/main.py --config tutorials/mitoEM/R.yaml

# MitoEM-Human
python scripts/main.py --config tutorials/mitoEM/H.yaml

# Joint (rat + human in the same training run)
python scripts/main.py --config tutorials/mitoEM/HR.yaml

The config sets system.num_gpus: -1 and system.num_workers: -1, so PyTC fans out across every visible GPU.

Training schedule:

max_epochs=200, n_steps_per_epoch=1000 → 200 k optimizer steps total.
accumulate_grad_batches=4 with batch_size=1 per GPU → effective batch size 4 × num_gpus.
checkpoint.monitor=val_loss_total with mode=min, save_top_k=3.
Image previews on the aff_r1 head every 10 epochs.

Outputs land in outputs/mitoem30{r,h,hr}_mednext_sdt_multitask/<timestamp>/.

Monitor with TensorBoard:

just tensorboard mitoem30r_mednext_sdt_multitask

3 - Inference, decoding, evaluation¶

Run the combined test mode:

python scripts/main.py --config tutorials/mitoEM/R.yaml \
    --mode test \
    --checkpoint outputs/mitoem30r_mednext_sdt_multitask/<timestamp>/checkpoints/last.ckpt

What happens, in order:

Inference. Sliding window 32 × 256 × 256, 50 % overlap, bump blending, padding_mode=replicate. The primary head aff_r1 is selected at save time, and per-channel sigmoid is applied. The raw 7-channel multi-head prediction is saved as test_im_prediction.h5.
Decoding. decode_distance_watershed runs on the EDT channel (channel 6), seeded at distance > 0.5, growing until the distance hits 0, with seeds < 100 voxels and instances < 50 voxels filtered out. The fast EDT path is enabled with edt_parallel=8.
Evaluation. Adapted Rand and Variation of Information against the test labels; written next to the segmentation.

To swap in the aff_r5 head as the primary inference output:

python scripts/main.py --config tutorials/mitoEM/R.yaml \
    --mode test --checkpoint <ckpt> \
    inference.model.head=aff_r5

4 - Submitting to the MitoEM Challenge¶

The Grand Challenge accepts segmentation HDF5 volumes. After --mode test produces the segmentation under outputs/.../results_step=<N>/, follow the formatting rules at https://mitoem.grand-challenge.org/ and submit. Performance on the challenge test split is only computable on the Grand Challenge website because public ground truth is not released for that split.

Per-volume offline evaluation on the validation split (provided in EM30-{R,H}/mito_val-v2.h5) uses the same adapted_rand + voi metrics described above; just point test.data.test at the val volumes.

5 - Reference behavior¶

A few sanity-check signals:

Training loss has three components (aff_r1, aff_r5, sdt) and uncertainty-balanced weights. The train_loss_term_*_weighted scalars logged in TensorBoard are the most informative — uncertainty balancing typically pushes the aff_r5 term down faster than aff_r1 because the long-range task is harder.
Validation loss is checked at every epoch boundary; the best-3 checkpoints by val_loss_total are kept.
Inference on the 4096 × 4096 × 500 test volume is the dominant cost; expect roughly 1-2 hours on a single A100/H100 with sw_batch_size=1.
Decoder threshold (distance_threshold[0]) is the primary knob for over- / under-segmentation. The default 0.5 is a reasonable starting point; lower (e.g. 0.3) yields more seeds.
Adapted Rand below ~0.05 on the validation split is in the ballpark of the published MitoEM-Rat baseline. The challenge uses AP-75 (average precision at IoU 0.75), which is computed by the Grand Challenge submission system.