FAQ¶

This is a list of Frequently Asked Questions about PyTorch Connectomics. Feel free to suggest new entries!

Why the model input sizes are usually 2n+1 (33, 129, 257, etc.) instead of 2n (32, 128, 256, etc.)?
Based on the Figure 11 of Mind the Pad – CNNs can Develop Blind Spots (arXiv), using 2n+1 input sizes for models with zero-padding layers can give a symmetric foveation map, but 2n leads to an asymmetric foveation map.
Why the activation functions during training and inference can be different?
During training, loss functions like CrossEntropyLoss and BCEWithLogitsLoss do not require softmax or sigmoid activations, but during inference (and visualization) those activations are needed. Besides, multiple losses can be applied to a single target during training with different activations.
How to finetune on a saved checkpoint from the beginning instead of resuming training?
To start from model weights without optimizer and scheduler state, pass the checkpoint as the model weight source in the config instead of resuming the full Lightning trainer state. Full trainer resume is reserved for continuing an interrupted run.
What dataset classes are used for large volumes?
The old VolumeDataset/TileDataset split has been removed. Current configs choose between connectomics.data.datasets.CachedVolumeDataset, connectomics.data.datasets.LazyH5VolumeDataset, connectomics.data.datasets.LazyZarrVolumeDataset, and connectomics.data.datasets.MonaiFilenameDataset. Use cached volume loading for datasets that fit in RAM, lazy HDF5/Zarr loading for large volumes, and filename JSON loading for pre-tiled image collections. See more information here.
What does isotropy mean in the model definition?
For volumetric images generated by serial-sectioning electron microscopy (EM), the resolution for the xy-plane is much higher than the z-axis (e.g., 4 vs. 30 nm), which are denoted as anisotropic volumes. It becomes less reasonable to use symmetric convolutions and the same downsampling ratios for all three axes in the model. Therefore we use the isotropy kwargs to control the types of convolutions and downsampling ratios at different CNN stages. For initial stages where the input is anisotropic, we use 2D convolutions and only downsample the xy-plane. For later stages where the feature becomes roughly isotropic after downsampling the xy-plane multiple times, we apply symmetric 3D convolutions and downsample all three dimensions with the same ratio (see connectomics.models.architectures.rsunet.RSUNet). Transformer-based architectures can follow a similar design.
What are the differences between 2D and 3D affinity maps?
The affinity between two pixels (usually 2 adjacent ones) is 1 if and only if they share the same segment index and the index does not mean background (usually 0). For 2D images, the affinity map has 2 channels, one channel for the affinity between (x,y) to (x+1,y), the other channel for the affinity between (x,y) to (x,y+1). Thus the prediction from a 2D affinity model is commonly (2,h,w). For 3D volumes, the affinity map has 3 channels, one channel for the affinity between (x,y,z) to (x+1,y,z), the second channel for (x,y,z) to (x,y+1,z), and the third channel for (x,y,z) to (x,y,z+1). Therefore the prediction from a 3D affinity model is commonly (3,d,h,w).