Shortcuts

FAQ

This is a list of Frequently Asked Questions about PyTorch Connectomics. Feel free to suggest new entries!

  1. Why the model input sizes are usually 2n+1 (33, 129, 257, etc.) instead of 2n (32, 128, 256, etc.)?

    Based on the Figure 11 of Mind the Pad – CNNs can Develop Blind Spots (arXiv), using 2n+1 input sizes for models with zero-padding layers can give a symmetric foveation map, but 2n leads to an asymmetric foveation map.

  2. Why the activation functions during training and inference can be different?

    During training, loss functions like CrossEntropyLoss and BCEWithLogitsLoss do not require softmax or sigmoid activations, but during inference (and visualization) those activations are needed. Besides, multiple losses can be applied to a single target during training with different activations.

  3. How to finetune on saved checkpoint from the beginning instead of resume training?

    To start from a saved checkpoint, we add --checkpoint checkpoint_xxxxx.pth.tar to the training command. By default the trainer will also load the status of the optimizer and learning-rate scheduler and resume training at the saved iteration. To finetune from beginning (e.g., on a different dataset), we need to change SOLVER.ITERATION_RESTART to True.

  4. What are the differences between VolumeDataset and TileDataset?

    VolumeDataset loads a list of 3D arrays and sample random subvolumes during training or stream sliding-window subvolumes during inference. Since large volumes (e.g., MitoEM) can not be completed loaded into memory for processing and are usually stored as individual PNG images, we implemented the TileDataset that reads the metadata of large datasets to process them by chunk. TileDataset inherits VolumeDataset and each chunk is handled by VolumeDataset. See more information here

  5. What does isotropy mean in the model definition?

    For volumetric images generated by serial-sectioning electron microscopy (EM), the resolution for the xy-plane is much higher than the z-axis (e.g., 4 vs. 30 nm), which are denoted as anisotropic volumes. It becomes less reasonable to use symmetric convolutions and the same downsampling ratios for all three axes in the model. Therefore we use the isotropy kwargs to control the types of convolutions and downsampling ratios at different CNN stages. For initial stages where the input is anisotropic, we use 2D convolutions and only downsample the xy-plane. For later stages where the feature becomes roughly isotropic after downsampling the xy-plane multiple times, we apply symmetric 3D convolutions and downsample all three dimensions with the same ratio (see connectomics.model.arch.UNet3D). Transformer-based architectures can follow a similar design.

  6. What are the differences between 2D and 3D affinity maps?

    The affinity between two pixels (usually 2 adjacent ones) is 1 if and only if they share the same segment index and the index does not mean background (usually 0). For 2D images, the affinity map has 2 channels, one channel for the affinity between (x,y) to (x+1,y), the other channel for the affinity between (x,y) to (x,y+1). Thus the prediction from a 2D affinity model is commonly (2,h,w). For 3D volumes, the affinity map has 3 channels, one channel for the affinity between (x,y,z) to (x+1,y,z), the second channel for (x,y,z) to (x,y+1,z), and the third channel for (x,y,z) to (x,y,z+1). Therefore the prediction from a 3D affinity model is commonly (3,d,h,w).