Shortcuts

connectomics.data

Datasets

class connectomics.data.dataset.TileDataset(chunk_num=[2, 2, 2], chunk_ind=None, chunk_ind_split=None, chunk_iter=- 1, chunk_stride=True, volume_json=['path/to/image.json'], label_json=None, valid_mask_json=None, mode='train', pad_size=[0, 0, 0], data_scale=[1.0, 1.0, 1.0], coord_range=None, do_relabel=True, **kwargs)[source]

Dataset class for large-scale tile-based datasets. Large-scale volumetric datasets are usually stored as individual tiles. Directly loading them as a single array for training and inference is infeasible. This class reads the paths of the tiles and construct smaller chunks for processing.

Parameters
  • chunk_num (list) – volume spliting parameters in \((z, y, x)\) order. Default: \([2, 2, 2]\)

  • chunk_ind (list) – predefined list of chunks. Default: None

  • chunk_ind_split (list) – rank and world_size for spliting chunk_ind in multi-processing. Default: None

  • chunk_iter (int) – number of iterations on each chunk. Default: -1

  • chunk_stride (bool) – allow overlap between chunks. Default: True

  • volume_json (str) – json file for input image. Default: 'path/to/image'

  • label_json (str, optional) – json file for label. Default: None

  • valid_mask_json (str, optional) – json file for valid mask. Default: None

  • mode (str) – 'train', 'val' or 'test'. Default: 'train'

  • pad_size (list) – padding parameters in \((z, y, x)\) order. Default: \([0, 0, 0]\)

  • data_scale (list) – volume scaling factors in \((z, y, x)\) order. Default: \([1.0, 1.0, 1.0]\)

  • coord_range (list) – the valid coordinate range of volumes. Default: None

  • do_relabel (bool) – reduce the the mask indicies in a sampled label volume. This option be set to False for semantic segmentation, otherwise the classes can shift. Default: True

Note

To run inference using multiple nodes in an asynchronous manner, chunk_ind_split specifies the number of parts to split the total number of chunks in inference, and which part should the current node/process see. For example, chunk_ind_split = "0-5" means the chunks are split into 5 parts (thus can be processed asynchronously using 5 nodes), and the current node/process is handling the first (0-base) part of the chunks.

Note

The coord_range option specify the region of a volume to use. Suppose the fisrt input volume has a voxel size of (1000, 10000, 10000), and only the center subvolume of size (400, 2000, 2000) needs to be used for training or inference, then set coord_range=[[300, 700, 4000, 6000, 4000, 6000]].

get_coord_name()[source]

Return the filename suffix based on the chunk coordinates.

loadchunk()[source]

Load the chunk based on current coordinates and construct a VolumeDataset for processing.

updatechunk(do_load=True)[source]

Update the coordinates to a new chunk in the large volume.

class connectomics.data.dataset.VolumeDataset(volume, label=None, valid_mask=None, valid_ratio=0.5, sample_volume_size=(8, 64, 64), sample_label_size=(8, 64, 64), sample_stride=(1, 1, 1), augmentor=None, target_opt=['1'], weight_opt=[['1']], erosion_rates=None, dilation_rates=None, mode='train', do_2d=False, iter_num=- 1, do_relabel=True, reject_size_thres=0, reject_diversity=0, reject_p=0.95, data_mean=0.5, data_std=0.5, data_match_act='none')[source]

Dataset class for volumetric image datasets. At training time, subvolumes are randomly sampled from all the large input volumes with (optional) rejection sampling to increase the frequency of foreground regions in a batch. At inference time, subvolumes are yielded in a sliding-window manner with overlap to counter border artifacts.

Parameters
  • volume (list) – list of image volumes.

  • label (list, optional) – list of label volumes. Default: None

  • valid_mask (list, optional) – list of valid masks. Default: None

  • valid_ratio (float) – volume ratio threshold for valid samples. Default: 0.5

  • sample_volume_size (tuple, int) – model input size.

  • sample_label_size (tuple, int) – model output size.

  • sample_stride (tuple, int) – stride size for sampling.

  • augmentor (connectomics.data.augmentation.composition.Compose, optional) – data augmentor for training. Default: None

  • target_opt (list) – list the model targets generated from segmentation labels.

  • weight_opt (list) – list of options for generating pixel-wise weight masks.

  • mode (str) – 'train', 'val' or 'test'. Default: 'train'

  • do_2d (bool) – load 2d samples from 3d volumes. Default: False

  • iter_num (int) – total number of training iterations (-1 for inference). Default: -1

  • do_relabel (bool) – reduce the the mask indicies in a sampled label volume. This option be set to False for semantic segmentation, otherwise the classes can shift. Default: True

  • reject_size_thres (int, optional) – threshold to decide if a sampled volumes contains foreground objects. Default: 0

  • reject_diversity (int, optional) – threshold to decide if a sampled volumes contains multiple objects. Default: 0

  • reject_p (float, optional) – probability of rejecting non-foreground volumes. Default: 0.95

  • data_mean (float) – mean of pixels for images normalized to (0,1). Default: 0.5

  • data_std (float) – standard deviation of pixels for images normalized to (0,1). Default: 0.5

  • data_match_act (str) – the data is normalized to match the range of an activation. Default: 'none'

Note

For relatively small volumes, the total number of possible subvolumes can be smaller than the total number of samples required in training (the product of total iterations and mini-natch size), which raises StopIteration. Therefore the dataset length is also decided by the training settings.

class connectomics.data.dataset.VolumeDatasetCond(volume, label, label_type='syn', augmentor=None, sample_size=(9, 65, 65), weight_opt=[['1']], mode='train', iter_num=- 1, data_mean=0.5, data_std=0.5)[source]

Dataset class for volumetric images in conditional segmentation. The label volumes are always required for this class.

Parameters
  • label (list) – list of label volumes.

  • volume (list) – list of image volumes.

  • label_type (str) – type of the annotation. Default: 'syn'

  • augmentor (connectomics.data.augmentation.composition.Compose, optional) – data augmentor for training. Default: None

  • sample_size (tuple) – model input size. Default: (9, 65, 65)

  • weight_opt (list) – list of options for generating pixel-wise weight masks.

  • mode (str) – 'train', 'val' or 'test'. Default: 'train'

  • iter_num (int) – total number of training iterations (-1 for inference). Default: -1

  • data_mean (float) – mean of pixels for images normalized to (0,1). Default: 0.5

  • data_std (float) – standard deviation of pixels for images normalized to (0,1). Default: 0.5

class connectomics.data.dataset.VolumeDatasetRecon(volume, label=None, valid_mask=None, valid_ratio=0.5, sample_volume_size=(8, 64, 64), sample_label_size=(8, 64, 64), sample_stride=(1, 1, 1), augmentor=None, target_opt=['1'], weight_opt=[['1']], erosion_rates=None, dilation_rates=None, mode='train', do_2d=False, iter_num=- 1, do_relabel=True, reject_size_thres=0, reject_diversity=0, reject_p=0.95, data_mean=0.5, data_std=0.5, data_match_act='none')[source]
connectomics.data.dataset.build_dataloader(cfg, augmentor=None, mode='train', dataset=None, rank=None, dataset_class=<class 'connectomics.data.dataset.dataset_volume.VolumeDataset'>, dataset_options={}, cf=<function collate_fn_train>)[source]

Prepare dataloader for training and inference.

connectomics.data.dataset.get_dataset(cfg, augmentor, mode='train', rank=None, dataset_class=<class 'connectomics.data.dataset.dataset_volume.VolumeDataset'>, dataset_options={}, dir_name_init=None, img_name_init=None)[source]

Prepare dataset for training and inference.

Parameters
  • dir_name_init (Optional[list]) –

  • img_name_init (Optional[list]) –

Augmentations

class connectomics.data.augmentation.Compose(transforms=[], input_size=(8, 256, 256), smooth=True, keep_uncropped=False, keep_non_smoothed=False, additional_targets=None)[source]

Composing a list of data transforms.

The sample size of the composed augmentor can be larger than the specified input size of the model to ensure that all pixels are valid after center-crop.

Parameters
  • transforms (list) – list of transformations to compose.

  • input_size (tuple) – input size of model in \((z, y, x)\) order. Default: \((8, 256, 256)\)

  • smooth (bool) – smoothing the object mask with Gaussian filtering. Default: True

  • keep_uncropped (bool) – keep uncropped image and label. Default: False

  • keep_non_smooth (bool) – keep the non-smoothed object mask. Default: False

  • additional_targets (dict, optional) – additional targets to augment. Default: None

  • keep_non_smoothed (bool) –

Examples::
>>> # specify addtional targets besides 'image'
>>> kwargs = {'additional_targets': {'label': 'mask'}}
>>> augmentor = Compose([Rotate(p=1.0, **kwargs),
>>>                      Flip(p=1.0, **kwargs),
>>>                      Elastic(alpha=12.0, p=0.75, **kwargs),
>>>                      Grayscale(p=0.75, **kwargs),
>>>                      MissingParts(p=0.9, **kwargs)],
>>>                      input_size = (8, 256, 256), **kwargs)
>>> sample = {'image':input, 'label':label}
>>> augmented = augmentor(data)
>>> out_input, out_label = augmented['image'], augmented['label']
class connectomics.data.augmentation.CopyPasteAugmentor(aug_thres=0.7, p=0.8, additional_targets={'label': 'mask'}, skip_targets=[])[source]

Copy-paste augmentor (experimental).

The input can be a numpy.ndarray or torch.Tensor of shape \((C, Z, Y, X)\) or \((Z, Y, X)\).

Parameters
  • aug_thres (float) – Maximum fractional size of the object occupying the volume. If the object is too large it is not augmented. Default: 0.7

  • p (float) –

  • additional_targets (Optional[dict]) –

  • skip_targets (list) –

copy_paste_single(rot_label, neuron_tensor)[source]

Find rotation with least overlap with GT and if there are multiple rotations with no overlap, find one with least distance from GT

set_params()[source]

Doesn’t change sample size

class connectomics.data.augmentation.CutBlur(length_ratio=0.25, down_ratio_min=2.0, down_ratio_max=8.0, downsample_z=False, p=0.5, additional_targets=None, skip_targets=[])[source]

3D CutBlur data augmentation, adapted from https://arxiv.org/abs/2004.00448.

Randomly downsample a cuboid region in the volume to force the model to learn super-resolution when making predictions. This augmentation is only applied to images.

Parameters
  • length_ratio (float) – the ratio of the cuboid length compared with volume length.

  • down_ratio_min (float) – minimal downsample ratio to generate low-res region.

  • down_ratio_max (float) – maximal downsample ratio to generate low-res region.

  • downsample_z (bool) – downsample along the z axis (default: False).

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

  • skip_targets (list) –

set_params()[source]

There is no change in sample size.

class connectomics.data.augmentation.CutNoise(length_ratio=0.25, mode='uniform', scale=0.2, p=0.5, additional_targets=None, skip_targets=[])[source]

3D CutNoise data augmentation.

Randomly add noise to a cuboid region in the volume to force the model to learn denoising when making predictions. This augmentation is only applied to images.

Parameters
  • length_ratio (float) – the ratio of the cuboid length compared with volume length.

  • mode (string) – the distribution of the noise pattern. Default: 'uniform'.

  • scale (float) – scale of the random noise. Default: 0.2.

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

  • skip_targets (list) –

set_params()[source]

There is no change in sample size.

class connectomics.data.augmentation.DataAugment(p=0.5, additional_targets=None, skip_targets=[])[source]

DataAugment interface. A data augmentor needs to conduct the following steps:

  1. Set sample_params at initialization to compute required sample size.

  2. Randomly generate augmentation parameters for the current transform.

  3. Apply the transform to a pair of images and corresponding labels.

All the real data augmentations (except mix-up augmentor and test-time augmentor) should be a subclass of this class.

Parameters
  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

  • skip_targets (list) –

abstract set_params()[source]

Calculate the appropriate sample size with data augmentation.

Some data augmentations (wrap, misalignment, etc.) require a larger sample size than the original, depending on the augmentation parameters that are randomly chosen. This function takes the data augmentation parameters and returns an updated data sampling size accordingly.

class connectomics.data.augmentation.Elastic(alpha=16.0, sigma=4.0, p=0.5, additional_targets=None, skip_targets=[])[source]

Elastic deformation of images as described in [Simard2003] (with modifications). The implementation is based on https://gist.github.com/erniejunior/601cdf56d2b424757de5. This augmentation is applied to both images and masks.

Simard2003

Simard, Steinkraus and Platt, “Best Practices for Convolutional Neural Networks applied to Visual Document Analysis”, in Proc. of the International Conference on Document Analysis and Recognition, 2003.

Parameters
  • alpha (float) – maximum pixel-moving distance of elastic deformation. Default: 10.0

  • sigma (float) – standard deviation of the Gaussian filter. Default: 4.0

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

  • skip_targets (list) –

set_params()[source]

The rescale augmentation is only applied to the xy-plane. The required sample size before transformation need to be larger as decided by the maximum pixel-moving distance (self.alpha).

class connectomics.data.augmentation.Flip(do_ztrans=0, p=0.5, additional_targets=None, skip_targets=[])[source]

Randomly flip along z-, y- and x-axes as well as swap y- and x-axes for anisotropic image volumes. For learning on isotropic image volumes set do_ztrans to 1 to swap z- and x-axes (the inputs need to be cubic). This augmentation is applied to both images and masks.

Parameters
  • do_ztrans (int) – set to 1 to swap z- and x-axes for isotropic data. Default: 0

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

  • skip_targets (list) –

set_params()[source]

There is no change in sample size.

class connectomics.data.augmentation.Grayscale(contrast_factor=0.3, brightness_factor=0.3, mode='mix', invert=False, invert_p=0.0, p=0.5, additional_targets=None, skip_targets=[])[source]

Grayscale intensity augmentation, adapted from ELEKTRONN (http://elektronn.org/).

Randomly adjust contrast/brightness, randomly invert the color space and apply gamma correction. This augmentation is only applied to images.

Parameters
  • contrast_factor (float) – intensity of contrast change. Default: 0.3

  • brightness_factor (float) – intensity of brightness change. Default: 0.3

  • mode (string) – one of '2D', '3D' or 'mix'. Default: 'mix'

  • invert (bool) – whether to invert the images. Default: False

  • invert_p (float) – probability of inverting the images. Default: 0.0

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

  • skip_targets (list) –

set_params()[source]

There is no change in sample size.

class connectomics.data.augmentation.MisAlignment(displacement=16, rotate_ratio=0.0, p=0.5, additional_targets=None, skip_targets=[])[source]

Mis-alignment data augmentation of image stacks. This augmentation is applied to both images and masks.

Parameters
  • displacement (int) – maximum pixel displacement in xy-plane. Default: 16

  • rotate_ratio (float) – ratio of rotation-based mis-alignment. Default: 0.0

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

  • skip_targets (list) –

set_params()[source]

The mis-alignment augmentation is only applied to the xy-plane. The required sample size before transformation need to be larger as decided by self.displacement.

class connectomics.data.augmentation.MissingParts(iterations=64, p=0.5, additional_targets=None, skip_targets=[])[source]

Missing-parts augmentation of image stacks. This augmentation is only applied to images.

Parameters
  • iterations (int) – number of iterations in binary dilation. Default: 64

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

  • skip_targets (list) –

set_params()[source]

There is no change in sample size.

class connectomics.data.augmentation.MissingSection(num_sections=2, p=0.5, additional_targets=None, skip_targets=[])[source]

Missing-section augmentation of image stacks. This augmentation is applied to both images and masks.

Parameters
  • num_sections (int) – number of missing sections. Default: 2

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

  • skip_targets (list) –

set_params()[source]

The missing-section augmentation is only applied to the z-axis. The required sample size before transformation need to be larger as decided by self.num_sections.

class connectomics.data.augmentation.MixupAugmentor(min_ratio=0.7, max_ratio=0.9, num_aug=2)[source]

Mixup augmentor (experimental). Conduct linear interpolation between two image samples. The segmentation mask of the sample with higher weight should be used with the augmented output.

The input can be a numpy.ndarray or torch.Tensor of shape \((B, C, Z, Y, X)\).

Parameters
  • min_ratio (float) – minimal interpolation ratio of the target volume. Default: 0.7

  • max_ratio (float) – maximal interpolation ratio of the target volume. Default: 0.9

  • num_aug (int) – number of volumes to be augmented in a batch. Default: 2

Examples::
>>> from connectomics.data.augmentation import MixupAugmentor
>>> mixup_augmentor = MixupAugmentor(num_aug=2)
>>> volume = mixup_augmentor(volume)
>>> pred = model(volume)
class connectomics.data.augmentation.MotionBlur(sections=2, kernel_size=11, p=0.5, additional_targets=None, skip_targets=[])[source]

Motion blur data augmentation of image stacks. This augmentation is only applied to images.

Parameters
  • sections (int) – number of sections along z dimension to apply motion blur. Default: 2

  • kernel_size (int) – kernel size for motion blur. Default: 11

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

  • skip_targets (list) –

set_params()[source]

There is no change in sample size.

class connectomics.data.augmentation.Rescale(low=0.8, high=1.25, fix_aspect=False, p=0.5, additional_targets=None, skip_targets=[])[source]

Rescale augmentation. This augmentation is applied to both images and masks.

Parameters
  • low (float) – lower bound of the random scale factor. Default: 0.8

  • high (float) – higher bound of the random scale factor. Default: 1.2

  • fix_aspect (bool) – fix aspect ratio or not. Default: False

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

  • skip_targets (list) –

set_params()[source]

The rescale augmentation is only applied to the xy-plane. The required sample size before transformation need to be larger as decided by the lowest scaling factor (self.low).

class connectomics.data.augmentation.Rotate(rot90=True, p=0.5, additional_targets=None, skip_targets=[])[source]

Continuous rotatation of the xy-plane.

If the rotation degree is arbitrary, the sample size for x- and y-axes should be at least \(\sqrt{2}\) times larger than the input size to ensure there is no non-valid region after center-crop. This augmentation is applied to both images and masks.

Parameters
  • rot90 (bool) – rotate the sample by only 90 degrees. Default: True

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

  • skip_targets (list) –

set_params()[source]

The rescale augmentation is only applied to the xy-plane. If self.rot90=True, then there is no change in sample size. For arbitrary rotation degree, the required sample size before transformation need to be \(\sqrt{2}\) times larger.

class connectomics.data.augmentation.TestAugmentor(mode='mean', do_2d=False, num_aug=None, scale_factors=[1.0, 1.0, 1.0], inference_act=None)[source]

Test-time spatial augmentor.

Our test-time augmentation includes horizontal/vertical flips over the xy-plane, swap of x and y axes, and flip in z-dimension, resulting in 16 variants. Considering inference efficiency, we also provide the option to apply only horizontal/vertical flips over the xy-plane, resulting in 4 variants. The augmentation can also be applied to 2D outputs without the z-flip. By default the test-time augmentor returns the pixel-wise mean value of the predictions.

Parameters
  • mode (str) – one of 'min', 'max' or 'mean'. Default: 'mean'

  • do_2d (bool) – the test-time augmentation is applied to 2d images. Default: False

  • num_aug (int, optional) – number of data augmentation variants: 4, 8 or 16 (3D only). Default: None

  • scale_factors (List[float]) – scale factors for resizing the model output. Default: [1.0, 1.0, 1.0]

Examples::
>>> from connectomics.data.augmentation import TestAugmentor
>>> test_augmentor = TestAugmentor(mode='mean', num_aug=16)
>>> output = test_augmentor(model, inputs) # output is a numpy.ndarray on CPU
classmethod build_from_cfg(cfg, activation=True)[source]

Build a TestAugmentor from configs.

update_name(name)[source]

Update the name of the output file to indicate applied test-time augmentations.

connectomics.data.augmentation.build_train_augmentor(cfg, keep_uncropped=False, keep_non_smoothed=False)[source]

Build the training augmentor based on the options specified in the configuration file.

Parameters
  • cfg (yacs.config.CfgNode) – YACS configuration options.

  • keep_uncropped (bool) – keep uncropped data in the output. Default: False

  • keep_non_smoothed (bool) – keep the masks before smoothing in the output. Default: False

Note

The two arguments, keep_uncropped and keep_non_smoothed, are used only for debugging, which are False by defaults and can not be adjusted in the config file.

Utility Functions