connectomics.data¶

Datasets¶

class connectomics.data.dataset.TileDataset(chunk_num=[2, 2, 2], chunk_ind=None, chunk_ind_split=None, chunk_iter=- 1, chunk_stride=True, volume_json=['path/to/image.json'], label_json=None, valid_mask_json=None, mode='train', pad_size=[0, 0, 0], data_scale=[1.0, 1.0, 1.0], coord_range=None, do_relabel=True, **kwargs)[source]¶

Dataset class for large-scale tile-based datasets. Large-scale volumetric datasets are usually stored as individual tiles. Directly loading them as a single array for training and inference is infeasible. This class reads the paths of the tiles and construct smaller chunks for processing.

Parameters

chunk_num (list) – volume spliting parameters in \((z, y, x)\) order. Default: \([2, 2, 2]\)
chunk_ind (list) – predefined list of chunks. Default: None
chunk_ind_split (list) – rank and world_size for spliting chunk_ind in multi-processing. Default: None
chunk_iter (int) – number of iterations on each chunk. Default: -1
chunk_stride (bool) – allow overlap between chunks. Default: True
volume_json (str) – json file for input image. Default: 'path/to/image'
label_json (str, optional) – json file for label. Default: None
valid_mask_json (str, optional) – json file for valid mask. Default: None
mode (str) – 'train', 'val' or 'test'. Default: 'train'
pad_size (list) – padding parameters in \((z, y, x)\) order. Default: \([0, 0, 0]\)
data_scale (list) – volume scaling factors in \((z, y, x)\) order. Default: \([1.0, 1.0, 1.0]\)
coord_range (list) – the valid coordinate range of volumes. Default: None
do_relabel (bool) – reduce the the mask indicies in a sampled label volume. This option be set to False for semantic segmentation, otherwise the classes can shift. Default: True

Note

To run inference using multiple nodes in an asynchronous manner, chunk_ind_split specifies the number of parts to split the total number of chunks in inference, and which part should the current node/process see. For example, chunk_ind_split = "0-5" means the chunks are split into 5 parts (thus can be processed asynchronously using 5 nodes), and the current node/process is handling the first (0-base) part of the chunks.

Note

The coord_range option specify the region of a volume to use. Suppose the fisrt input volume has a voxel size of (1000, 10000, 10000), and only the center subvolume of size (400, 2000, 2000) needs to be used for training or inference, then set coord_range=[[300, 700, 4000, 6000, 4000, 6000]].

get_coord_name()[source]¶: Return the filename suffix based on the chunk coordinates.

loadchunk()[source]¶: Load the chunk based on current coordinates and construct a VolumeDataset for processing.

updatechunk(do_load=True)[source]¶: Update the coordinates to a new chunk in the large volume.

class connectomics.data.dataset.VolumeDataset(volume, label=None, valid_mask=None, valid_ratio=0.5, sample_volume_size=(8, 64, 64), sample_label_size=(8, 64, 64), sample_stride=(1, 1, 1), augmentor=None, target_opt=['1'], weight_opt=[['1']], erosion_rates=None, dilation_rates=None, mode='train', do_2d=False, iter_num=- 1, do_relabel=True, reject_size_thres=0, reject_diversity=0, reject_p=0.95, data_mean=0.5, data_std=0.5, data_match_act='none')[source]¶

Dataset class for volumetric image datasets. At training time, subvolumes are randomly sampled from all the large input volumes with (optional) rejection sampling to increase the frequency of foreground regions in a batch. At inference time, subvolumes are yielded in a sliding-window manner with overlap to counter border artifacts.

Parameters

volume (list) – list of image volumes.
label (list, optional) – list of label volumes. Default: None
valid_mask (list, optional) – list of valid masks. Default: None
valid_ratio (float) – volume ratio threshold for valid samples. Default: 0.5
sample_volume_size (tuple, int) – model input size.
sample_label_size (tuple, int) – model output size.
sample_stride (tuple, int) – stride size for sampling.
augmentor (connectomics.data.augmentation.composition.Compose, optional) – data augmentor for training. Default: None
target_opt (list) – list the model targets generated from segmentation labels.
weight_opt (list) – list of options for generating pixel-wise weight masks.
mode (str) – 'train', 'val' or 'test'. Default: 'train'
do_2d (bool) – load 2d samples from 3d volumes. Default: False
iter_num (int) – total number of training iterations (-1 for inference). Default: -1
do_relabel (bool) – reduce the the mask indicies in a sampled label volume. This option be set to False for semantic segmentation, otherwise the classes can shift. Default: True
reject_size_thres (int, optional) – threshold to decide if a sampled volumes contains foreground objects. Default: 0
reject_diversity (int, optional) – threshold to decide if a sampled volumes contains multiple objects. Default: 0
reject_p (float, optional) – probability of rejecting non-foreground volumes. Default: 0.95
data_mean (float) – mean of pixels for images normalized to (0,1). Default: 0.5
data_std (float) – standard deviation of pixels for images normalized to (0,1). Default: 0.5
data_match_act (str) – the data is normalized to match the range of an activation. Default: 'none'

Note

For relatively small volumes, the total number of possible subvolumes can be smaller than the total number of samples required in training (the product of total iterations and mini-natch size), which raises StopIteration. Therefore the dataset length is also decided by the training settings.

class connectomics.data.dataset.VolumeDatasetCond(volume, label, label_type='syn', augmentor=None, sample_size=(9, 65, 65), weight_opt=[['1']], mode='train', iter_num=- 1, data_mean=0.5, data_std=0.5)[source]¶

Dataset class for volumetric images in conditional segmentation. The label volumes are always required for this class.

Parameters

label (list) – list of label volumes.
volume (list) – list of image volumes.
label_type (str) – type of the annotation. Default: 'syn'
augmentor (connectomics.data.augmentation.composition.Compose, optional) – data augmentor for training. Default: None
sample_size (tuple) – model input size. Default: (9, 65, 65)
weight_opt (list) – list of options for generating pixel-wise weight masks.
mode (str) – 'train', 'val' or 'test'. Default: 'train'
iter_num (int) – total number of training iterations (-1 for inference). Default: -1
data_mean (float) – mean of pixels for images normalized to (0,1). Default: 0.5
data_std (float) – standard deviation of pixels for images normalized to (0,1). Default: 0.5

class connectomics.data.dataset.VolumeDatasetRecon(volume, label=None, valid_mask=None, valid_ratio=0.5, sample_volume_size=(8, 64, 64), sample_label_size=(8, 64, 64), sample_stride=(1, 1, 1), augmentor=None, target_opt=['1'], weight_opt=[['1']], erosion_rates=None, dilation_rates=None, mode='train', do_2d=False, iter_num=- 1, do_relabel=True, reject_size_thres=0, reject_diversity=0, reject_p=0.95, data_mean=0.5, data_std=0.5, data_match_act='none')[source]¶

connectomics.data.dataset.build_dataloader(cfg, augmentor=None, mode='train', dataset=None, rank=None, dataset_class=<class 'connectomics.data.dataset.dataset_volume.VolumeDataset'>, dataset_options={}, cf=<function collate_fn_train>)[source]¶: Prepare dataloader for training and inference.

connectomics.data.dataset.get_dataset(cfg, augmentor, mode='train', rank=None, dataset_class=<class 'connectomics.data.dataset.dataset_volume.VolumeDataset'>, dataset_options={}, dir_name_init=None, img_name_init=None)[source]¶

Prepare dataset for training and inference.

Parameters

dir_name_init (Optional[list]) –
img_name_init (Optional[list]) –

Augmentations¶

class connectomics.data.augmentation.Compose(transforms=[], input_size=(8, 256, 256), smooth=True, keep_uncropped=False, keep_non_smoothed=False, additional_targets=None)[source]¶

Composing a list of data transforms.

The sample size of the composed augmentor can be larger than the specified input size of the model to ensure that all pixels are valid after center-crop.

Parameters

transforms (list) – list of transformations to compose.
input_size (tuple) – input size of model in \((z, y, x)\) order. Default: \((8, 256, 256)\)
smooth (bool) – smoothing the object mask with Gaussian filtering. Default: True
keep_uncropped (bool) – keep uncropped image and label. Default: False
keep_non_smooth (bool) – keep the non-smoothed object mask. Default: False
additional_targets (dict, optional) – additional targets to augment. Default: None
keep_non_smoothed (bool) –

Examples::

>>> # specify addtional targets besides 'image'
>>> kwargs = {'additional_targets': {'label': 'mask'}}
>>> augmentor = Compose([Rotate(p=1.0, **kwargs),
>>>                      Flip(p=1.0, **kwargs),
>>>                      Elastic(alpha=12.0, p=0.75, **kwargs),
>>>                      Grayscale(p=0.75, **kwargs),
>>>                      MissingParts(p=0.9, **kwargs)],
>>>                      input_size = (8, 256, 256), **kwargs)
>>> sample = {'image':input, 'label':label}
>>> augmented = augmentor(data)
>>> out_input, out_label = augmented['image'], augmented['label']

class connectomics.data.augmentation.CopyPasteAugmentor(aug_thres=0.7, p=0.8, additional_targets={'label': 'mask'}, skip_targets=[])[source]¶

Copy-paste augmentor (experimental).

The input can be a numpy.ndarray or torch.Tensor of shape \((C, Z, Y, X)\) or \((Z, Y, X)\).

Parameters

aug_thres (float) – Maximum fractional size of the object occupying the volume. If the object is too large it is not augmented. Default: 0.7
p (float) –
additional_targets (Optional[dict]) –
skip_targets (list) –

copy_paste_single(rot_label, neuron_tensor)[source]¶: Find rotation with least overlap with GT and if there are multiple rotations with no overlap, find one with least distance from GT

set_params()[source]¶: Doesn’t change sample size

class connectomics.data.augmentation.CutBlur(length_ratio=0.25, down_ratio_min=2.0, down_ratio_max=8.0, downsample_z=False, p=0.5, additional_targets=None, skip_targets=[])[source]¶

3D CutBlur data augmentation, adapted from https://arxiv.org/abs/2004.00448.

Randomly downsample a cuboid region in the volume to force the model to learn super-resolution when making predictions. This augmentation is only applied to images.

Parameters

length_ratio (float) – the ratio of the cuboid length compared with volume length.
down_ratio_min (float) – minimal downsample ratio to generate low-res region.
down_ratio_max (float) – maximal downsample ratio to generate low-res region.
downsample_z (bool) – downsample along the z axis (default: False).
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –

set_params()[source]¶: There is no change in sample size.

class connectomics.data.augmentation.CutNoise(length_ratio=0.25, mode='uniform', scale=0.2, p=0.5, additional_targets=None, skip_targets=[])[source]¶

3D CutNoise data augmentation.

Randomly add noise to a cuboid region in the volume to force the model to learn denoising when making predictions. This augmentation is only applied to images.

Parameters

length_ratio (float) – the ratio of the cuboid length compared with volume length.
mode (string) – the distribution of the noise pattern. Default: 'uniform'.
scale (float) – scale of the random noise. Default: 0.2.
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –

set_params()[source]¶: There is no change in sample size.

class connectomics.data.augmentation.DataAugment(p=0.5, additional_targets=None, skip_targets=[])[source]¶

DataAugment interface. A data augmentor needs to conduct the following steps:

Set sample_params at initialization to compute required sample size.
Randomly generate augmentation parameters for the current transform.
Apply the transform to a pair of images and corresponding labels.

All the real data augmentations (except mix-up augmentor and test-time augmentor) should be a subclass of this class.

Parameters

p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –

abstract set_params()[source]¶

Calculate the appropriate sample size with data augmentation.

Some data augmentations (wrap, misalignment, etc.) require a larger sample size than the original, depending on the augmentation parameters that are randomly chosen. This function takes the data augmentation parameters and returns an updated data sampling size accordingly.

class connectomics.data.augmentation.Elastic(alpha=16.0, sigma=4.0, p=0.5, additional_targets=None, skip_targets=[])[source]¶

Elastic deformation of images as described in [Simard2003] (with modifications). The implementation is based on https://gist.github.com/erniejunior/601cdf56d2b424757de5. This augmentation is applied to both images and masks.

Simard2003: Simard, Steinkraus and Platt, “Best Practices for Convolutional Neural Networks applied to Visual Document Analysis”, in Proc. of the International Conference on Document Analysis and Recognition, 2003.

Parameters

alpha (float) – maximum pixel-moving distance of elastic deformation. Default: 10.0
sigma (float) – standard deviation of the Gaussian filter. Default: 4.0
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –

set_params()[source]¶: The rescale augmentation is only applied to the xy-plane. The required sample size before transformation need to be larger as decided by the maximum pixel-moving distance (self.alpha).

class connectomics.data.augmentation.Flip(do_ztrans=0, p=0.5, additional_targets=None, skip_targets=[])[source]¶

Randomly flip along z-, y- and x-axes as well as swap y- and x-axes for anisotropic image volumes. For learning on isotropic image volumes set do_ztrans to 1 to swap z- and x-axes (the inputs need to be cubic). This augmentation is applied to both images and masks.

Parameters

do_ztrans (int) – set to 1 to swap z- and x-axes for isotropic data. Default: 0
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –

set_params()[source]¶: There is no change in sample size.

class connectomics.data.augmentation.Grayscale(contrast_factor=0.3, brightness_factor=0.3, mode='mix', invert=False, invert_p=0.0, p=0.5, additional_targets=None, skip_targets=[])[source]¶

Grayscale intensity augmentation, adapted from ELEKTRONN (http://elektronn.org/).

Randomly adjust contrast/brightness, randomly invert the color space and apply gamma correction. This augmentation is only applied to images.

Parameters

contrast_factor (float) – intensity of contrast change. Default: 0.3
brightness_factor (float) – intensity of brightness change. Default: 0.3
mode (string) – one of '2D', '3D' or 'mix'. Default: 'mix'
invert (bool) – whether to invert the images. Default: False
invert_p (float) – probability of inverting the images. Default: 0.0
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –

set_params()[source]¶: There is no change in sample size.

class connectomics.data.augmentation.MisAlignment(displacement=16, rotate_ratio=0.0, p=0.5, additional_targets=None, skip_targets=[])[source]¶

Mis-alignment data augmentation of image stacks. This augmentation is applied to both images and masks.

Parameters

displacement (int) – maximum pixel displacement in xy-plane. Default: 16
rotate_ratio (float) – ratio of rotation-based mis-alignment. Default: 0.0
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –

set_params()[source]¶: The mis-alignment augmentation is only applied to the xy-plane. The required sample size before transformation need to be larger as decided by self.displacement.

class connectomics.data.augmentation.MissingParts(iterations=64, p=0.5, additional_targets=None, skip_targets=[])[source]¶

Missing-parts augmentation of image stacks. This augmentation is only applied to images.

Parameters

iterations (int) – number of iterations in binary dilation. Default: 64
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –

set_params()[source]¶: There is no change in sample size.

class connectomics.data.augmentation.MissingSection(num_sections=2, p=0.5, additional_targets=None, skip_targets=[])[source]¶

Missing-section augmentation of image stacks. This augmentation is applied to both images and masks.

Parameters

num_sections (int) – number of missing sections. Default: 2
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –

set_params()[source]¶: The missing-section augmentation is only applied to the z-axis. The required sample size before transformation need to be larger as decided by self.num_sections.

class connectomics.data.augmentation.MixupAugmentor(min_ratio=0.7, max_ratio=0.9, num_aug=2)[source]¶

Mixup augmentor (experimental). Conduct linear interpolation between two image samples. The segmentation mask of the sample with higher weight should be used with the augmented output.

The input can be a numpy.ndarray or torch.Tensor of shape \((B, C, Z, Y, X)\).

Parameters

min_ratio (float) – minimal interpolation ratio of the target volume. Default: 0.7
max_ratio (float) – maximal interpolation ratio of the target volume. Default: 0.9
num_aug (int) – number of volumes to be augmented in a batch. Default: 2

Examples::

>>> from connectomics.data.augmentation import MixupAugmentor
>>> mixup_augmentor = MixupAugmentor(num_aug=2)
>>> volume = mixup_augmentor(volume)
>>> pred = model(volume)

class connectomics.data.augmentation.MotionBlur(sections=2, kernel_size=11, p=0.5, additional_targets=None, skip_targets=[])[source]¶

Motion blur data augmentation of image stacks. This augmentation is only applied to images.

Parameters

sections (int) – number of sections along z dimension to apply motion blur. Default: 2
kernel_size (int) – kernel size for motion blur. Default: 11
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –

set_params()[source]¶: There is no change in sample size.

class connectomics.data.augmentation.Rescale(low=0.8, high=1.25, fix_aspect=False, p=0.5, additional_targets=None, skip_targets=[])[source]¶

Rescale augmentation. This augmentation is applied to both images and masks.

Parameters

low (float) – lower bound of the random scale factor. Default: 0.8
high (float) – higher bound of the random scale factor. Default: 1.2
fix_aspect (bool) – fix aspect ratio or not. Default: False
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –

set_params()[source]¶: The rescale augmentation is only applied to the xy-plane. The required sample size before transformation need to be larger as decided by the lowest scaling factor (self.low).

class connectomics.data.augmentation.Rotate(rot90=True, p=0.5, additional_targets=None, skip_targets=[])[source]¶

Continuous rotatation of the xy-plane.

If the rotation degree is arbitrary, the sample size for x- and y-axes should be at least \(\sqrt{2}\) times larger than the input size to ensure there is no non-valid region after center-crop. This augmentation is applied to both images and masks.

Parameters

rot90 (bool) – rotate the sample by only 90 degrees. Default: True
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –

set_params()[source]¶: The rescale augmentation is only applied to the xy-plane. If self.rot90=True, then there is no change in sample size. For arbitrary rotation degree, the required sample size before transformation need to be \(\sqrt{2}\) times larger.

class connectomics.data.augmentation.TestAugmentor(mode='mean', do_2d=False, num_aug=None, scale_factors=[1.0, 1.0, 1.0], inference_act=None)[source]¶

Test-time spatial augmentor.

Our test-time augmentation includes horizontal/vertical flips over the xy-plane, swap of x and y axes, and flip in z-dimension, resulting in 16 variants. Considering inference efficiency, we also provide the option to apply only horizontal/vertical flips over the xy-plane, resulting in 4 variants. The augmentation can also be applied to 2D outputs without the z-flip. By default the test-time augmentor returns the pixel-wise mean value of the predictions.

Parameters

mode (str) – one of 'min', 'max' or 'mean'. Default: 'mean'
do_2d (bool) – the test-time augmentation is applied to 2d images. Default: False
num_aug (int, optional) – number of data augmentation variants: 4, 8 or 16 (3D only). Default: None
scale_factors (List[float]) – scale factors for resizing the model output. Default: [1.0, 1.0, 1.0]

Examples::

>>> from connectomics.data.augmentation import TestAugmentor
>>> test_augmentor = TestAugmentor(mode='mean', num_aug=16)
>>> output = test_augmentor(model, inputs) # output is a numpy.ndarray on CPU

classmethod build_from_cfg(cfg, activation=True)[source]¶: Build a TestAugmentor from configs.

update_name(name)[source]¶: Update the name of the output file to indicate applied test-time augmentations.

connectomics.data.augmentation.build_train_augmentor(cfg, keep_uncropped=False, keep_non_smoothed=False)[source]¶

Build the training augmentor based on the options specified in the configuration file.

Parameters

cfg (yacs.config.CfgNode) – YACS configuration options.
keep_uncropped (bool) – keep uncropped data in the output. Default: False
keep_non_smoothed (bool) – keep the masks before smoothing in the output. Default: False

Note

The two arguments, keep_uncropped and keep_non_smoothed, are used only for debugging, which are False by defaults and can not be adjusted in the config file.

connectomics.data¶

Datasets¶

Augmentations¶

Utility Functions¶