connectomics.data¶
Datasets¶
-
class
connectomics.data.dataset.
TileDataset
(chunk_num=[2, 2, 2], chunk_ind=None, chunk_ind_split=None, chunk_iter=- 1, chunk_stride=True, volume_json=['path/to/image.json'], label_json=None, valid_mask_json=None, mode='train', pad_size=[0, 0, 0], data_scale=[1.0, 1.0, 1.0], coord_range=None, do_relabel=True, **kwargs)[source]¶ Dataset class for large-scale tile-based datasets. Large-scale volumetric datasets are usually stored as individual tiles. Directly loading them as a single array for training and inference is infeasible. This class reads the paths of the tiles and construct smaller chunks for processing.
- Parameters
chunk_num (list) – volume spliting parameters in \((z, y, x)\) order. Default: \([2, 2, 2]\)
chunk_ind (list) – predefined list of chunks. Default: None
chunk_ind_split (list) – rank and world_size for spliting chunk_ind in multi-processing. Default: None
chunk_iter (int) – number of iterations on each chunk. Default: -1
chunk_stride (bool) – allow overlap between chunks. Default: True
volume_json (str) – json file for input image. Default:
'path/to/image'
label_json (str, optional) – json file for label. Default: None
valid_mask_json (str, optional) – json file for valid mask. Default: None
mode (str) –
'train'
,'val'
or'test'
. Default:'train'
pad_size (list) – padding parameters in \((z, y, x)\) order. Default: \([0, 0, 0]\)
data_scale (list) – volume scaling factors in \((z, y, x)\) order. Default: \([1.0, 1.0, 1.0]\)
coord_range (list) – the valid coordinate range of volumes. Default: None
do_relabel (bool) – reduce the the mask indicies in a sampled label volume. This option be set to False for semantic segmentation, otherwise the classes can shift. Default: True
Note
To run inference using multiple nodes in an asynchronous manner,
chunk_ind_split
specifies the number of parts to split the total number of chunks in inference, and which part should the current node/process see. For example,chunk_ind_split = "0-5"
means the chunks are split into 5 parts (thus can be processed asynchronously using 5 nodes), and the current node/process is handling the first (0-base) part of the chunks.Note
The
coord_range
option specify the region of a volume to use. Suppose the fisrt input volume has a voxel size of (1000, 10000, 10000), and only the center subvolume of size (400, 2000, 2000) needs to be used for training or inference, then setcoord_range=[[300, 700, 4000, 6000, 4000, 6000]]
.
-
class
connectomics.data.dataset.
VolumeDataset
(volume, label=None, valid_mask=None, valid_ratio=0.5, sample_volume_size=(8, 64, 64), sample_label_size=(8, 64, 64), sample_stride=(1, 1, 1), augmentor=None, target_opt=['1'], weight_opt=[['1']], erosion_rates=None, dilation_rates=None, mode='train', do_2d=False, iter_num=- 1, do_relabel=True, reject_size_thres=0, reject_diversity=0, reject_p=0.95, data_mean=0.5, data_std=0.5, data_match_act='none')[source]¶ Dataset class for volumetric image datasets. At training time, subvolumes are randomly sampled from all the large input volumes with (optional) rejection sampling to increase the frequency of foreground regions in a batch. At inference time, subvolumes are yielded in a sliding-window manner with overlap to counter border artifacts.
- Parameters
volume (list) – list of image volumes.
label (list, optional) – list of label volumes. Default: None
valid_mask (list, optional) – list of valid masks. Default: None
valid_ratio (float) – volume ratio threshold for valid samples. Default: 0.5
augmentor (connectomics.data.augmentation.composition.Compose, optional) – data augmentor for training. Default: None
target_opt (list) – list the model targets generated from segmentation labels.
weight_opt (list) – list of options for generating pixel-wise weight masks.
mode (str) –
'train'
,'val'
or'test'
. Default:'train'
do_2d (bool) – load 2d samples from 3d volumes. Default: False
iter_num (int) – total number of training iterations (-1 for inference). Default: -1
do_relabel (bool) – reduce the the mask indicies in a sampled label volume. This option be set to False for semantic segmentation, otherwise the classes can shift. Default: True
reject_size_thres (int, optional) – threshold to decide if a sampled volumes contains foreground objects. Default: 0
reject_diversity (int, optional) – threshold to decide if a sampled volumes contains multiple objects. Default: 0
reject_p (float, optional) – probability of rejecting non-foreground volumes. Default: 0.95
data_mean (float) – mean of pixels for images normalized to (0,1). Default: 0.5
data_std (float) – standard deviation of pixels for images normalized to (0,1). Default: 0.5
data_match_act (str) – the data is normalized to match the range of an activation. Default:
'none'
Note
For relatively small volumes, the total number of possible subvolumes can be smaller than the total number of samples required in training (the product of total iterations and mini-natch size), which raises StopIteration. Therefore the dataset length is also decided by the training settings.
-
class
connectomics.data.dataset.
VolumeDatasetCond
(volume, label, label_type='syn', augmentor=None, sample_size=(9, 65, 65), weight_opt=[['1']], mode='train', iter_num=- 1, data_mean=0.5, data_std=0.5)[source]¶ Dataset class for volumetric images in conditional segmentation. The label volumes are always required for this class.
- Parameters
label (list) – list of label volumes.
volume (list) – list of image volumes.
label_type (str) – type of the annotation. Default:
'syn'
augmentor (connectomics.data.augmentation.composition.Compose, optional) – data augmentor for training. Default: None
sample_size (tuple) – model input size. Default: (9, 65, 65)
weight_opt (list) – list of options for generating pixel-wise weight masks.
mode (str) –
'train'
,'val'
or'test'
. Default:'train'
iter_num (int) – total number of training iterations (-1 for inference). Default: -1
data_mean (float) – mean of pixels for images normalized to (0,1). Default: 0.5
data_std (float) – standard deviation of pixels for images normalized to (0,1). Default: 0.5
-
class
connectomics.data.dataset.
VolumeDatasetRecon
(volume, label=None, valid_mask=None, valid_ratio=0.5, sample_volume_size=(8, 64, 64), sample_label_size=(8, 64, 64), sample_stride=(1, 1, 1), augmentor=None, target_opt=['1'], weight_opt=[['1']], erosion_rates=None, dilation_rates=None, mode='train', do_2d=False, iter_num=- 1, do_relabel=True, reject_size_thres=0, reject_diversity=0, reject_p=0.95, data_mean=0.5, data_std=0.5, data_match_act='none')[source]¶
-
connectomics.data.dataset.
build_dataloader
(cfg, augmentor=None, mode='train', dataset=None, rank=None, dataset_class=<class 'connectomics.data.dataset.dataset_volume.VolumeDataset'>, dataset_options={}, cf=<function collate_fn_train>)[source]¶ Prepare dataloader for training and inference.
Augmentations¶
-
class
connectomics.data.augmentation.
Compose
(transforms=[], input_size=(8, 256, 256), smooth=True, keep_uncropped=False, keep_non_smoothed=False, additional_targets=None)[source]¶ Composing a list of data transforms.
The sample size of the composed augmentor can be larger than the specified input size of the model to ensure that all pixels are valid after center-crop.
- Parameters
transforms (list) – list of transformations to compose.
input_size (tuple) – input size of model in \((z, y, x)\) order. Default: \((8, 256, 256)\)
smooth (bool) – smoothing the object mask with Gaussian filtering. Default: True
keep_uncropped (bool) – keep uncropped image and label. Default: False
keep_non_smooth (bool) – keep the non-smoothed object mask. Default: False
additional_targets (dict, optional) – additional targets to augment. Default: None
keep_non_smoothed (bool) –
- Examples::
>>> # specify addtional targets besides 'image' >>> kwargs = {'additional_targets': {'label': 'mask'}} >>> augmentor = Compose([Rotate(p=1.0, **kwargs), >>> Flip(p=1.0, **kwargs), >>> Elastic(alpha=12.0, p=0.75, **kwargs), >>> Grayscale(p=0.75, **kwargs), >>> MissingParts(p=0.9, **kwargs)], >>> input_size = (8, 256, 256), **kwargs) >>> sample = {'image':input, 'label':label} >>> augmented = augmentor(data) >>> out_input, out_label = augmented['image'], augmented['label']
-
class
connectomics.data.augmentation.
CopyPasteAugmentor
(aug_thres=0.7, p=0.8, additional_targets={'label': 'mask'}, skip_targets=[])[source]¶ Copy-paste augmentor (experimental).
The input can be a numpy.ndarray or torch.Tensor of shape \((C, Z, Y, X)\) or \((Z, Y, X)\).
- Parameters
-
class
connectomics.data.augmentation.
CutBlur
(length_ratio=0.25, down_ratio_min=2.0, down_ratio_max=8.0, downsample_z=False, p=0.5, additional_targets=None, skip_targets=[])[source]¶ 3D CutBlur data augmentation, adapted from https://arxiv.org/abs/2004.00448.
Randomly downsample a cuboid region in the volume to force the model to learn super-resolution when making predictions. This augmentation is only applied to images.
- Parameters
length_ratio (float) – the ratio of the cuboid length compared with volume length.
down_ratio_min (float) – minimal downsample ratio to generate low-res region.
down_ratio_max (float) – maximal downsample ratio to generate low-res region.
downsample_z (bool) – downsample along the z axis (default: False).
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –
-
class
connectomics.data.augmentation.
CutNoise
(length_ratio=0.25, mode='uniform', scale=0.2, p=0.5, additional_targets=None, skip_targets=[])[source]¶ 3D CutNoise data augmentation.
Randomly add noise to a cuboid region in the volume to force the model to learn denoising when making predictions. This augmentation is only applied to images.
- Parameters
length_ratio (float) – the ratio of the cuboid length compared with volume length.
mode (string) – the distribution of the noise pattern. Default:
'uniform'
.scale (float) – scale of the random noise. Default: 0.2.
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –
-
class
connectomics.data.augmentation.
DataAugment
(p=0.5, additional_targets=None, skip_targets=[])[source]¶ DataAugment interface. A data augmentor needs to conduct the following steps:
Set
sample_params
at initialization to compute required sample size.Randomly generate augmentation parameters for the current transform.
Apply the transform to a pair of images and corresponding labels.
All the real data augmentations (except mix-up augmentor and test-time augmentor) should be a subclass of this class.
- Parameters
-
abstract
set_params
()[source]¶ Calculate the appropriate sample size with data augmentation.
Some data augmentations (wrap, misalignment, etc.) require a larger sample size than the original, depending on the augmentation parameters that are randomly chosen. This function takes the data augmentation parameters and returns an updated data sampling size accordingly.
-
class
connectomics.data.augmentation.
Elastic
(alpha=16.0, sigma=4.0, p=0.5, additional_targets=None, skip_targets=[])[source]¶ Elastic deformation of images as described in [Simard2003] (with modifications). The implementation is based on https://gist.github.com/erniejunior/601cdf56d2b424757de5. This augmentation is applied to both images and masks.
- Simard2003
Simard, Steinkraus and Platt, “Best Practices for Convolutional Neural Networks applied to Visual Document Analysis”, in Proc. of the International Conference on Document Analysis and Recognition, 2003.
- Parameters
alpha (float) – maximum pixel-moving distance of elastic deformation. Default: 10.0
sigma (float) – standard deviation of the Gaussian filter. Default: 4.0
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –
-
class
connectomics.data.augmentation.
Flip
(do_ztrans=0, p=0.5, additional_targets=None, skip_targets=[])[source]¶ Randomly flip along z-, y- and x-axes as well as swap y- and x-axes for anisotropic image volumes. For learning on isotropic image volumes set
do_ztrans
to 1 to swap z- and x-axes (the inputs need to be cubic). This augmentation is applied to both images and masks.- Parameters
-
class
connectomics.data.augmentation.
Grayscale
(contrast_factor=0.3, brightness_factor=0.3, mode='mix', invert=False, invert_p=0.0, p=0.5, additional_targets=None, skip_targets=[])[source]¶ Grayscale intensity augmentation, adapted from ELEKTRONN (http://elektronn.org/).
Randomly adjust contrast/brightness, randomly invert the color space and apply gamma correction. This augmentation is only applied to images.
- Parameters
contrast_factor (float) – intensity of contrast change. Default: 0.3
brightness_factor (float) – intensity of brightness change. Default: 0.3
mode (string) – one of
'2D'
,'3D'
or'mix'
. Default:'mix'
invert (bool) – whether to invert the images. Default: False
invert_p (float) – probability of inverting the images. Default: 0.0
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –
-
class
connectomics.data.augmentation.
MisAlignment
(displacement=16, rotate_ratio=0.0, p=0.5, additional_targets=None, skip_targets=[])[source]¶ Mis-alignment data augmentation of image stacks. This augmentation is applied to both images and masks.
- Parameters
displacement (int) – maximum pixel displacement in xy-plane. Default: 16
rotate_ratio (float) – ratio of rotation-based mis-alignment. Default: 0.0
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –
-
class
connectomics.data.augmentation.
MissingParts
(iterations=64, p=0.5, additional_targets=None, skip_targets=[])[source]¶ Missing-parts augmentation of image stacks. This augmentation is only applied to images.
- Parameters
-
class
connectomics.data.augmentation.
MissingSection
(num_sections=2, p=0.5, additional_targets=None, skip_targets=[])[source]¶ Missing-section augmentation of image stacks. This augmentation is applied to both images and masks.
- Parameters
-
class
connectomics.data.augmentation.
MixupAugmentor
(min_ratio=0.7, max_ratio=0.9, num_aug=2)[source]¶ Mixup augmentor (experimental). Conduct linear interpolation between two image samples. The segmentation mask of the sample with higher weight should be used with the augmented output.
The input can be a numpy.ndarray or torch.Tensor of shape \((B, C, Z, Y, X)\).
- Parameters
- Examples::
>>> from connectomics.data.augmentation import MixupAugmentor >>> mixup_augmentor = MixupAugmentor(num_aug=2) >>> volume = mixup_augmentor(volume) >>> pred = model(volume)
-
class
connectomics.data.augmentation.
MotionBlur
(sections=2, kernel_size=11, p=0.5, additional_targets=None, skip_targets=[])[source]¶ Motion blur data augmentation of image stacks. This augmentation is only applied to images.
- Parameters
sections (int) – number of sections along z dimension to apply motion blur. Default: 2
kernel_size (int) – kernel size for motion blur. Default: 11
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –
-
class
connectomics.data.augmentation.
Rescale
(low=0.8, high=1.25, fix_aspect=False, p=0.5, additional_targets=None, skip_targets=[])[source]¶ Rescale augmentation. This augmentation is applied to both images and masks.
- Parameters
low (float) – lower bound of the random scale factor. Default: 0.8
high (float) – higher bound of the random scale factor. Default: 1.2
fix_aspect (bool) – fix aspect ratio or not. Default: False
p (float) – probability of applying the augmentation. Default: 0.5
additional_targets (dict, optional) – additional targets to augment. Default: None
skip_targets (list) –
-
class
connectomics.data.augmentation.
Rotate
(rot90=True, p=0.5, additional_targets=None, skip_targets=[])[source]¶ Continuous rotatation of the xy-plane.
If the rotation degree is arbitrary, the sample size for x- and y-axes should be at least \(\sqrt{2}\) times larger than the input size to ensure there is no non-valid region after center-crop. This augmentation is applied to both images and masks.
- Parameters
-
class
connectomics.data.augmentation.
TestAugmentor
(mode='mean', do_2d=False, num_aug=None, scale_factors=[1.0, 1.0, 1.0], inference_act=None)[source]¶ Test-time spatial augmentor.
Our test-time augmentation includes horizontal/vertical flips over the xy-plane, swap of x and y axes, and flip in z-dimension, resulting in 16 variants. Considering inference efficiency, we also provide the option to apply only horizontal/vertical flips over the xy-plane, resulting in 4 variants. The augmentation can also be applied to 2D outputs without the z-flip. By default the test-time augmentor returns the pixel-wise mean value of the predictions.
- Parameters
mode (str) – one of
'min'
,'max'
or'mean'
. Default:'mean'
do_2d (bool) – the test-time augmentation is applied to 2d images. Default: False
num_aug (int, optional) – number of data augmentation variants: 4, 8 or 16 (3D only). Default: None
scale_factors (List[float]) – scale factors for resizing the model output. Default: [1.0, 1.0, 1.0]
- Examples::
>>> from connectomics.data.augmentation import TestAugmentor >>> test_augmentor = TestAugmentor(mode='mean', num_aug=16) >>> output = test_augmentor(model, inputs) # output is a numpy.ndarray on CPU
-
connectomics.data.augmentation.
build_train_augmentor
(cfg, keep_uncropped=False, keep_non_smoothed=False)[source]¶ Build the training augmentor based on the options specified in the configuration file.
- Parameters
Note
The two arguments, keep_uncropped and keep_non_smoothed, are used only for debugging, which are False by defaults and can not be adjusted in the config file.