scvi.dataloaders.BatchDistributedSampler

scvi.dataloaders.BatchDistributedSampler#

class scvi.dataloaders.BatchDistributedSampler(dataset, batch_size=128, drop_last=False, drop_dataset_tail=False, **kwargs)[source]#

EXPERIMENTAL Sampler that restricts to loading from a subset of the dataset.

In contrast to DistributedSampler, retrieves a minibatch of data with one call to the dataset’s __getitem__ for efficient access to sparse data.

Parameters:
  • dataset (Dataset) – Dataset instance to sample from.

  • batch_size (int (default: 128)) – Minibatch size to load each iteration for each replica. Thus, the effective minibatch size is batch_size * num_replicas.

  • drop_last (bool (default: False)) – If True and the dataset is not evenly divisible by batch_size, the last incomplete batch is dropped. If False and the dataset is not evenly divisible by batch_size, then the last batch will be smaller than batch_size.

  • drop_dataset_tail (bool (default: False)) – If True the sampler will drop the tail of the dataset to make it evenly divisible by the number of replicas. If False, then the sampler will add extra indices to make the dataset evenly divisible by the number of replicas.

  • **kwargs – Additional keyword arguments passed into DistributedSampler.

Methods table#

set_epoch(epoch)

Set the epoch for this sampler.

Methods#

BatchDistributedSampler.set_epoch(epoch)[source]#

Set the epoch for this sampler.

When shuffle=True, this ensures all replicas use a different random ordering for each epoch. Otherwise, the next iteration of this sampler will yield the same ordering.

Parameters:

epoch (int) – Epoch number.

Return type:

None