scvi.dataloaders.AnnDataLoader#

class scvi.dataloaders.AnnDataLoader(adata_manager, indices=None, batch_size=128, shuffle=False, sampler=None, drop_last=False, drop_dataset_tail=False, data_and_attributes=None, iter_ndarray=False, distributed_sampler=False, load_sparse_tensor=False, **kwargs)[source]#

DataLoader for loading tensors from AnnData objects.

Parameters:
  • adata_manager (AnnDataManager) – AnnDataManager object with a registered AnnData object.

  • indices (Union[list[int], list[bool], None] (default: None)) – The indices of the observations in adata_manager.adata to load.

  • batch_size (int (default: 128)) – Minibatch size to load each iteration. If distributed_sampler is True, refers to the minibatch size per replica. Thus, the effective minibatch size is batch_size * num_replicas.

  • shuffle (bool (default: False)) – Whether the dataset should be shuffled prior to sampling.

  • sampler (Optional[Sampler] (default: None)) – Defines the strategy to draw samples from the dataset. Can be any Iterable with __len__ implemented. If specified, shuffle must not be specified. By default, we use a custom sampler that is designed to get a minibatch of data with one call to __getitem__.

  • drop_last (bool (default: False)) – If True and the dataset is not evenly divisible by batch_size, the last incomplete batch is dropped. If False and the dataset is not evenly divisible by batch_size, then the last batch will be smaller than batch_size.

  • drop_dataset_tail (bool (default: False)) – Only used if distributed_sampler is True. If True the sampler will drop the tail of the dataset to make it evenly divisible by the number of replicas. If False, then the sampler will add extra indices to make the dataset evenly divisible by the number of replicas.

  • data_and_attributes (Union[list[str], dict[str, dtype], None] (default: None)) – Dictionary with keys representing keys in data registry (adata_manager.data_registry) and value equal to desired numpy loading type (later made into torch tensor) or list of such keys. A list can be used to subset to certain keys in the event that more tensors than needed have been registered. If None, defaults to all registered data.

  • iter_ndarray (bool (default: False)) – Whether to iterate over numpy arrays instead of torch tensors

  • distributed_sampler (bool (default: False)) – EXPERIMENTAL Whether to use BatchDistributedSampler as the sampler. If True, sampler must be None.

  • load_sparse_tensor (bool (default: False)) – EXPERIMENTAL If True, loads data with sparse CSR or CSC layout as a Tensor with the same layout. Can lead to speedups in data transfers to GPUs, depending on the sparsity of the data.

  • **kwargs – Additional keyword arguments passed into DataLoader.

Notes

If sampler is not specified, a BatchSampler instance is passed in as the sampler, which retrieves a minibatch of data with one call to __getitem__(). This is useful for fast access to sparse matrices as retrieving single observations and then collating is inefficient.

Attributes table#

multiprocessing_context

dataset

batch_size

num_workers

pin_memory

drop_last

timeout

sampler

pin_memory_device

prefetch_factor

Methods table#

check_worker_number_rationality()

Attributes#

AnnDataLoader.multiprocessing_context[source]#
AnnDataLoader.dataset: Dataset[T_co]#
AnnDataLoader.batch_size: Optional[int]#
AnnDataLoader.num_workers: int#
AnnDataLoader.pin_memory: bool#
AnnDataLoader.drop_last: bool#
AnnDataLoader.timeout: float#
AnnDataLoader.sampler: Union[Sampler, Iterable]#
AnnDataLoader.pin_memory_device: str#
AnnDataLoader.prefetch_factor: Optional[int]#

Methods#

AnnDataLoader.check_worker_number_rationality()[source]#