torchuq.dataset Subpackage

This section contains the Python API reference for the torchuq.dataset subpackage, containing code for common evaluation datasets for UQ.

torchuq.dataset.classification Module

torchuq.dataset.classification.get_classification_datasets(name, val_fraction=0.2, test_fraction=0.2, split_seed=0, normalize=True, verbose=True)

Returns a UCI regression dataset in the form of a torch Dataset.

Parameters
  • name (str) – name of the dataset

  • val_fraction (float) – fraction of dataset to use for validation, if 0 then val dataset will return None

  • test_fraction (float) – fraction of the dataset used for the test set, if 0 then test dataset will return None

  • split_seed (int) – seed used to generate train/test split, if split_seed=-1 then the dataset is not shuffled

  • normalize (bool) – normalize the dataset to have zero mean and unit variance

Returns

training dataset val_dataset (torch.utils.data.Dataset): validation dataset, None if val_fraction=0.0 test_dataset (torch.utils.data.Dataset): test dataset, None if test_fraction=0.0

Return type

train_dataset (torch.utils.data.Dataset)

torchuq.dataset.ham Module

class torchuq.dataset.ham.HAM10000(df, transform=None)

The class for the HAM10000 dataset that inherits the torch Dataset interface.

__init__(df, transform=None)
torchuq.dataset.ham.get_ham10000(data_dir='.', test_fraction=0.2, val_fraction=0.2, split_seed=0, balance_train=True, verbose=True, input_size=224)

Retrieve the HAM10000 dataset.

To use this function, download the dataset folders HAM10000_images_part_1 and HAM10000_images_part_2 and the meta-data file https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000 from https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000. Put them into the same folder, and point data_dir to this folder.

Parameters
  • data_dir (str) – the data folder.

  • val_fraction (float) – fraction of dataset to use for validation, if 0 then val dataset will return None.

  • test_fraction (float) – fraction of the dataset used for the test set, if 0 then test dataset will return None.

  • split_seed (int) – seed used to generate train/test split.

  • balance_train (bool) – if True then over-sample under-represented classes in the training set, so that all classes have approximately the same number of samples.

  • verbose (bool) – if True then print additional messages.

  • input_size (int) – the size of the image.

Returns

training dataset val_dataset (torch.utils.data.Dataset): validation dataset, None if val_fraction=0.0 test_dataset (torch.utils.data.Dataset): test dataset, None if test_fraction=0.0

Return type

train_dataset (torch.utils.data.Dataset)

torchuq.dataset.regression Module

IO module for UCI datasets for regression.

Much of the code is adapted from: https://github.com/aamini/evidential-regression/blob/c0823f18ff015f5eb46a23f0039f4d62b76bc8d1/data_loader.py

torchuq.dataset.regression.get_regression_datasets(name, val_fraction=0.2, test_fraction=0.2, split_seed=0, normalize=True, verbose=True)

Returns a UCI regression dataset in the form of a torch Dataset.

Parameters
  • name (str) – name of the dataset.

  • val_fraction (float) – fraction of dataset to use for validation, if 0 then val dataset will return None.

  • test_fraction (float) – fraction of the dataset used for the test set, if 0 then test dataset will return None.

  • split_seed (int) – seed used to generate train/test split, if split_seed=-1 then the dataset is not shuffled.

  • normalize (bool) – if True then normalize the dataset to have zero mean and unit variance.

  • verbose (bool) – if True then print additional messages.

Returns

training dataset. val_dataset (torch.utils.data.Dataset): validation dataset, None if val_fraction=0.0. test_dataset (torch.utils.data.Dataset): test dataset, None if test_fraction=0.0.

Return type

train_dataset (torch.utils.data.Dataset)