torchuq.dataset Subpackage¶

This section contains the Python API reference for the torchuq.dataset subpackage, containing code for common evaluation datasets for UQ.

torchuq.dataset.classification Module¶

torchuq.dataset.classification.get_classification_datasets(name, val_fraction=0.2, test_fraction=0.2, split_seed=0, normalize=True, verbose=True)¶

Returns a UCI regression dataset in the form of a torch Dataset.

Parameters

name (str) – name of the dataset
val_fraction (float) – fraction of dataset to use for validation, if 0 then val dataset will return None
test_fraction (float) – fraction of the dataset used for the test set, if 0 then test dataset will return None
split_seed (int) – seed used to generate train/test split, if split_seed=-1 then the dataset is not shuffled
normalize (bool) – normalize the dataset to have zero mean and unit variance

Returns

training dataset val_dataset (torch.utils.data.Dataset): validation dataset, None if val_fraction=0.0 test_dataset (torch.utils.data.Dataset): test dataset, None if test_fraction=0.0

Return type

train_dataset (torch.utils.data.Dataset)

torchuq.dataset.ham Module¶

class torchuq.dataset.ham.HAM10000(df, transform=None)¶

The class for the HAM10000 dataset that inherits the torch Dataset interface.

__init__(df, transform=None)¶

torchuq.dataset.ham.get_ham10000(data_dir='.', test_fraction=0.2, val_fraction=0.2, split_seed=0, balance_train=True, verbose=True, input_size=224)¶

Retrieve the HAM10000 dataset.

To use this function, download the dataset folders HAM10000_images_part_1 and HAM10000_images_part_2 and the meta-data file https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000 from https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000. Put them into the same folder, and point data_dir to this folder.

Parameters

data_dir (str) – the data folder.
val_fraction (float) – fraction of dataset to use for validation, if 0 then val dataset will return None.
test_fraction (float) – fraction of the dataset used for the test set, if 0 then test dataset will return None.
split_seed (int) – seed used to generate train/test split.
balance_train (bool) – if True then over-sample under-represented classes in the training set, so that all classes have approximately the same number of samples.
verbose (bool) – if True then print additional messages.
input_size (int) – the size of the image.

Returns

training dataset val_dataset (torch.utils.data.Dataset): validation dataset, None if val_fraction=0.0 test_dataset (torch.utils.data.Dataset): test dataset, None if test_fraction=0.0

Return type

train_dataset (torch.utils.data.Dataset)

torchuq.dataset.regression Module¶

IO module for UCI datasets for regression.

Much of the code is adapted from: https://github.com/aamini/evidential-regression/blob/c0823f18ff015f5eb46a23f0039f4d62b76bc8d1/data_loader.py

torchuq.dataset.regression.get_regression_datasets(name, val_fraction=0.2, test_fraction=0.2, split_seed=0, normalize=True, verbose=True)¶

Returns a UCI regression dataset in the form of a torch Dataset.

Parameters

name (str) – name of the dataset.
val_fraction (float) – fraction of dataset to use for validation, if 0 then val dataset will return None.
test_fraction (float) – fraction of the dataset used for the test set, if 0 then test dataset will return None.
split_seed (int) – seed used to generate train/test split, if split_seed=-1 then the dataset is not shuffled.
normalize (bool) – if True then normalize the dataset to have zero mean and unit variance.
verbose (bool) – if True then print additional messages.

Returns

training dataset. val_dataset (torch.utils.data.Dataset): validation dataset, None if val_fraction=0.0. test_dataset (torch.utils.data.Dataset): test dataset, None if test_fraction=0.0.

Return type

train_dataset (torch.utils.data.Dataset)