torchuq.dataset Subpackage¶
This section contains the Python API reference for the torchuq.dataset
subpackage,
containing code for common evaluation datasets for UQ.
torchuq.dataset.classification Module¶
- torchuq.dataset.classification.get_classification_datasets(name, val_fraction=0.2, test_fraction=0.2, split_seed=0, normalize=True, verbose=True)¶
Returns a UCI regression dataset in the form of a torch Dataset.
- Parameters
name (str) – name of the dataset
val_fraction (float) – fraction of dataset to use for validation, if 0 then val dataset will return None
test_fraction (float) – fraction of the dataset used for the test set, if 0 then test dataset will return None
split_seed (int) – seed used to generate train/test split, if split_seed=-1 then the dataset is not shuffled
normalize (bool) – normalize the dataset to have zero mean and unit variance
- Returns
training dataset val_dataset (torch.utils.data.Dataset): validation dataset, None if val_fraction=0.0 test_dataset (torch.utils.data.Dataset): test dataset, None if test_fraction=0.0
- Return type
train_dataset (torch.utils.data.Dataset)
torchuq.dataset.ham Module¶
- class torchuq.dataset.ham.HAM10000(df, transform=None)¶
The class for the HAM10000 dataset that inherits the torch Dataset interface.
- __init__(df, transform=None)¶
- torchuq.dataset.ham.get_ham10000(data_dir='.', test_fraction=0.2, val_fraction=0.2, split_seed=0, balance_train=True, verbose=True, input_size=224)¶
Retrieve the HAM10000 dataset.
To use this function, download the dataset folders HAM10000_images_part_1 and HAM10000_images_part_2 and the meta-data file https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000 from https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000. Put them into the same folder, and point data_dir to this folder.
- Parameters
data_dir (str) – the data folder.
val_fraction (float) – fraction of dataset to use for validation, if 0 then val dataset will return None.
test_fraction (float) – fraction of the dataset used for the test set, if 0 then test dataset will return None.
split_seed (int) – seed used to generate train/test split.
balance_train (bool) – if True then over-sample under-represented classes in the training set, so that all classes have approximately the same number of samples.
verbose (bool) – if True then print additional messages.
input_size (int) – the size of the image.
- Returns
training dataset val_dataset (torch.utils.data.Dataset): validation dataset, None if val_fraction=0.0 test_dataset (torch.utils.data.Dataset): test dataset, None if test_fraction=0.0
- Return type
train_dataset (torch.utils.data.Dataset)
torchuq.dataset.regression Module¶
IO module for UCI datasets for regression.
Much of the code is adapted from: https://github.com/aamini/evidential-regression/blob/c0823f18ff015f5eb46a23f0039f4d62b76bc8d1/data_loader.py
- torchuq.dataset.regression.get_regression_datasets(name, val_fraction=0.2, test_fraction=0.2, split_seed=0, normalize=True, verbose=True)¶
Returns a UCI regression dataset in the form of a torch Dataset.
- Parameters
name (str) – name of the dataset.
val_fraction (float) – fraction of dataset to use for validation, if 0 then val dataset will return None.
test_fraction (float) – fraction of the dataset used for the test set, if 0 then test dataset will return None.
split_seed (int) – seed used to generate train/test split, if split_seed=-1 then the dataset is not shuffled.
normalize (bool) – if True then normalize the dataset to have zero mean and unit variance.
verbose (bool) – if True then print additional messages.
- Returns
training dataset. val_dataset (torch.utils.data.Dataset): validation dataset, None if val_fraction=0.0. test_dataset (torch.utils.data.Dataset): test dataset, None if test_fraction=0.0.
- Return type
train_dataset (torch.utils.data.Dataset)