torchuq.evaluate Subpackage¶

This section contains the Python API reference for the torchuq.evaluate subpackage, containing code for evaluating and visualizing predictions.

torchuq.evaluate.distribution Module¶

torchuq.evaluate.distribution.compute_crps(predictions, labels, reduction='mean', resolution=500)¶

Compute the CRPS score.

The CRPS score is a proper score that measures the quality of a prediction.

Parameters

predictions (distribution) – a batch of distribution predictions.
labels (tensor) – array [batch_size] of labels.
reduction (str) – the method to aggregate the results across the batch. Can be ‘none’, ‘mean’, ‘sum’, ‘median’, ‘min’, or ‘max’.
resolution (int) – the number of discretization bins, higher resolution increases estimation accuracy but also requires more memory/compute.

Returns

The CRPS score, when reduction is ‘none’ the shape is [batch_size], otherwise the shape is [].

Return type

tensor with shape [batch_size] or []

torchuq.evaluate.distribution.compute_ece(predictions, labels, debiased=False)¶

Compute the (weighted) ECE score as in https://arxiv.org/abs/1807.00263.

Note that this function has biased gradients because of non-differentiable sorting.

Parameters

predictions (distribution) – a batch of distribution predictions.
labels (tensor) – array [batch_size] of labels.
debiased (bool) – if True then the estimation bias is deducted: if the predictions are perfectly calibrated, then this function in expectation returns 0.

Returns

The ECE score.

Return type

tensor with shape []

torchuq.evaluate.distribution.compute_mean(predictions, reduction='mean', resolution=500)¶

Compute the mean of the distribution predictions.

Parameters

predictions (distribution) – a batch of distribution predictions.
reduction (str) – the method to aggregate the results across the batch. Can be ‘none’, ‘mean’, ‘sum’, ‘median’, ‘min’, or ‘max’.
resolution (int) – the number of discretization bins, where higher resolution increases estimation accuracy but also requires more memory/compute.

Returns

The computed mean, when reduction is ‘none’ the shape is [batch_size], otherwise the shape is [].

Return type

tensor with shape [batch_size] or []

torchuq.evaluate.distribution.compute_mean_std(predictions, reduction='mean', resolution=500)¶

Same as compute_mean and compute_std, but combines into one function for better efficiency.

Parameters

predictions (distribution) – a batch of distribution predictions.
reduction (str) – the method to aggregate the results across the batch. Can be ‘none’, ‘mean’, ‘sum’, ‘median’, ‘min’, or ‘max’.
resolution (int) – the number of discretization bins, where higher resolution increases estimation accuracy but also requires more memory/compute.

Returns

The mean and the standard deviation. When reduction is ‘none’ the shape is [batch_size], otherwise the shape is [].

Return type

tuple of two tensors with shape [batch_size] or []

torchuq.evaluate.distribution.compute_std(predictions, reduction='mean', resolution=500)¶

Compute the standard deviation of the distribution predictions.

Parameters

predictions (distribution) – a batch of distribution predictions.
reduction (str) – the method to aggregate the results across the batch. Can be ‘none’, ‘mean’, ‘sum’, ‘median’, ‘min’, or ‘max’.
resolution (int) – the number of discretization bins, where higher resolution increases estimation accuracy but also requires more memory/compute.

Returns

The standard deviation, when reduction is ‘none’ the shape is [batch_size], otherwise the shape is [].

Return type

tensor with shape [batch_size] or []

torchuq.evaluate.distribution.plot_cdf(predictions, labels=None, ax=None, max_count=30, resolution=200)¶

Plot the CDF functions.

Parameters

predictions (distribution) – a batch of distribution predictions.
labels (tensor with shape [batch_size]) – the true labels. If None the true labels are not plotted.
ax (axes) – the axes to plot the figure on, if None automatically creates a figure with recommended size.
max_count (int) – the maximum number of CDFs to plot.
resolution (int) – the number of points to compute the density. Higher resolution leads to a more accurate plot, but also requires more computation.

Returns

the ax on which the plot is made.

Return type

matplotlib.axes.Axes

torchuq.evaluate.distribution.plot_cdf_sequence(predictions, labels=None, ax=None, max_count=20, resolution=200)¶

Plot the CDF functions.

Parameters

predictions (distribution) – a batch of distribution predictions.
labels (tensor with shape [batch_size]) – the true labels. If None the true labels are not plotted.
ax (axes) – the axes to plot the figure on, if None automatically creates a figure with recommended size.
max_count (int) – the maximum number of CDFs to plot.
resolution (int) – the number of points to compute the density. Higher resolution leads to a more accurate plot, but also requires more computation.

Returns

the ax on which the plot is made.

Return type

matplotlib.axes.Axes

torchuq.evaluate.distribution.plot_density_sequence(predictions, labels=None, max_count=100, ax=None, resolution=100, smooth_bw=0)¶

Plot the PDF of the predictions and the labels.

For aesthetics the PDFs are reflected along y axis to make a symmetric violin shaped plot.

Parameters

predictions (distribution) – a batch of distribution predictions.
labels (tensor with shape [batch_size]) – the true labels. If None the true labels are not plotted.
ax (axes) – the axes to plot the figure on, if None automatically creates a figure with recommended size.
max_count (int) – the maximum number of PDFs to plot.
resolution (int) – the number of points to compute the density. Higher resolution leads to a more accurate plot, but also requires more computation.
smooth_bw (int) – smooth the PDF with a uniform kernel whose bandwidth is smooth_bw / resolution.

Returns

the ax on which the plot is made.

Return type

matplotlib.axes.Axes

torchuq.evaluate.distribution.plot_icdf(predictions, labels=None, ax=None, max_count=30, resolution=200)¶

Plot the inverse CDF functions.

Parameters

predictions (distribution) – a batch of distribution predictions.
labels (tensor with shape [batch_size]) – the true labels. If None the true labels are not plotted.
ax (axes) – optional matplotlib.axes.Axes, the axes to plot the figure on. If None, automatically creates a figure with recommended size.
max_count (int) – the maximum number of CDFs to plot.
resolution (int) – the number of points to compute the density. Higher resolution leads to a more accurate plot, but also requires more computation.

Returns

the ax on which the plot is made.

Return type

matplotlib.axes.Axes

torchuq.evaluate.distribution.plot_reliability_diagram(predictions, labels, ax=None)¶

Plot the reliability diagram https://arxiv.org/abs/1807.00263.

Parameters

predictions (distribution) – a batch of distribution predictions.
labels (tensor with shape [batch_size]) – the true labels.
ax (axes) – optional matplotlib.axes.Axes, the axes to plot the figure on. If None, automatically creates a figure with recommended size.

Returns

the ax on which the plot is made.

Return type

matplotlib.axes.Axes

torchuq.evaluate.interval Module¶

torchuq.evaluate.interval.compute_coverage(predictions, labels, reduction='mean')¶

Compute the empirical coverage. This function is not differentiable.

Parameters

predictions (tensor) – a batch of interval predictions, which is an array [batch_size, 2].
labels (tensor) – the labels, an array of shape [batch_size].
reduction (str) – the method to aggregate the results across the batch. Can be ‘none’, ‘mean’, ‘sum’, ‘median’, ‘min’, or ‘max’.

Returns

the coverage, an array with shape [batch_size] or shape [] depending on the reduction.

Return type

tensor

torchuq.evaluate.interval.compute_length(predictions, reduction='mean')¶

Compute the average length of an interval prediction.

Parameters

predictions (tensor) – a batch of interval predictions, which is an array [batch_size, 2].
reduction (str) – the method to aggregate the results across the batch. Can be ‘none’, ‘mean’, ‘sum’, ‘median’, ‘min’, or ‘max’.

Returns

the interval length, an array with shape [batch_size] or shape [] depending on the reduction.

Return type

tensor

torchuq.evaluate.interval.plot_interval_sequence(predictions, labels=None, ax=None, max_count=100)¶

Plot the PDF of the predictions and the labels.

For aesthetics the PDFs are reflected along y axis to make a symmetric violin shaped plot.

Parameters

predictions (tensor) – a batch of interval predictions, which is an array [batch_size, 2].
labels (tensor) – the labels, an array of shape [batch_size].
ax (axes) – the axes to plot the figure on. If None, automatically creates a figure with recommended size.
max_count (int) – the maximum number of intervals to plot.

Returns

the ax on which the plot is made.

Return type

axes

torchuq.evaluate.interval.plot_length_cdf(predictions, ax=None, plot_median=True)¶

Plot the CDF of interval length.

Parameters

predictions (tensor) – a batch of interval predictions, which is an array [batch_size, 2].
ax (axes) – the axes to plot the figure on, if None automatically creates a figure with recommended size.
plot_median (bool) – if true plot the median interval length.

Returns

the ax on which the plot is made.

Return type

axes

torchuq.evaluate.point Module¶

torchuq.evaluate.point.compute_huber_loss(predictions, labels, reduction='mean', delta=None)¶

Compute the Huber loss.

Parameters

predictions (tensor) – a batch of point predictions.
labels (tensor) – the labels, an array of shape [batch_size].
reduction (str) – the method to aggregate the results across the batch. Can be ‘none’, ‘mean’, ‘sum’, ‘median’, ‘min’, or ‘max’.
delta (float) – the delta parameter for the huber loss, if None then automatically set it as the top 20% largest absolute error.

Returns

the huber loss, an array with shape [batch_size] or shape [] depending on the reduction.

Return type

tensor

torchuq.evaluate.point.compute_l2_loss(predictions, labels, reduction='mean')¶

Compute the L2 loss.

Parameters

predictions (tensor) – a batch of point predictions.
labels (tensor) – the labels, an array of shape [batch_size].
reduction (str) – the method to aggregate the results across the batch. Can be ‘none’, ‘mean’, ‘sum’, ‘median’, ‘min’, or ‘max’.

Returns

the l2 loss, an array with shape [batch_size] or shape [] depending on the reduction.

Return type

tensor

torchuq.evaluate.point.compute_pinball_loss(predictions, labels, alpha=0.5, reduction='mean')¶

Compute the pinball loss for the alpha-th quantile.

Parameters

predictions (tensor) – a batch of point predictions.
labels (tensor) – the labels, an array of shape [batch_size].
reduction (str) – the method to aggregate the results across the batch. Can be ‘none’, ‘mean’, ‘sum’, ‘median’, ‘min’, or ‘max’.
alpha (float) – the quantile to compute the pinball loss for.

Returns

the pinball loss, an array with shape [batch_size] or shape [] depending on the reduction.

Return type

tensor

torchuq.evaluate.point.plot_conditional_bias(predictions, labels, ax=None, knn=None, conditioning='label')¶

Make the conditional bias diagram as described in [TODO: add paper reference].

Parameters

predictions (tensor) – a batch of point predictions.
labels (tensor) – the labels, an array of shape [batch_size].
ax (axes) – the axes to plot the figure on, if None automatically creates a figure with recommended size.
knn (int) – the number of nearest neighbors to average over. If None knn is set automatically.
conditioning (str) – can be ‘label’ or ‘prediction’.

Returns

the ax on which the plot is made.

Return type

axes

torchuq.evaluate.point.plot_scatter(predictions, labels, ax=None)¶

Plot the scatter plot between the point predictions and the labels.

Parameters

predictions (tensor) – a batch of point predictions.
labels (tensor) – the labels, an array of shape [batch_size].
ax (axes) – the axes to plot the figure on, if None automatically creates a figure with recommended size.

Returns

the ax on which the plot is made.

Return type

axes

torchuq.evaluate.quantile Module¶

torchuq.evaluate.quantile.compute_pinball_loss(predictions, labels, reduction='mean')¶

Compute the pinball loss, which is a proper scoring rule for quantile predictions.

Parameters

predictions (tensor) – a batch of quantile predictions, which is an array with shape [batch_size, n_quantiles] or [batch_size, 2, n_quantiles].
labels (tensor) – the labels, an array of shape [batch_size].
reduction (str) – the method to aggregate the results across the batch. Can be ‘none’, ‘mean’, ‘sum’, ‘median’, ‘min’, or ‘max’.

Returns

the pinball loss, an array with shape [batch_size] or shape [] depending on the reduction.

Return type

tensor

torchuq.evaluate.quantile.plot_quantile_calibration(predictions, labels, ax=None)¶

Plot the reliability diagram for quantiles.

Parameters

predictions (tensor) – a batch of quantile predictions, which is an array with shape [batch_size, n_quantiles] or [batch_size, 2, n_quantiles].
labels (tensor) – the labels, an array of shape [batch_size].
ax (axes) – the axes to plot the figure on, if None automatically creates a figure with recommended size.

Returns

the ax on which the plot is made.

Return type

axes

torchuq.evaluate.quantile.plot_quantile_sequence(predictions, labels=None, ax=None, max_count=100)¶

Plot the PDF of the predictions and the labels.

For aesthetics the PDFs are reflected along y axis to make a symmetric violin shaped plot.

Parameters

predictions (tensor) – a batch of quantile predictions, which is an array with shape [batch_size, n_quantiles] or [batch_size, 2, n_quantiles].
labels (tensor) – the labels, an array of shape [batch_size].
ax (axes) – the axes to plot the figure on, if None automatically creates a figure with recommended size.
max_count (int) – the maximum number of quantiles to plot.

Returns

the ax on which the plot is made.

Return type

axes