API Reference
This section documents the public API of the bbstat package.
bbstat Package
bbstat: Bayesian Bootstrap Utilities
This package provides tools for performing and evaluating the Bayesian bootstrap, a resampling method based on the Bayesian interpretation of uncertainty.
Main Features
bootstrap: Run the Bayesian bootstrap on compatible data structures.BootstrapDistribution: A frozen data class representing the resulting distribution of a bootstrap resampling procedure.BootstrapSummary: A frozen data class that holds the summary (mean, credible interval, and level) of a Bayesian bootstrap procedure's result.resample: Generate weighted samples using the Dirichlet distribution.statistics: Collection of built-in weighted statistics.BootstrapResult: A data class that holds bootstrap estimates, computes the mean, and automatically evaluates the credible interval.
Supported Statistic Functions
Custom statistic functions must accept the signature:
(data: ..., weights: numpy.typing.NDarray[numpy.floating], **kwargs) -> float
Compatible examples in bbstat.statistics include:
compute_weighted_entropy: Weighted entropycompute_weighted_eta_square_dependency: Weighted eta-squared for categorical group differencescompute_weighted_log_odds: Weighted log-odds of a selected statecompute_weighted_mean: Weighted mean estimatecompute_weighted_median: Weighted median estimatecompute_weighted_mutual_information: Weighted mutual informationcompute_weighted_pearson_dependency: Weighted Pearson correlationcompute_weighted_percentile: Weighted percentile estimatecompute_weighted_probability: Weighted probability of a selected statecompute_weighted_quantile: Weighted quantile estimatecompute_weighted_self_information: Weighted self-information of a selected statecompute_weighted_spearman_dependency: Weighted Spearman correlationcompute_weighted_std: Weighted standard deviation estimatecompute_weighted_sum: Weighted sum estimatecompute_weighted_variance: Weighted variance estimate
Modules:
| Name | Description |
|---|---|
- `bootstrap` |
Core logic for Bayesian bootstrap |
- `evaluate` |
Tools for summarizing bootstrap results |
- `plot` |
Tool for visualizing bootstrap results |
- `registry` |
Registry for built-in statistic functions |
- `resample` |
Weighted resampling function |
- `statistics` |
Built-in statistic functions |
- `utils` |
Utility functions |
bootstrap Module
Bayesian bootstrap resampling for statistical estimation and uncertainty quantification.
This module provides the bootstrap function, which applies the Bayesian bootstrap
resampling method to estimate a statistic (such as the mean or median) along with its
credible interval. It supports flexible input data formats, user-defined or
registered statistic functions, and additional customization via keyword arguments.
The function is designed for use in probabilistic data analysis workflows, where quantifying uncertainty through resampling is critical. It is particularly well-suited for small to moderate datasets and non-parametric inference.
Main Features
- Resampling via the Bayesian bootstrap method.
- Support for scalar or multivariate data inputs.
- Use of string-based or function-based statistic definitions.
- Configurable number of resamples and credible interval level.
- Optional blockwise resampling for structured data.
- Random seed control for reproducibility.
Example
import numpy as np
from bbstat.bootstrap import bootstrap
data = np.random.randn(100)
distribution = bootstrap(data, statistic_fn="mean")
print(distribution)
print(distribution.summarize())
See the function-level docstring of bootstrap for full details.
bootstrap(data, statistic_fn, n_boot=1000, seed=None, blocksize=None, fn_kwargs=None)
Performs Bayesian bootstrap resampling to estimate a statistic.
This function performs Bayesian bootstrap resampling by generating n_boot resamples from
the provided data and applying the specified statistic function (statistic_fn).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Any
|
The data to be resampled. It can be a 1D array, a tuple, or a list of arrays where each element represents a different group of data to resample. |
required |
statistic_fn
|
Union[str, StatisticFunction]
|
The statistic function to be applied on each bootstrap resample. It can either be the name of a registered statistic function or the function itself. |
required |
n_boot
|
int
|
The number of bootstrap resamples to generate. Default is 1000. |
1000
|
seed
|
int
|
A seed for the random number generator to ensure reproducibility.
Default is |
None
|
blocksize
|
int
|
The block size for resampling. If provided, resampling weights
are generated in blocks of this size. Defaults to |
None
|
fn_kwargs
|
Dict[str, Any]
|
Additional keyword arguments to be passed to
the |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
BootstrapDistribution |
BootstrapDistribution
|
An object containing the array with the resampled statistics. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If any data array is not 1D or if the dimensions of the input arrays do not match. |
Example
data = np.random.randn(100)
statistic_fn = "mean"
result = bootstrap(data, statistic_fn)
print(result)
print(result.summarize())
Notes
- The
dataargument can be a single 1D array, or a tuple or list of 1D arrays where each array represents a feature of the data. - The
statistic_fncan either be the name of a registered function (as a string) or the function itself. If a string is provided, it must match the name of a function in thestatistics.registry. - The function uses the
resamplefunction to generate bootstrap resamples and apply the statistic function to each resample.
Source code in bbstat/bootstrap.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | |
evaluate Module
Evaluation utilities for summarizing bootstrap resampling results.
This module provides a data structure for interpreting and summarizing the output of Bayesian bootstrap resampling procedures.
Main Features
BootstrapDistribution: A frozen data class representing the resulting distribution of a bootstrap resampling procedure.BootstrapSummary: A frozen data class that holds the summary (mean, credible interval, and level) of a Bayesian bootstrap procedure's result.
Example
import numpy as np
from bbstat.evaluate import BootstrapDistribution
distribution = BootstrapDistribution(estimates=np.array([5.0, 2.3, 2.9]))
print(distribution) # => BootstrapDistribution(mean=3.4, size=3)
summary = distribution.summarize(level=0.95)
print(summary) # => BootstrapSummary(mean=3.4, ci_low=2.33, ci_high=4.895, level=0.95)
Notes
- This module is designed to be used alongside the
bootstrapandresamplemodules to provide complete statistical summaries of resampled data.
BootstrapDistribution
dataclass
A class representing the resulting distribution of a bootstrap resampling procedure.
This class stores the distribution resulting from a Bayesian bootstrap analysis, and provides a method to summarize the result.
Attributes:
| Name | Type | Description |
|---|---|---|
estimates |
FArray
|
The array of bootstrap resample estimates. |
Methods:
| Name | Description |
|---|---|
__post_init__ |
Validates and locks the |
__len__ |
Returns the length of the |
__str__ |
Returns a string representation of the object. |
summarize |
Returns a |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in bbstat/evaluate.py
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | |
__len__()
Returns the length of the estimates array.
Source code in bbstat/evaluate.py
212 213 214 | |
__post_init__()
Post-initialization method to validate and lock the estimates array.
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in bbstat/evaluate.py
193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | |
__str__()
Returns a human-readable string representation of the bootstrap distribution.
This method formats the mean and size of the bootstrap distribution for display.
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
A formatted string representing the bootstrap distribution. |
Source code in bbstat/evaluate.py
216 217 218 219 220 221 222 223 224 225 226 227 | |
summarize(level=0.87)
Returns a BootstrapSummary object.
This method is a wrapper for BootstrapSummary.from_estimates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
level
|
float
|
The desired level for the credible interval (must be between 0 and 1). |
0.87
|
Returns:
| Name | Type | Description |
|---|---|---|
BootstrapSummary |
BootstrapSummary
|
the summary object. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the |
Source code in bbstat/evaluate.py
229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | |
BootstrapSummary
dataclass
A class representing the summary of a Bayesian bootstrap resampling procedure.
This class stores the mean, the credible interval, and level.
Attributes:
| Name | Type | Description |
|---|---|---|
mean |
float
|
The mean of the bootstrap estimates. |
ci_low |
float
|
The lower bound of the credible interval. |
ci_high |
float
|
The upper bound of the credible interval. |
level |
float
|
The desired level for the credible interval (between 0 and 1). |
ci_width |
float
|
The width of the credible interval (property). |
Methods:
| Name | Description |
|---|---|
__post_init__ |
Validates the |
round |
Returns a new version of the summary with rounded values. |
from_estimates |
Creates a summary object from estimates. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
ValueError
|
If the bounds are swapped, |
ValueError
|
If |
Source code in bbstat/evaluate.py
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 | |
ci_width
property
Returns the width of the credible interval.
__post_init__()
Post-initialization method to validate the mean, ci_low,
ci_high, and level attributes.
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
ValueError
|
If the bounds are swapped, |
ValueError
|
If |
Source code in bbstat/evaluate.py
71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 | |
from_estimates(estimates, *, level=0.87)
classmethod
Creates a summary object from estimates.
This method computes the mean and credible interval bounds ci_low and
ci_high, and creates a BootstrapSummary object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
estimates
|
FArray
|
The estimated values from a Bayesian bootstrap procedure. |
required |
level
|
float
|
The desired level for the credible interval (between 0 and 1), default is 0.87. |
0.87
|
Returns:
| Name | Type | Description |
|---|---|---|
BootstrapSummary |
BootstrapSummary
|
The summary of a Bayesian bootstrap procedure's result. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If estimates is empty, not a 1D array, or contains NaN values. |
ValueError
|
If level is not between 0 and 1 (exclusive). |
Source code in bbstat/evaluate.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 | |
round(precision=None)
Returns a new version of the summary with rounded values.
When precision is given, the mean and credible interval bounds are rounded
to this number of digits. If precision=None (default), the precision is
computed form the width of the credible interval.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
precision
|
int
|
The desired precision for rounding. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
BootstrapSummary |
BootstrapSummary
|
The summary of a Bayesian bootstrap procedure's result. |
Source code in bbstat/evaluate.py
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | |
plot Module
Plotting utility for bootstrap resampling results.
This module provides a function for visually interpreting and summarizing the output of Bayesian bootstrap resampling procedures.
Main Features
plot: Visualizes the result of a bootstrap resampling procedure.
Notes
- The credible interval is calculated using quantiles of the empirical distribution of bootstrap estimates.
- This module is designed to be used alongside the
evaluatemodule to provide complete statistical summaries of resampled data.
plot(bootstrap_distribution, level, *, ax=None, n_grid=200, label=None, precision=None)
Plot the kernel density estimate (KDE) of bootstrap estimates with credible interval shading and a vertical line at the mean.
If an axis is provided, the plot is drawn on it; otherwise, a new figure and axis are created. Displays a shaded credible interval and labels the plot with a formatted mean and credible interval. If no axis is provided, the figure further is annotated with a title and ylabel, ylim[0] positioned at zero, the legend is set, and a tight layout applied.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bootstrap_distribution
|
BootstrapDistribution
|
The result of a bootstrap resampling procedure. |
required |
level
|
float
|
Credible interval level (e.g., 0.95 for 95% CI). |
required |
ax
|
Axes
|
Matplotlib axis to draw the plot on. If None, a new axis is created. |
None
|
n_grid
|
int
|
Number of grid points to use for evaluating the KDE, default is 200. |
200
|
label
|
str
|
Optional label for the line. If provided, the label is extended to include the mean and credible interval. |
None
|
precision
|
int or auto or None
|
Optional precision for rounding the summary values (mean and credible interval). If None (default), no rounding is done; if "auto", the precision is computed from the width of the credible interval; if integer, we round to this many digits. |
None
|
Returns:
| Type | Description |
|---|---|
Axes
|
plt.Axes: The axis object containing the plot. |
Source code in bbstat/plot.py
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 | |
registry Module
Statistic function registry and protocol definition.
This module defines a strict Protocol (StatisticFunction) for all supported
statistical aggregation functions used in the system. It also provides a typed
mapping of statistic function names to their concrete implementations and a
lookup function (get_statistic_fn) for retrieving them by name.
All registered functions are callable with specific combinations of arguments
(e.g. data, weights, and optional parameters like ddof, factor, or
sorter) depending on the computation type. Static typing ensures correct
usage of each registered function.
StatisticFunction
Bases: Protocol
A protocol defining the interface for all statistical computation functions.
Each implementing function must take data and weights arrays and may
accept additional keyword-only arguments depending on the computation type.
Overloads:
aggregate: acceptsdata: FArray,weights: FArray, and optionalfactor: floatmean,sum: accept onlydata: FArray,weights: FArrayvariance,std: accept optionalweighted_mean: floatandddof: intquantile: requiresquantile: floatand optionalsorter: IArraypercentile: requirespercentile: floatand optionalsorter: IArraymedian: accepts optionalsortermutual_information: acceptsdata: IIArrayandweights: FArray, andnormalize: bool = Truepearson_dependency,spearman_dependency: take tuple of two float arrays (FFArray) andddofeta_square_dependency: takes tuple of and integer and a float array (IFArray)entropy: acceptsdata: IFArrayandweights: FArrayprobability,self_information,log_odds: acceptsdata: IFArray,weights: FArray, andstate: int
Source code in bbstat/registry.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 | |
get_statistic_fn(name)
Retrieve a registered statistic function by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The lowercase name of the statistic function to retrieve. Must be one of: - "aggregate" - "entropy" - "eta_square_dependency" - "log_odds" - "mean" - "median" - "mutual_information" - "pearson_dependency" - "percentile" - "probability" - "quantile" - "self_information" - "spearman_dependency" - "std" - "sum" - "variance" |
required |
Returns:
| Name | Type | Description |
|---|---|---|
StatisticFunction |
StatisticFunction
|
The corresponding function implementation. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the name does not correspond to a registered function. |
Source code in bbstat/registry.py
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 | |
get_statistic_fn_names()
Retrieve the names of registered statistic functions.
Returns:
| Type | Description |
|---|---|
Tuple[str, ...]
|
Tuple[str, ...]: The names of the available statistic functions. |
Source code in bbstat/registry.py
292 293 294 295 296 297 298 299 | |
resample Module
Bootstrap resampling utilities using Dirichlet-distributed weights.
This module provides functionality for generating bootstrap resamples via the Bayesian bootstrap method, where resamples are weighted using samples from a Dirichlet distribution. It is intended for internal use within higher-level resampling and estimation workflows.
The function resample yields weighted resamples suitable for estimating
statistics under uncertainty without making parametric assumptions.
Main Features
- Dirichlet-based resampling for Bayesian bootstrap.
- Support for blockwise resample generation to control memory usage.
- Optional random seed for reproducibility.
- Generator interface for efficient streaming of resample weights.
Example
from bbstat.resample import resample
for weights in resample(n_boot=1000, n_data=50):
# Apply weights to compute statistic
...
Notes
- The function is designed to scale to large numbers of resamples.
- It is most useful as a low-level utility within a bootstrap framework.
See the resample function docstring for complete usage details.
resample(n_boot, n_data, seed=None, blocksize=None)
Generates bootstrap resamples with Dirichlet-distributed weights.
This function performs resampling by generating weights from a Dirichlet distribution.
The number of resamples is controlled by the n_boot argument, while the size of
each block of resamples can be adjusted using the blocksize argument. The seed
argument allows for reproducible results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_boot
|
int
|
The total number of bootstrap resamples to generate. |
required |
n_data
|
int
|
The number of data points to resample (used for the dimension of the Dirichlet distribution). |
required |
seed
|
int
|
A random seed for reproducibility (default is |
None
|
blocksize
|
int
|
The number of resamples to generate in each block.
If |
None
|
Yields:
| Type | Description |
|---|---|
FArray
|
Generator[FArray, None, None]: A generator that yields each resample
(a 1D array of floats) as it is generated. Each resample contains Dirichlet-
distributed weights for the given |
Example
for r in resample(n_boot=10, n_data=5):
print(r)
Notes
- If
blocksizeis specified, the resampling will be performed in smaller blocks, which can be useful for parallelizing or limiting memory usage. - The function uses NumPy's
default_rngto generate random numbers, which provides a more flexible and efficient interface compared tonp.random.seed.
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in bbstat/resample.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 | |
statistics Module
Type definitions and weighted statistical functions for use in bootstrap resampling and analysis.
This module defines types for statistical functions that operate on weighted data, particularly in the context of Bayesian bootstrap procedures. It provides a collection of pre-defined weighted statistics (e.g., mean, variance, quantile).
Main Features
- Type aliases for data and weights.
- A library of built-in weighted statistical functions (e.g., mean, std, quantile, etc.)
Type Aliases:
| Name | Description |
|---|---|
- `FArray` |
Alias for |
- `IArray` |
Alias for |
- `FFArray`, `IFArray`, `IIArray` |
Tuples of data arrays used in bivariate computations. |
Built-in Functions
"compute_weighted_aggregate": Weighted dot product, optionally scaled by a factor (internal use only)."compute_weighted_entropy": Weighted entropy."compute_weighted_eta_square_dependency": Effect size for categorical-continuous variable relationships."compute_weighted_mean": Weighted arithmetic mean."compute_weighted_median": Weighted median."compute_weighted_mutual_information": Weighted mutual information."compute_weighted_pearson_dependency": Weighted Pearson correlation for two variables."compute_weighted_probability": Weighted probability of a state."compute_weighted_quantile"/"compute_weighted_percentile": Weighted quantile estimation."compute_weighted_spearman_dependency": Weighted Spearman correlation."compute_weighted_std": Weighted standard deviation."compute_weighted_sum": Weighted sum."compute_weighted_variance": Weighted variance with optional degrees of freedom correction.
Notes
- All functions assume normalized weights (i.e., sum to 1).
- Functions raise
ValueErrorfor invalid shapes, mismatched dimensions, or inappropriate input types. - This module is intended for use with
bootstrap, which applies these functions across bootstrap resamples.
FArray = NDArray[np.floating]
module-attribute
FFArray = Tuple[FArray, FArray]
module-attribute
IArray = NDArray[np.integer]
module-attribute
IFArray = Tuple[IArray, FArray]
module-attribute
compute_weighted_aggregate(data, weights, *, factor=None)
Computes a weighted aggregate of the input data.
This function calculates the dot product of the input data and weights.
The function assumes that both data and weights are 1D arrays of the same
length and that the weights sum to 1. If a factor is provided, the dot product
is multiplied with it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
FArray
|
A 1D array of numeric values representing the data to be aggregated. |
required |
weights
|
FArray
|
A 1D array of numeric values representing the weights for the data. |
required |
factor
|
float
|
A scalar factor to multiply with the computed aggregate (default is None). |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The computed weighted aggregate, potentially scaled by the |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
ValueError
|
If the shapes of |
Example
data = np.array([1.0, 2.0, 3.0])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_aggregate(data, weights)) # => 2.1
print(compute_weighted_aggregate(data, weights, factor=1.5)) # => 3.15
Notes
The weighted aggregate is computed using the dot product between data and weights.
The optional factor scales the result of this dot product. If no factor is given,
the aggregation computes the weighted arithmetic mean of the data; if instead the factor
equals the length of the data array, the aggregation computes the weighted sum.
Source code in bbstat/statistics.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | |
compute_weighted_entropy(data, weights)
Computes a weighted entropy of 1D code data.
This function calculates the weighted entropy by first computing the weighted distribution, dropping the zero elements (contribute zero to the following sum), and computing the negative dot product between the distribution and log-distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
IArray
|
A 1D array of numeric values representing the sample data in code format. |
required |
weights
|
FArray
|
A 1D array of numeric weights corresponding to the data. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The weighted entropy value. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Example
data = np.array([1, 0, 0])
weights = np.array([0.4, 0.2, 0.4])
print(compute_weighted_entropy(data, weights)) # => 0.673...
Source code in bbstat/statistics.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | |
compute_weighted_eta_square_dependency(data, weights)
Computes the weighted eta-squared (η²) statistic to assess dependency between a categorical and a numerical variable.
Eta-squared measures the proportion of total variance in the numerical variable that is explained by the categorical grouping. It is commonly used in ANOVA-like analyses and effect size estimation. The value is bounded between 0 and 1.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
IFArray
|
A tuple |
required |
weights
|
FArray
|
A 1D array of non-negative weights,
same length as |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Weighted eta-squared value in the range [0, 1], where higher values |
float
|
indicate stronger association between the categorical and numeric variable. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If input arrays are not 1D or do not have matching shapes. |
Example
data_cat = np.array([0, 0, 1, 1])
data_num = np.array([1.0, 2.0, 3.0, 4.0])
weights = np.array([0.25, 0.25, 0.25, 0.25)
print(compute_weighted_eta_square_dependency((data_cat, data_num), weights)) # => 0.8
Notes
- Internally, η² is computed as the ratio of weighted between-group variance to the total weighted variance.
- The statistic is sensitive to group sizes and imbalance in weights.
- When all group means equal the global mean, η² is 0.
- When groups are perfectly separated by the numeric variable, η² is 1.
Source code in bbstat/statistics.py
164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | |
compute_weighted_log_odds(data, weights, state)
Computes a weighted log-odds of a state within 1D code data.
This function calculates the weighted probability of a state, p(state) by summing
the weights which coincide with data == state. The log-odds then
computes as logarithm of the odds p(state) / (1 - p(state)).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
IArray
|
A 1D array of numeric values representing the sample data in code format. |
required |
weights
|
FArray
|
A 1D array of numeric weights corresponding to the data. |
required |
state
|
int
|
The state for which we estimate the log-odds. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The weighted log-odds value. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Example
data = np.array([1, 0, 0])
weights = np.array([0.4, 0.2, 0.4])
print(compute_weighted_log_odds(data, weights, state=0)) # => 0.405...
Source code in bbstat/statistics.py
224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 | |
compute_weighted_mean(data, weights)
Computes a weighted mean of the input data.
This function calculates the weighted arithmetic mean of the input data
and weights via the compute_weighted_aggregate function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
FArray
|
A 1D array of numeric values representing the data to be averaged. |
required |
weights
|
FArray
|
A 1D array of numeric values representing the weights for the data. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The computed weighted mean. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
ValueError
|
If the shapes of |
Example
data = np.array([1.0, 2.0, 3.0])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_mean(data, weights)) # => 2.1
Source code in bbstat/statistics.py
261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 | |
compute_weighted_median(data, weights, *, sorter=None)
Computes a weighted median of 1D data using linear interpolation.
This function calculates the weighted meadian of the given data array
based on the provided weights via compute_weighted_quantile with parameter
quantile=0.5.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
FArray
|
A 1D array of numeric values representing the sample data. |
required |
weights
|
FArray
|
A 1D array of numeric weights corresponding to the data. |
required |
sorter
|
Optional[NDArray[integer]]
|
Optional array of indices that
sorts |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The interpolated weighted median value. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Example
data = np.array([1.0, 2.0, 3.0])
weights = np.array([0.4, 0.2, 0.4])
print(compute_weighted_median(data, weights)) # => 2.25
Source code in bbstat/statistics.py
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 | |
compute_weighted_mutual_information(data, weights, *, normalize=True)
Computes the weighted mutual information (dependency) between two 1D arrays.
This function calculates the mutual information dependency between two integer variables
using the direct mutual information formula on weighted joint distribution estimates.
The inputs data_1 and data_2 are expected to be 1D arrays of the same length, provided
as a tuple data. Each data point is assigned a weight from the weights array.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
IIArray
|
A tuple of two 1D integer arrays |
required |
weights
|
FArray
|
A 1D float array of weights, same length as
each array in |
required |
normalize
|
bool
|
Whether or not to normalize the mutual information to range [0, 1]. Default is True |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The weighted mutual information. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the input arrays are not 1D or have mismatched lengths. |
Example
data_1 = np.array([0, 0, 1])
data_2 = np.array([0, 1, 1])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_mutual_information((data_1, data_2), weights)) # => 0.146...
Notes
- The normalized result is bounded by [0, 1], where 0 indicates perfect independence and 1 indicates perfect dependence.
- The unnormalized result is bounded between 0 (perfect independence) and
min(H(data_0), H(data_1)), whereH(data_0)andH(data_1)are the entropies of the two variables. If we reach the upper bound andH(data_0) == H(data_1), we have perfect dependence.
Source code in bbstat/statistics.py
336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 | |
compute_weighted_pearson_dependency(data, weights, *, ddof=0)
Computes the weighted Pearson correlation coefficient (dependency) between two 1D arrays.
This function calculates the linear dependency between two variables using a weighted
version of Pearson's correlation coefficient. The inputs data_1 and data_2 are
expected to be 1D arrays of the same length, provided as a tuple data. Each data point
is assigned a weight from the weights array.
The function normalizes both variables by subtracting their weighted means and dividing by their weighted standard deviations, then computes the weighted mean of the element-wise product of these normalized arrays.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
FArray
|
A tuple of two 1D float arrays |
required |
weights
|
FArray
|
A 1D float array of weights, same length as
each array in |
required |
ddof
|
int
|
Delta degrees of freedom for standard deviation. Defaults to 0 (population formula). Use 1 for sample-based correction. |
0
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The weighted Pearson correlation coefficient in the range [-1, 1]. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the input arrays are not 1D or have mismatched lengths. |
Example
data_1 = np.array([1.0, 2.0, 3.0])
data_2 = np.array([1.0, 2.0, 2.9])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_pearson_dependency((data_1, data_2), weights)) # => 0.998...
Notes
- The function relies on
compute_weighted_meanandcompute_weighted_std. - The correlation is computed using the formula: corr = weighted_mean(z1 * z2) where z1 and z2 are the standardized variables.
- The result is bounded between -1 (perfect negative linear relationship) and 1 (perfect positive linear relationship), with 0 indicating no linear dependency.
Source code in bbstat/statistics.py
407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 | |
compute_weighted_percentile(data, weights, *, percentile, sorter=None)
Computes a weighted percentile of 1D data using linear interpolation.
This function calculates the weighted percentile of the given data array
based on the provided weights via compute_weighted_quantile with parameter
quantile=0.01 * percentile.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
FArray
|
A 1D array of numeric values representing the sample data. |
required |
weights
|
FArray
|
A 1D array of numeric weights corresponding to the data. |
required |
percentile
|
float
|
The desired percentile in the interval [0, 100]. |
required |
sorter
|
Optional[NDArray[integer]]
|
Optional array of indices that
sorts |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The interpolated weighted percentile value. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Example
data = np.array([1.0, 2.0, 3.0])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_percentile(data, weights, percentile=70)) # => 2.2
Source code in bbstat/statistics.py
475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 | |
compute_weighted_probability(data, weights, state)
Computes a weighted probability of a state within 1D code data.
This function calculates the weighted probability of a state by summing
the weights which coincide with data == state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
IArray
|
A 1D array of numeric values representing the sample data in code format. |
required |
weights
|
FArray
|
A 1D array of numeric weights corresponding to the data. |
required |
state
|
int
|
The state for which we estimate the probability. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The weighted probability value. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Example
data = np.array([1, 0, 0])
weights = np.array([0.4, 0.2, 0.4])
print(compute_weighted_probability(data, weights, state=0)) # => 0.6
Source code in bbstat/statistics.py
519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 | |
compute_weighted_quantile(data, weights, *, quantile, sorter=None)
Computes a weighted quantile of 1D data using linear interpolation.
This function calculates the weighted quantile of the given data array
based on the provided weights. It uses a normalized cumulative weight
distribution to determine the interpolated quantile value. The computation
assumes both data and weights are 1D arrays of equal length.
A precomputed sorter (array of indices that would sort data) can be
optionally provided to avoid recomputing it internally.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
FArray
|
A 1D array of numeric values representing the sample data. |
required |
weights
|
FArray
|
A 1D array of numeric weights corresponding to the data. |
required |
quantile
|
float
|
The desired quantile in the interval [0, 1]. |
required |
sorter
|
Optional[NDArray[integer]]
|
Optional array of indices that
sorts |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The interpolated weighted quantile value. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Example
data = np.array([1.0, 2.0, 3.0])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_quantile(data, weights, quantile=0.7)) # => 2.2
Notes
- If
quantileis less than or equal to the minimum cumulative weight, the smallest data point is returned. - If
quantileis greater than or equal to the maximum cumulative weight, the largest data point is returned. - Linear interpolation is used between the two closest surrounding data points.
- Providing a precomputed
sortercan optimize performance in repeated calls.
Source code in bbstat/statistics.py
557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 | |
compute_weighted_self_information(data, weights, state)
Computes a weighted self-information of a state within 1D code data.
This function calculates the weighted probability of a state by summing
the weights which coincide with data == state. The self-information then
computes as negative logarithm of the weighted probability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
IArray
|
A 1D array of numeric values representing the sample data in code format. |
required |
weights
|
FArray
|
A 1D array of numeric weights corresponding to the data. |
required |
state
|
int
|
The state for which we estimate the self-information. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The weighted self-information value. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Example
data = np.array([1, 0, 0])
weights = np.array([0.4, 0.2, 0.4])
print(compute_weighted_self_information(data, weights, state=0)) # => 0.510...
Source code in bbstat/statistics.py
626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 | |
compute_weighted_spearman_dependency(data, weights, *, ddof=0)
Computes the weighted Spearman rank correlation coefficient between two 1D arrays.
This function measures the monotonic relationship between two variables by computing the weighted Pearson correlation between their ranked values (i.e., their order statistics). It is particularly useful for assessing non-linear relationships.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
FArray
|
A tuple of two 1D float arrays |
required |
weights
|
FArray
|
A 1D float array of weights, same length as
each array in |
required |
ddof
|
int
|
Delta degrees of freedom for standard deviation. Defaults to 0 (population formula). Use 1 for sample-based correction. |
0
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The weighted Spearman rank correlation coefficient in the range [-1, 1]. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the input arrays are not 1D or have mismatched lengths. |
Example
data_1 = np.array([1.0, 2.0, 3.0])
data_2 = np.array([0.3, 0.2, 0.1])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_spearman_dependency((data_1, data_2), weights)) # => -0.9999...
Notes
- Internally, ranks are computed using
scipy.stats.rankdata, which handles ties by assigning average ranks. - The Spearman coefficient is equivalent to the Pearson correlation between rank-transformed data.
- Output is bounded between -1 (perfect inverse monotonic relationship) and 1 (perfect direct monotonic relationship), with 0 indicating no monotonic correlation.
- Weights are applied after ranking.
Source code in bbstat/statistics.py
663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 | |
compute_weighted_std(data, weights, *, weighted_mean=None, ddof=0)
Computes a weighted standard deviation of the input data.
This function calculates the weighted standard deviation of the
input data and weights via the square root of the
compute_weighted_variance function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
FArray
|
A 1D array of numeric values representing the data for which we want the standard deviation. |
required |
weights
|
FArray
|
A 1D array of numeric values representing the weights for the data. |
required |
weighted_mean
|
float
|
The weighted mean of the data (default is
None). If missing, this value is computed via |
None
|
ddof
|
int
|
Delta degrees of freedom. Defaults to 0 (population formula). Use 1 for sample-based correction. |
0
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The computed weighted standard deviation. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
ValueError
|
If the shapes of |
Example
data = np.array([1.0, 2.0, 3.0])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_std(data, weights)) # => 0.7
Source code in bbstat/statistics.py
718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 | |
compute_weighted_sum(data, weights)
Computes a weighted sum of the input data.
This function calculates the weighted sum of the input data
and weights via the compute_weighted_aggregate function with
factor=len(data).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
FArray
|
A 1D array of numeric values representing the data to be summed. |
required |
weights
|
FArray
|
A 1D array of numeric values representing the weights for the data. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The computed weighted sum. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
ValueError
|
If the shapes of |
Example
data = np.array([1.0, 2.0, 3.0])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_sum(data, weights)) # => 6.3
Source code in bbstat/statistics.py
766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 | |
compute_weighted_variance(data, weights, *, weighted_mean=None, ddof=0)
Computes a weighted variance of the input data.
This function calculates the weighted variance of the input data
and weights via the compute_weighted_aggregate function with
factor=len(data) / (len(data) - ddof), where ddof specifies the
delta degrees of freedom.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
FArray
|
A 1D array of numeric values representing the data for which we want the variance. |
required |
weights
|
FArray
|
A 1D array of numeric values representing the weights for the data. |
required |
weighted_mean
|
float
|
The weighted mean of the data (default is
None). If missing, this value is computed via |
None
|
ddof
|
int
|
Delta degrees of freedom. Defaults to 0 (population formula). Use 1 for sample-based correction. |
0
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The computed weighted variance. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
ValueError
|
If the shapes of |
Example
data = np.array([1.0, 2.0, 3.0])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_variance(data, weights)) # => 0.49
print(compute_weighted_variance(data, weights, ddof=1)) # => 0.735
Source code in bbstat/statistics.py
800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 | |
utils Module
Utilities bootstrap-related tasks.
This module provides functions to aid interpretation and summarizing the output of Bayesian bootstrap resampling procedures. It includes tools to compute credible intervals for statistical estimates and gauging the appropriate precision for rounding mean and crebilility interval values from the width of the latter.
Main Features
compute_credible_interval: Computes a credible interval from a set of estimates.get_precision_for_rounding: Gauges the precision for rounding from the width of the credible interval.
Notes
- The credible interval is calculated using quantiles of the empirical distribution of bootstrap estimates.
- This module is designed to be used alongside the
evaluatemodule to provide complete statistical summaries of resampled data.
compute_credible_interval(estimates, level=0.87)
Compute the credible interval for a set of estimates.
This function calculates the credible interval of the given estimates array,
which is a range of values that contains a specified proportion of the data,
determined by the level parameter.
The credible interval is calculated by determining the quantiles at
(1 - level) / 2 and 1 - (1 - level) / 2 of the sorted estimates data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
estimates
|
FArray
|
A 1D array of floating-point numbers representing the estimates from which the credible interval will be calculated. |
required |
level
|
float
|
The proportion of data to be included in the credible interval. Must be between 0 and 1 (exclusive). Default is 0.87. |
0.87
|
Returns:
| Type | Description |
|---|---|
Tuple[float, float]
|
Tuple[float, float]: A tuple containing the lower and upper bounds of the credible
interval, with the lower bound corresponding to the |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Example
import numpy as np
estimates = np.array([1.1, 2.3, 3.5, 2.9, 4.0])
compute_credible_interval(estimates, 0.6) # => (2.06, 3.6)
Source code in bbstat/utils.py
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | |
get_precision_for_rounding(ci_width)
Returns number of digits for rounding.
This method computes the precision (number of digits) for rounding mean and credible interval values for better readability. If the credible interval has width zero, we round to zero digits. Otherwise, we take one minus the floored order of magnitude of the width.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ci_width
|
float
|
The width of the credible interval. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The number of digits for rounding. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Source code in bbstat/utils.py
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 | |