API Reference

This section documents the public API of the bbstat package.

`bbstat` Package

bbstat: Bayesian Bootstrap Utilities

This package provides tools for performing and evaluating the Bayesian bootstrap, a resampling method based on the Bayesian interpretation of uncertainty.

Main Features

bootstrap: Run the Bayesian bootstrap on compatible data structures.
BootstrapDistribution: A frozen data class representing the resulting distribution of a bootstrap resampling procedure.
BootstrapSummary: A frozen data class that holds the summary (mean, credible interval, and level) of a Bayesian bootstrap procedure's result.
resample: Generate weighted samples using the Dirichlet distribution.
statistics: Collection of built-in weighted statistics.
BootstrapResult: A data class that holds bootstrap estimates, computes the mean, and automatically evaluates the credible interval.

Supported Statistic Functions

Custom statistic functions must accept the signature:

(data: ..., weights: numpy.typing.NDarray[numpy.floating], **kwargs) -> float

Compatible examples in bbstat.statistics include:

compute_weighted_entropy: Weighted entropy
compute_weighted_eta_square_dependency: Weighted eta-squared for categorical group differences
compute_weighted_log_odds: Weighted log-odds of a selected state
compute_weighted_mean: Weighted mean estimate
compute_weighted_median: Weighted median estimate
compute_weighted_mutual_information: Weighted mutual information
compute_weighted_pearson_dependency: Weighted Pearson correlation
compute_weighted_percentile: Weighted percentile estimate
compute_weighted_probability: Weighted probability of a selected state
compute_weighted_quantile: Weighted quantile estimate
compute_weighted_self_information: Weighted self-information of a selected state
compute_weighted_spearman_dependency: Weighted Spearman correlation
compute_weighted_std: Weighted standard deviation estimate
compute_weighted_sum: Weighted sum estimate
compute_weighted_variance: Weighted variance estimate

Modules:

Name	Description
- `bootstrap`	Core logic for Bayesian bootstrap
- `evaluate`	Tools for summarizing bootstrap results
- `plot`	Tool for visualizing bootstrap results
- `registry`	Registry for built-in statistic functions
- `resample`	Weighted resampling function
- `statistics`	Built-in statistic functions
- `utils`	Utility functions

`bootstrap` Module

Bayesian bootstrap resampling for statistical estimation and uncertainty quantification.

This module provides the bootstrap function, which applies the Bayesian bootstrap resampling method to estimate a statistic (such as the mean or median) along with its credible interval. It supports flexible input data formats, user-defined or registered statistic functions, and additional customization via keyword arguments.

The function is designed for use in probabilistic data analysis workflows, where quantifying uncertainty through resampling is critical. It is particularly well-suited for small to moderate datasets and non-parametric inference.

Main Features

Resampling via the Bayesian bootstrap method.
Support for scalar or multivariate data inputs.
Use of string-based or function-based statistic definitions.
Configurable number of resamples and credible interval level.
Optional blockwise resampling for structured data.
Random seed control for reproducibility.

Example

import numpy as np
from bbstat.bootstrap import bootstrap
data = np.random.randn(100)
distribution = bootstrap(data, statistic_fn="mean")
print(distribution)
print(distribution.summarize())

See the function-level docstring of bootstrap for full details.

`bootstrap(data, statistic_fn, n_boot=1000, seed=None, blocksize=None, fn_kwargs=None)`

Performs Bayesian bootstrap resampling to estimate a statistic.

This function performs Bayesian bootstrap resampling by generating n_boot resamples from the provided data and applying the specified statistic function (statistic_fn).

Parameters:

Name	Type	Description	Default
`data`	`Any`	The data to be resampled. It can be a 1D array, a tuple, or a list of arrays where each element represents a different group of data to resample.	required
`statistic_fn`	`Union[str, StatisticFunction]`	The statistic function to be applied on each bootstrap resample. It can either be the name of a registered statistic function or the function itself.	required
`n_boot`	`int`	The number of bootstrap resamples to generate. Default is 1000.	`1000`
`seed`	`int`	A seed for the random number generator to ensure reproducibility. Default is `None`, which means no fixed seed.	`None`
`blocksize`	`int`	The block size for resampling. If provided, resampling weights are generated in blocks of this size. Defaults to `None`, meaning all resampling weights are generated at once.	`None`
`fn_kwargs`	`Dict[str, Any]`	Additional keyword arguments to be passed to the `statistic_fn` for each resample. Default is `None`.	`None`

Returns:

Name	Type	Description
`BootstrapDistribution`	`BootstrapDistribution`	An object containing the array with the resampled statistics.

Raises:

Type	Description
`ValueError`	If any data array is not 1D or if the dimensions of the input arrays do not match.

Example

data = np.random.randn(100)
statistic_fn = "mean"
result = bootstrap(data, statistic_fn)
print(result)
print(result.summarize())

Notes

The data argument can be a single 1D array, or a tuple or list of 1D arrays where each array represents a feature of the data.
The statistic_fn can either be the name of a registered function (as a string) or the function itself. If a string is provided, it must match the name of a function in the statistics.registry.
The function uses the resample function to generate bootstrap resamples and apply the statistic function to each resample.

Source code in bbstat/bootstrap.py

def bootstrap(
    data: Any,
    statistic_fn: Union[str, Callable],
    n_boot: int = 1000,
    seed: Optional[int] = None,
    blocksize: Optional[int] = None,
    fn_kwargs: Optional[Dict[str, Any]] = None,
) -> BootstrapDistribution:
    """
    Performs Bayesian bootstrap resampling to estimate a statistic.

    This function performs Bayesian bootstrap resampling by generating `n_boot` resamples from
    the provided `data` and applying the specified statistic function (`statistic_fn`).

    Args:
        data (Any): The data to be resampled. It can be a 1D array, a tuple,
            or a list of arrays where each element represents a different group of data to resample.
        statistic_fn (Union[str, StatisticFunction]): The statistic function to be applied on each
            bootstrap resample. It can either be the name of a registered statistic function or the
            function itself.
        n_boot (int, optional): The number of bootstrap resamples to generate. Default is 1000.
        seed (int, optional): A seed for the random number generator to ensure reproducibility.
            Default is `None`, which means no fixed seed.
        blocksize (int, optional): The block size for resampling. If provided, resampling weights
            are generated in blocks of this size. Defaults to `None`, meaning all resampling weights
            are generated at once.
        fn_kwargs (Dict[str, Any], optional): Additional keyword arguments to be passed to
            the `statistic_fn` for each resample. Default is `None`.

    Returns:
        BootstrapDistribution: An object containing the array with the resampled statistics.

    Raises:
        ValueError: If any data array is not 1D or if the dimensions of the input arrays do not match.

    Example:
        ```python
        data = np.random.randn(100)
        statistic_fn = "mean"
        result = bootstrap(data, statistic_fn)
        print(result)
        print(result.summarize())
        ```

    Notes:
        - The `data` argument can be a single 1D array, or a tuple or list of 1D arrays where each array
          represents a feature of the data.
        - The `statistic_fn` can either be the name of a registered function (as a string) or the function
          itself. If a string is provided, it must match the name of a function in the `statistics.registry`.
        - The function uses the `resample` function to generate bootstrap resamples and apply the statistic
          function to each resample.
    """
    if isinstance(data, np.ndarray):
        if data.ndim != 1:
            raise ValueError(f"Invalid parameter {data.ndim=:}: must be 1.")
        n_data: int = len(data)
    elif isinstance(data, (tuple, list)):
        n_data = len(data[0])
        for i, array in enumerate(data):
            if array.ndim != 1:
                raise ValueError(f"Invalid parameter {data[i].ndim=:}: must be 1.")
            if n_data != len(array):
                raise ValueError(
                    f"Invalid parameter {data[i].shape[0]=:}: must be {n_data=:}."
                )

    if isinstance(statistic_fn, str):
        statistic_fn = get_statistic_fn(statistic_fn)
    estimates = np.array(
        [
            statistic_fn(data=data, weights=weights, **(fn_kwargs or {}))
            for weights in resample(
                n_boot=n_boot,
                n_data=n_data,
                seed=seed,
                blocksize=blocksize,
            )
        ]
    )
    return BootstrapDistribution(estimates=estimates)

`evaluate` Module

Evaluation utilities for summarizing bootstrap resampling results.

This module provides a data structure for interpreting and summarizing the output of Bayesian bootstrap resampling procedures.

Main Features

BootstrapDistribution: A frozen data class representing the resulting distribution of a bootstrap resampling procedure.
BootstrapSummary: A frozen data class that holds the summary (mean, credible interval, and level) of a Bayesian bootstrap procedure's result.

Example

import numpy as np
from bbstat.evaluate import BootstrapDistribution
distribution = BootstrapDistribution(estimates=np.array([5.0, 2.3, 2.9]))
print(distribution)  # => BootstrapDistribution(mean=3.4, size=3)
summary = distribution.summarize(level=0.95)
print(summary)  # => BootstrapSummary(mean=3.4, ci_low=2.33, ci_high=4.895, level=0.95)

Notes

This module is designed to be used alongside the bootstrap and resample modules to provide complete statistical summaries of resampled data.

`BootstrapDistribution` `dataclass`

A class representing the resulting distribution of a bootstrap resampling procedure.

This class stores the distribution resulting from a Bayesian bootstrap analysis, and provides a method to summarize the result.

Attributes:

Name	Type	Description
`estimates`	`FArray`	The array of bootstrap resample estimates.

Methods:

Name	Description
`__post_init__`	Validates and locks the `estimates` attribute.
`__len__`	Returns the length of the `estimates` array.
`__str__`	Returns a string representation of the object.
`summarize`	Returns a `BootstrapSummary` object.

Raises:

Type	Description
`ValueError`	If `estimates` is not a 1D array or contains NaN values.

Source code in bbstat/evaluate.py

@dataclass(frozen=True)
class BootstrapDistribution:
    """
    A class representing the resulting distribution of a bootstrap resampling procedure.

    This class stores the distribution resulting from a Bayesian bootstrap analysis,
    and provides a method to summarize the result.

    Attributes:
        estimates (FArray): The array of bootstrap resample estimates.

    Methods:
        __post_init__: Validates and locks the `estimates` attribute.
        __len__: Returns the length of the `estimates` array.
        __str__: Returns a string representation of the object.
        summarize: Returns a `BootstrapSummary` object.

    Raises:
        ValueError: If `estimates` is not a 1D array or contains NaN values.
    """

    estimates: FArray

    def __post_init__(self):
        """
        Post-initialization method to validate and lock the estimates array.

        Raises:
            ValueError: If `estimates` is not a 1D array or contains NaN values.
        """
        if self.estimates.ndim != 1:
            raise ValueError(f"Invalid parameter {self.estimates.ndim=}: must be 1.")
        if len(self.estimates) < 1:
            raise ValueError("Invalid parameter estimates: must not be empty.")
        if np.isnan(self.estimates).any():
            raise ValueError(
                "Invalid parameter estimates: must not contain NaN values."
            )
        estimates_copy = np.array(self.estimates, copy=True)
        estimates_copy.setflags(write=False)
        object.__setattr__(self, "estimates", estimates_copy)

    def __len__(self) -> int:
        """Returns the length of the estimates array."""
        return len(self.estimates)

    def __str__(self) -> str:
        """
        Returns a human-readable string representation of the bootstrap distribution.

        This method formats the mean and size of the bootstrap distribution for display.

        Returns:
            str: A formatted string representing the bootstrap distribution.
        """
        mean = self.summarize().mean
        size = len(self)
        return f"BootstrapDistribution({mean=:}, {size=:})"

    def summarize(self, level: float = 0.87) -> BootstrapSummary:
        """
        Returns a `BootstrapSummary` object.

        This method is a wrapper for `BootstrapSummary.from_estimates`.

        Args:
            level (float): The desired level for the credible interval
                (must be between 0 and 1).

        Returns:
            BootstrapSummary: the summary object.

        Raises:
            ValueError: If the `level` is not between 0 and 1.
        """
        return BootstrapSummary.from_estimates(self.estimates, level=level)

`len()`

Returns the length of the estimates array.

Source code in bbstat/evaluate.py

def __len__(self) -> int:
    """Returns the length of the estimates array."""
    return len(self.estimates)

`__post_init__()`

Post-initialization method to validate and lock the estimates array.

Raises:

Type	Description
`ValueError`	If `estimates` is not a 1D array or contains NaN values.

Source code in bbstat/evaluate.py

def __post_init__(self):
    """
    Post-initialization method to validate and lock the estimates array.

    Raises:
        ValueError: If `estimates` is not a 1D array or contains NaN values.
    """
    if self.estimates.ndim != 1:
        raise ValueError(f"Invalid parameter {self.estimates.ndim=}: must be 1.")
    if len(self.estimates) < 1:
        raise ValueError("Invalid parameter estimates: must not be empty.")
    if np.isnan(self.estimates).any():
        raise ValueError(
            "Invalid parameter estimates: must not contain NaN values."
        )
    estimates_copy = np.array(self.estimates, copy=True)
    estimates_copy.setflags(write=False)
    object.__setattr__(self, "estimates", estimates_copy)

`str()`

Returns a human-readable string representation of the bootstrap distribution.

This method formats the mean and size of the bootstrap distribution for display.

Returns:

Name	Type	Description
`str`	`str`	A formatted string representing the bootstrap distribution.

Source code in bbstat/evaluate.py

def __str__(self) -> str:
    """
    Returns a human-readable string representation of the bootstrap distribution.

    This method formats the mean and size of the bootstrap distribution for display.

    Returns:
        str: A formatted string representing the bootstrap distribution.
    """
    mean = self.summarize().mean
    size = len(self)
    return f"BootstrapDistribution({mean=:}, {size=:})"

`summarize(level=0.87)`

Returns a BootstrapSummary object.

This method is a wrapper for BootstrapSummary.from_estimates.

Parameters:

Name	Type	Description	Default
`level`	`float`	The desired level for the credible interval (must be between 0 and 1).	`0.87`

Returns:

Name	Type	Description
`BootstrapSummary`	`BootstrapSummary`	the summary object.

Raises:

Type	Description
`ValueError`	If the `level` is not between 0 and 1.

Source code in bbstat/evaluate.py

def summarize(self, level: float = 0.87) -> BootstrapSummary:
    """
    Returns a `BootstrapSummary` object.

    This method is a wrapper for `BootstrapSummary.from_estimates`.

    Args:
        level (float): The desired level for the credible interval
            (must be between 0 and 1).

    Returns:
        BootstrapSummary: the summary object.

    Raises:
        ValueError: If the `level` is not between 0 and 1.
    """
    return BootstrapSummary.from_estimates(self.estimates, level=level)

`BootstrapSummary` `dataclass`

A class representing the summary of a Bayesian bootstrap resampling procedure.

This class stores the mean, the credible interval, and level.

Attributes:

Name	Type	Description
`mean`	`float`	The mean of the bootstrap estimates.
`ci_low`	`float`	The lower bound of the credible interval.
`ci_high`	`float`	The upper bound of the credible interval.
`level`	`float`	The desired level for the credible interval (between 0 and 1).
`ci_width`	`float`	The width of the credible interval (property).

Methods:

Name	Description
`__post_init__`	Validates the `mean`, `ci_low`, `ci_high`, and `level` attributes.
`round`	Returns a new version of the summary with rounded values.
`from_estimates`	Creates a summary object from estimates.

Raises:

Type	Description
`ValueError`	If `mean`, `ci_low`, `ci_high`, or `level` are NaN.
`ValueError`	If the bounds are swapped, `ci_low > ci_high`.
`ValueError`	If `level` is not between 0 and 1 (exclusive).

Source code in bbstat/evaluate.py

@dataclass(frozen=True)
class BootstrapSummary:
    """
    A class representing the summary of a Bayesian bootstrap resampling procedure.

    This class stores the mean, the credible interval, and level.

    Attributes:
        mean (float): The mean of the bootstrap estimates.
        ci_low (float): The lower bound of the credible interval.
        ci_high (float): The upper bound of the credible interval.
        level (float): The desired level for the credible interval (between 0 and 1).
        ci_width (float): The width of the credible interval (property).

    Methods:
        __post_init__: Validates the `mean`, `ci_low`, `ci_high`, and `level` attributes.
        round: Returns a new version of the summary with rounded values.
        from_estimates: Creates a summary object from estimates.

    Raises:
        ValueError: If `mean`, `ci_low`, `ci_high`, or `level` are NaN.
        ValueError: If the bounds are swapped, `ci_low > ci_high`.
        ValueError: If `level` is not between 0 and 1 (exclusive).
    """

    mean: float
    ci_low: float
    ci_high: float
    level: float

    def __post_init__(self) -> None:
        """
        Post-initialization method to validate the `mean`, `ci_low`,
        `ci_high`, and `level` attributes.

        Raises:
            ValueError: If `mean`, `ci_low`, `ci_high`, or `level` are NaN.
            ValueError: If the bounds are swapped, `ci_low > ci_high`.
            ValueError: If `level` is not between 0 and 1 (exclusive).
        """
        if np.isnan(self.mean):
            raise ValueError("Invalid parameter mean: must not be NaN.")
        if np.isnan(self.ci_low):
            raise ValueError("Invalid parameter ci_low: must not be NaN.")
        if np.isnan(self.ci_high):
            raise ValueError("Invalid parameter ci_high: must not be NaN.")
        if np.isnan(self.level):
            raise ValueError("Invalid parameter level: must not be NaN.")
        if self.ci_low > self.ci_high:
            raise ValueError(
                f"Invalid parameters {self.ci_low=:} and {self.ci_high=:}: "
                "higher end is smaller than lower end."
            )
        if self.level <= 0 or self.level >= 1:
            raise ValueError(
                f"Invalid parameter {self.level=:}: must be within (0, 1)."
            )
        if self.mean < self.ci_low or self.mean > self.ci_high:
            raise ValueError(
                f"Invalid parameter {self.mean=:}: is outside the credible interval "
                f"{self.ci_low=:}, {self.ci_high}."
            )

    @property
    def ci_width(self) -> float:
        """Returns the width of the credible interval."""
        return self.ci_high - self.ci_low

    def round(self, precision: Optional[int] = None) -> "BootstrapSummary":
        """
        Returns a new version of the summary with rounded values.

        When `precision` is given, the mean and credible interval bounds are rounded
        to this number of digits. If `precision=None` (default), the precision is
        computed form the width of the credible interval.

        Args:
            precision (int, optional): The desired precision for rounding.

        Returns:
            BootstrapSummary: The summary of a Bayesian bootstrap procedure's result.
        """
        if precision is None:
            precision = get_precision_for_rounding(self.ci_width)
        return self.__class__(
            mean=round(self.mean, precision),
            ci_low=round(self.ci_low, precision),
            ci_high=round(self.ci_high, precision),
            level=self.level,
        )

    @classmethod
    def from_estimates(
        cls,
        estimates: FArray,
        *,
        level: float = 0.87,
    ) -> "BootstrapSummary":
        """
        Creates a summary object from estimates.

        This method computes the `mean` and credible interval bounds `ci_low` and
        `ci_high`, and creates a `BootstrapSummary` object.

        Args:
            estimates (FArray): The estimated values from a Bayesian bootstrap procedure.
            level (float): The desired level for the credible interval (between 0 and 1),
                default is 0.87.

        Returns:
            BootstrapSummary: The summary of a Bayesian bootstrap procedure's result.

        Raises:
            ValueError: If estimates is empty, not a 1D array, or contains NaN values.
            ValueError: If level is not between 0 and 1 (exclusive).
        """
        if estimates.ndim != 1:
            raise ValueError(f"Invalid parameter {estimates.ndim=}: must be 1.")
        if len(estimates) < 1:
            raise ValueError("Invalid parameter estimates: must not be empty.")
        if np.isnan(estimates).any():
            raise ValueError(
                "Invalid parameter estimates: must not contain NaN values."
            )
        mean = np.mean(estimates).item()
        ci_low, ci_high = compute_credible_interval(estimates=estimates, level=level)
        return cls(mean=mean, ci_low=ci_low, ci_high=ci_high, level=level)

`ci_width` `property`

Returns the width of the credible interval.

`__post_init__()`

Post-initialization method to validate the mean, ci_low, ci_high, and level attributes.

Raises:

Type	Description
`ValueError`	If `mean`, `ci_low`, `ci_high`, or `level` are NaN.
`ValueError`	If the bounds are swapped, `ci_low > ci_high`.
`ValueError`	If `level` is not between 0 and 1 (exclusive).

Source code in bbstat/evaluate.py

def __post_init__(self) -> None:
    """
    Post-initialization method to validate the `mean`, `ci_low`,
    `ci_high`, and `level` attributes.

    Raises:
        ValueError: If `mean`, `ci_low`, `ci_high`, or `level` are NaN.
        ValueError: If the bounds are swapped, `ci_low > ci_high`.
        ValueError: If `level` is not between 0 and 1 (exclusive).
    """
    if np.isnan(self.mean):
        raise ValueError("Invalid parameter mean: must not be NaN.")
    if np.isnan(self.ci_low):
        raise ValueError("Invalid parameter ci_low: must not be NaN.")
    if np.isnan(self.ci_high):
        raise ValueError("Invalid parameter ci_high: must not be NaN.")
    if np.isnan(self.level):
        raise ValueError("Invalid parameter level: must not be NaN.")
    if self.ci_low > self.ci_high:
        raise ValueError(
            f"Invalid parameters {self.ci_low=:} and {self.ci_high=:}: "
            "higher end is smaller than lower end."
        )
    if self.level <= 0 or self.level >= 1:
        raise ValueError(
            f"Invalid parameter {self.level=:}: must be within (0, 1)."
        )
    if self.mean < self.ci_low or self.mean > self.ci_high:
        raise ValueError(
            f"Invalid parameter {self.mean=:}: is outside the credible interval "
            f"{self.ci_low=:}, {self.ci_high}."
        )

`from_estimates(estimates, *, level=0.87)` `classmethod`

Creates a summary object from estimates.

This method computes the mean and credible interval bounds ci_low and ci_high, and creates a BootstrapSummary object.

Parameters:

Name	Type	Description	Default
`estimates`	`FArray`	The estimated values from a Bayesian bootstrap procedure.	required
`level`	`float`	The desired level for the credible interval (between 0 and 1), default is 0.87.	`0.87`

Returns:

Name	Type	Description
`BootstrapSummary`	`BootstrapSummary`	The summary of a Bayesian bootstrap procedure's result.

Raises:

Type	Description
`ValueError`	If estimates is empty, not a 1D array, or contains NaN values.
`ValueError`	If level is not between 0 and 1 (exclusive).

Source code in bbstat/evaluate.py

@classmethod
def from_estimates(
    cls,
    estimates: FArray,
    *,
    level: float = 0.87,
) -> "BootstrapSummary":
    """
    Creates a summary object from estimates.

    This method computes the `mean` and credible interval bounds `ci_low` and
    `ci_high`, and creates a `BootstrapSummary` object.

    Args:
        estimates (FArray): The estimated values from a Bayesian bootstrap procedure.
        level (float): The desired level for the credible interval (between 0 and 1),
            default is 0.87.

    Returns:
        BootstrapSummary: The summary of a Bayesian bootstrap procedure's result.

    Raises:
        ValueError: If estimates is empty, not a 1D array, or contains NaN values.
        ValueError: If level is not between 0 and 1 (exclusive).
    """
    if estimates.ndim != 1:
        raise ValueError(f"Invalid parameter {estimates.ndim=}: must be 1.")
    if len(estimates) < 1:
        raise ValueError("Invalid parameter estimates: must not be empty.")
    if np.isnan(estimates).any():
        raise ValueError(
            "Invalid parameter estimates: must not contain NaN values."
        )
    mean = np.mean(estimates).item()
    ci_low, ci_high = compute_credible_interval(estimates=estimates, level=level)
    return cls(mean=mean, ci_low=ci_low, ci_high=ci_high, level=level)

`round(precision=None)`

Returns a new version of the summary with rounded values.

When precision is given, the mean and credible interval bounds are rounded to this number of digits. If precision=None (default), the precision is computed form the width of the credible interval.

Parameters:

Name	Type	Description	Default
`precision`	`int`	The desired precision for rounding.	`None`

Returns:

Name	Type	Description
`BootstrapSummary`	`BootstrapSummary`	The summary of a Bayesian bootstrap procedure's result.

Source code in bbstat/evaluate.py

def round(self, precision: Optional[int] = None) -> "BootstrapSummary":
    """
    Returns a new version of the summary with rounded values.

    When `precision` is given, the mean and credible interval bounds are rounded
    to this number of digits. If `precision=None` (default), the precision is
    computed form the width of the credible interval.

    Args:
        precision (int, optional): The desired precision for rounding.

    Returns:
        BootstrapSummary: The summary of a Bayesian bootstrap procedure's result.
    """
    if precision is None:
        precision = get_precision_for_rounding(self.ci_width)
    return self.__class__(
        mean=round(self.mean, precision),
        ci_low=round(self.ci_low, precision),
        ci_high=round(self.ci_high, precision),
        level=self.level,
    )

`plot` Module

Plotting utility for bootstrap resampling results.

This module provides a function for visually interpreting and summarizing the output of Bayesian bootstrap resampling procedures.

Main Features

plot: Visualizes the result of a bootstrap resampling procedure.

Notes

The credible interval is calculated using quantiles of the empirical distribution of bootstrap estimates.
This module is designed to be used alongside the evaluate module to provide complete statistical summaries of resampled data.

`plot(bootstrap_distribution, level, *, ax=None, n_grid=200, label=None, precision=None)`

Plot the kernel density estimate (KDE) of bootstrap estimates with credible interval shading and a vertical line at the mean.

If an axis is provided, the plot is drawn on it; otherwise, a new figure and axis are created. Displays a shaded credible interval and labels the plot with a formatted mean and credible interval. If no axis is provided, the figure further is annotated with a title and ylabel, ylim[0] positioned at zero, the legend is set, and a tight layout applied.

Parameters:

Name	Type	Description	Default
`bootstrap_distribution`	`BootstrapDistribution`	The result of a bootstrap resampling procedure.	required
`level`	`float`	Credible interval level (e.g., 0.95 for 95% CI).	required
`ax`	`Axes`	Matplotlib axis to draw the plot on. If None, a new axis is created.	`None`
`n_grid`	`int`	Number of grid points to use for evaluating the KDE, default is 200.	`200`
`label`	`str`	Optional label for the line. If provided, the label is extended to include the mean and credible interval.	`None`
`precision`	`int or auto or None`	Optional precision for rounding the summary values (mean and credible interval). If None (default), no rounding is done; if "auto", the precision is computed from the width of the credible interval; if integer, we round to this many digits.	`None`

Returns:

Type	Description
`Axes`	plt.Axes: The axis object containing the plot.

Source code in bbstat/plot.py

def plot(
    bootstrap_distribution: BootstrapDistribution,
    level: float,
    *,
    ax: Optional[plt.Axes] = None,
    n_grid: int = 200,
    label: Optional[str] = None,
    precision: Optional[Union[int, Literal["auto"]]] = None,
) -> plt.Axes:
    """
    Plot the kernel density estimate (KDE) of bootstrap estimates with
    credible interval shading and a vertical line at the mean.

    If an axis is provided, the plot is drawn on it; otherwise, a new figure and axis are created.
    Displays a shaded credible interval and labels the plot with a formatted mean
    and credible interval. If no axis is provided, the figure further is annotated with a title and ylabel,
    ylim[0] positioned at zero, the legend is set, and a tight layout applied.

    Args:
        bootstrap_distribution (BootstrapDistribution): The result of a bootstrap resampling procedure.
        level (float): Credible interval level (e.g., 0.95 for 95% CI).
        ax (plt.Axes, optional): Matplotlib axis to draw the plot on. If None, a new axis is created.
        n_grid (int): Number of grid points to use for evaluating the KDE, default is 200.
        label (str, optional): Optional label for the line. If provided, the label is
            extended to include the mean and credible interval.
        precision (int or "auto" or None, optional): Optional precision for rounding the summary
            values (mean and credible interval). If None (default), no rounding is done; if "auto",
            the precision is computed from the width of the credible interval; if integer, we round to
            this many digits.

    Returns:
        plt.Axes: The axis object containing the plot.
    """
    if ax is None:
        fig, ax = plt.subplots(figsize=(8, 4.5))
    else:
        fig = None

    summary = bootstrap_distribution.summarize(level)

    if precision is not None:
        if precision == "auto":
            summary = summary.round()
        else:
            summary = summary.round(precision)

    param_str = f"{summary.mean} ({summary.ci_low}, {summary.ci_high})"

    if label is not None:
        param_str = f"{label}={param_str}"

    p = gaussian_kde(bootstrap_distribution.estimates)

    x_grid = np.linspace(
        bootstrap_distribution.estimates.min(), bootstrap_distribution.estimates.max(), n_grid
    )
    within_ci = np.logical_and(x_grid >= summary.ci_low, x_grid <= summary.ci_high)
    y_grid = p(x_grid)
    y_mean = p([summary.mean]).item()

    (line,) = ax.plot(x_grid, y_grid, label=param_str)
    color = line.get_color()

    ax.fill_between(
        x_grid[within_ci],
        0,
        y_grid[within_ci],
        facecolor=color,
        alpha=0.5,
    )
    ax.plot([summary.mean] * 2, [0, y_mean], "--", color=color)
    ax.plot([summary.mean], [y_mean], "o", color=color)

    if fig is not None:
        ax.set_title(
            f"Bayesian bootstrap  •  {len(bootstrap_distribution)} resamples, {level * 100:.0f}% CI"
        )
        ax.set_ylim(0, ax.get_ylim()[1])
        ax.set_ylabel("Distribution of estimates")
        ax.legend()
        fig.tight_layout()
    return ax

`registry` Module

Statistic function registry and protocol definition.

This module defines a strict Protocol (StatisticFunction) for all supported statistical aggregation functions used in the system. It also provides a typed mapping of statistic function names to their concrete implementations and a lookup function (get_statistic_fn) for retrieving them by name.

All registered functions are callable with specific combinations of arguments (e.g. data, weights, and optional parameters like ddof, factor, or sorter) depending on the computation type. Static typing ensures correct usage of each registered function.

`StatisticFunction`

Bases: Protocol

A protocol defining the interface for all statistical computation functions.

Each implementing function must take data and weights arrays and may accept additional keyword-only arguments depending on the computation type.

Overloads:

aggregate: accepts data: FArray, weights: FArray, and optional factor: float
mean, sum: accept only data: FArray, weights: FArray
variance, std: accept optional weighted_mean: float and ddof: int
quantile: requires quantile: float and optional sorter: IArray
percentile: requires percentile: float and optional sorter: IArray
median: accepts optional sorter
mutual_information: accepts data: IIArray and weights: FArray, and normalize: bool = True
pearson_dependency, spearman_dependency: take tuple of two float arrays (FFArray) and ddof
eta_square_dependency: takes tuple of and integer and a float array (IFArray)
entropy: accepts data: IFArray and weights: FArray
probability, self_information, log_odds: accepts data: IFArray, weights: FArray, and state: int

Source code in bbstat/registry.py

class StatisticFunction(Protocol):
    """
    A protocol defining the interface for all statistical computation functions.

    Each implementing function must take `data` and `weights` arrays and may
    accept additional keyword-only arguments depending on the computation type.

    Overloads:

    - `aggregate`: accepts `data: FArray`, `weights: FArray`, and optional `factor: float`
    - `mean`, `sum`: accept only `data: FArray`, `weights: FArray`
    - `variance`, `std`: accept optional `weighted_mean: float` and `ddof: int`
    - `quantile`: requires `quantile: float` and optional `sorter: IArray`
    - `percentile`: requires `percentile: float` and optional `sorter: IArray`
    - `median`: accepts optional `sorter`
    - `mutual_information`: accepts `data: IIArray` and `weights: FArray`, and `normalize: bool = True`
    - `pearson_dependency`, `spearman_dependency`: take tuple of two
      float arrays (`FFArray`) and `ddof`
    - `eta_square_dependency`: takes tuple of and integer and a float
      array (`IFArray`)
    - `entropy`: accepts `data: IFArray` and `weights: FArray`
    - `probability`, `self_information`, `log_odds`: accepts `data: IFArray`,
      `weights: FArray`, and `state: int`
    """

    # aggregate
    @overload
    def __call__(
        self,
        data: FArray,
        weights: FArray,
        *,
        factor: Optional[float],
    ) -> float: ...

    # mean, sum
    @overload
    def __call__(
        self,
        data: FArray,
        weights: FArray,
    ) -> float: ...

    # entropy
    @overload
    def __call__(
        self,
        data: IArray,
        weights: FArray,
    ) -> float: ...

    # mutual_information
    @overload
    def __call__(
        self,
        data: IIArray,
        weights: FArray,
        *,
        normalize: bool = True,
    ) -> float: ...

    # probability, log_odds, self_information
    @overload
    def __call__(
        self,
        data: IArray,
        weights: FArray,
        state: int,
    ) -> float: ...

    # variance, std
    @overload
    def __call__(
        self,
        data: FArray,
        weights: FArray,
        *,
        weighted_mean: Optional[float],
        ddof: int,
    ) -> float: ...

    # quantile
    @overload
    def __call__(
        self,
        data: FArray,
        weights: FArray,
        *,
        quantile: float,
        sorter: Optional[IArray],
    ) -> float: ...

    # percentile
    @overload
    def __call__(
        self,
        data: FArray,
        weights: FArray,
        *,
        percentile: float,
        sorter: Optional[IArray],
    ) -> float: ...

    # median
    @overload
    def __call__(
        self,
        data: FArray,
        weights: FArray,
        *,
        sorter: Optional[IArray],
    ) -> float: ...

    # pearson_dependency, spearman_dependency
    @overload
    def __call__(
        self,
        data: FFArray,
        weights: FArray,
        *,
        ddof: int,
    ) -> float: ...

    # eta_squared_dependency
    @overload
    def __call__(
        self,
        data: IFArray,
        weights: FArray,
    ) -> float: ...

`get_statistic_fn(name)`

Retrieve a registered statistic function by name.

Parameters:

Name	Type	Description	Default
`name`	`str`	The lowercase name of the statistic function to retrieve. Must be one of: - "aggregate" - "entropy" - "eta_square_dependency" - "log_odds" - "mean" - "median" - "mutual_information" - "pearson_dependency" - "percentile" - "probability" - "quantile" - "self_information" - "spearman_dependency" - "std" - "sum" - "variance"	required

Returns:

Name	Type	Description
`StatisticFunction`	`StatisticFunction`	The corresponding function implementation.

Raises:

Type	Description
`ValueError`	If the name does not correspond to a registered function.

Source code in bbstat/registry.py

def get_statistic_fn(name: str) -> StatisticFunction:
    """
    Retrieve a registered statistic function by name.

    Parameters:
        name (str): The lowercase name of the statistic function to retrieve.
            Must be one of:
            - "aggregate"
            - "entropy"
            - "eta_square_dependency"
            - "log_odds"
            - "mean"
            - "median"
            - "mutual_information"
            - "pearson_dependency"
            - "percentile"
            - "probability"
            - "quantile"
            - "self_information"
            - "spearman_dependency"
            - "std"
            - "sum"
            - "variance"

    Returns:
        StatisticFunction: The corresponding function implementation.

    Raises:
        ValueError: If the name does not correspond to a registered function.
    """
    try:
        return STATISTIC_FUNCTIONS[name.lower()]
    except KeyError:
        raise ValueError(
            f"Invalid {name=:}: choose from {list(STATISTIC_FUNCTIONS.keys())}"
        )

`get_statistic_fn_names()`

Retrieve the names of registered statistic functions.

Returns:

Type	Description
`Tuple[str, ...]`	Tuple[str, ...]: The names of the available statistic functions.

Source code in bbstat/registry.py

def get_statistic_fn_names() -> Tuple[str, ...]:
    """
    Retrieve the names of registered statistic functions.

    Returns:
        Tuple[str, ...]: The names of the available statistic functions.
    """
    return tuple(STATISTIC_FUNCTIONS.keys())

`resample` Module

Bootstrap resampling utilities using Dirichlet-distributed weights.

This module provides functionality for generating bootstrap resamples via the Bayesian bootstrap method, where resamples are weighted using samples from a Dirichlet distribution. It is intended for internal use within higher-level resampling and estimation workflows.

The function resample yields weighted resamples suitable for estimating statistics under uncertainty without making parametric assumptions.

Main Features

Dirichlet-based resampling for Bayesian bootstrap.
Support for blockwise resample generation to control memory usage.
Optional random seed for reproducibility.
Generator interface for efficient streaming of resample weights.

Example

from bbstat.resample import resample
for weights in resample(n_boot=1000, n_data=50):
    # Apply weights to compute statistic
    ...

Notes

The function is designed to scale to large numbers of resamples.
It is most useful as a low-level utility within a bootstrap framework.

See the resample function docstring for complete usage details.

`resample(n_boot, n_data, seed=None, blocksize=None)`

Generates bootstrap resamples with Dirichlet-distributed weights.

This function performs resampling by generating weights from a Dirichlet distribution. The number of resamples is controlled by the n_boot argument, while the size of each block of resamples can be adjusted using the blocksize argument. The seed argument allows for reproducible results.

Parameters:

Name	Type	Description	Default
`n_boot`	`int`	The total number of bootstrap resamples to generate.	required
`n_data`	`int`	The number of data points to resample (used for the dimension of the Dirichlet distribution).	required
`seed`	`int`	A random seed for reproducibility (default is `None` for random seeding).	`None`
`blocksize`	`int`	The number of resamples to generate in each block. If `None`, the entire number of resamples is generated in one block. Defaults to `None`.	`None`

Yields:

Type	Description
`FArray`	Generator[FArray, None, None]: A generator that yields each resample (a 1D array of floats) as it is generated. Each resample contains Dirichlet- distributed weights for the given `n_data`.

Example

for r in resample(n_boot=10, n_data=5):
    print(r)

Notes

If blocksize is specified, the resampling will be performed in smaller blocks, which can be useful for parallelizing or limiting memory usage.
The function uses NumPy's default_rng to generate random numbers, which provides a more flexible and efficient interface compared to np.random.seed.

Raises:

Type	Description
`ValueError`	If `n_boot` is less than 1 or `n_data` is less than 1.

Source code in bbstat/resample.py

def resample(
    n_boot: int,
    n_data: int,
    seed: Optional[int] = None,
    blocksize: Optional[int] = None,
) -> Generator[FArray, None, None]:
    """
    Generates bootstrap resamples with Dirichlet-distributed weights.

    This function performs resampling by generating weights from a Dirichlet distribution.
    The number of resamples is controlled by the `n_boot` argument, while the size of
    each block of resamples can be adjusted using the `blocksize` argument. The `seed`
    argument allows for reproducible results.

    Args:
        n_boot (int): The total number of bootstrap resamples to generate.
        n_data (int): The number of data points to resample (used for the dimension of the
            Dirichlet distribution).
        seed (int, optional): A random seed for reproducibility (default is `None` for
            random seeding).
        blocksize (int, optional): The number of resamples to generate in each block.
            If `None`, the entire number of resamples is generated in one block.
            Defaults to `None`.

    Yields:
        Generator[FArray, None, None]: A generator that yields each resample
            (a 1D array of floats) as it is generated. Each resample contains Dirichlet-
            distributed weights for the given `n_data`.

    Example:
        ```python
        for r in resample(n_boot=10, n_data=5):
            print(r)
        ```

    Notes:
        - If `blocksize` is specified, the resampling will be performed in smaller blocks,
          which can be useful for parallelizing or limiting memory usage.
        - The function uses NumPy's `default_rng` to generate random numbers, which provides
          a more flexible and efficient interface compared to `np.random.seed`.

    Raises:
        ValueError: If `n_boot` is less than 1 or `n_data` is less than 1.
    """
    if n_boot < 1:
        raise ValueError(f"Invalid parameter {n_boot=:}: must be positive.")
    if n_data < 1:
        raise ValueError(f"Invalid parameter {n_data=:}: must be positive.")
    rng = np.random.default_rng(seed)
    alpha = np.ones(n_data)
    if blocksize is None:
        blocksize = n_boot
    else:
        blocksize = min(max(1, blocksize), n_boot)
    remainder = n_boot
    while remainder > 0:
        size = min(blocksize, remainder)
        weights = rng.dirichlet(alpha=alpha, size=size)
        for w in weights:
            yield w
        remainder -= size

`statistics` Module

Type definitions and weighted statistical functions for use in bootstrap resampling and analysis.

This module defines types for statistical functions that operate on weighted data, particularly in the context of Bayesian bootstrap procedures. It provides a collection of pre-defined weighted statistics (e.g., mean, variance, quantile).

Main Features

Type aliases for data and weights.
A library of built-in weighted statistical functions (e.g., mean, std, quantile, etc.)

Type Aliases:

Name	Description
- `FArray`	Alias for `numpy.typing.NDArray[numpy.floating]`, used for floating-point data and weights.
- `IArray`	Alias for `numpy.typing.NDArray[numpy.integer]`, used for index arrays.
- `FFArray`, `IFArray`, `IIArray`	Tuples of data arrays used in bivariate computations.

Built-in Functions

"compute_weighted_aggregate": Weighted dot product, optionally scaled by a factor (internal use only).
"compute_weighted_entropy": Weighted entropy.
"compute_weighted_eta_square_dependency": Effect size for categorical-continuous variable relationships.
"compute_weighted_mean": Weighted arithmetic mean.
"compute_weighted_median": Weighted median.
"compute_weighted_mutual_information": Weighted mutual information.
"compute_weighted_pearson_dependency": Weighted Pearson correlation for two variables.
"compute_weighted_probability": Weighted probability of a state.
"compute_weighted_quantile" / "compute_weighted_percentile": Weighted quantile estimation.
"compute_weighted_spearman_dependency": Weighted Spearman correlation.
"compute_weighted_std": Weighted standard deviation.
"compute_weighted_sum": Weighted sum.
"compute_weighted_variance": Weighted variance with optional degrees of freedom correction.

Notes

All functions assume normalized weights (i.e., sum to 1).
Functions raise ValueError for invalid shapes, mismatched dimensions, or inappropriate input types.
This module is intended for use with bootstrap, which applies these functions across bootstrap resamples.

`FArray = NDArray[np.floating]` `module-attribute`

`FFArray = Tuple[FArray, FArray]` `module-attribute`

`IArray = NDArray[np.integer]` `module-attribute`

`IFArray = Tuple[IArray, FArray]` `module-attribute`

`compute_weighted_aggregate(data, weights, *, factor=None)`

Computes a weighted aggregate of the input data.

This function calculates the dot product of the input data and weights. The function assumes that both data and weights are 1D arrays of the same length and that the weights sum to 1. If a factor is provided, the dot product is multiplied with it.

Parameters:

Name	Type	Description	Default
`data`	`FArray`	A 1D array of numeric values representing the data to be aggregated.	required
`weights`	`FArray`	A 1D array of numeric values representing the weights for the data.	required
`factor`	`float`	A scalar factor to multiply with the computed aggregate (default is None).	`None`

Returns:

Name	Type	Description
`float`	`float`	The computed weighted aggregate, potentially scaled by the `factor`.

Raises:

Type	Description
`ValueError`	If `data` or `weights` are not 1D arrays.
`ValueError`	If the shapes of `data` and `weights` do not match.

Example

data = np.array([1.0, 2.0, 3.0])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_aggregate(data, weights))  # => 2.1
print(compute_weighted_aggregate(data, weights, factor=1.5))  # => 3.15

Notes

The weighted aggregate is computed using the dot product between data and weights. The optional factor scales the result of this dot product. If no factor is given, the aggregation computes the weighted arithmetic mean of the data; if instead the factor equals the length of the data array, the aggregation computes the weighted sum.

Source code in bbstat/statistics.py

def compute_weighted_aggregate(
    data: FArray,
    weights: FArray,
    *,
    factor: Optional[float] = None,
) -> float:
    """
    Computes a weighted aggregate of the input data.

    This function calculates the dot product of the input `data` and `weights`.
    The function assumes that both `data` and `weights` are 1D arrays of the same
    length and that the weights sum to 1. If a `factor` is provided, the dot product
    is multiplied with it.

    Args:
        data (FArray): A 1D array of numeric values representing the
            data to be aggregated.
        weights (FArray): A 1D array of numeric values representing
            the weights for the data.
        factor (float, optional): A scalar factor to multiply with the computed
            aggregate (default is None).

    Returns:
        float: The computed weighted aggregate, potentially scaled by the `factor`.

    Raises:
        ValueError: If `data` or `weights` are not 1D arrays.
        ValueError: If the shapes of `data` and `weights` do not match.

    Example:
        ```python
        data = np.array([1.0, 2.0, 3.0])
        weights = np.array([0.2, 0.5, 0.3])
        print(compute_weighted_aggregate(data, weights))  # => 2.1
        print(compute_weighted_aggregate(data, weights, factor=1.5))  # => 3.15
        ```

    Notes:
        The weighted aggregate is computed using the dot product between `data` and `weights`.
        The optional `factor` scales the result of this dot product. If no factor is given,
        the aggregation computes the weighted arithmetic mean of the data; if instead the factor
        equals the length of the data array, the aggregation computes the weighted sum.

    """
    validate_array(data=data, weights=weights)
    aggregate = np.dot(weights, data)
    if factor is not None:
        aggregate *= factor
    return aggregate.item()

`compute_weighted_entropy(data, weights)`

Computes a weighted entropy of 1D code data.

This function calculates the weighted entropy by first computing the weighted distribution, dropping the zero elements (contribute zero to the following sum), and computing the negative dot product between the distribution and log-distribution.

Parameters:

Name	Type	Description	Default
`data`	`IArray`	A 1D array of numeric values representing the sample data in code format.	required
`weights`	`FArray`	A 1D array of numeric weights corresponding to the data.	required

Returns:

Name	Type	Description
`float`	`float`	The weighted entropy value.

Raises:

Type	Description
`ValueError`	If `data` and `weights` have different shapes or are not 1D.

Example

data = np.array([1, 0, 0])
weights = np.array([0.4, 0.2, 0.4])
print(compute_weighted_entropy(data, weights))  # => 0.673...

Source code in bbstat/statistics.py

def compute_weighted_entropy(
    data: IArray,
    weights: FArray,
) -> float:
    """
    Computes a weighted entropy of 1D code data.

    This function calculates the weighted entropy by first computing
    the weighted distribution, dropping the zero elements (contribute
    zero to the following sum), and computing the negative dot product
    between the distribution and log-distribution.

    Args:
        data (IArray): A 1D array of numeric values representing
            the sample data in code format.
        weights (FArray): A 1D array of numeric weights corresponding
            to the data.

    Returns:
        float: The weighted entropy value.

    Raises:
        ValueError: If `data` and `weights` have different shapes or are not 1D.

    Example:
        ```python
        data = np.array([1, 0, 0])
        weights = np.array([0.4, 0.2, 0.4])
        print(compute_weighted_entropy(data, weights))  # => 0.673...
        ```
    """
    validate_array(data=data, weights=weights)
    active_set = get_active_set(data=data)
    distribution = weighted_discrete_distribution(data=active_set, weights=weights)
    distribution = distribution[distribution > 0.0]
    return -np.dot(distribution, np.log(distribution)).item()

`compute_weighted_eta_square_dependency(data, weights)`

Computes the weighted eta-squared (η²) statistic to assess dependency between a categorical and a numerical variable.

Eta-squared measures the proportion of total variance in the numerical variable that is explained by the categorical grouping. It is commonly used in ANOVA-like analyses and effect size estimation. The value is bounded between 0 and 1.

Parameters:

Name	Type	Description	Default
`data`	`IFArray`	A tuple `(data_cat, data_num)` where: - `data_cat` is a 1D array of integer-encoded categorical values. - `data_num` is a 1D array of corresponding numeric values.	required
`weights`	`FArray`	A 1D array of non-negative weights, same length as `data_cat`.	required

Returns:

Name	Type	Description
`float`	`float`	Weighted eta-squared value in the range [0, 1], where higher values
	`float`	indicate stronger association between the categorical and numeric variable.

Raises:

Type	Description
`ValueError`	If input arrays are not 1D or do not have matching shapes.

Example

data_cat = np.array([0, 0, 1, 1])
data_num = np.array([1.0, 2.0, 3.0, 4.0])
weights = np.array([0.25, 0.25, 0.25, 0.25)
print(compute_weighted_eta_square_dependency((data_cat, data_num), weights))  # => 0.8

Notes

Internally, η² is computed as the ratio of weighted between-group variance to the total weighted variance.
The statistic is sensitive to group sizes and imbalance in weights.
When all group means equal the global mean, η² is 0.
When groups are perfectly separated by the numeric variable, η² is 1.

Source code in bbstat/statistics.py

def compute_weighted_eta_square_dependency(
    data: IFArray,
    weights: FArray,
) -> float:
    """
    Computes the weighted eta-squared (η²) statistic to assess dependency between
    a categorical and a numerical variable.

    Eta-squared measures the proportion of total variance in the numerical variable
    that is explained by the categorical grouping. It is commonly used in ANOVA-like
    analyses and effect size estimation. The value is bounded between 0 and 1.

    Args:
        data (IFArray): A tuple `(data_cat, data_num)` where:
            - `data_cat` is a 1D array of integer-encoded categorical values.
            - `data_num` is a 1D array of corresponding numeric values.
        weights (FArray): A 1D array of non-negative weights,
            same length as `data_cat`.

    Returns:
        float: Weighted eta-squared value in the range [0, 1], where higher values
        indicate stronger association between the categorical and numeric variable.

    Raises:
        ValueError: If input arrays are not 1D or do not have matching shapes.

    Example:
        ```python
        data_cat = np.array([0, 0, 1, 1])
        data_num = np.array([1.0, 2.0, 3.0, 4.0])
        weights = np.array([0.25, 0.25, 0.25, 0.25)
        print(compute_weighted_eta_square_dependency((data_cat, data_num), weights))  # => 0.8
        ```

    Notes:
        - Internally, η² is computed as the ratio of weighted between-group variance
          to the total weighted variance.
        - The statistic is sensitive to group sizes and imbalance in weights.
        - When all group means equal the global mean, η² is 0.
        - When groups are perfectly separated by the numeric variable, η² is 1.
    """
    data_cat, data_num = data
    mean_sample = compute_weighted_mean(data=data_num, weights=weights)
    group_variance_sum = 0.0
    for value in np.unique(data_cat):
        subset = data_cat == value
        group_weight = weights[subset].sum().item()
        mean_group = compute_weighted_aggregate(
            data=data_num[subset],
            weights=weights[subset],
            factor=1.0 / group_weight,
        )
        group_variance_sum += group_weight * (mean_group - mean_sample) ** 2
    return group_variance_sum / compute_weighted_variance(
        data=data_num,
        weights=weights,
        ddof=0,
    )

`compute_weighted_log_odds(data, weights, state)`

Computes a weighted log-odds of a state within 1D code data.

This function calculates the weighted probability of a state, p(state) by summing the weights which coincide with data == state. The log-odds then computes as logarithm of the odds p(state) / (1 - p(state)).

Parameters:

Name	Type	Description	Default
`data`	`IArray`	A 1D array of numeric values representing the sample data in code format.	required
`weights`	`FArray`	A 1D array of numeric weights corresponding to the data.	required
`state`	`int`	The state for which we estimate the log-odds.	required

Returns:

Name	Type	Description
`float`	`float`	The weighted log-odds value.

Raises:

Type	Description
`ValueError`	If `data` and `weights` have different shapes or are not 1D, or if `state` is not in `data`.

Example

data = np.array([1, 0, 0])
weights = np.array([0.4, 0.2, 0.4])
print(compute_weighted_log_odds(data, weights, state=0))  # => 0.405...

Source code in bbstat/statistics.py

def compute_weighted_log_odds(
    data: IArray,
    weights: FArray,
    state: int,
) -> float:
    """
    Computes a weighted log-odds of a state within 1D code data.

    This function calculates the weighted probability of a `state`, `p(state)` by summing
    the `weights` which coincide with `data == state`. The log-odds then
    computes as logarithm of the odds `p(state) / (1 - p(state))`.

    Args:
        data (IArray): A 1D array of numeric values representing
            the sample data in code format.
        weights (FArray): A 1D array of numeric weights corresponding
            to the data.
        state (int): The state for which we estimate the log-odds.

    Returns:
        float: The weighted log-odds value.

    Raises:
        ValueError: If `data` and `weights` have different shapes or are not 1D,
            or if `state` is not in `data`.

    Example:
        ```python
        data = np.array([1, 0, 0])
        weights = np.array([0.4, 0.2, 0.4])
        print(compute_weighted_log_odds(data, weights, state=0))  # => 0.405...
        ```
    """
    probability = compute_weighted_probability(data=data, weights=weights, state=state)
    return math.log(probability / (1.0 - probability))

`compute_weighted_mean(data, weights)`

Computes a weighted mean of the input data.

This function calculates the weighted arithmetic mean of the input data and weights via the compute_weighted_aggregate function.

Parameters:

Name	Type	Description	Default
`data`	`FArray`	A 1D array of numeric values representing the data to be averaged.	required
`weights`	`FArray`	A 1D array of numeric values representing the weights for the data.	required

Returns:

Name	Type	Description
`float`	`float`	The computed weighted mean.

Raises:

Type	Description
`ValueError`	If `data` or `weights` are not 1D arrays.
`ValueError`	If the shapes of `data` and `weights` do not match.

Example

data = np.array([1.0, 2.0, 3.0])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_mean(data, weights))  # => 2.1

Source code in bbstat/statistics.py

def compute_weighted_mean(
    data: FArray,
    weights: FArray,
) -> float:
    """
    Computes a weighted mean of the input data.

    This function calculates the weighted arithmetic mean of the input `data`
    and `weights` via the `compute_weighted_aggregate` function.

    Args:
        data (FArray): A 1D array of numeric values representing
            the data to be averaged.
        weights (FArray): A 1D array of numeric values representing
            the weights for the data.

    Returns:
        float: The computed weighted mean.

    Raises:
        ValueError: If `data` or `weights` are not 1D arrays.
        ValueError: If the shapes of `data` and `weights` do not match.

    Example:
        ```python
        data = np.array([1.0, 2.0, 3.0])
        weights = np.array([0.2, 0.5, 0.3])
        print(compute_weighted_mean(data, weights))  # => 2.1
        ```
    """
    return compute_weighted_aggregate(data=data, weights=weights, factor=None)

`compute_weighted_median(data, weights, *, sorter=None)`

Computes a weighted median of 1D data using linear interpolation.

This function calculates the weighted meadian of the given data array based on the provided weights via compute_weighted_quantile with parameter quantile=0.5.

Parameters:

Name	Type	Description	Default
`data`	`FArray`	A 1D array of numeric values representing the sample data.	required
`weights`	`FArray`	A 1D array of numeric weights corresponding to the data.	required
`sorter`	`Optional[NDArray[integer]]`	Optional array of indices that sorts `data`.	`None`

Returns:

Name	Type	Description
`float`	`float`	The interpolated weighted median value.

Raises:

Type	Description
`ValueError`	If `data` and `weights` have different shapes or are not 1D.

Example

data = np.array([1.0, 2.0, 3.0])
weights = np.array([0.4, 0.2, 0.4])
print(compute_weighted_median(data, weights))  # => 2.25

Source code in bbstat/statistics.py

def compute_weighted_median(
    data: FArray,
    weights: FArray,
    *,
    sorter: Optional[NDArray[np.integer]] = None,
) -> float:
    """
    Computes a weighted median of 1D data using linear interpolation.

    This function calculates the weighted meadian of the given `data` array
    based on the provided `weights` via `compute_weighted_quantile` with parameter
    `quantile=0.5`.

    Args:
        data (FArray): A 1D array of numeric values representing
            the sample data.
        weights (FArray): A 1D array of numeric weights corresponding
            to the data.
        sorter (Optional[NDArray[np.integer]]): Optional array of indices that
            sorts `data`.

    Returns:
        float: The interpolated weighted median value.

    Raises:
        ValueError: If `data` and `weights` have different shapes or are not 1D.

    Example:
        ```python
        data = np.array([1.0, 2.0, 3.0])
        weights = np.array([0.4, 0.2, 0.4])
        print(compute_weighted_median(data, weights))  # => 2.25
        ```
    """
    return compute_weighted_quantile(
        data=data,
        weights=weights,
        quantile=0.5,
        sorter=sorter,
    )

`compute_weighted_mutual_information(data, weights, *, normalize=True)`

Computes the weighted mutual information (dependency) between two 1D arrays.

This function calculates the mutual information dependency between two integer variables using the direct mutual information formula on weighted joint distribution estimates. The inputs data_1 and data_2 are expected to be 1D arrays of the same length, provided as a tuple data. Each data point is assigned a weight from the weights array.

Parameters:

Name	Type	Description	Default
`data`	`IIArray`	A tuple of two 1D integer arrays `(data_1, data_2)` of equal length.	required
`weights`	`FArray`	A 1D float array of weights, same length as each array in `data`.	required
`normalize`	`bool`	Whether or not to normalize the mutual information to range [0, 1]. Default is True	`True`

Returns:

Name	Type	Description
`float`	`float`	The weighted mutual information.

Raises:

Type	Description
`ValueError`	If the input arrays are not 1D or have mismatched lengths.

Example

data_1 = np.array([0, 0, 1])
data_2 = np.array([0, 1, 1])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_mutual_information((data_1, data_2), weights))  # => 0.146...

Notes

The normalized result is bounded by [0, 1], where 0 indicates perfect independence and 1 indicates perfect dependence.
The unnormalized result is bounded between 0 (perfect independence) and min(H(data_0), H(data_1)), where H(data_0) and H(data_1) are the entropies of the two variables. If we reach the upper bound and H(data_0) == H(data_1), we have perfect dependence.

Source code in bbstat/statistics.py

def compute_weighted_mutual_information(
    data: IIArray,
    weights: FArray,
    *,
    normalize: bool = True,
) -> float:
    """
    Computes the weighted mutual information (dependency) between two 1D arrays.

    This function calculates the mutual information dependency between two integer variables
    using the direct mutual information formula on weighted joint distribution estimates.
    The inputs `data_1` and `data_2` are expected to be 1D arrays of the same length, provided
    as a tuple `data`. Each data point is assigned a weight from the `weights` array.

    Args:
        data (IIArray): A tuple of two 1D integer arrays `(data_1, data_2)`
            of equal length.
        weights (FArray): A 1D float array of weights, same length as
            each array in `data`.
        normalize (bool, optional): Whether or not to normalize the mutual information
            to range [0, 1]. Default is True

    Returns:
        float: The weighted mutual information.

    Raises:
        ValueError: If the input arrays are not 1D or have mismatched lengths.

    Example:
        ```python
        data_1 = np.array([0, 0, 1])
        data_2 = np.array([0, 1, 1])
        weights = np.array([0.2, 0.5, 0.3])
        print(compute_weighted_mutual_information((data_1, data_2), weights))  # => 0.146...
        ```

    Notes:
        - The normalized result is bounded by [0, 1], where 0 indicates perfect independence and
          1 indicates perfect dependence.
        - The unnormalized result is bounded between 0 (perfect independence) and
          `min(H(data_0), H(data_1))`, where `H(data_0)` and `H(data_1)` are the entropies of
          the two variables. If we reach the upper bound and `H(data_0) == H(data_1)`,
          we have perfect dependence.
    """
    validate_arrays(data=data, weights=weights)
    active_sets = (
        get_active_set(data[0]),
        get_active_set(data[1]),
    )
    joint_distribution = weighted_discrete_joint_distribution(
        data=active_sets,
        weights=weights,
    )
    distribution_0 = np.sum(joint_distribution, axis=1)
    distribution_1 = np.sum(joint_distribution, axis=0)
    divisor = np.outer(distribution_0, distribution_1).reshape(-1)
    joint_distribution = joint_distribution.reshape(-1)
    non_zero = joint_distribution > 0.0
    mutual_information = np.sum(
        joint_distribution[non_zero]
        * np.log(joint_distribution[non_zero] / divisor[non_zero])
    ).item()
    if mutual_information > 0 and normalize:
        mask = distribution_0 > 0.0
        entropy_0 = -np.dot(distribution_0[mask], np.log(distribution_0[mask])).item()
        mask = distribution_1 > 0.0
        entropy_1 = -np.dot(distribution_1[mask], np.log(distribution_1[mask])).item()
        mutual_information = 2.0 * mutual_information / (entropy_0 + entropy_1)
    return mutual_information

`compute_weighted_pearson_dependency(data, weights, *, ddof=0)`

Computes the weighted Pearson correlation coefficient (dependency) between two 1D arrays.

This function calculates the linear dependency between two variables using a weighted version of Pearson's correlation coefficient. The inputs data_1 and data_2 are expected to be 1D arrays of the same length, provided as a tuple data. Each data point is assigned a weight from the weights array.

The function normalizes both variables by subtracting their weighted means and dividing by their weighted standard deviations, then computes the weighted mean of the element-wise product of these normalized arrays.

Parameters:

Name	Type	Description	Default
`data`	`FArray`	A tuple of two 1D float arrays `(data_1, data_2)` of equal length.	required
`weights`	`FArray`	A 1D float array of weights, same length as each array in `data`.	required
`ddof`	`int`	Delta degrees of freedom for standard deviation. Defaults to 0 (population formula). Use 1 for sample-based correction.	`0`

Returns:

Name	Type	Description
`float`	`float`	The weighted Pearson correlation coefficient in the range [-1, 1].

Raises:

Type	Description
`ValueError`	If the input arrays are not 1D or have mismatched lengths.

Example

data_1 = np.array([1.0, 2.0, 3.0])
data_2 = np.array([1.0, 2.0, 2.9])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_pearson_dependency((data_1, data_2), weights))  # => 0.998...

Notes

The function relies on compute_weighted_mean and compute_weighted_std.
The correlation is computed using the formula: corr = weighted_mean(z1 * z2) where z1 and z2 are the standardized variables.
The result is bounded between -1 (perfect negative linear relationship) and 1 (perfect positive linear relationship), with 0 indicating no linear dependency.

Source code in bbstat/statistics.py

def compute_weighted_pearson_dependency(
    data: FFArray,
    weights: FArray,
    *,
    ddof: int = 0,
) -> float:
    """
    Computes the weighted Pearson correlation coefficient (dependency) between two 1D arrays.

    This function calculates the linear dependency between two variables using a weighted
    version of Pearson's correlation coefficient. The inputs `data_1` and `data_2` are
    expected to be 1D arrays of the same length, provided as a tuple `data`. Each data point
    is assigned a weight from the `weights` array.

    The function normalizes both variables by subtracting their weighted means and dividing
    by their weighted standard deviations, then computes the weighted mean of the element-wise
    product of these normalized arrays.

    Args:
        data (FArray): A tuple of two 1D float arrays `(data_1, data_2)`
            of equal length.
        weights (FArray): A 1D float array of weights, same length as
            each array in `data`.
        ddof (int, optional): Delta degrees of freedom for standard deviation.
            Defaults to 0 (population formula). Use 1 for sample-based correction.

    Returns:
        float: The weighted Pearson correlation coefficient in the range [-1, 1].

    Raises:
        ValueError: If the input arrays are not 1D or have mismatched lengths.

    Example:
        ```python
        data_1 = np.array([1.0, 2.0, 3.0])
        data_2 = np.array([1.0, 2.0, 2.9])
        weights = np.array([0.2, 0.5, 0.3])
        print(compute_weighted_pearson_dependency((data_1, data_2), weights))  # => 0.998...
        ```

    Notes:
        - The function relies on `compute_weighted_mean` and `compute_weighted_std`.
        - The correlation is computed using the formula:
            corr = weighted_mean(z1 * z2)
          where z1 and z2 are the standardized variables.
        - The result is bounded between -1 (perfect negative linear relationship)
          and 1 (perfect positive linear relationship), with 0 indicating no linear dependency.
    """
    data_1, data_2 = data
    weighted_mean_1 = compute_weighted_mean(data=data_1, weights=weights)
    weighted_mean_2 = compute_weighted_mean(data=data_2, weights=weights)
    weighted_std_1 = compute_weighted_std(
        data=data_1,
        weights=weights,
        weighted_mean=weighted_mean_1,
        ddof=ddof,
    )
    weighted_std_2 = compute_weighted_std(
        data=data_2,
        weights=weights,
        weighted_mean=weighted_mean_2,
        ddof=ddof,
    )
    array_1 = (data_1 - weighted_mean_1) / weighted_std_1
    array_2 = (data_2 - weighted_mean_2) / weighted_std_2
    return compute_weighted_mean(data=array_1 * array_2, weights=weights)

`compute_weighted_percentile(data, weights, *, percentile, sorter=None)`

Computes a weighted percentile of 1D data using linear interpolation.

This function calculates the weighted percentile of the given data array based on the provided weights via compute_weighted_quantile with parameter quantile=0.01 * percentile.

Parameters:

Name	Type	Description	Default
`data`	`FArray`	A 1D array of numeric values representing the sample data.	required
`weights`	`FArray`	A 1D array of numeric weights corresponding to the data.	required
`percentile`	`float`	The desired percentile in the interval [0, 100].	required
`sorter`	`Optional[NDArray[integer]]`	Optional array of indices that sorts `data`.	`None`

Returns:

Name	Type	Description
`float`	`float`	The interpolated weighted percentile value.

Raises:

Type	Description
`ValueError`	If `data` and `weights` have different shapes or are not 1D.

Example

data = np.array([1.0, 2.0, 3.0])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_percentile(data, weights, percentile=70))  # => 2.2

Source code in bbstat/statistics.py

def compute_weighted_percentile(
    data: FArray,
    weights: FArray,
    *,
    percentile: float,
    sorter: Optional[NDArray[np.integer]] = None,
) -> float:
    """
    Computes a weighted percentile of 1D data using linear interpolation.

    This function calculates the weighted percentile of the given `data` array
    based on the provided `weights` via `compute_weighted_quantile` with parameter
    `quantile=0.01 * percentile`.

    Args:
        data (FArray): A 1D array of numeric values representing
            the sample data.
        weights (FArray): A 1D array of numeric weights corresponding
            to the data.
        percentile (float): The desired percentile in the interval [0, 100].
        sorter (Optional[NDArray[np.integer]]): Optional array of indices that
            sorts `data`.

    Returns:
        float: The interpolated weighted percentile value.

    Raises:
        ValueError: If `data` and `weights` have different shapes or are not 1D.

    Example:
        ```python
        data = np.array([1.0, 2.0, 3.0])
        weights = np.array([0.2, 0.5, 0.3])
        print(compute_weighted_percentile(data, weights, percentile=70))  # => 2.2
        ```
    """
    return compute_weighted_quantile(
        data=data,
        weights=weights,
        quantile=0.01 * percentile,
        sorter=sorter,
    )

`compute_weighted_probability(data, weights, state)`

Computes a weighted probability of a state within 1D code data.

This function calculates the weighted probability of a state by summing the weights which coincide with data == state.

Parameters:

Name	Type	Description	Default
`data`	`IArray`	A 1D array of numeric values representing the sample data in code format.	required
`weights`	`FArray`	A 1D array of numeric weights corresponding to the data.	required
`state`	`int`	The state for which we estimate the probability.	required

Returns:

Name	Type	Description
`float`	`float`	The weighted probability value.

Raises:

Type	Description
`ValueError`	If `data` and `weights` have different shapes or are not 1D, or if `state` is not in `data`.

Example

data = np.array([1, 0, 0])
weights = np.array([0.4, 0.2, 0.4])
print(compute_weighted_probability(data, weights, state=0))  # => 0.6

Source code in bbstat/statistics.py

def compute_weighted_probability(
    data: IArray,
    weights: FArray,
    state: int,
) -> float:
    """
    Computes a weighted probability of a state within 1D code data.

    This function calculates the weighted probability of a `state` by summing
    the `weights` which coincide with `data == state`.

    Args:
        data (IArray): A 1D array of numeric values representing
            the sample data in code format.
        weights (FArray): A 1D array of numeric weights corresponding
            to the data.
        state (int): The state for which we estimate the probability.

    Returns:
        float: The weighted probability value.

    Raises:
        ValueError: If `data` and `weights` have different shapes or are not 1D,
            or if `state` is not in `data`.

    Example:
        ```python
        data = np.array([1, 0, 0])
        weights = np.array([0.4, 0.2, 0.4])
        print(compute_weighted_probability(data, weights, state=0))  # => 0.6
        ```
    """
    validate_array(data=data, weights=weights)
    if state not in data:
        raise ValueError(f"Incompatible parameter {state=:}: not included in data.")
    return np.sum(weights[data == state]).item()

`compute_weighted_quantile(data, weights, *, quantile, sorter=None)`

Computes a weighted quantile of 1D data using linear interpolation.

This function calculates the weighted quantile of the given data array based on the provided weights. It uses a normalized cumulative weight distribution to determine the interpolated quantile value. The computation assumes both data and weights are 1D arrays of equal length.

A precomputed sorter (array of indices that would sort data) can be optionally provided to avoid recomputing it internally.

Parameters:

Name	Type	Description	Default
`data`	`FArray`	A 1D array of numeric values representing the sample data.	required
`weights`	`FArray`	A 1D array of numeric weights corresponding to the data.	required
`quantile`	`float`	The desired quantile in the interval [0, 1].	required
`sorter`	`Optional[NDArray[integer]]`	Optional array of indices that sorts `data`.	`None`

Returns:

Name	Type	Description
`float`	`float`	The interpolated weighted quantile value.

Raises:

Type	Description
`ValueError`	If `data` and `weights` have different shapes or are not 1D.

Example

data = np.array([1.0, 2.0, 3.0])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_quantile(data, weights, quantile=0.7))  # => 2.2

Notes

If quantile is less than or equal to the minimum cumulative weight, the smallest data point is returned.
If quantile is greater than or equal to the maximum cumulative weight, the largest data point is returned.
Linear interpolation is used between the two closest surrounding data points.
Providing a precomputed sorter can optimize performance in repeated calls.

Source code in bbstat/statistics.py

def compute_weighted_quantile(
    data: FArray,
    weights: FArray,
    *,
    quantile: float,
    sorter: Optional[IArray] = None,
) -> float:
    """
    Computes a weighted quantile of 1D data using linear interpolation.

    This function calculates the weighted quantile of the given `data` array
    based on the provided `weights`. It uses a normalized cumulative weight
    distribution to determine the interpolated quantile value. The computation
    assumes both `data` and `weights` are 1D arrays of equal length.

    A precomputed `sorter` (array of indices that would sort `data`) can be
    optionally provided to avoid recomputing it internally.

    Args:
        data (FArray): A 1D array of numeric values representing
            the sample data.
        weights (FArray): A 1D array of numeric weights corresponding
            to the data.
        quantile (float): The desired quantile in the interval [0, 1].
        sorter (Optional[NDArray[np.integer]]): Optional array of indices that
            sorts `data`.

    Returns:
        float: The interpolated weighted quantile value.

    Raises:
        ValueError: If `data` and `weights` have different shapes or are not 1D.

    Example:
        ```python
        data = np.array([1.0, 2.0, 3.0])
        weights = np.array([0.2, 0.5, 0.3])
        print(compute_weighted_quantile(data, weights, quantile=0.7))  # => 2.2
        ```

    Notes:
        - If `quantile` is less than or equal to the minimum cumulative weight,
          the smallest data point is returned.
        - If `quantile` is greater than or equal to the maximum cumulative weight,
          the largest data point is returned.
        - Linear interpolation is used between the two closest surrounding data points.
        - Providing a precomputed `sorter` can optimize performance in repeated calls.
    """
    validate_array(data=data, weights=weights)
    if sorter is None:
        sorter = np.argsort(data)
    elif sorter.shape != data.shape:
        raise ValueError(f"Invalid parameter {sorter.shape}: must match {data.shape}.")
    cumulative_weights = np.cumsum(weights[sorter])
    cw_lo, cw_hi = cumulative_weights[[0, -1]]
    cumulative_weights -= cw_lo
    cumulative_weights /= cw_hi - cw_lo
    data = data[sorter]
    if quantile <= cumulative_weights[0]:
        return data[0]
    if quantile >= cumulative_weights[-1]:
        return data[-1]
    idx = np.searchsorted(cumulative_weights, quantile)
    w_lo, w_hi = cumulative_weights[[idx - 1, idx]]
    s_lo, s_hi = data[[idx - 1, idx]]
    w = (quantile - w_lo) / (w_hi - w_lo)
    return (s_lo + w * (s_hi - s_lo)).item()

`compute_weighted_self_information(data, weights, state)`

Computes a weighted self-information of a state within 1D code data.

This function calculates the weighted probability of a state by summing the weights which coincide with data == state. The self-information then computes as negative logarithm of the weighted probability.

Parameters:

Name	Type	Description	Default
`data`	`IArray`	A 1D array of numeric values representing the sample data in code format.	required
`weights`	`FArray`	A 1D array of numeric weights corresponding to the data.	required
`state`	`int`	The state for which we estimate the self-information.	required

Returns:

Name	Type	Description
`float`	`float`	The weighted self-information value.

Raises:

Type	Description
`ValueError`	If `data` and `weights` have different shapes or are not 1D, or if `state` is not in `data`.

Example

data = np.array([1, 0, 0])
weights = np.array([0.4, 0.2, 0.4])
print(compute_weighted_self_information(data, weights, state=0))  # => 0.510...

Source code in bbstat/statistics.py

def compute_weighted_self_information(
    data: IArray,
    weights: FArray,
    state: int,
) -> float:
    """
    Computes a weighted self-information of a state within 1D code data.

    This function calculates the weighted probability of a `state` by summing
    the `weights` which coincide with `data == state`. The self-information then
    computes as negative logarithm of the weighted probability.

    Args:
        data (IArray): A 1D array of numeric values representing
            the sample data in code format.
        weights (FArray): A 1D array of numeric weights corresponding
            to the data.
        state (int): The state for which we estimate the self-information.

    Returns:
        float: The weighted self-information value.

    Raises:
        ValueError: If `data` and `weights` have different shapes or are not 1D,
            or if `state` is not in `data`.

    Example:
        ```python
        data = np.array([1, 0, 0])
        weights = np.array([0.4, 0.2, 0.4])
        print(compute_weighted_self_information(data, weights, state=0))  # => 0.510...
        ```
    """
    probability = compute_weighted_probability(data=data, weights=weights, state=state)
    return -math.log(probability)

`compute_weighted_spearman_dependency(data, weights, *, ddof=0)`

Computes the weighted Spearman rank correlation coefficient between two 1D arrays.

This function measures the monotonic relationship between two variables by computing the weighted Pearson correlation between their ranked values (i.e., their order statistics). It is particularly useful for assessing non-linear relationships.

Parameters:

Name	Type	Description	Default
`data`	`FArray`	A tuple of two 1D float arrays `(data_1, data_2)` of equal length.	required
`weights`	`FArray`	A 1D float array of weights, same length as each array in `data`.	required
`ddof`	`int`	Delta degrees of freedom for standard deviation. Defaults to 0 (population formula). Use 1 for sample-based correction.	`0`

Returns:

Name	Type	Description
`float`	`float`	The weighted Spearman rank correlation coefficient in the range [-1, 1].

Raises:

Type	Description
`ValueError`	If the input arrays are not 1D or have mismatched lengths.

Example

data_1 = np.array([1.0, 2.0, 3.0])
data_2 = np.array([0.3, 0.2, 0.1])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_spearman_dependency((data_1, data_2), weights))  # => -0.9999...

Notes

Internally, ranks are computed using scipy.stats.rankdata, which handles ties by assigning average ranks.
The Spearman coefficient is equivalent to the Pearson correlation between rank-transformed data.
Output is bounded between -1 (perfect inverse monotonic relationship) and 1 (perfect direct monotonic relationship), with 0 indicating no monotonic correlation.
Weights are applied after ranking.

Source code in bbstat/statistics.py

def compute_weighted_spearman_dependency(
    data: FFArray,
    weights: FArray,
    *,
    ddof: int = 0,
) -> float:
    """
    Computes the weighted Spearman rank correlation coefficient between two 1D arrays.

    This function measures the monotonic relationship between two variables by computing
    the weighted Pearson correlation between their ranked values (i.e., their order statistics).
    It is particularly useful for assessing non-linear relationships.

    Args:
        data (FArray): A tuple of two 1D float arrays `(data_1, data_2)`
            of equal length.
        weights (FArray): A 1D float array of weights, same length as
            each array in `data`.
        ddof (int, optional): Delta degrees of freedom for standard deviation.
            Defaults to 0 (population formula). Use 1 for sample-based correction.

    Returns:
        float: The weighted Spearman rank correlation coefficient in the range [-1, 1].

    Raises:
        ValueError: If the input arrays are not 1D or have mismatched lengths.

    Example:
        ```python
        data_1 = np.array([1.0, 2.0, 3.0])
        data_2 = np.array([0.3, 0.2, 0.1])
        weights = np.array([0.2, 0.5, 0.3])
        print(compute_weighted_spearman_dependency((data_1, data_2), weights))  # => -0.9999...
        ```

    Notes:
        - Internally, ranks are computed using `scipy.stats.rankdata`, which handles ties
          by assigning average ranks.
        - The Spearman coefficient is equivalent to the Pearson correlation between
          rank-transformed data.
        - Output is bounded between -1 (perfect inverse monotonic relationship)
          and 1 (perfect direct monotonic relationship), with 0 indicating no
          monotonic correlation.
        - Weights are applied after ranking.
    """
    data_1, data_2 = data
    ranks_1 = cast(FArray, rankdata(data_1))
    ranks_2 = cast(FArray, rankdata(data_2))
    return compute_weighted_pearson_dependency(
        data=(ranks_1, ranks_2),
        weights=weights,
        ddof=ddof,
    )

`compute_weighted_std(data, weights, *, weighted_mean=None, ddof=0)`

Computes a weighted standard deviation of the input data.

This function calculates the weighted standard deviation of the input data and weights via the square root of the compute_weighted_variance function.

Parameters:

Name	Type	Description	Default
`data`	`FArray`	A 1D array of numeric values representing the data for which we want the standard deviation.	required
`weights`	`FArray`	A 1D array of numeric values representing the weights for the data.	required
`weighted_mean`	`float`	The weighted mean of the data (default is None). If missing, this value is computed via `compute_weighted_mean` via `compute_weighted_variance`.	`None`
`ddof`	`int`	Delta degrees of freedom. Defaults to 0 (population formula). Use 1 for sample-based correction.	`0`

Returns:

Name	Type	Description
`float`	`float`	The computed weighted standard deviation.

Raises:

Type	Description
`ValueError`	If `data` or `weights` are not 1D arrays.
`ValueError`	If the shapes of `data` and `weights` do not match.

Example

data = np.array([1.0, 2.0, 3.0])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_std(data, weights))  # => 0.7

Source code in bbstat/statistics.py

def compute_weighted_std(
    data: FArray,
    weights: FArray,
    *,
    weighted_mean: Optional[float] = None,
    ddof: int = 0,
) -> float:
    """
    Computes a weighted standard deviation of the input data.

    This function calculates the weighted standard deviation of the
    input `data` and `weights` via the square root of the
    `compute_weighted_variance` function.

    Args:
        data (FArray): A 1D array of numeric values representing
            the data for which we want the standard deviation.
        weights (FArray): A 1D array of numeric values representing
            the weights for the data.
        weighted_mean (float, optional): The weighted mean of the data (default is
            None). If missing, this value is computed via `compute_weighted_mean`
            via `compute_weighted_variance`.
        ddof (int, optional): Delta degrees of freedom.
            Defaults to 0 (population formula). Use 1 for sample-based correction.

    Returns:
        float: The computed weighted standard deviation.

    Raises:
        ValueError: If `data` or `weights` are not 1D arrays.
        ValueError: If the shapes of `data` and `weights` do not match.

    Example:
        ```python
        data = np.array([1.0, 2.0, 3.0])
        weights = np.array([0.2, 0.5, 0.3])
        print(compute_weighted_std(data, weights))  # => 0.7
        ```
    """
    weighted_variance = compute_weighted_variance(
        data=data,
        weights=weights,
        weighted_mean=weighted_mean,
        ddof=ddof,
    )
    return math.sqrt(weighted_variance)

`compute_weighted_sum(data, weights)`

Computes a weighted sum of the input data.

This function calculates the weighted sum of the input data and weights via the compute_weighted_aggregate function with factor=len(data).

Parameters:

Name	Type	Description	Default
`data`	`FArray`	A 1D array of numeric values representing the data to be summed.	required
`weights`	`FArray`	A 1D array of numeric values representing the weights for the data.	required

Returns:

Name	Type	Description
`float`	`float`	The computed weighted sum.

Raises:

Type	Description
`ValueError`	If `data` or `weights` are not 1D arrays.
`ValueError`	If the shapes of `data` and `weights` do not match.

Example

data = np.array([1.0, 2.0, 3.0])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_sum(data, weights))  # => 6.3

Source code in bbstat/statistics.py

def compute_weighted_sum(
    data: FArray,
    weights: FArray,
) -> float:
    """
    Computes a weighted sum of the input data.

    This function calculates the weighted sum of the input `data`
    and `weights` via the `compute_weighted_aggregate` function with
    `factor=len(data)`.

    Args:
        data (FArray): A 1D array of numeric values representing
            the data to be summed.
        weights (FArray): A 1D array of numeric values representing
            the weights for the data.

    Returns:
        float: The computed weighted sum.

    Raises:
        ValueError: If `data` or `weights` are not 1D arrays.
        ValueError: If the shapes of `data` and `weights` do not match.

    Example:
        ```python
        data = np.array([1.0, 2.0, 3.0])
        weights = np.array([0.2, 0.5, 0.3])
        print(compute_weighted_sum(data, weights))  # => 6.3
        ```
    """
    return compute_weighted_aggregate(data=data, weights=weights, factor=len(data))

`compute_weighted_variance(data, weights, *, weighted_mean=None, ddof=0)`

Computes a weighted variance of the input data.

This function calculates the weighted variance of the input data and weights via the compute_weighted_aggregate function with factor=len(data) / (len(data) - ddof), where ddof specifies the delta degrees of freedom.

Parameters:

Name	Type	Description	Default
`data`	`FArray`	A 1D array of numeric values representing the data for which we want the variance.	required
`weights`	`FArray`	A 1D array of numeric values representing the weights for the data.	required
`weighted_mean`	`float`	The weighted mean of the data (default is None). If missing, this value is computed via `compute_weighted_mean`.	`None`
`ddof`	`int`	Delta degrees of freedom. Defaults to 0 (population formula). Use 1 for sample-based correction.	`0`

Returns:

Name	Type	Description
`float`	`float`	The computed weighted variance.

Raises:

Type	Description
`ValueError`	If `data` or `weights` are not 1D arrays.
`ValueError`	If the shapes of `data` and `weights` do not match.

Example

data = np.array([1.0, 2.0, 3.0])
weights = np.array([0.2, 0.5, 0.3])
print(compute_weighted_variance(data, weights))  # => 0.49
print(compute_weighted_variance(data, weights, ddof=1))  # => 0.735

Source code in bbstat/statistics.py

def compute_weighted_variance(
    data: FArray,
    weights: FArray,
    *,
    weighted_mean: Optional[float] = None,
    ddof: int = 0,
) -> float:
    """
    Computes a weighted variance of the input data.

    This function calculates the weighted variance of the input `data`
    and `weights` via the `compute_weighted_aggregate` function with
    `factor=len(data) / (len(data) - ddof)`, where `ddof` specifies the
    delta degrees of freedom.

    Args:
        data (FArray): A 1D array of numeric values representing
            the data for which we want the variance.
        weights (FArray): A 1D array of numeric values representing
            the weights for the data.
        weighted_mean (float, optional): The weighted mean of the data (default is
            None). If missing, this value is computed via `compute_weighted_mean`.
        ddof (int, optional): Delta degrees of freedom.
            Defaults to 0 (population formula). Use 1 for sample-based correction.

    Returns:
        float: The computed weighted variance.

    Raises:
        ValueError: If `data` or `weights` are not 1D arrays.
        ValueError: If the shapes of `data` and `weights` do not match.

    Example:
        ```python
        data = np.array([1.0, 2.0, 3.0])
        weights = np.array([0.2, 0.5, 0.3])
        print(compute_weighted_variance(data, weights))  # => 0.49
        print(compute_weighted_variance(data, weights, ddof=1))  # => 0.735
        ```
    """
    if weighted_mean is None:
        weighted_mean = compute_weighted_mean(data=data, weights=weights)
    return compute_weighted_aggregate(
        data=np.power(data - weighted_mean, 2.0),
        weights=weights,
        factor=len(data) / (len(data) - ddof),
    )

`utils` Module

Utilities bootstrap-related tasks.

This module provides functions to aid interpretation and summarizing the output of Bayesian bootstrap resampling procedures. It includes tools to compute credible intervals for statistical estimates and gauging the appropriate precision for rounding mean and crebilility interval values from the width of the latter.

Main Features

compute_credible_interval: Computes a credible interval from a set of estimates.
get_precision_for_rounding: Gauges the precision for rounding from the width of the credible interval.

Notes

The credible interval is calculated using quantiles of the empirical distribution of bootstrap estimates.
This module is designed to be used alongside the evaluate module to provide complete statistical summaries of resampled data.

`compute_credible_interval(estimates, level=0.87)`

Compute the credible interval for a set of estimates.

This function calculates the credible interval of the given estimates array, which is a range of values that contains a specified proportion of the data, determined by the level parameter.

The credible interval is calculated by determining the quantiles at (1 - level) / 2 and 1 - (1 - level) / 2 of the sorted estimates data.

Parameters:

Name	Type	Description	Default
`estimates`	`FArray`	A 1D array of floating-point numbers representing the estimates from which the credible interval will be calculated.	required
`level`	`float`	The proportion of data to be included in the credible interval. Must be between 0 and 1 (exclusive). Default is 0.87.	`0.87`

Returns:

Type	Description
`Tuple[float, float]`	Tuple[float, float]: A tuple containing the lower and upper bounds of the credible interval, with the lower bound corresponding to the `(1 - level) / 2` quantile, and the upper bound corresponding to the `1 - (1 - level) / 2` quantile.

Raises:

Type	Description
`ValueError`	If `estimates` is not a 1D array or if `level` is not between 0 and 1 (exclusive).

Example

import numpy as np
estimates = np.array([1.1, 2.3, 3.5, 2.9, 4.0])
compute_credible_interval(estimates, 0.6)  # => (2.06, 3.6)

Source code in bbstat/utils.py

def compute_credible_interval(
    estimates: FArray,
    level: float = 0.87,
) -> Tuple[float, float]:
    """
    Compute the credible interval for a set of estimates.

    This function calculates the credible interval of the given `estimates` array,
    which is a range of values that contains a specified proportion of the data,
    determined by the `level` parameter.

    The credible interval is calculated by determining the quantiles at
    `(1 - level) / 2` and `1 - (1 - level) / 2` of the sorted `estimates` data.

    Args:
        estimates (FArray): A 1D array of floating-point numbers representing
            the estimates from which the credible interval will be calculated.
        level (float, optional): The proportion of data to be included in the credible
            interval. Must be between 0 and 1 (exclusive). Default is 0.87.

    Returns:
        Tuple[float, float]: A tuple containing the lower and upper bounds of the credible
            interval, with the lower bound corresponding to the `(1 - level) / 2` quantile,
            and the upper bound corresponding to the `1 - (1 - level) / 2` quantile.

    Raises:
        ValueError: If `estimates` is not a 1D array or if `level` is not between 0 and 1
            (exclusive).

    Example:
        ```python
        import numpy as np
        estimates = np.array([1.1, 2.3, 3.5, 2.9, 4.0])
        compute_credible_interval(estimates, 0.6)  # => (2.06, 3.6)
        ```
    """
    if estimates.ndim != 1:
        raise ValueError(f"Invalid parameter {estimates.ndim=:}: must be 1D array.")
    if level <= 0 or level >= 1:
        raise ValueError(f"Invalid parameter {level=:}: must be within (0, 1).")
    edge = (1.0 - level) / 2.0
    return tuple(np.quantile(estimates, [edge, 1.0 - edge]).tolist())

`get_precision_for_rounding(ci_width)`

Returns number of digits for rounding.

This method computes the precision (number of digits) for rounding mean and credible interval values for better readability. If the credible interval has width zero, we round to zero digits. Otherwise, we take one minus the floored order of magnitude of the width.

Parameters:

Name	Type	Description	Default
`ci_width`	`float`	The width of the credible interval.	required

Returns:

Name	Type	Description
`int`	`int`	The number of digits for rounding.

Raises:

Type	Description
`ValueError`	If `ci_width` is negative or NaN.

Source code in bbstat/utils.py

def get_precision_for_rounding(ci_width: float) -> int:
    """
    Returns number of digits for rounding.

    This method computes the precision (number of digits) for rounding mean and
    credible interval values for better readability. If the credible interval
    has width zero, we round to zero digits. Otherwise, we take one minus the floored
    order of magnitude of the width.

    Args:
        ci_width (float): The width of the credible interval.

    Returns:
        int: The number of digits for rounding.

    Raises:
        ValueError: If `ci_width` is negative or NaN.
    """
    if np.isnan(ci_width) or ci_width < 0.0:
        raise ValueError(f"Invalid parameter {ci_width=:}: must be non-negative.")
    if ci_width == 0:
        return 0
    return int(1 - math.floor(math.log10(ci_width)))

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search

API Reference

bbstat Package

bootstrap Module

bootstrap(data, statistic_fn, n_boot=1000, seed=None, blocksize=None, fn_kwargs=None)

evaluate Module

BootstrapDistribution dataclass

__len__()

__post_init__()

__str__()

summarize(level=0.87)

BootstrapSummary dataclass

ci_width property

__post_init__()

from_estimates(estimates, *, level=0.87) classmethod

round(precision=None)

plot Module

plot(bootstrap_distribution, level, *, ax=None, n_grid=200, label=None, precision=None)

registry Module

StatisticFunction

get_statistic_fn(name)

get_statistic_fn_names()

resample Module

resample(n_boot, n_data, seed=None, blocksize=None)

statistics Module

FArray = NDArray[np.floating] module-attribute

FFArray = Tuple[FArray, FArray] module-attribute

IArray = NDArray[np.integer] module-attribute

IFArray = Tuple[IArray, FArray] module-attribute

compute_weighted_aggregate(data, weights, *, factor=None)

compute_weighted_entropy(data, weights)

compute_weighted_eta_square_dependency(data, weights)

compute_weighted_log_odds(data, weights, state)

compute_weighted_mean(data, weights)

compute_weighted_median(data, weights, *, sorter=None)

compute_weighted_mutual_information(data, weights, *, normalize=True)

compute_weighted_pearson_dependency(data, weights, *, ddof=0)

compute_weighted_percentile(data, weights, *, percentile, sorter=None)

compute_weighted_probability(data, weights, state)

compute_weighted_quantile(data, weights, *, quantile, sorter=None)

compute_weighted_self_information(data, weights, state)

compute_weighted_spearman_dependency(data, weights, *, ddof=0)

compute_weighted_std(data, weights, *, weighted_mean=None, ddof=0)

compute_weighted_sum(data, weights)

compute_weighted_variance(data, weights, *, weighted_mean=None, ddof=0)

utils Module

compute_credible_interval(estimates, level=0.87)

get_precision_for_rounding(ci_width)

`bbstat` Package

`bootstrap` Module

`bootstrap(data, statistic_fn, n_boot=1000, seed=None, blocksize=None, fn_kwargs=None)`

`evaluate` Module

`BootstrapDistribution` `dataclass`

`len()`

`__post_init__()`

`str()`

`summarize(level=0.87)`

`BootstrapSummary` `dataclass`

`ci_width` `property`

`__post_init__()`

`from_estimates(estimates, *, level=0.87)` `classmethod`

`round(precision=None)`

`plot` Module

`plot(bootstrap_distribution, level, *, ax=None, n_grid=200, label=None, precision=None)`

`registry` Module

`StatisticFunction`

`get_statistic_fn(name)`

`get_statistic_fn_names()`

`resample` Module

`resample(n_boot, n_data, seed=None, blocksize=None)`

`statistics` Module

`FArray = NDArray[np.floating]` `module-attribute`

`FFArray = Tuple[FArray, FArray]` `module-attribute`

`IArray = NDArray[np.integer]` `module-attribute`

`IFArray = Tuple[IArray, FArray]` `module-attribute`

`compute_weighted_aggregate(data, weights, *, factor=None)`

`compute_weighted_entropy(data, weights)`

`compute_weighted_eta_square_dependency(data, weights)`

`compute_weighted_log_odds(data, weights, state)`

`compute_weighted_mean(data, weights)`

`compute_weighted_median(data, weights, *, sorter=None)`

`compute_weighted_mutual_information(data, weights, *, normalize=True)`

`compute_weighted_pearson_dependency(data, weights, *, ddof=0)`

`compute_weighted_percentile(data, weights, *, percentile, sorter=None)`

`compute_weighted_probability(data, weights, state)`

`compute_weighted_quantile(data, weights, *, quantile, sorter=None)`

`compute_weighted_self_information(data, weights, state)`

`compute_weighted_spearman_dependency(data, weights, *, ddof=0)`

`compute_weighted_std(data, weights, *, weighted_mean=None, ddof=0)`

`compute_weighted_sum(data, weights)`

`compute_weighted_variance(data, weights, *, weighted_mean=None, ddof=0)`

`utils` Module

`compute_credible_interval(estimates, level=0.87)`

`get_precision_for_rounding(ci_width)`