Gaussian Mixture

Fit a and sample from a uni- bi- or multivariate Gaussian mixture model with diagonal covariance matrices. For the multivariate case the distribution is given by

\[G(X | \mu, \Sigma) = \frac{1}{\sqrt{2\pi\left|\Sigma\right|}} \exp^{(-\frac{1}{2} (X-\mu)^T\Sigma^{-1}(X-\mu))}\]

The mixture model is then composed of a linear combination of an arbitrary number of components \(K\):

\[p(X) = \sum_{k=1}^K \pi_k G(X|\mu_k, \Sigma_k).\]

Where \(\pi_k\) is the mixing coefficient for the \(k\)-th distribution. \(\mu\), \(\Sigma\) and \(\pi\) are estimated by Maximum-Likelihood for each \(k\). It is possible to specify the number of kernels to define the modality of the distribution and also dimensionality for both \(x\) and \(y\). The component means are initialized randomly according to given standard deviation. Also the weights are initialized randomly.

class cde.density_simulation.GaussianMixture(n_kernels=5, ndim_x=1, ndim_y=1, means_std=1.5, random_seed=None)[source]

This model allows to fit and sample from a uni- bi- or multivariate Gaussian mixture model with diagonal covariance matrices. The mixture model is composed by a linear combination of an arbitrary number of components n_kernels. Means, covariances and weights are estimated by Maximum-Likelihood for each component. It is possible to specify the number of kernels to define the modality of the distribution and also dimensionality for both x and y. The component means are initialized randomly according to given standard deviation. Also the weights are initialized randomly.

Parameters
  • n_kernels – number of mixture components

  • ndim_x – dimensionality of X / number of random variables in X

  • ndim_y – dimensionality of Y / number of random variables in Y

  • means_std – std. dev. when sampling the kernel means

  • random_seed – seed for the random_number generator

can_sample = None

set parameters, calculate weights, means and covariances

cdf(X, Y)[source]
conditional cumulative probability density function P(Y<y|X=x).

See “Conditional Gaussian Mixture Models for Environmental Risk Mapping” [Gilardi, Bengio] for the math.

Parameters
  • X – the position/conditional variable for the distribution P(Y<y|X=x), array_like, shape:(n_samples, ndim_x)

  • Y – the on X conditioned variable Y, array_like, shape:(n_samples, ndim_y)

Returns

(n_samples,)

Return type

the cond. cumulative distribution of Y given X, for the given realizations of X with shape

conditional_value_at_risk(x_cond, alpha=0.01, n_samples=1000000)

Computes the Conditional Value-at-Risk (CVaR) / Expected Shortfall of the fitted distribution. Only if ndim_y = 1

Parameters
  • x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

  • alpha – quantile percentage of the distribution

  • n_samples – number of samples for monte carlo model_fitting

Returns

CVaR values for each x to condition on - numpy array of shape (n_values)

covariance(x_cond, n_samples=None)[source]

Covariance of the distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Covariances Cov[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

covariances_y = None

some eigenvalues of the sampled covariance matrices can be exactly zero -> map to positive semi-definite subspace

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

joint_pdf(X, Y)[source]

joint probability density function P(X, Y)

Parameters
  • X – variable X for the distribution P(X, Y), array_like, shape:(n_samples, ndim_x)

  • Y – variable Y for the distribution P(X, Y) array_like, shape:(n_samples, ndim_y)

Returns

(n_samples,)

Return type

the joint distribution of X and Y wih shape

kurtosis(x_cond, n_samples=1000000)

Kurtosis of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Kurtosis Kurt[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

log_pdf(X, Y)

Conditional log-probability log p(y|x). Requires the model to be fitted.

Parameters
  • X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

  • Y – numpy array of y targets - shape: (n_samples, n_dim_y)

Returns

conditional log-probability log p(y|x) - numpy array of shape (n_query_samples, )

mean_(x_cond, n_samples=None)[source]

Conditional mean of the distribution :param x_cond: different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Means E[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y)

means = None

Sample cov matrixes and assure that cov matrix is pos definite

pdf(X, Y)[source]
conditional probability density function P(Y|X)

See “Conditional Gaussian Mixture Models for Environmental Risk Mapping” [Gilardi, Bengio] for the math.

Parameters
  • X – the position/conditional variable for the distribution P(Y|X), array_like, shape:(n_samples, ndim_x)

  • Y – the on X conditioned variable Y, array_like, shape:(n_samples, ndim_y)

Returns

(n_samples,)

Return type

the cond. distribution of Y given X, for the given realizations of X with shape

plot(xlim=(-5, 5), ylim=(-5, 5), resolution=100, mode='pdf', show=False, numpyfig=False)

Plots the distribution specified in mode if x and y are 1-dimensional each

Parameters
  • xlim – 2-tuple specifying the x axis limits

  • ylim – 2-tuple specifying the y axis limits

  • resolution – integer specifying the resolution of plot

  • mode – spefify which dist to plot [“pdf”, “cdf”, “joint_pdf”]

plot2d(x_cond=[0, 1, 2], ylim=(-8, 8), resolution=100, mode='pdf', show=True, prefix='', numpyfig=False)

Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each

Parameters
  • xlim – 2-tuple specifying the x axis limits

  • ylim – 2-tuple specifying the y axis limits

  • resolution – integer specifying the resolution of plot

plot3d(xlim=(-5, 5), ylim=(-8, 8), resolution=100, show=False, numpyfig=False)

Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each

Parameters
  • xlim – 2-tuple specifying the x axis limits

  • ylim – 2-tuple specifying the y axis limits

  • resolution – integer specifying the resolution of plot

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

Return type

self

simulate(n_samples=1000)[source]

Draws random samples from the unconditional distribution p(x,y)

Parameters

n_samples – (int) number of samples to be drawn from the conditional distribution

Returns

(X,Y) - random samples drawn from p(x,y) - numpy arrays of shape (n_samples, ndim_x) and (n_samples, ndim_y)

simulate_conditional(X)[source]

Draws random samples from the conditional distribution

Parameters

X – x to be conditioned on when drawing a sample from y ~ p(y|x) - numpy array of shape (n_samples, ndim_x)

Returns

Conditional random samples y drawn from p(y|x) - numpy array of shape (n_samples, ndim_y)

skewness(x_cond, n_samples=1000000)

Skewness of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Skewness Skew[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

std_(x_cond, n_samples=1000000)

Standard deviation of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Standard deviations sqrt(Var[y|x]) corresponding to x_cond - numpy array of shape (n_values, ndim_y)

tail_risk_measures(x_cond, alpha=0.01, n_samples=1000000)

Computes the Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR)

Parameters
  • x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

  • alpha – quantile percentage of the distribution

  • n_samples – number of samples for monte carlo model_fitting

Returns

  • VaR values for each x to condition on - numpy array of shape (n_values)

  • CVaR values for each x to condition on - numpy array of shape (n_values)

value_at_risk(x_cond, alpha=0.01, n_samples=1000000)

Computes the Value-at-Risk (VaR) of the fitted distribution. Only if ndim_y = 1

Parameters
  • x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

  • alpha – quantile percentage of the distribution

  • n_samples – number of samples for monte carlo model_fitting

Returns

VaR values for each x to condition on - numpy array of shape (n_values)