# Gaussian Mixture¶

Fit a and sample from a uni- bi- or multivariate Gaussian mixture model with diagonal covariance matrices. For the multivariate case the distribution is given by

$G(X | \mu, \Sigma) = \frac{1}{\sqrt{2\pi\left|\Sigma\right|}} \exp^{(-\frac{1}{2} (X-\mu)^T\Sigma^{-1}(X-\mu))}$

The mixture model is then composed of a linear combination of an arbitrary number of components $$K$$:

$p(X) = \sum_{k=1}^K \pi_k G(X|\mu_k, \Sigma_k).$

Where $$\pi_k$$ is the mixing coefficient for the $$k$$-th distribution. $$\mu$$, $$\Sigma$$ and $$\pi$$ are estimated by Maximum-Likelihood for each $$k$$. It is possible to specify the number of kernels to define the modality of the distribution and also dimensionality for both $$x$$ and $$y$$. The component means are initialized randomly according to given standard deviation. Also the weights are initialized randomly.

class cde.density_simulation.GaussianMixture(n_kernels=5, ndim_x=1, ndim_y=1, means_std=1.5, random_seed=None)[source]

This model allows to fit and sample from a uni- bi- or multivariate Gaussian mixture model with diagonal covariance matrices. The mixture model is composed by a linear combination of an arbitrary number of components n_kernels. Means, covariances and weights are estimated by Maximum-Likelihood for each component. It is possible to specify the number of kernels to define the modality of the distribution and also dimensionality for both x and y. The component means are initialized randomly according to given standard deviation. Also the weights are initialized randomly.

Parameters
• n_kernels – number of mixture components

• ndim_x – dimensionality of X / number of random variables in X

• ndim_y – dimensionality of Y / number of random variables in Y

• means_std – std. dev. when sampling the kernel means

• random_seed – seed for the random_number generator

can_sample = None

set parameters, calculate weights, means and covariances

cdf(X, Y)[source]
conditional cumulative probability density function P(Y<y|X=x).

See “Conditional Gaussian Mixture Models for Environmental Risk Mapping” [Gilardi, Bengio] for the math.

Parameters
• X – the position/conditional variable for the distribution P(Y<y|X=x), array_like, shape:(n_samples, ndim_x)

• Y – the on X conditioned variable Y, array_like, shape:(n_samples, ndim_y)

Returns

(n_samples,)

Return type

the cond. cumulative distribution of Y given X, for the given realizations of X with shape

conditional_value_at_risk(x_cond, alpha=0.01, n_samples=1000000)

Computes the Conditional Value-at-Risk (CVaR) / Expected Shortfall of the fitted distribution. Only if ndim_y = 1

Parameters
• x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

• alpha – quantile percentage of the distribution

• n_samples – number of samples for monte carlo model_fitting

Returns

CVaR values for each x to condition on - numpy array of shape (n_values)

covariance(x_cond, n_samples=None)[source]

Covariance of the distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Covariances Cov[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

covariances_y = None

some eigenvalues of the sampled covariance matrices can be exactly zero -> map to positive semi-definite subspace

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

joint_pdf(X, Y)[source]

joint probability density function P(X, Y)

Parameters
• X – variable X for the distribution P(X, Y), array_like, shape:(n_samples, ndim_x)

• Y – variable Y for the distribution P(X, Y) array_like, shape:(n_samples, ndim_y)

Returns

(n_samples,)

Return type

the joint distribution of X and Y wih shape

kurtosis(x_cond, n_samples=1000000)

Kurtosis of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Kurtosis Kurt[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

log_pdf(X, Y)

Conditional log-probability log p(y|x). Requires the model to be fitted.

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

Returns

conditional log-probability log p(y|x) - numpy array of shape (n_query_samples, )

mean_(x_cond, n_samples=None)[source]

Conditional mean of the distribution :param x_cond: different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Means E[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y)

means = None

Sample cov matrixes and assure that cov matrix is pos definite

pdf(X, Y)[source]
conditional probability density function P(Y|X)

See “Conditional Gaussian Mixture Models for Environmental Risk Mapping” [Gilardi, Bengio] for the math.

Parameters
• X – the position/conditional variable for the distribution P(Y|X), array_like, shape:(n_samples, ndim_x)

• Y – the on X conditioned variable Y, array_like, shape:(n_samples, ndim_y)

Returns

(n_samples,)

Return type

the cond. distribution of Y given X, for the given realizations of X with shape

plot(xlim=(-5, 5), ylim=(-5, 5), resolution=100, mode='pdf', show=False, numpyfig=False)

Plots the distribution specified in mode if x and y are 1-dimensional each

Parameters
• xlim – 2-tuple specifying the x axis limits

• ylim – 2-tuple specifying the y axis limits

• resolution – integer specifying the resolution of plot

• mode – spefify which dist to plot [“pdf”, “cdf”, “joint_pdf”]

plot2d(x_cond=[0, 1, 2], ylim=(-8, 8), resolution=100, mode='pdf', show=True, prefix='', numpyfig=False)

Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each

Parameters
• xlim – 2-tuple specifying the x axis limits

• ylim – 2-tuple specifying the y axis limits

• resolution – integer specifying the resolution of plot

plot3d(xlim=(-5, 5), ylim=(-8, 8), resolution=100, show=False, numpyfig=False)

Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each

Parameters
• xlim – 2-tuple specifying the x axis limits

• ylim – 2-tuple specifying the y axis limits

• resolution – integer specifying the resolution of plot

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

Return type

self

simulate(n_samples=1000)[source]

Draws random samples from the unconditional distribution p(x,y)

Parameters

n_samples – (int) number of samples to be drawn from the conditional distribution

Returns

(X,Y) - random samples drawn from p(x,y) - numpy arrays of shape (n_samples, ndim_x) and (n_samples, ndim_y)

simulate_conditional(X)[source]

Draws random samples from the conditional distribution

Parameters

X – x to be conditioned on when drawing a sample from y ~ p(y|x) - numpy array of shape (n_samples, ndim_x)

Returns

Conditional random samples y drawn from p(y|x) - numpy array of shape (n_samples, ndim_y)

skewness(x_cond, n_samples=1000000)

Skewness of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Skewness Skew[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

std_(x_cond, n_samples=1000000)

Standard deviation of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Standard deviations sqrt(Var[y|x]) corresponding to x_cond - numpy array of shape (n_values, ndim_y)

tail_risk_measures(x_cond, alpha=0.01, n_samples=1000000)

Computes the Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR)

Parameters
• x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

• alpha – quantile percentage of the distribution

• n_samples – number of samples for monte carlo model_fitting

Returns

• VaR values for each x to condition on - numpy array of shape (n_values)

• CVaR values for each x to condition on - numpy array of shape (n_values)

value_at_risk(x_cond, alpha=0.01, n_samples=1000000)

Computes the Value-at-Risk (VaR) of the fitted distribution. Only if ndim_y = 1

Parameters
• x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

• alpha – quantile percentage of the distribution

• n_samples – number of samples for monte carlo model_fitting

Returns

VaR values for each x to condition on - numpy array of shape (n_values)