Gaussian Mixture¶

Fit a and sample from a uni- bi- or multivariate Gaussian mixture model with diagonal covariance matrices. For the multivariate case the distribution is given by

\[G(X | \mu, \Sigma) = \frac{1}{\sqrt{2\pi\left|\Sigma\right|}} \exp^{(-\frac{1}{2} (X-\mu)^T\Sigma^{-1}(X-\mu))}\]

The mixture model is then composed of a linear combination of an arbitrary number of components \(K\):

\[p(X) = \sum_{k=1}^K \pi_k G(X|\mu_k, \Sigma_k).\]

Where \(\pi_k\) is the mixing coefficient for the \(k\)-th distribution. \(\mu\), \(\Sigma\) and \(\pi\) are estimated by Maximum-Likelihood for each \(k\). It is possible to specify the number of kernels to define the modality of the distribution and also dimensionality for both \(x\) and \(y\). The component means are initialized randomly according to given standard deviation. Also the weights are initialized randomly.

class cde.density_simulation.GaussianMixture(n_kernels=5, ndim_x=1, ndim_y=1, means_std=1.5, random_seed=None)[source]¶

This model allows to fit and sample from a uni- bi- or multivariate Gaussian mixture model with diagonal covariance matrices. The mixture model is composed by a linear combination of an arbitrary number of components n_kernels. Means, covariances and weights are estimated by Maximum-Likelihood for each component. It is possible to specify the number of kernels to define the modality of the distribution and also dimensionality for both x and y. The component means are initialized randomly according to given standard deviation. Also the weights are initialized randomly.

Parameters

n_kernels – number of mixture components
ndim_x – dimensionality of X / number of random variables in X
ndim_y – dimensionality of Y / number of random variables in Y
means_std – std. dev. when sampling the kernel means
random_seed – seed for the random_number generator

can_sample = None¶: set parameters, calculate weights, means and covariances

cdf(X, Y)[source]¶

conditional cumulative probability density function P(Y<y|X=x).: See “Conditional Gaussian Mixture Models for Environmental Risk Mapping” [Gilardi, Bengio] for the math.

Parameters

X – the position/conditional variable for the distribution P(Y<y|X=x), array_like, shape:(n_samples, ndim_x)
Y – the on X conditioned variable Y, array_like, shape:(n_samples, ndim_y)

Returns

(n_samples,)

Return type

the cond. cumulative distribution of Y given X, for the given realizations of X with shape

conditional_value_at_risk(x_cond, alpha=0.01, n_samples=1000000)¶

Computes the Conditional Value-at-Risk (CVaR) / Expected Shortfall of the fitted distribution. Only if ndim_y = 1

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)
alpha – quantile percentage of the distribution
n_samples – number of samples for monte carlo model_fitting

Returns

CVaR values for each x to condition on - numpy array of shape (n_values)

covariance(x_cond, n_samples=None)[source]¶

Covariance of the distribution conditioned on x_cond

Parameters: x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)
Returns: Covariances Cov[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

covariances_y = None¶: some eigenvalues of the sampled covariance matrices can be exactly zero -> map to positive semi-definite subspace

get_params(deep=True)¶

Get parameters for this estimator.

Parameters: deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params – Parameter names mapped to their values.
Return type: mapping of string to any

joint_pdf(X, Y)[source]¶

joint probability density function P(X, Y)

Parameters

X – variable X for the distribution P(X, Y), array_like, shape:(n_samples, ndim_x)
Y – variable Y for the distribution P(X, Y) array_like, shape:(n_samples, ndim_y)

Returns

(n_samples,)

Return type

the joint distribution of X and Y wih shape

kurtosis(x_cond, n_samples=1000000)¶

Kurtosis of the fitted distribution conditioned on x_cond

Parameters: x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)
Returns: Kurtosis Kurt[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

log_pdf(X, Y)¶

Conditional log-probability log p(y|x). Requires the model to be fitted.

Parameters

X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)
Y – numpy array of y targets - shape: (n_samples, n_dim_y)

Returns

conditional log-probability log p(y|x) - numpy array of shape (n_query_samples, )

mean_(x_cond, n_samples=None)[source]¶

Conditional mean of the distribution :param x_cond: different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns: Means E[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y)

means = None¶: Sample cov matrixes and assure that cov matrix is pos definite

pdf(X, Y)[source]¶

conditional probability density function P(Y|X): See “Conditional Gaussian Mixture Models for Environmental Risk Mapping” [Gilardi, Bengio] for the math.

Parameters

X – the position/conditional variable for the distribution P(Y|X), array_like, shape:(n_samples, ndim_x)
Y – the on X conditioned variable Y, array_like, shape:(n_samples, ndim_y)

Returns

(n_samples,)

Return type

the cond. distribution of Y given X, for the given realizations of X with shape

plot(xlim=(-5, 5), ylim=(-5, 5), resolution=100, mode='pdf', show=False, numpyfig=False)¶

Plots the distribution specified in mode if x and y are 1-dimensional each

Parameters

xlim – 2-tuple specifying the x axis limits
ylim – 2-tuple specifying the y axis limits
resolution – integer specifying the resolution of plot
mode – spefify which dist to plot [“pdf”, “cdf”, “joint_pdf”]

plot2d(x_cond=[0, 1, 2], ylim=(-8, 8), resolution=100, mode='pdf', show=True, prefix='', numpyfig=False)¶

Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each

Parameters

xlim – 2-tuple specifying the x axis limits
ylim – 2-tuple specifying the y axis limits
resolution – integer specifying the resolution of plot

plot3d(xlim=(-5, 5), ylim=(-8, 8), resolution=100, show=False, numpyfig=False)¶

Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each

Parameters

xlim – 2-tuple specifying the x axis limits
ylim – 2-tuple specifying the y axis limits
resolution – integer specifying the resolution of plot

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
Return type: self

simulate(n_samples=1000)[source]¶

Draws random samples from the unconditional distribution p(x,y)

Parameters: n_samples – (int) number of samples to be drawn from the conditional distribution
Returns: (X,Y) - random samples drawn from p(x,y) - numpy arrays of shape (n_samples, ndim_x) and (n_samples, ndim_y)

simulate_conditional(X)[source]¶

Draws random samples from the conditional distribution

Parameters: X – x to be conditioned on when drawing a sample from y ~ p(y|x) - numpy array of shape (n_samples, ndim_x)
Returns: Conditional random samples y drawn from p(y|x) - numpy array of shape (n_samples, ndim_y)

skewness(x_cond, n_samples=1000000)¶

Skewness of the fitted distribution conditioned on x_cond

Parameters: x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)
Returns: Skewness Skew[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

std_(x_cond, n_samples=1000000)¶

Standard deviation of the fitted distribution conditioned on x_cond

Parameters: x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)
Returns: Standard deviations sqrt(Var[y|x]) corresponding to x_cond - numpy array of shape (n_values, ndim_y)

tail_risk_measures(x_cond, alpha=0.01, n_samples=1000000)¶

Computes the Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR)

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)
alpha – quantile percentage of the distribution
n_samples – number of samples for monte carlo model_fitting

Returns

VaR values for each x to condition on - numpy array of shape (n_values)
CVaR values for each x to condition on - numpy array of shape (n_values)

value_at_risk(x_cond, alpha=0.01, n_samples=1000000)¶

Computes the Value-at-Risk (VaR) of the fitted distribution. Only if ndim_y = 1

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)
alpha – quantile percentage of the distribution
n_samples – number of samples for monte carlo model_fitting

Returns

VaR values for each x to condition on - numpy array of shape (n_values)