# Kernel Mixture Network¶

Implementation of Kernel Mixture Network introduced in [AMB2017] with some extra features.

The approach combines unconditional kernel density estimation with a (softmax) neural network, obtaining a conditional kernel density estimator. Comparable to unconditional kernel density estimation, kernels are placed in each of the training samples or a subset of the samples. A neural network predicts the weights of the kernels based on the x (value to condition on) which it receives as an input. Overall the the conditional probability density function is modeled as follows:

$f(y|x) = \frac{1}{\sum_{p,j} w_{pj}(x; W)} \sum_{p,j} w_{pj}(x; W) \mathcal{K}_j(y,y^{(p)})$

This implementation uses Gaussian Kernels:

$\mathcal{K}(y,y';\sigma)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{\left\Vert y-y'\right\Vert^2}{2\sigma^2}}$

In addition to the approach described in the paper, the implementation has the following extensions:

• Trainable scales/bandwiths: The scales of the Gaussian kernels can be either be fixed or jointly trained with the neural network weights. This property is controlled by the boolean train_scales in the constructor.

• Center Sampling Methods:
• all: use all data points in the train set as kernel centers

• random: randomly selects k points as kernel centers

• k_means: uses k-means clustering to determine k kernel centers

• agglomorative: uses agglomorative clustering to determine k kernel centers

class cde.density_estimator.KernelMixtureNetwork(name, ndim_x, ndim_y, center_sampling_method='k_means', n_centers=50, keep_edges=True, init_scales='default', hidden_sizes=(16, 16), hidden_nonlinearity=<function tanh>, train_scales=True, n_training_epochs=1000, x_noise_std=None, y_noise_std=None, entropy_reg_coef=0.0, weight_decay=0.0, weight_normalization=True, data_normalization=True, dropout=0.0, random_seed=None)[source]

Kernel Mixture Network Estimator

https://arxiv.org/abs/1705.07111

Parameters
• name – (str) name space of MDN (should be unique in code, otherwise tensorflow namespace collitions may arise)

• ndim_x – (int) dimensionality of x variable

• ndim_y – (int) dimensionality of y variable

• center_sampling_method – String that describes the method to use for finding kernel centers. Allowed values [all, random, distance, k_means, agglomerative]

• n_centers – Number of kernels to use in the output

• keep_edges – Keep the extreme y values as center to keep expressiveness

• init_scales – List or scalar that describes (initial) values of bandwidth parameter

• train_scales – Boolean that describes whether or not to make the scales trainable

• x_noise_std – (optional) standard deviation of Gaussian noise over the the training data X -> regularization through noise. Adding noise is

• deactivated during (automatically) –

• y_noise_std – (optional) standard deviation of Gaussian noise over the the training data Y -> regularization through noise

• entropy_reg_coef – (optional) scalar float coefficient for shannon entropy penalty on the mixture component weight distribution

• weight_decay – (float) the amount of decoupled (http://arxiv.org/abs/1711.05101) weight decay to apply

• weight_normalization – boolean specifying whether weight normalization shall be used

• data_normalization – (boolean) whether to normalize the data (X and Y) to exhibit zero-mean and std

• dropout – (float) the probability of switching off nodes during training

• random_seed – (optional) seed (int) of the random number generators used

cdf(X, Y)

Predicts the conditional cumulative probability p(Y<=y|X=x). Requires the model to be fitted.

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

Returns

conditional cumulative probability p(Y<=y|X=x) - numpy array of shape (n_query_samples, )

conditional_value_at_risk(x_cond, alpha=0.01, n_samples=10000000)

Computes the Conditional Value-at-Risk (CVaR) / Expected Shortfall of a GMM. Only if ndim_y = 1

Based on formulas from section 2.3.2 in “Expected shortfall for distributions in finance”, Simon A. Broda, Marc S. Paolella, 2011

Parameters
• x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

• alpha – quantile percentage of the distribution

Returns

CVaR values for each x to condition on - numpy array of shape (n_values)

covariance(x_cond, n_samples=None)

Covariance of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Covariances Cov[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

eval_by_cv(X, Y, n_splits=5, verbose=True)

Fits the conditional density model with cross-validation by using the score function of the BaseDensityEstimator for scoring the various splits.

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

• n_splits – number of cross-validation folds (positive integer)

• verbose – the verbosity level

fit(X, Y, eval_set=None, verbose=True)[source]

Fits the conditional density model with provided data

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

• eval_set – (tuple) eval/test set - tuple (X_test, Y_test)

• verbose – (boolean) controls the verbosity (console output)

fit_by_cv(X, Y, n_folds=3, param_grid=None, random_state=None, verbose=True, n_jobs=-1)

Fits the conditional density model with hyperparameter search and cross-validation.

• Determines the best hyperparameter configuration from a pre-defined set using cross-validation. Thereby, the conditional log-likelihood is used for simulation_eval.

• Fits the model with the previously selected hyperparameter configuration

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

• n_folds – number of cross-validation folds (positive integer)

• param_grid

(optional) a dictionary with the hyperparameters of the model as key and and a list of respective parametrizations as value. The hyperparameter search is performed over the cartesian product of the provided lists. Example:

{"n_centers": [20, 50, 100, 200],
"center_sampling_method": ["agglomerative", "k_means", "random"],
"keep_edges": [True, False]
}


• random_state – (int) seed used by the random number generator for shuffeling the data

get_configuration(deep=True)

Get parameter configuration for this estimator.

Parameters

deep – boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params - mapping of string to any Parameter names mapped to their values.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

get_params_internal(**tags)

Internal method to be implemented which does not perform caching

kurtosis(x_cond, n_samples=1000000)

Kurtosis of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Kurtosis Kurt[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

log_pdf(X, Y)

Predicts the conditional log-probability log p(y|x). Requires the model to be fitted.

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

Returns

onditional log-probability log p(y|x) - numpy array of shape (n_query_samples, )

mean_(x_cond, n_samples=None)

Mean of the fitted distribution conditioned on x_cond :param x_cond: different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Means E[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y)

mean_std(x_cond, n_samples=None)
Computes Mean and Covariance of the fitted distribution conditioned on x_cond.

Computationally more efficient than calling mean and covariance computatio separately

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Means E[y|x] and Covariances Cov[y|x]

pdf(X, Y)

Predicts the conditional probability p(y|x). Requires the model to be fitted.

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

Returns

conditional probability p(y|x) - numpy array of shape (n_query_samples, )

plot2d(x_cond=[0, 1, 2], ylim=(-8, 8), resolution=100, mode='pdf', show=True, prefix='', numpyfig=False)

Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each

Parameters
• xlim – 2-tuple specifying the x axis limits

• ylim – 2-tuple specifying the y axis limits

• resolution – integer specifying the resolution of plot

plot3d(xlim=(-5, 5), ylim=(-8, 8), resolution=100, show=False, numpyfig=False)

Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each

Parameters
• xlim – 2-tuple specifying the x axis limits

• ylim – 2-tuple specifying the y axis limits

• resolution – integer specifying the resolution of plot

predict_density(X, Y=None, resolution=100)

Computes conditional density p(y|x) over a predefined grid of y target values

Parameters
• X – values/vectors to be conditioned on - shape: (n_instances, n_dim_x)

• Y – (optional) y values to be evaluated from p(y|x) - if not set, Y will be a grid with with specified resolution

• resolution

integer specifying the resolution of simulation_eval grid

Returns: tuple (P, Y)
• P - density p(y|x) - shape (n_instances, resolution**n_dim_y)

• Y - grid with with specified resolution - shape (resolution**n_dim_y, n_dim_y) or a copy of Y in case it was provided as argument

reset_fit()

resets all tensorflow objects and :return:

sample(X)

sample from the conditional mixture distributions - requires the model to be fitted

Parameters

X – values to be conditioned on when sampling - numpy array of shape (n_instances, n_dim_x)

Returns: tuple (X, Y)
• X - the values to conditioned on that were provided as argument - numpy array of shape (n_samples, ndim_x)

• Y - conditional samples from the model p(y|x) - numpy array of shape (n_samples, ndim_y)

score(X, Y)

Computes the mean conditional log-likelihood of the provided data (X, Y)

Parameters
• X – numpy array to be conditioned on - shape: (n_query_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_query_samples, n_dim_y)

Returns

average log likelihood of data

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

Return type

self

skewness(x_cond, n_samples=1000000)

Skewness of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Skewness Skew[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

std_(x_cond, n_samples=1000000)

Standard deviation of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Standard deviations sqrt(Var[y|x]) corresponding to x_cond - numpy array of shape (n_values, ndim_y)

tail_risk_measures(x_cond, alpha=0.01, n_samples=10000000)

Computes the Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR)

Parameters
• x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

• alpha – quantile percentage of the distribution

• n_samples – number of samples for monte carlo model_fitting

Returns

• VaR values for each x to condition on - numpy array of shape (n_values)

• CVaR values for each x to condition on - numpy array of shape (n_values)

value_at_risk(x_cond, alpha=0.01, n_samples=1000000)

Computes the Value-at-Risk (VaR) of the fitted distribution. Only if ndim_y = 1

Parameters
• x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

• alpha – quantile percentage of the distribution

Returns

VaR values for each x to condition on - numpy array of shape (n_values)

The core of the Kernel Mixture Network implementation is originally written by [VEG2017]. In addition to the original implementation of Jan van der Vegt and Alexander Backus we added support for mulivariate distributions p(y|x) as well as automated hyperparameter search via cross-validation.

AMB2017

Luca Ambrogioni, Umut Güçlü, Marcel A. J. van Gerven, Eric Maris (2017). The Kernel Mixture Network: A Nonparametric Method for Conditional Density Estimation of Continuous Random Variables (https://arxiv.org/abs/1705.07111)

VEG2017

https://github.com/janvdvegt/KernelMixtureNetwork