Kernel Mixture Network

Implementation of Kernel Mixture Network introduced in [AMB2017] with some extra features.

The approach combines unconditional kernel density estimation with a (softmax) neural network, obtaining a conditional kernel density estimator. Comparable to unconditional kernel density estimation, kernels are placed in each of the training samples or a subset of the samples. A neural network predicts the weights of the kernels based on the x (value to condition on) which it receives as an input. Overall the the conditional probability density function is modeled as follows:

\[f(y|x) = \frac{1}{\sum_{p,j} w_{pj}(x; W)} \sum_{p,j} w_{pj}(x; W) \mathcal{K}_j(y,y^{(p)})\]

This implementation uses Gaussian Kernels:

\[\mathcal{K}(y,y';\sigma)=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{\left\Vert y-y'\right\Vert^2}{2\sigma^2}}\]

In addition to the approach described in the paper, the implementation has the following extensions:

  • Trainable scales/bandwiths: The scales of the Gaussian kernels can be either be fixed or jointly trained with the neural network weights. This property is controlled by the boolean train_scales in the constructor.

  • Center Sampling Methods:
    • all: use all data points in the train set as kernel centers

    • random: randomly selects k points as kernel centers

    • k_means: uses k-means clustering to determine k kernel centers

    • agglomorative: uses agglomorative clustering to determine k kernel centers

class cde.density_estimator.KernelMixtureNetwork(name, ndim_x, ndim_y, center_sampling_method='k_means', n_centers=50, keep_edges=True, init_scales='default', hidden_sizes=(16, 16), hidden_nonlinearity=<function tanh>, train_scales=True, n_training_epochs=1000, x_noise_std=None, y_noise_std=None, entropy_reg_coef=0.0, weight_decay=0.0, weight_normalization=True, data_normalization=True, dropout=0.0, random_seed=None)[source]

Kernel Mixture Network Estimator

  • name – (str) name space of MDN (should be unique in code, otherwise tensorflow namespace collitions may arise)

  • ndim_x – (int) dimensionality of x variable

  • ndim_y – (int) dimensionality of y variable

  • center_sampling_method – String that describes the method to use for finding kernel centers. Allowed values [all, random, distance, k_means, agglomerative]

  • n_centers – Number of kernels to use in the output

  • keep_edges – Keep the extreme y values as center to keep expressiveness

  • init_scales – List or scalar that describes (initial) values of bandwidth parameter

  • train_scales – Boolean that describes whether or not to make the scales trainable

  • x_noise_std – (optional) standard deviation of Gaussian noise over the the training data X -> regularization through noise. Adding noise is

  • deactivated during (automatically) –

  • y_noise_std – (optional) standard deviation of Gaussian noise over the the training data Y -> regularization through noise

  • entropy_reg_coef – (optional) scalar float coefficient for shannon entropy penalty on the mixture component weight distribution

  • weight_decay – (float) the amount of decoupled ( weight decay to apply

  • weight_normalization – boolean specifying whether weight normalization shall be used

  • data_normalization – (boolean) whether to normalize the data (X and Y) to exhibit zero-mean and std

  • dropout – (float) the probability of switching off nodes during training

  • random_seed – (optional) seed (int) of the random number generators used

cdf(X, Y)

Predicts the conditional cumulative probability p(Y<=y|X=x). Requires the model to be fitted.

  • X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

  • Y – numpy array of y targets - shape: (n_samples, n_dim_y)


conditional cumulative probability p(Y<=y|X=x) - numpy array of shape (n_query_samples, )

conditional_value_at_risk(x_cond, alpha=0.01, n_samples=10000000)

Computes the Conditional Value-at-Risk (CVaR) / Expected Shortfall of a GMM. Only if ndim_y = 1

Based on formulas from section 2.3.2 in “Expected shortfall for distributions in finance”, Simon A. Broda, Marc S. Paolella, 2011

  • x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

  • alpha – quantile percentage of the distribution


CVaR values for each x to condition on - numpy array of shape (n_values)

covariance(x_cond, n_samples=None)

Covariance of the fitted distribution conditioned on x_cond


x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)


Covariances Cov[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

eval_by_cv(X, Y, n_splits=5, verbose=True)

Fits the conditional density model with cross-validation by using the score function of the BaseDensityEstimator for scoring the various splits.

  • X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

  • Y – numpy array of y targets - shape: (n_samples, n_dim_y)

  • n_splits – number of cross-validation folds (positive integer)

  • verbose – the verbosity level

fit(X, Y, eval_set=None, verbose=True)[source]

Fits the conditional density model with provided data

  • X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

  • Y – numpy array of y targets - shape: (n_samples, n_dim_y)

  • eval_set – (tuple) eval/test set - tuple (X_test, Y_test)

  • verbose – (boolean) controls the verbosity (console output)

fit_by_cv(X, Y, n_folds=3, param_grid=None, random_state=None, verbose=True, n_jobs=-1)

Fits the conditional density model with hyperparameter search and cross-validation.

  • Determines the best hyperparameter configuration from a pre-defined set using cross-validation. Thereby, the conditional log-likelihood is used for simulation_eval.

  • Fits the model with the previously selected hyperparameter configuration

  • X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

  • Y – numpy array of y targets - shape: (n_samples, n_dim_y)

  • n_folds – number of cross-validation folds (positive integer)

  • param_grid

    (optional) a dictionary with the hyperparameters of the model as key and and a list of respective parametrizations as value. The hyperparameter search is performed over the cartesian product of the provided lists. Example:

    {"n_centers": [20, 50, 100, 200],
     "center_sampling_method": ["agglomerative", "k_means", "random"],
     "keep_edges": [True, False]

  • random_state – (int) seed used by the random number generator for shuffeling the data


Get parameter configuration for this estimator.


deep – boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.


params - mapping of string to any Parameter names mapped to their values.


Get parameters for this estimator.


deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.


params – Parameter names mapped to their values.

Return type

mapping of string to any


Internal method to be implemented which does not perform caching

kurtosis(x_cond, n_samples=1000000)

Kurtosis of the fitted distribution conditioned on x_cond


x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)


Kurtosis Kurt[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

log_pdf(X, Y)

Predicts the conditional log-probability log p(y|x). Requires the model to be fitted.

  • X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

  • Y – numpy array of y targets - shape: (n_samples, n_dim_y)


onditional log-probability log p(y|x) - numpy array of shape (n_query_samples, )

mean_(x_cond, n_samples=None)

Mean of the fitted distribution conditioned on x_cond :param x_cond: different x values to condition on - numpy array of shape (n_values, ndim_x)


Means E[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y)

mean_std(x_cond, n_samples=None)
Computes Mean and Covariance of the fitted distribution conditioned on x_cond.

Computationally more efficient than calling mean and covariance computatio separately


x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)


Means E[y|x] and Covariances Cov[y|x]

pdf(X, Y)

Predicts the conditional probability p(y|x). Requires the model to be fitted.

  • X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

  • Y – numpy array of y targets - shape: (n_samples, n_dim_y)


conditional probability p(y|x) - numpy array of shape (n_query_samples, )

plot2d(x_cond=[0, 1, 2], ylim=(-8, 8), resolution=100, mode='pdf', show=True, prefix='', numpyfig=False)

Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each

  • xlim – 2-tuple specifying the x axis limits

  • ylim – 2-tuple specifying the y axis limits

  • resolution – integer specifying the resolution of plot

plot3d(xlim=(-5, 5), ylim=(-8, 8), resolution=100, show=False, numpyfig=False)

Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each

  • xlim – 2-tuple specifying the x axis limits

  • ylim – 2-tuple specifying the y axis limits

  • resolution – integer specifying the resolution of plot

predict_density(X, Y=None, resolution=100)

Computes conditional density p(y|x) over a predefined grid of y target values

  • X – values/vectors to be conditioned on - shape: (n_instances, n_dim_x)

  • Y – (optional) y values to be evaluated from p(y|x) - if not set, Y will be a grid with with specified resolution

  • resolution

    integer specifying the resolution of simulation_eval grid

    Returns: tuple (P, Y)
    • P - density p(y|x) - shape (n_instances, resolution**n_dim_y)

    • Y - grid with with specified resolution - shape (resolution**n_dim_y, n_dim_y) or a copy of Y in case it was provided as argument


resets all tensorflow objects and :return:


sample from the conditional mixture distributions - requires the model to be fitted


X – values to be conditioned on when sampling - numpy array of shape (n_instances, n_dim_x)

Returns: tuple (X, Y)
  • X - the values to conditioned on that were provided as argument - numpy array of shape (n_samples, ndim_x)

  • Y - conditional samples from the model p(y|x) - numpy array of shape (n_samples, ndim_y)

score(X, Y)

Computes the mean conditional log-likelihood of the provided data (X, Y)

  • X – numpy array to be conditioned on - shape: (n_query_samples, n_dim_x)

  • Y – numpy array of y targets - shape: (n_query_samples, n_dim_y)


average log likelihood of data


Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.


Return type


skewness(x_cond, n_samples=1000000)

Skewness of the fitted distribution conditioned on x_cond


x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)


Skewness Skew[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

std_(x_cond, n_samples=1000000)

Standard deviation of the fitted distribution conditioned on x_cond


x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)


Standard deviations sqrt(Var[y|x]) corresponding to x_cond - numpy array of shape (n_values, ndim_y)

tail_risk_measures(x_cond, alpha=0.01, n_samples=10000000)

Computes the Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR)

  • x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

  • alpha – quantile percentage of the distribution

  • n_samples – number of samples for monte carlo model_fitting


  • VaR values for each x to condition on - numpy array of shape (n_values)

  • CVaR values for each x to condition on - numpy array of shape (n_values)

value_at_risk(x_cond, alpha=0.01, n_samples=1000000)

Computes the Value-at-Risk (VaR) of the fitted distribution. Only if ndim_y = 1

  • x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

  • alpha – quantile percentage of the distribution


VaR values for each x to condition on - numpy array of shape (n_values)

The core of the Kernel Mixture Network implementation is originally written by [VEG2017]. In addition to the original implementation of Jan van der Vegt and Alexander Backus we added support for mulivariate distributions p(y|x) as well as automated hyperparameter search via cross-validation.


Luca Ambrogioni, Umut Güçlü, Marcel A. J. van Gerven, Eric Maris (2017). The Kernel Mixture Network: A Nonparametric Method for Conditional Density Estimation of Continuous Random Variables (
