Kernel Mixture Network¶
Implementation of Kernel Mixture Network introduced in [AMB2017] with some extra features.
The approach combines unconditional kernel density estimation with a (softmax) neural network, obtaining a conditional kernel density estimator. Comparable to unconditional kernel density estimation, kernels are placed in each of the training samples or a subset of the samples. A neural network predicts the weights of the kernels based on the x (value to condition on) which it receives as an input. Overall the the conditional probability density function is modeled as follows:
This implementation uses Gaussian Kernels:
In addition to the approach described in the paper, the implementation has the following extensions:
Trainable scales/bandwiths: The scales of the Gaussian kernels can be either be fixed or jointly trained with the neural network weights. This property is controlled by the boolean train_scales in the constructor.
 Center Sampling Methods:
all: use all data points in the train set as kernel centers
random: randomly selects k points as kernel centers
k_means: uses kmeans clustering to determine k kernel centers
agglomorative: uses agglomorative clustering to determine k kernel centers

class
cde.density_estimator.
KernelMixtureNetwork
(name, ndim_x, ndim_y, center_sampling_method='k_means', n_centers=50, keep_edges=True, init_scales='default', hidden_sizes=(16, 16), hidden_nonlinearity=<function tanh>, train_scales=True, n_training_epochs=1000, x_noise_std=None, y_noise_std=None, entropy_reg_coef=0.0, weight_decay=0.0, weight_normalization=True, data_normalization=True, dropout=0.0, random_seed=None)[source]¶ Kernel Mixture Network Estimator
https://arxiv.org/abs/1705.07111
 Parameters
name – (str) name space of MDN (should be unique in code, otherwise tensorflow namespace collitions may arise)
ndim_x – (int) dimensionality of x variable
ndim_y – (int) dimensionality of y variable
center_sampling_method – String that describes the method to use for finding kernel centers. Allowed values [all, random, distance, k_means, agglomerative]
n_centers – Number of kernels to use in the output
keep_edges – Keep the extreme y values as center to keep expressiveness
init_scales – List or scalar that describes (initial) values of bandwidth parameter
train_scales – Boolean that describes whether or not to make the scales trainable
x_noise_std – (optional) standard deviation of Gaussian noise over the the training data X > regularization through noise. Adding noise is
deactivated during (automatically) –
y_noise_std – (optional) standard deviation of Gaussian noise over the the training data Y > regularization through noise
entropy_reg_coef – (optional) scalar float coefficient for shannon entropy penalty on the mixture component weight distribution
weight_decay – (float) the amount of decoupled (http://arxiv.org/abs/1711.05101) weight decay to apply
weight_normalization – boolean specifying whether weight normalization shall be used
data_normalization – (boolean) whether to normalize the data (X and Y) to exhibit zeromean and std
dropout – (float) the probability of switching off nodes during training
random_seed – (optional) seed (int) of the random number generators used

cdf
(X, Y)¶ Predicts the conditional cumulative probability p(Y<=yX=x). Requires the model to be fitted.
 Parameters
X – numpy array to be conditioned on  shape: (n_samples, n_dim_x)
Y – numpy array of y targets  shape: (n_samples, n_dim_y)
 Returns
conditional cumulative probability p(Y<=yX=x)  numpy array of shape (n_query_samples, )

conditional_value_at_risk
(x_cond, alpha=0.01, n_samples=10000000)¶ Computes the Conditional ValueatRisk (CVaR) / Expected Shortfall of a GMM. Only if ndim_y = 1
Based on formulas from section 2.3.2 in “Expected shortfall for distributions in finance”, Simon A. Broda, Marc S. Paolella, 2011
 Parameters
x_cond – different x values to condition on  numpy array of shape (n_values, ndim_x)
alpha – quantile percentage of the distribution
 Returns
CVaR values for each x to condition on  numpy array of shape (n_values)

covariance
(x_cond, n_samples=None)¶ Covariance of the fitted distribution conditioned on x_cond
 Parameters
x_cond – different x values to condition on  numpy array of shape (n_values, ndim_x)
 Returns
Covariances Cov[yx] corresponding to x_cond  numpy array of shape (n_values, ndim_y, ndim_y)

eval_by_cv
(X, Y, n_splits=5, verbose=True)¶ Fits the conditional density model with crossvalidation by using the score function of the BaseDensityEstimator for scoring the various splits.
 Parameters
X – numpy array to be conditioned on  shape: (n_samples, n_dim_x)
Y – numpy array of y targets  shape: (n_samples, n_dim_y)
n_splits – number of crossvalidation folds (positive integer)
verbose – the verbosity level

fit
(X, Y, eval_set=None, verbose=True)[source]¶ Fits the conditional density model with provided data
 Parameters
X – numpy array to be conditioned on  shape: (n_samples, n_dim_x)
Y – numpy array of y targets  shape: (n_samples, n_dim_y)
eval_set – (tuple) eval/test set  tuple (X_test, Y_test)
verbose – (boolean) controls the verbosity (console output)

fit_by_cv
(X, Y, n_folds=3, param_grid=None, random_state=None, verbose=True, n_jobs=1)¶ Fits the conditional density model with hyperparameter search and crossvalidation.
Determines the best hyperparameter configuration from a predefined set using crossvalidation. Thereby, the conditional loglikelihood is used for simulation_eval.
Fits the model with the previously selected hyperparameter configuration
 Parameters
X – numpy array to be conditioned on  shape: (n_samples, n_dim_x)
Y – numpy array of y targets  shape: (n_samples, n_dim_y)
n_folds – number of crossvalidation folds (positive integer)
param_grid –
(optional) a dictionary with the hyperparameters of the model as key and and a list of respective parametrizations as value. The hyperparameter search is performed over the cartesian product of the provided lists. Example:
{"n_centers": [20, 50, 100, 200], "center_sampling_method": ["agglomerative", "k_means", "random"], "keep_edges": [True, False] }
random_state – (int) seed used by the random number generator for shuffeling the data

get_configuration
(deep=True)¶ Get parameter configuration for this estimator.
 Parameters
deep – boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.
 Returns
params  mapping of string to any Parameter names mapped to their values.

get_params
(deep=True)¶ Get parameters for this estimator.
 Parameters
deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
 Returns
params – Parameter names mapped to their values.
 Return type
mapping of string to any

get_params_internal
(**tags)¶ Internal method to be implemented which does not perform caching

kurtosis
(x_cond, n_samples=1000000)¶ Kurtosis of the fitted distribution conditioned on x_cond
 Parameters
x_cond – different x values to condition on  numpy array of shape (n_values, ndim_x)
 Returns
Kurtosis Kurt[yx] corresponding to x_cond  numpy array of shape (n_values, ndim_y, ndim_y)

log_pdf
(X, Y)¶ Predicts the conditional logprobability log p(yx). Requires the model to be fitted.
 Parameters
X – numpy array to be conditioned on  shape: (n_samples, n_dim_x)
Y – numpy array of y targets  shape: (n_samples, n_dim_y)
 Returns
onditional logprobability log p(yx)  numpy array of shape (n_query_samples, )

mean_
(x_cond, n_samples=None)¶ Mean of the fitted distribution conditioned on x_cond :param x_cond: different x values to condition on  numpy array of shape (n_values, ndim_x)
 Returns
Means E[yx] corresponding to x_cond  numpy array of shape (n_values, ndim_y)

mean_std
(x_cond, n_samples=None)¶  Computes Mean and Covariance of the fitted distribution conditioned on x_cond.
Computationally more efficient than calling mean and covariance computatio separately
 Parameters
x_cond – different x values to condition on  numpy array of shape (n_values, ndim_x)
 Returns
Means E[yx] and Covariances Cov[yx]

pdf
(X, Y)¶ Predicts the conditional probability p(yx). Requires the model to be fitted.
 Parameters
X – numpy array to be conditioned on  shape: (n_samples, n_dim_x)
Y – numpy array of y targets  shape: (n_samples, n_dim_y)
 Returns
conditional probability p(yx)  numpy array of shape (n_query_samples, )

plot2d
(x_cond=[0, 1, 2], ylim=(8, 8), resolution=100, mode='pdf', show=True, prefix='', numpyfig=False)¶ Generates a 3d surface plot of the fitted conditional distribution if x and y are 1dimensional each
 Parameters
xlim – 2tuple specifying the x axis limits
ylim – 2tuple specifying the y axis limits
resolution – integer specifying the resolution of plot

plot3d
(xlim=(5, 5), ylim=(8, 8), resolution=100, show=False, numpyfig=False)¶ Generates a 3d surface plot of the fitted conditional distribution if x and y are 1dimensional each
 Parameters
xlim – 2tuple specifying the x axis limits
ylim – 2tuple specifying the y axis limits
resolution – integer specifying the resolution of plot

predict_density
(X, Y=None, resolution=100)¶ Computes conditional density p(yx) over a predefined grid of y target values
 Parameters
X – values/vectors to be conditioned on  shape: (n_instances, n_dim_x)
Y – (optional) y values to be evaluated from p(yx)  if not set, Y will be a grid with with specified resolution
resolution –
integer specifying the resolution of simulation_eval grid
 Returns: tuple (P, Y)
P  density p(yx)  shape (n_instances, resolution**n_dim_y)
Y  grid with with specified resolution  shape (resolution**n_dim_y, n_dim_y) or a copy of Y in case it was provided as argument

reset_fit
()¶ resets all tensorflow objects and :return:

sample
(X)¶ sample from the conditional mixture distributions  requires the model to be fitted
 Parameters
X – values to be conditioned on when sampling  numpy array of shape (n_instances, n_dim_x)
 Returns: tuple (X, Y)
X  the values to conditioned on that were provided as argument  numpy array of shape (n_samples, ndim_x)
Y  conditional samples from the model p(yx)  numpy array of shape (n_samples, ndim_y)

score
(X, Y)¶ Computes the mean conditional loglikelihood of the provided data (X, Y)
 Parameters
X – numpy array to be conditioned on  shape: (n_query_samples, n_dim_x)
Y – numpy array of y targets  shape: (n_query_samples, n_dim_y)
 Returns
average log likelihood of data

set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object. Returns
 Return type
self

skewness
(x_cond, n_samples=1000000)¶ Skewness of the fitted distribution conditioned on x_cond
 Parameters
x_cond – different x values to condition on  numpy array of shape (n_values, ndim_x)
 Returns
Skewness Skew[yx] corresponding to x_cond  numpy array of shape (n_values, ndim_y, ndim_y)

std_
(x_cond, n_samples=1000000)¶ Standard deviation of the fitted distribution conditioned on x_cond
 Parameters
x_cond – different x values to condition on  numpy array of shape (n_values, ndim_x)
 Returns
Standard deviations sqrt(Var[yx]) corresponding to x_cond  numpy array of shape (n_values, ndim_y)

tail_risk_measures
(x_cond, alpha=0.01, n_samples=10000000)¶ Computes the ValueatRisk (VaR) and Conditional ValueatRisk (CVaR)
 Parameters
x_cond – different x values to condition on  numpy array of shape (n_values, ndim_x)
alpha – quantile percentage of the distribution
n_samples – number of samples for monte carlo model_fitting
 Returns
VaR values for each x to condition on  numpy array of shape (n_values)
CVaR values for each x to condition on  numpy array of shape (n_values)

value_at_risk
(x_cond, alpha=0.01, n_samples=1000000)¶ Computes the ValueatRisk (VaR) of the fitted distribution. Only if ndim_y = 1
 Parameters
x_cond – different x values to condition on  numpy array of shape (n_values, ndim_x)
alpha – quantile percentage of the distribution
 Returns
VaR values for each x to condition on  numpy array of shape (n_values)
The core of the Kernel Mixture Network implementation is originally written by [VEG2017]. In addition to the original implementation of Jan van der Vegt and Alexander Backus we added support for mulivariate distributions p(yx) as well as automated hyperparameter search via crossvalidation.
 AMB2017
Luca Ambrogioni, Umut Güçlü, Marcel A. J. van Gerven, Eric Maris (2017). The Kernel Mixture Network: A Nonparametric Method for Conditional Density Estimation of Continuous Random Variables (https://arxiv.org/abs/1705.07111)
 VEG2017