# Neighborhood Kernel Density Estimation¶

For estimating the conditional density $$p(y|x)$$, $$\epsilon$$-neighbor kernel density estimation ($$\epsilon$$-KDE) employs standard kernel density estimation in a local $$\epsilon$$-neighborhood around a query point $$(x,y)$$.

$$\epsilon$$-KDE is a lazy learner, meaning that it simply stores the training points $$\{(x_i,y_i)\}_{i=1}^n$$ and puts a kernel function in each of the points. In order to compute $$p(y|x)$$, the estimator only considers a local subset of the training samples $$\{(x_i, y_i)\}_{i \in \mathcal{I}_{x, \epsilon}}$$, where $$\mathcal{I}_{x, \epsilon}$$ is the set of sample indices such that $$||x_i - x|| \leq \epsilon$$.

In case of Gaussian Kernels, the estimated density can be expressed as

$p(y|x) = \sum_{j \in \mathcal{I}_{x, \epsilon}} w_j ~ N(y~| y_j, \sigma^2 I)$

where $$w_j$$ is the weighting of the j-th kernel and $$N(y~|\mu,\Sigma)$$ the probability function of a multivariate Gaussian. This implementation currently supports two types of weighting:

• equal weights: $$w_j = \frac{1}{|\mathcal{I}_{x, \epsilon}|}$$

• weights $$w_j$$ proportional to $$||x_j - x||_2$$, the euclidean distance w.r.t. to x

class cde.density_estimator.NeighborKernelDensityEstimation(name='NKDE', ndim_x=None, ndim_y=None, epsilon=0.4, bandwidth=0.6, param_selection='normal_reference', weighted=True, n_jobs=-1, random_seed=None)[source]

Epsilon-Neighbor Kernel Density Estimation (lazy learner) with Gaussian Kernels

Parameters
• name – (str) name / identifier of estimator

• ndim_x – (int) dimensionality of x variable

• ndim_y – (int) dimensionality of y variable

• epsilon – size of the (normalized) neighborhood region

• bandwidth – (float of array_like) initial bandwidth parameter

• param_selection – parameter selection method. Must be - None or False: use the provided epsilon and bandwidth - normal_reference: bandwidths are chosen according to normal reference distribution - cv_ml: select bandwidth and epsilon via maximum likelihood leave-one-out cross-validation

• weighted – if true - the neighborhood Gaussians are weighted according to their distance to the query point, if false - all neighborhood Gaussians are weighted equally

• random_seed – (optional) seed (int) of the random number generators used

conditional_value_at_risk(x_cond, alpha=0.01, n_samples=1000000)

Computes the Conditional Value-at-Risk (CVaR) / Expected Shortfall of the fitted distribution. Only if ndim_y = 1

Parameters
• x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

• alpha – quantile percentage of the distribution

Returns

CVaR values for each x to condition on - numpy array of shape (n_values)

covariance(x_cond, n_samples=1000000)

Covariance of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Covariances Cov[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

eval_by_cv(X, Y, n_splits=5, verbose=True)

Fits the conditional density model with cross-validation by using the score function of the BaseDensityEstimator for scoring the various splits.

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

• n_splits – number of cross-validation folds (positive integer)

• verbose – the verbosity level

fit(X, Y, **kwargs)[source]

Since NKDE is a lazy learner, fit just stores the provided training data (X,Y)

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

fit_by_cv(X, Y, n_folds=3, param_grid=None, verbose=True, n_jobs=-1)

Fits the conditional density model with hyperparameter search and cross-validation. - Determines the best hyperparameter configuration from a pre-defined set using cross-validation. Thereby,

the conditional log-likelihood is used for simulation_eval.

• Fits the model with the previously selected hyperparameter configuration

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

• n_folds – number of cross-validation folds (positive integer)

• param_grid

(optional) a dictionary with the hyperparameters of the model as key and and a list of respective parametrizations as value. The hyperparameter search is performed over the cartesian product of the provided lists. Example: {“n_centers”: [20, 50, 100, 200],

”center_sampling_method”: [“agglomerative”, “k_means”, “random”], “keep_edges”: [True, False]

}

get_configuration(deep=True)

Get parameter configuration for this estimator.

Parameters

deep – boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params - mapping of string to any Parameter names mapped to their values.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

kurtosis(x_cond, n_samples=1000000)

Kurtosis of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Kurtosis Kurt[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

log_pdf(X, Y)[source]

Predicts the conditional log-probability log p(y|x). Requires the model to be fitted.

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

Returns

conditional log-probability log p(y|x) - numpy array of shape (n_query_samples, )

loo_likelihood(bandwidth, epsilon)[source]

calculates the negative leave-one-out log-likelihood of the training data

Parameters
• bw – bandwidth parameter

• epsilon – size of the (normalized) neighborhood region

mean_(x_cond, n_samples=1000000)

Mean of the fitted distribution conditioned on x_cond :param x_cond: different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Means E[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y)

mean_std(x_cond, n_samples=1000000)
Computes Mean and Covariance of the fitted distribution conditioned on x_cond.

Computationally more efficient than calling mean and covariance computatio separately

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Means E[y|x] and Covariances Cov[y|x]

pdf(X, Y)[source]

Predicts the conditional probability density p(y|x). Requires the model to be fitted.

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

Returns

conditional probability p(y|x) - numpy array of shape (n_query_samples, )

plot2d(x_cond=[0, 1, 2], ylim=(-8, 8), resolution=100, mode='pdf', show=True, prefix='', numpyfig=False)

Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each

Parameters
• xlim – 2-tuple specifying the x axis limits

• ylim – 2-tuple specifying the y axis limits

• resolution – integer specifying the resolution of plot

plot3d(xlim=(-5, 5), ylim=(-8, 8), resolution=100, show=False, numpyfig=False)

Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each

Parameters
• xlim – 2-tuple specifying the x axis limits

• ylim – 2-tuple specifying the y axis limits

• resolution – integer specifying the resolution of plot

predict_density(X, Y=None, resolution=50)

Computes conditional density p(y|x) over a predefined grid of y target values

Parameters
• X – values/vectors to be conditioned on - shape: (n_instances, n_dim_x)

• Y – (optional) y values to be evaluated from p(y|x) - if not set, Y will be a grid with with specified resolution

• resulution – integer specifying the resolution of simulation_eval grid

Returns: tuple (P, Y)
• P - density p(y|x) - shape (n_instances, resolution**n_dim_y)

• Y - grid with with specified resolution - shape (resolution**n_dim_y, n_dim_y) or a copy of Y in case it was provided as argument

score(X, Y)

Computes the mean conditional log-likelihood of the provided data (X, Y)

Parameters
• X – numpy array to be conditioned on - shape: (n_query_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_query_samples, n_dim_y)

Returns

average log likelihood of data

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

Return type

self

skewness(x_cond, n_samples=1000000)

Skewness of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Skewness Skew[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

std_(x_cond, n_samples=1000000)

Standard deviation of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Standard deviations sqrt(Var[y|x]) corresponding to x_cond - numpy array of shape (n_values, ndim_y)

tail_risk_measures(x_cond, alpha=0.01, n_samples=1000000)

Computes the Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR)

Parameters
• x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

• alpha – quantile percentage of the distribution

• n_samples – number of samples for monte carlo model_fitting

Returns

• VaR values for each x to condition on - numpy array of shape (n_values)

• CVaR values for each x to condition on - numpy array of shape (n_values)

value_at_risk(x_cond, alpha=0.01, n_samples=1000000)

Computes the Value-at-Risk (VaR) of the fitted distribution. Only if ndim_y = 1

Parameters
• x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

• alpha – quantile percentage of the distribution

Returns

VaR values for each x to condition on - numpy array of shape (n_values)