Least-Squares Density Ratio Estimation

Implementation of Least-Squares Density Ratio Estimation (LS-CDE) method introduced in [SUG2010] with some extra features.

This approach estimates the conditional density of multi-dimensional inputs/outputs by expressing the conditional density in terms of the ratio of unconditional densities r(x,y):

\[p(y|x) = \frac{p(x,y)}{p(x)} = r(x,y)\]

Instead of estimating both unconditional densities separately, the density ratio function r(x,y) is directly estimated from samples. The density ratio function is modelled by the following linear model:

\[\widehat{r_{\alpha}}(x,y) := \alpha^T \phi(x,y)\]

where \(\alpha=(\alpha_1, \alpha_2,...,\alpha_b)^T\) are the parameters learned from samples and \(\phi(x,y) = (\phi_{1}(x, y),\phi_{2}(x,y),...,\phi_{b}(x,y))^T\) are kernel functions such that \(\phi_{l}(x,y) \geq 0\) for all \((x,y)\in D_{X} \times D_{Y}\) and \(l = 1, ..., b\).

The parameters \(\alpha\) are learned by minimizing the a integrated squared error.

\[J(\alpha) = \int\int ( \widehat{r_{\alpha}}(x,y) - r(x,y))^2 p(x)dxdy.\]

After having obtained \(\widehat{\alpha} = argmin_{\alpha} ~ J(\alpha)\) through training, the conditional density can be computed as follows:

(1)\[\widehat{p}(y|x=\tilde{x}) = \frac{\widehat{\alpha}^T\phi(\tilde{x},y)}{\int\widehat{\alpha}^T\phi(\tilde{x},y)dy}\]

[SUG2010] propose to use a Gaussian kernel with width \(\sigma\) (bandwidth parameter), which is also the choice for this implementation:

\[\phi_{l}(x,y) = exp \left( \frac{||x-u_{l}||^2}{2 \sigma^2} \right) exp \left( \frac{||y-v_{l}||^2}{2 \sigma^2} \right)\]

where \(\{(u_{l},v_{l})\}_{l=1}^b\) are center points that are chosen from the training data set. By using Gaussian kernels the optimization problem \(argmin_{\alpha} ~ J(\alpha)\) can be solved analytically. Also, the denominator in (1) is traceable and can be computed analytically. The fact that training does not require numerical optimization and the solution can be computed fully analytically is the key advantage of LS-CDE.

While [SUG2010] propose to select center points for the kernel functions randomly from the training set, our implementation offers further center sampling methods:

  • all: use all data points in the train set as kernel centers

  • random: randomly selects k points as kernel centers

  • k_means: uses k-means clustering to determine k kernel centers

  • agglomorative: uses agglomorative clustering to determine k kernel centers

class cde.density_estimator.LSConditionalDensityEstimation(name='LSCDE', ndim_x=None, ndim_y=None, center_sampling_method='k_means', bandwidth=0.5, n_centers=500, regularization=1.0, keep_edges=True, n_jobs=-1, random_seed=None)[source]

Least-Squares Density Ratio Estimator

http://proceedings.mlr.press/v9/sugiyama10a.html

Parameters
  • name – (str) name / identifier of estimator

  • ndim_x – (int) dimensionality of x variable

  • ndim_y – (int) dimensionality of y variable

  • center_sampling_method – String that describes the method to use for finding kernel centers. Allowed values [all, random, distance, k_means, agglomerative]

  • bandwidth – scale / bandwith of the gaussian kernels

  • n_centers – Number of kernels to use in the output

  • regularization – regularization / damping parameter for solving the least-squares problem

  • keep_edges – if set to True, the extreme y values as centers are kept (for expressiveness)

  • n_jobs – (int) number of jobs to launch for calls with large batch sizes

  • random_seed – (optional) seed (int) of the random number generators used

conditional_value_at_risk(x_cond, alpha=0.01, n_samples=1000000)

Computes the Conditional Value-at-Risk (CVaR) / Expected Shortfall of the fitted distribution. Only if ndim_y = 1

Parameters
  • x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

  • alpha – quantile percentage of the distribution

Returns

CVaR values for each x to condition on - numpy array of shape (n_values)

covariance(x_cond, n_samples=1000000)

Covariance of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Covariances Cov[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

eval_by_cv(X, Y, n_splits=5, verbose=True)

Fits the conditional density model with cross-validation by using the score function of the BaseDensityEstimator for scoring the various splits.

Parameters
  • X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

  • Y – numpy array of y targets - shape: (n_samples, n_dim_y)

  • n_splits – number of cross-validation folds (positive integer)

  • verbose – the verbosity level

fit(X, Y, **kwargs)[source]

Fits the conditional density model with provided data

Parameters
  • X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

  • Y – numpy array of y targets - shape: (n_samples, n_dim_y)

fit_by_cv(X, Y, n_folds=3, param_grid=None, verbose=True, n_jobs=-1)

Fits the conditional density model with hyperparameter search and cross-validation. - Determines the best hyperparameter configuration from a pre-defined set using cross-validation. Thereby,

the conditional log-likelihood is used for simulation_eval.

  • Fits the model with the previously selected hyperparameter configuration

Parameters
  • X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

  • Y – numpy array of y targets - shape: (n_samples, n_dim_y)

  • n_folds – number of cross-validation folds (positive integer)

  • param_grid

    (optional) a dictionary with the hyperparameters of the model as key and and a list of respective parametrizations as value. The hyperparameter search is performed over the cartesian product of the provided lists. Example: {“n_centers”: [20, 50, 100, 200],

    ”center_sampling_method”: [“agglomerative”, “k_means”, “random”], “keep_edges”: [True, False]

    }

get_configuration(deep=True)

Get parameter configuration for this estimator.

Parameters

deep – boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params - mapping of string to any Parameter names mapped to their values.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

kurtosis(x_cond, n_samples=1000000)

Kurtosis of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Kurtosis Kurt[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

log_pdf(X, Y)[source]

Predicts the conditional log-probability log p(y|x). Requires the model to be fitted.

Parameters
  • X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

  • Y – numpy array of y targets - shape: (n_samples, n_dim_y)

Returns

conditional log-probability density log p(y|x) - numpy array of shape (n_query_samples, )

mean_(x_cond, n_samples=1000000)

Mean of the fitted distribution conditioned on x_cond :param x_cond: different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Means E[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y)

mean_std(x_cond, n_samples=1000000)
Computes Mean and Covariance of the fitted distribution conditioned on x_cond.

Computationally more efficient than calling mean and covariance computatio separately

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Means E[y|x] and Covariances Cov[y|x]

pdf(X, Y)[source]

Predicts the conditional density p(y|x). Requires the model to be fitted.

Parameters
  • X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

  • Y – numpy array of y targets - shape: (n_samples, n_dim_y)

Returns

conditional probability density p(y|x) - numpy array of shape (n_query_samples, )

plot2d(x_cond=[0, 1, 2], ylim=(-8, 8), resolution=100, mode='pdf', show=True, prefix='', numpyfig=False)

Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each

Parameters
  • xlim – 2-tuple specifying the x axis limits

  • ylim – 2-tuple specifying the y axis limits

  • resolution – integer specifying the resolution of plot

plot3d(xlim=(-5, 5), ylim=(-8, 8), resolution=100, show=False, numpyfig=False)

Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each

Parameters
  • xlim – 2-tuple specifying the x axis limits

  • ylim – 2-tuple specifying the y axis limits

  • resolution – integer specifying the resolution of plot

predict_density(X, Y=None, resolution=50)

Computes conditional density p(y|x) over a predefined grid of y target values

Parameters
  • X – values/vectors to be conditioned on - shape: (n_instances, n_dim_x)

  • Y – (optional) y values to be evaluated from p(y|x) - if not set, Y will be a grid with with specified resolution

  • resulution – integer specifying the resolution of simulation_eval grid

Returns: tuple (P, Y)
  • P - density p(y|x) - shape (n_instances, resolution**n_dim_y)

  • Y - grid with with specified resolution - shape (resolution**n_dim_y, n_dim_y) or a copy of Y in case it was provided as argument

sample(X)[source]

sample from the conditional mixture distributions - requires the model to be fitted

Parameters

X – values to be conditioned on when sampling - numpy array of shape (n_instances, n_dim_x)

Returns: tuple (X, Y)
  • X - the values to conditioned on that were provided as argument - numpy array of shape (n_samples, ndim_x)

  • Y - conditional samples from the model p(y|x) - numpy array of shape (n_samples, ndim_y)

score(X, Y)

Computes the mean conditional log-likelihood of the provided data (X, Y)

Parameters
  • X – numpy array to be conditioned on - shape: (n_query_samples, n_dim_x)

  • Y – numpy array of y targets - shape: (n_query_samples, n_dim_y)

Returns

average log likelihood of data

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

Return type

self

skewness(x_cond, n_samples=1000000)

Skewness of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Skewness Skew[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

std_(x_cond, n_samples=1000000)

Standard deviation of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Standard deviations sqrt(Var[y|x]) corresponding to x_cond - numpy array of shape (n_values, ndim_y)

tail_risk_measures(x_cond, alpha=0.01, n_samples=1000000)

Computes the Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR)

Parameters
  • x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

  • alpha – quantile percentage of the distribution

  • n_samples – number of samples for monte carlo model_fitting

Returns

  • VaR values for each x to condition on - numpy array of shape (n_values)

  • CVaR values for each x to condition on - numpy array of shape (n_values)

value_at_risk(x_cond, alpha=0.01, n_samples=1000000)

Computes the Value-at-Risk (VaR) of the fitted distribution. Only if ndim_y = 1

Parameters
  • x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

  • alpha – quantile percentage of the distribution

Returns

VaR values for each x to condition on - numpy array of shape (n_values)

SUG2010(1,2,3)

Sugiyama et al. (2010). Conditional Density Estimation via Least-Squares Density Ratio Estimation, in PMLR 9:781-788 (http://proceedings.mlr.press/v9/sugiyama10a.html)