Conditional Kernel Density Estimation¶
-
class
cde.density_estimator.
ConditionalKernelDensityEstimation
(name='CKDE', ndim_x=None, ndim_y=None, bandwidth='cv_ml', n_jobs=-1, random_seed=None)[source]¶ ConditionalKernelDensityEstimation (CKDE): Nonparametric conditional density estimator that models the joint probability p(x,y) and marginal probability p(x) via kernel density estimation and computes the conditional density as p(y|x) = p(x, y) / p(x). This implementation wraps functionality of the statsmodels.nonparametric module.
- Parameters
name – (str) name / identifier of estimator
ndim_x – (int) dimensionality of x variable
ndim_y – (int) dimensionality of y variable
bandwidth –
(array_like or str) If an array, it is a fixed user-specified bandwidth. If a string, should be one of:
normal_reference: normal reference rule of thumb (default)
cv_ml: cross validation maximum likelihood
cv_ls: cross validation least squares
n_jobs – (int) number of jobs to launch for calls with large batch sizes
random_seed – (optional) seed (int) of the random number generators used
References
Racine, J., Li, Q. Nonparametric econometrics: theory and practice. Princeton University Press. (2007)
-
cdf
(X, Y)[source]¶ Predicts the conditional cumulative probability p(Y<=y|X=x). Requires the model to be fitted.
- Parameters
X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)
Y – numpy array of y targets - shape: (n_samples, n_dim_y)
- Returns
conditional cumulative probability p(Y<=y|X=x) - numpy array of shape (n_query_samples, )
-
conditional_value_at_risk
(x_cond, alpha=0.01, n_samples=1000000)¶ Computes the Conditional Value-at-Risk (CVaR) / Expected Shortfall of the fitted distribution. Only if ndim_y = 1
- Parameters
x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)
alpha – quantile percentage of the distribution
- Returns
CVaR values for each x to condition on - numpy array of shape (n_values)
-
covariance
(x_cond, n_samples=1000000)¶ Covariance of the fitted distribution conditioned on x_cond
- Parameters
x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)
- Returns
Covariances Cov[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)
-
eval_by_cv
(X, Y, n_splits=5, verbose=True)¶ Fits the conditional density model with cross-validation by using the score function of the BaseDensityEstimator for scoring the various splits.
- Parameters
X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)
Y – numpy array of y targets - shape: (n_samples, n_dim_y)
n_splits – number of cross-validation folds (positive integer)
verbose – the verbosity level
-
fit
(X, Y, **kwargs)[source]¶ Since CKDE is a lazy learner, fit just stores the provided training data (X,Y)
- Parameters
X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)
Y – numpy array of y targets - shape: (n_samples, n_dim_y)
-
fit_by_cv
(X, Y, n_folds=3, param_grid=None, verbose=True, n_jobs=-1)¶ Fits the conditional density model with hyperparameter search and cross-validation. - Determines the best hyperparameter configuration from a pre-defined set using cross-validation. Thereby,
the conditional log-likelihood is used for simulation_eval.
Fits the model with the previously selected hyperparameter configuration
- Parameters
X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)
Y – numpy array of y targets - shape: (n_samples, n_dim_y)
n_folds – number of cross-validation folds (positive integer)
param_grid –
(optional) a dictionary with the hyperparameters of the model as key and and a list of respective parametrizations as value. The hyperparameter search is performed over the cartesian product of the provided lists. Example: {“n_centers”: [20, 50, 100, 200],
”center_sampling_method”: [“agglomerative”, “k_means”, “random”], “keep_edges”: [True, False]
}
-
get_configuration
(deep=True)¶ Get parameter configuration for this estimator.
- Parameters
deep – boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params - mapping of string to any Parameter names mapped to their values.
-
get_params
(deep=True)¶ Get parameters for this estimator.
- Parameters
deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
params – Parameter names mapped to their values.
- Return type
mapping of string to any
-
kurtosis
(x_cond, n_samples=1000000)¶ Kurtosis of the fitted distribution conditioned on x_cond
- Parameters
x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)
- Returns
Kurtosis Kurt[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)
-
log_pdf
(X, Y)¶ Predicts the conditional log-probability log p(y|x). Requires the model to be fitted.
- Parameters
X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)
Y – numpy array of y targets - shape: (n_samples, n_dim_y)
- Returns
conditional log-probability log p(y|x) - numpy array of shape (n_query_samples, )
-
mean_
(x_cond, n_samples=1000000)¶ Mean of the fitted distribution conditioned on x_cond :param x_cond: different x values to condition on - numpy array of shape (n_values, ndim_x)
- Returns
Means E[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y)
-
mean_std
(x_cond, n_samples=1000000)¶ - Computes Mean and Covariance of the fitted distribution conditioned on x_cond.
Computationally more efficient than calling mean and covariance computatio separately
- Parameters
x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)
- Returns
Means E[y|x] and Covariances Cov[y|x]
-
pdf
(X, Y)[source]¶ Predicts the conditional likelihood p(y|x). Requires the model to be fitted.
- Parameters
X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)
Y – numpy array of y targets - shape: (n_samples, n_dim_y)
- Returns
conditional likelihood p(y|x) - numpy array of shape (n_query_samples, )
-
plot2d
(x_cond=[0, 1, 2], ylim=(-8, 8), resolution=100, mode='pdf', show=True, prefix='', numpyfig=False)¶ Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each
- Parameters
xlim – 2-tuple specifying the x axis limits
ylim – 2-tuple specifying the y axis limits
resolution – integer specifying the resolution of plot
-
plot3d
(xlim=(-5, 5), ylim=(-8, 8), resolution=100, show=False, numpyfig=False)¶ Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each
- Parameters
xlim – 2-tuple specifying the x axis limits
ylim – 2-tuple specifying the y axis limits
resolution – integer specifying the resolution of plot
-
predict_density
(X, Y=None, resolution=50)¶ Computes conditional density p(y|x) over a predefined grid of y target values
- Parameters
X – values/vectors to be conditioned on - shape: (n_instances, n_dim_x)
Y – (optional) y values to be evaluated from p(y|x) - if not set, Y will be a grid with with specified resolution
resulution – integer specifying the resolution of simulation_eval grid
- Returns: tuple (P, Y)
P - density p(y|x) - shape (n_instances, resolution**n_dim_y)
Y - grid with with specified resolution - shape (resolution**n_dim_y, n_dim_y) or a copy of Y in case it was provided as argument
-
score
(X, Y)¶ Computes the mean conditional log-likelihood of the provided data (X, Y)
- Parameters
X – numpy array to be conditioned on - shape: (n_query_samples, n_dim_x)
Y – numpy array of y targets - shape: (n_query_samples, n_dim_y)
- Returns
average log likelihood of data
-
set_params
(**params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it’s possible to update each component of a nested object.- Returns
- Return type
self
-
skewness
(x_cond, n_samples=1000000)¶ Skewness of the fitted distribution conditioned on x_cond
- Parameters
x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)
- Returns
Skewness Skew[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)
-
std_
(x_cond, n_samples=1000000)¶ Standard deviation of the fitted distribution conditioned on x_cond
- Parameters
x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)
- Returns
Standard deviations sqrt(Var[y|x]) corresponding to x_cond - numpy array of shape (n_values, ndim_y)
-
tail_risk_measures
(x_cond, alpha=0.01, n_samples=1000000)¶ Computes the Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR)
- Parameters
x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)
alpha – quantile percentage of the distribution
n_samples – number of samples for monte carlo model_fitting
- Returns
VaR values for each x to condition on - numpy array of shape (n_values)
CVaR values for each x to condition on - numpy array of shape (n_values)
-
value_at_risk
(x_cond, alpha=0.01, n_samples=1000000)¶ Computes the Value-at-Risk (VaR) of the fitted distribution. Only if ndim_y = 1
- Parameters
x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)
alpha – quantile percentage of the distribution
- Returns
VaR values for each x to condition on - numpy array of shape (n_values)