# Normalizing Flow Estimator¶

The Normalizing Flow Estimator (NFE) combines a conventional neural network (in our implementation specified as $$estimator$$) with a multi-stage Normalizing Flow [REZENDE2015] for modeling conditional probability distributions $$p(y|x)$$. Given a network and a flow, the distribution $$y$$ can be specified by having the network output the parameters of the flow given an input $$x$$ [TRIPPE2018]. If the normalizing flow is expressive enough, arbitrary conditional distributions can be approximated.

The flows work by transforming a base distribution (in our case a normal distribution) into successively more complex distributions by applying bijectors.

Example of a normal distribution being transformed by two planar flows:

Using the change of variable formula, the resulting probability distribution $$p_1$$ for a single flow $$f$$ applied to the base distribution $$p_0$$ becomes:

\begin{align}\begin{aligned}p_0(\mathbf{z_0}) = \mathcal{N}(\mathbf{\mu}, \mathbf{\Sigma})(\mathbf{z_0})\\\mathbf{z_1} = f(\mathbf{z_0})\\p_1(\mathbf{z_1}) = p_0(f^{-1}(\mathbf{z_1})) \cdot |\det \dfrac{d f^{-1}(\mathbf{z_1})}{d \mathbf{z_1}}|\end{aligned}\end{align}

Using normalizing flows for density estimation requires that the inverse and the Jacobian determinant of the flow can be calculated quickly.

Given input $$x$$, the neural network outputs the parameters $$\theta$$ of the flows. The weights and biases $$w$$ of the neural network are learned by minimizing the negative logarithm of the likelihood (maximum likelihood) over $$N$$ data points for a normalizing flow consisting of $$K$$ flows.

\begin{align}\begin{aligned}E(w) = - \sum_{n=1}^N \bigg\{\log p_0(\mathbf{z_{0,n}}) + \sum_{k=1}^{K} \log|\det\dfrac{d f_k^{-1}(\mathbf{z_{k,n}}, \theta_k(\mathbf{w}, \mathbf{x_n}))}{d \mathbf{z_{k,n}}}|\bigg\}\\\mathbf{z_{0,n}} = f_1^{-1}(f_2^{-1}(\dots f_K^{-1}(\mathbf{z_{K,n}}))), \mathbf{z_{K,n}} = \mathbf{y_n}\end{aligned}\end{align}

Available flows:

class cde.density_estimator.NormalizingFlowEstimator(name, ndim_x, ndim_y, flows_type=('affine', 'radial', 'radial', 'radial'), hidden_sizes=(16, 16), hidden_nonlinearity=<function tanh>, n_training_epochs=1000, x_noise_std=None, y_noise_std=None, weight_decay=0.0, weight_normalization=True, data_normalization=True, dropout=0.0, random_seed=None)[source]

Normalizing Flow Estimator

Parameters
• name – (str) name space of the network (should be unique in code, otherwise tensorflow namespace collisions may arise)

• ndim_x – (int) dimensionality of x variable

• ndim_y – (int) dimensionality of y variable

• flows_type – (tuple of strings) The chain of individual flows that together make up the full flow. The individual flows can be any of: affine, planar, radial, identity. They will be applied in order going from the base distribution to the transformed distribution.

• hidden_sizes – (tuple of int) sizes of the hidden layers of the neural network

• hidden_nonlinearity – (tf function) nonlinearity of the hidden layers

• n_training_epochs – (int) Number of epochs for training

• x_noise_std – (optional) standard deviation of Gaussian noise over the the training data X -> regularization through noise

• y_noise_std – (optional) standard deviation of Gaussian noise over the the training data Y -> regularization through noise

• weight_decay – (float) the amount of decoupled (http://arxiv.org/abs/1711.05101) weight decay to apply

• weight_normalization – (boolean) whether weight normalization shall be used for the neural network

• data_normalization – (boolean) whether to normalize the data (X and Y) to exhibit zero-mean and uniform-std

• dropout – (float) the probability of switching off nodes during training

• random_seed – (optional) seed (int) of the random number generators used

cdf(X, Y)

Predicts the conditional cumulative probability p(Y<=y|X=x). Requires the model to be fitted.

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

Returns

conditional cumulative probability p(Y<=y|X=x) - numpy array of shape (n_query_samples, )

conditional_value_at_risk(x_cond, alpha=0.01, n_samples=1000000)

Computes the Conditional Value-at-Risk (CVaR) / Expected Shortfall of the fitted distribution. Only if ndim_y = 1

Parameters
• x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

• alpha – quantile percentage of the distribution

Returns

CVaR values for each x to condition on - numpy array of shape (n_values)

covariance(x_cond, n_samples=1000000)

Covariance of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Covariances Cov[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

eval_by_cv(X, Y, n_splits=5, verbose=True)

Fits the conditional density model with cross-validation by using the score function of the BaseDensityEstimator for scoring the various splits.

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

• n_splits – number of cross-validation folds (positive integer)

• verbose – the verbosity level

fit(X, Y, random_seed=None, verbose=True, eval_set=None, **kwargs)[source]

Fit the model with to the provided data

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

• eval_set – (tuple) eval/test dataset - tuple (X_test, Y_test)

• verbose – (boolean) controls the verbosity of console output

fit_by_cv(X, Y, n_folds=3, param_grid=None, random_state=None, verbose=True, n_jobs=-1)

Fits the conditional density model with hyperparameter search and cross-validation.

• Determines the best hyperparameter configuration from a pre-defined set using cross-validation. Thereby, the conditional log-likelihood is used for simulation_eval.

• Fits the model with the previously selected hyperparameter configuration

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

• n_folds – number of cross-validation folds (positive integer)

• param_grid

(optional) a dictionary with the hyperparameters of the model as key and and a list of respective parametrizations as value. The hyperparameter search is performed over the cartesian product of the provided lists. Example:

{"n_centers": [20, 50, 100, 200],
"center_sampling_method": ["agglomerative", "k_means", "random"],
"keep_edges": [True, False]
}


• random_state – (int) seed used by the random number generator for shuffeling the data

get_configuration(deep=True)

Get parameter configuration for this estimator.

Parameters

deep – boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params - mapping of string to any Parameter names mapped to their values.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

get_params_internal(**tags)

Internal method to be implemented which does not perform caching

kurtosis(x_cond, n_samples=1000000)

Kurtosis of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Kurtosis Kurt[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

log_pdf(X, Y)

Predicts the conditional log-probability log p(y|x). Requires the model to be fitted.

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

Returns

onditional log-probability log p(y|x) - numpy array of shape (n_query_samples, )

mean_(x_cond, n_samples=1000000)

Mean of the fitted distribution conditioned on x_cond :param x_cond: different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Means E[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y)

mean_std(x_cond, n_samples=1000000)
Computes Mean and Covariance of the fitted distribution conditioned on x_cond.

Computationally more efficient than calling mean and covariance computatio separately

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Means E[y|x] and Covariances Cov[y|x]

pdf(X, Y)

Predicts the conditional probability p(y|x). Requires the model to be fitted.

Parameters
• X – numpy array to be conditioned on - shape: (n_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_samples, n_dim_y)

Returns

conditional probability p(y|x) - numpy array of shape (n_query_samples, )

plot2d(x_cond=[0, 1, 2], ylim=(-8, 8), resolution=100, mode='pdf', show=True, prefix='', numpyfig=False)

Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each

Parameters
• xlim – 2-tuple specifying the x axis limits

• ylim – 2-tuple specifying the y axis limits

• resolution – integer specifying the resolution of plot

plot3d(xlim=(-5, 5), ylim=(-8, 8), resolution=100, show=False, numpyfig=False)

Generates a 3d surface plot of the fitted conditional distribution if x and y are 1-dimensional each

Parameters
• xlim – 2-tuple specifying the x axis limits

• ylim – 2-tuple specifying the y axis limits

• resolution – integer specifying the resolution of plot

predict_density(X, Y=None, resolution=50)

Computes conditional density p(y|x) over a predefined grid of y target values

Parameters
• X – values/vectors to be conditioned on - shape: (n_instances, n_dim_x)

• Y – (optional) y values to be evaluated from p(y|x) - if not set, Y will be a grid with with specified resolution

• resulution – integer specifying the resolution of simulation_eval grid

Returns: tuple (P, Y)
• P - density p(y|x) - shape (n_instances, resolution**n_dim_y)

• Y - grid with with specified resolution - shape (resolution**n_dim_y, n_dim_y) or a copy of Y in case it was provided as argument

reset_fit()[source]

Resets all tensorflow objects and enables this model to be fitted anew

score(X, Y)

Computes the mean conditional log-likelihood of the provided data (X, Y)

Parameters
• X – numpy array to be conditioned on - shape: (n_query_samples, n_dim_x)

• Y – numpy array of y targets - shape: (n_query_samples, n_dim_y)

Returns

average log likelihood of data

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

Return type

self

skewness(x_cond, n_samples=1000000)

Skewness of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Skewness Skew[y|x] corresponding to x_cond - numpy array of shape (n_values, ndim_y, ndim_y)

std_(x_cond, n_samples=1000000)

Standard deviation of the fitted distribution conditioned on x_cond

Parameters

x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

Returns

Standard deviations sqrt(Var[y|x]) corresponding to x_cond - numpy array of shape (n_values, ndim_y)

tail_risk_measures(x_cond, alpha=0.01, n_samples=1000000)

Computes the Value-at-Risk (VaR) and Conditional Value-at-Risk (CVaR)

Parameters
• x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

• alpha – quantile percentage of the distribution

• n_samples – number of samples for monte carlo model_fitting

Returns

• VaR values for each x to condition on - numpy array of shape (n_values)

• CVaR values for each x to condition on - numpy array of shape (n_values)

value_at_risk(x_cond, alpha=0.01, n_samples=1000000)

Computes the Value-at-Risk (VaR) of the fitted distribution. Only if ndim_y = 1

Parameters
• x_cond – different x values to condition on - numpy array of shape (n_values, ndim_x)

• alpha – quantile percentage of the distribution

Returns

VaR values for each x to condition on - numpy array of shape (n_values)

REZENDE2015

Rezende, Mohamed (2015). Variational Inference with Normalizing Flows (http://arxiv.org/abs/1505.05770)

TRIPPE2018

Trippe, Turner (2018). Conditional Density Estimation with Bayesian Normalising Flows (http://arxiv.org/abs/1802.04908)