Risk Analysis

kxy.asset_management.risk_analysis.information_adjusted_correlation(x, y=None, p=0)

Calculates the information-adjusted correlation matrix between two arrays.

Note

Pearson’s correlation coefficient quantifies the linear association between two random variables. Indeed, two random variables can be statistically dependent despite being decorrelated. Such dependence will typically materialize during tail events, the worst timing from a risk management perspective.

The mutual information rate between two scalar time series provides an alternative that fully captures linear and nonlinear, cross-sectional and temporal dependence:

\[I\left(\left\{x_t\right\}, \left\{y_t\right\}\right) = h\left( \left\{x_t\right\} \right) + h\left( \left\{y_t\right\} \right) - h\left(\left\{x_t, y_t\right\} \right)\]

where \(h\left( \left\{ x_t \right\} \right)\) is the entropy rate of the process \(\left\{ x_t \right\}\).

Specifically, the mutual information rate is 0 if and only if the two processes are statistically independent, and in particular exhibit no cross-sectional or temporal dependence, linear or nonlinear.

When \(\left\{ x_t, y_t \right\}\) is Gaussian, stationary and memoryless, for instance when \(\left(x_i, y_i \right)\) are assumed i.i.d Gaussian, the mutual entropy rate reads

\[I\left(\left\{x_t\right\}, \left\{y_t\right\}\right) = -\frac{1}{2} \log \left(1- \text{Corr}\left(x_t, y_t\right)^2 \right).\]

We generalize this formula and define the information-adjusted correlation as the quantity \(\text{IACorr}\left( \left\{x_t\right\}, \left\{y_t\right\} \right)\) so that the mutual information rate always reads

\[I\left(\left\{x_t\right\}, \left\{y_t\right\}\right) = -\frac{1}{2} \log \left(1- \text{IACorr}\left( \left\{x_t\right\}, \left\{y_t\right\} \right)^2 \right),\]

whether or not the time series are jointly Gaussian and memoryless.

\[\text{IACorr}\left(\left\{x_t\right\}, \left\{y_t\right\}\right) := \text{sign}\left( \text{Corr}\left(x_., y_.\right) \right)\sqrt{1-e^{-2 I\left(\left\{x_t\right\}, \left\{y_t\right\}\right)}}\]

where \(\text{sign}(x)=1\) if and only if \(x \geq 0\) and \(-1\) otherwise. Note that the information-adjusted correlation is 0 if and only if the two time series are statistically independent, and in particular exhibit no cross-sectional or temporal dependencee.

Parameters:
  • x ((n,) or (n, d) np.array) – n i.i.d. draws from a scalar or vector random variable.
  • y ((n,) or (n, q) np.array) – n i.i.d. draws from a scalar or vector random variable jointly sampled with x.
  • p (int) – The number of lags to use when generating Spearman rank auto-correlation to use as empirical evidence in the maximum-entropy problem. The default value is 0, which corresponds to assuming rows are i.i.d. This is also the only supported value for now.
Returns:

c – The information-adjusted correlation matrix between the two random variables.

Return type:

np.array

Raises:

AssertionError – If p is different from 0. Higher values will be supported later.

kxy.asset_management.risk_analysis.robust_pearson_corr(x, y=None, p=0, p_ic='hqic')

Computes a robust estimator of the Pearson correlation matrix between \(x\) and \(y\) (or \(x\) if \(y\) is None) as the Pearson correlation matrix that is equivalent to the sample Spearman correlation matrix, assuming \((x, y)\) is jointly Gaussian.

Parameters:
  • x ((n,) or (n, d) np.array) – n i.i.d. draws from a scalar or vector random variable.
  • y ((n,) or (n, q) np.array) – n i.i.d. draws from a scalar or vector random variable jointly sampled with x.
  • p (int) – The number of lags to use when generating Spearman rank auto-correlation. The default value is 0, which corresponds to assuming rows are i.i.d.
  • p_ic (str) – The criterion used to learn the optimal value of p (by fitting a VAR(p) model) when p=None. Should be one of ‘hqic’ (Hannan-Quinn Information Criterion), ‘aic’ (Akaike Information Criterion), ‘bic’ (Bayes Information Criterion) and ‘t-stat’ (based on last lag). Same as the ‘ic’ parameter of statsmodels.tsa.api.VAR.
Returns:

c – The robust Pearson correlation matrix between the two random variables.

Return type:

np.array