Misc¶

kxy.api.core.utils.
empirical_copula_uniform
(x)¶ Evaluate the empirical copulauniform dual representation of x as rank(x)/n.
 Parameters
x ((n, d) np.array) – n i.i.d. draws from a ddimensional distribution.

kxy.api.core.utils.
get_x_data
(x_c, x_d, n_outputs, categorical_encoding='twosplit', non_monotonic_extension=True, space='dual')¶

kxy.api.core.utils.
get_y_data
(y_c, y_d, categorical_encoding='twosplit')¶

kxy.api.core.utils.
hqi
(h, q)¶ Computes \(\bar{h}_q^{1}(x)\) where
\[\bar{h}_q(a) = a \log a (1a) \log \left(\frac{1a}{q1}\right), ~~~~ a \geq \frac{1}{q}.\]

kxy.api.core.utils.
one_hot_encoding
(x)¶ Computes the one hot encoding representation of the input array.
The representation used is the one where all distincts inputs are first converted to string, then sorted using Python’s
sorted
method.The \(i\)th element in this sort has its \(i\)th bit from the right set to 1 and all others to 0.
 Parameters
x ((n,) or (n, d) np.array) – Array of n inputs (typically strings) to encode and that take q distinct values.
 Returns
Binary array representing the onehot encoding representation of the inputs.
 Return type
(n, q) np.array

kxy.api.core.utils.
pearson_corr
(x)¶ Calculate the Pearson correlation matrix, ignoring nans.
 Parameters
x ((n, d) np.array) – Input data representing n i.i.d. draws from the ddimensional random variable, whose Pearson correlation matrix this function calculates.
 Returns
corr – The Pearson correlation matrix.
 Return type
np.array

kxy.api.core.utils.
prepare_data_for_mutual_info_analysis
(x_c, x_d, y_c, y_d, non_monotonic_extension=True, categorical_encoding='twosplit', space='dual')¶

kxy.api.core.utils.
prepare_test_data_for_prediction
(test_x_c, test_x_d, train_x_c, train_x_d, train_y_c, train_y_d, non_monotonic_extension=True, categorical_encoding='twosplit', space='dual')¶

kxy.api.core.utils.
robust_log_det
(c)¶ Computes the logarithm of the determinant of a positive definite matrix in a fashion that is more robust to illconditioning than taking the logarithm of np.linalg.det.
Note
Specifically, we compute the SVD of c, and return the sum of the log of eigenvalues. np.linalg.det on the other hand computes the Cholesky decomposition of c, which is more likely to fail than its SVD, and takes the product of its diagonal elements, which could be subject to underflow error when diagonal elements are small.
 Parameters
c ((d, d) np.array) – Square input matrix for computing logdeterminant.
 Returns
d – Logdeterminant of the input matrix.
 Return type
float

kxy.api.core.utils.
spearman_corr
(x)¶ Calculate the Spearman rank correlation matrix, ignoring nans.
 Parameters
x ((n, d) np.array) – Input data representing n i.i.d. draws from the ddimensional random variable, whose Spearman rank correlation matrix this function calculates.
 Returns
corr – The Spearman rank correlation matrix.
 Return type
np.array

kxy.api.core.utils.
two_split_encoding
(x)¶ Also known as binary encoding, the twosplit encoding method turns categorical data taking \(q\) distinct values into the ordinal data \(1, \dots, q\), and then generates the binary representation of the ordinal data.
This encoding is more economical than onehot encoding. Unlike the onehot encoding methods which requires as many columns as the number of distinct inputs, twosplit encoding only requires \(\log_2 \left\lceil q \right\rceil\) columns.
Every bit in the encoding splits the set of all \(q\) possible categories in 2 subsets of equal size (when q is a power of 2), and the value of the bit determines which subset the category of interest belongs to. Each bit generates a different partitioning of the set of all distinct categories.
The ordinal value assigned to a categorical value is the order of its string representation among all \(q\) distinct categorical values in the array.
In other words, while a bit in onehot encoding determines whether the category is equal to a specific values, a bit in the twosplit encoding determines whether the category belong in one of two subsets of equal size.
 Parameters
x ((n,) or (n, d) np.array) – Array of n inputs (typically strings) to encode and that take q distinct values.
 Returns
Binary array representing the twosplit encoding representation of the inputs.
 Return type
(n, q) np.array