PostLearning¶

kxy.regression.post_learning.
regression_bias
(f_x, z, linear_scale=True, space='dual', categorical_encoding='twosplit')¶ Quantifies the bias in a regression model as the mutual information between a category variable and model predictions.
Parameters:  f_x ((n,) np.array) – The model decisions.
 z ((n,) np.array) – The associated variable through which the bias could arise.
 categorical_encoding (str, 'onehot'  'twosplit' (default)) – The encoding method to use to represent categorical variables. See kxy.api.core.utils.one_hot_encoding and kxy.api.core.utils.two_split_encoding.
Returns: b – The mutual information \(m\) or \(1e^{2m}\) if
linear_scale=True
.Return type: float
Theoretical Foundation
Section b) Quantifying Bias in Models.

kxy.regression.post_learning.
regression_model_explanation_analysis
(x_c, f_x, x_d=None, space='dual', categorical_encoding='twosplit')¶ Runs the model explanation analysis on a trained regression model.
Parameters:  x_c ((n,d) np.array) – Continuous inputs.
 f_x ((n,) np.array) – Labels predicted by the model and corresponding to inputs x_c and x_d.
 x_d ((n, d) np.array or None (default), optional) – Discrete inputs.
 space (str, 'primal'  'dual') – The space in which the maximum entropy problem is solved.
When
space='primal'
, the maximum entropy problem is solved in the original observation space, under Pearson covariance constraints, leading to the Gaussian copula. Whenspace='dual'
, the maximum entropy problem is solved in the copulauniform dual space, under Spearman rank correlation constraints.  categorical_encoding (str, 'onehot'  'twosplit' (default)) – The encoding method to use to represent categorical variables. See kxy.api.core.utils.one_hot_encoding and kxy.api.core.utils.two_split_encoding.
Returns:  a (pandas.DataFrame) – Dataframe with columns:
'Variable'
: The variable index starting from 0 at the leftmost column ofx_c
and ending at the rightmost column ofx_d
.'Selection Order'
: The order in which the associated variable was selected, starting at 1 for the most important variable.'Univariate Explained R^2'
: The \(R^2\) between predicted labels and this variable.'Running Explained R^2'
: The \(R^2\) between predicted labels and all variables selected so far, including this one.'Marginal Explained R^2'
: The increase in \(R^2\) between predicted labels and all variables selected so far that is due to adding this variable in the selection scheme.
 .. admonition:: Theoretical Foundation – Section a) Model Explanation.

kxy.regression.post_learning.
regression_model_improvability_analysis
(x_c, y_p, y, x_d=None, space='dual', categorical_encoding='twosplit')¶ Runs the model improvability analysis on a trained regression model.
Parameters:  x_c ((n,d) np.array) – Continuous inputs.
 y_p ((n,) np.array) – Labels predicted by the model and corresponding to inputs x_c and x_d.
 y ((n,) np.array) – True labels.
 x_d ((n, d) np.array or None (default), optional) – Discrete inputs.
 space (str, 'primal'  'dual') – The space in which the maximum entropy problem is solved.
When
space='primal'
, the maximum entropy problem is solved in the original observation space, under Pearson covariance constraints, leading to the Gaussian copula. Whenspace='dual'
, the maximum entropy problem is solved in the copulauniform dual space, under Spearman rank correlation constraints.  categorical_encoding (str, 'onehot'  'twosplit' (default)) – The encoding method to use to represent categorical variables. See kxy.api.core.utils.one_hot_encoding and kxy.api.core.utils.two_split_encoding.
Returns: a – Dataframe with columns:
'Leftover R^2'
: The amount by which the trained model’s \(R^2\) can still be improved without resorting to additional inputs, simply through better modeling.'Leftover LogLikelihood Per Sample'
: The amount by which the trained model’s true loglikelihood per sample can still be increased without resorting to additional inputs, simply through better modeling.
Return type: pandas.DataFrame
Theoretical Foundation
Section 3  Model Improvability.