# Post-Learning¶

kxy.regression.post_learning.regression_bias(f_x, z, linear_scale=True, space='dual', categorical_encoding='two-split')

Quantifies the bias in a regression model as the mutual information between a category variable and model predictions.

Parameters: f_x ((n,) np.array) – The model decisions. z ((n,) np.array) – The associated variable through which the bias could arise. categorical_encoding (str, 'one-hot' | 'two-split' (default)) – The encoding method to use to represent categorical variables. See kxy.api.core.utils.one_hot_encoding and kxy.api.core.utils.two_split_encoding. b – The mutual information $$m$$ or $$1-e^{-2m}$$ if linear_scale=True. float

Theoretical Foundation

Section b) Quantifying Bias in Models.

kxy.regression.post_learning.regression_model_explanation_analysis(x_c, f_x, x_d=None, space='dual', categorical_encoding='two-split')

Runs the model explanation analysis on a trained regression model.

Parameters: x_c ((n,d) np.array) – Continuous inputs. f_x ((n,) np.array) – Labels predicted by the model and corresponding to inputs x_c and x_d. x_d ((n, d) np.array or None (default), optional) – Discrete inputs. space (str, 'primal' | 'dual') – The space in which the maximum entropy problem is solved. When space='primal', the maximum entropy problem is solved in the original observation space, under Pearson covariance constraints, leading to the Gaussian copula. When space='dual', the maximum entropy problem is solved in the copula-uniform dual space, under Spearman rank correlation constraints. categorical_encoding (str, 'one-hot' | 'two-split' (default)) – The encoding method to use to represent categorical variables. See kxy.api.core.utils.one_hot_encoding and kxy.api.core.utils.two_split_encoding. a (pandas.DataFrame) – Dataframe with columns: 'Variable': The variable index starting from 0 at the leftmost column of x_c and ending at the rightmost column of x_d. 'Selection Order': The order in which the associated variable was selected, starting at 1 for the most important variable. 'Univariate Explained R^2': The $$R^2$$ between predicted labels and this variable. 'Running Explained R^2': The $$R^2$$ between predicted labels and all variables selected so far, including this one. 'Marginal Explained R^2': The increase in $$R^2$$ between predicted labels and all variables selected so far that is due to adding this variable in the selection scheme. .. admonition:: Theoretical Foundation – Section a) Model Explanation.
kxy.regression.post_learning.regression_model_improvability_analysis(x_c, y_p, y, x_d=None, space='dual', categorical_encoding='two-split')

Runs the model improvability analysis on a trained regression model.

Parameters: x_c ((n,d) np.array) – Continuous inputs. y_p ((n,) np.array) – Labels predicted by the model and corresponding to inputs x_c and x_d. y ((n,) np.array) – True labels. x_d ((n, d) np.array or None (default), optional) – Discrete inputs. space (str, 'primal' | 'dual') – The space in which the maximum entropy problem is solved. When space='primal', the maximum entropy problem is solved in the original observation space, under Pearson covariance constraints, leading to the Gaussian copula. When space='dual', the maximum entropy problem is solved in the copula-uniform dual space, under Spearman rank correlation constraints. categorical_encoding (str, 'one-hot' | 'two-split' (default)) – The encoding method to use to represent categorical variables. See kxy.api.core.utils.one_hot_encoding and kxy.api.core.utils.two_split_encoding. a – Dataframe with columns: 'Leftover R^2': The amount by which the trained model’s $$R^2$$ can still be improved without resorting to additional inputs, simply through better modeling. 'Leftover Log-Likelihood Per Sample': The amount by which the trained model’s true log-likelihood per sample can still be increased without resorting to additional inputs, simply through better modeling. pandas.DataFrame

Theoretical Foundation

Section 3 - Model Improvability.