Model Improvability¶
Estimation of the amount by which the performance of a trained supervised learning model can be increased, either in a modeldriven fashion, or a datadriven fashion.
ModelDriven Improvability¶

kxy.post_learning.improvability.
model_driven_improvability
(data_df, target_column, prediction_column, problem_type)¶ Estimate the extent to which a trained supervised learner may be improved in a modeldriven fashion (i.e. without resorting to additional explanatory variables).
 Parameters
data_df (pandas.DataFrame) – The pandas DataFrame containing the data.
target_column (str) – The name of the column containing true labels.
prediction_column (str) – The name of the column containing model predictions.
problem_type (None  'classification'  'regression') – The type of supervised learning problem. When None, it is inferred from whether or not
target_column
is categorical.
 Returns
result – The result is a pandas.Dataframe with columns (where applicable):
'Lost Accuracy'
: The amount of classification accuracy that was irreversibly lost when training the supervised learner.'Lost RSquared'
: The amount of \(R^2\) that was irreversibly lost when training the supervised learner.'Lost RMSE'
: The amount of Root Mean Square Error that was irreversibly lost when training the supervised learner.'Lost LogLikelihood Per Sample'
: The amount of true loglikelihood per sample that was irreversibly lost when training the supervised learner.'Residual RSquared'
: For regression problems, this is the highest \(R^2\) that may be achieved when using explanatory variables to predict regression residuals.'Residual RMSE'
: For regression problems, this is the lowest Root Mean Square Error that may be achieved when using explanatory variables to predict regression residuals.'Residual LogLikelihood Per Sample'
: For regression problems, this is the highest loglikelihood per sample that may be achieved when using explanatory variables to predict regression residuals.
 Return type
pandas.Dataframe
Theoretical Foundation
Section 3  Model Improvability.
DataDriven Improvability¶

kxy.post_learning.improvability.
data_driven_improvability
(data_df, target_column, new_variables, problem_type)¶ Estimate the potential performance boost that a set of new explanatory variables can bring about.
 Parameters
data_df (pandas.DataFrame) – The pandas DataFrame containing the data.
target_column (str) – The name of the column containing true labels.
new_variables (list) – The names of the columns to use as new explanatory variables.
problem_type (None  'classification'  'regression') – The type of supervised learning problem. When None, it is inferred from whether or not
target_column
is categorical.
 Returns
result – The result is a pandas.Dataframe with columns (where applicable):
'Accuracy Boost'
: The classification accuracy boost that the new explanatory variables can bring about.'RSquared Boost'
: The \(R^2\) boost that the new explanatory variables can bring about.'RMSE Reduction'
: The reduction in Root Mean Square Error that the new explanatory variables can bring about.'LogLikelihood Per Sample Boost'
: The boost in loglikelihood per sample that the new explanatory variables can bring about.
 Return type
pandas.Dataframe
Theoretical Foundation
Section 3  Model Improvability.