Model Improvability¶
Estimation of the amount by which the performance of a trained supervised learning model can be increased, either in a model-driven fashion, or a data-driven fashion.
Model-Driven Improvability¶
- kxy.post_learning.improvability.model_driven_improvability(data_df, target_column, prediction_column, problem_type, snr='auto', file_name=None)¶
Estimate the extent to which a trained supervised learner may be improved in a model-driven fashion (i.e. without resorting to additional explanatory variables).
- Parameters
data_df (pandas.DataFrame) – The pandas DataFrame containing the data.
target_column (str) – The name of the column containing true labels.
prediction_column (str) – The name of the column containing model predictions.
problem_type (None | 'classification' | 'regression') – The type of supervised learning problem. When None, it is inferred from whether or not
target_column
is categorical.file_name (None | str) – A unique identifier characterizing data_df in the form of a file name. Do not set this unless you know why.
- Returns
result – The result is a pandas.Dataframe with columns (where applicable):
'Lost Accuracy'
: The amount of classification accuracy that was irreversibly lost when training the supervised learner.'Lost R-Squared'
: The amount of \(R^2\) that was irreversibly lost when training the supervised learner.'Lost RMSE'
: The amount of Root Mean Square Error that was irreversibly lost when training the supervised learner.'Lost Log-Likelihood Per Sample'
: The amount of true log-likelihood per sample that was irreversibly lost when training the supervised learner.'Residual R-Squared'
: For regression problems, this is the highest \(R^2\) that may be achieved when using explanatory variables to predict regression residuals.'Residual RMSE'
: For regression problems, this is the lowest Root Mean Square Error that may be achieved when using explanatory variables to predict regression residuals.'Residual Log-Likelihood Per Sample'
: For regression problems, this is the highest log-likelihood per sample that may be achieved when using explanatory variables to predict regression residuals.
- Return type
pandas.Dataframe
Theoretical Foundation
Section 3 - Model Improvability.
Data-Driven Improvability¶
- kxy.post_learning.improvability.data_driven_improvability(data_df, target_column, new_variables, problem_type, snr='auto', file_name=None)¶
Estimate the potential performance boost that a set of new explanatory variables can bring about.
- Parameters
data_df (pandas.DataFrame) – The pandas DataFrame containing the data.
target_column (str) – The name of the column containing true labels.
new_variables (list) – The names of the columns to use as new explanatory variables.
problem_type (None | 'classification' | 'regression') – The type of supervised learning problem. When None, it is inferred from whether or not
target_column
is categorical.file_name (None | str) – A unique identifier characterizing data_df in the form of a file name. Do not set this unless you know why.
- Returns
result – The result is a pandas.Dataframe with columns (where applicable):
'Accuracy Boost'
: The classification accuracy boost that the new explanatory variables can bring about.'R-Squared Boost'
: The \(R^2\) boost that the new explanatory variables can bring about.'RMSE Reduction'
: The reduction in Root Mean Square Error that the new explanatory variables can bring about.'Log-Likelihood Per Sample Boost'
: The boost in log-likelihood per sample that the new explanatory variables can bring about.
- Return type
pandas.Dataframe
Theoretical Foundation
Section 3 - Model Improvability.