Cheatsheet

Imports

import pandas as pd
import kxy

From now on, df refers to a Pandas dataframe object and y_column is the column of df to be used as target. All columns in df but y_column are treated as explanatory variables. problem_type is a variable taking value 'regression' for regression problems and 'classification' for classification problems.

Data Valuation

df.kxy.data_valuation(y_column, problem_type=problem_type)

By default, your data is transmitted to our backend in clear. To anonymize your data before performing data valuation, simply set anonymize=True.

df.kxy.data_valuation(y_column, problem_type=problem_type, anonymize=True) # Data valuation using anonymized data.

Automatic (Model-Free) Feature Selection

df.kxy.variable_selection(y_column, problem_type=problem_type)

By default, your data is transmitted to our backend in clear. To anonymize your data before performing automatic feature selection, simply set anonymize=True.

df.kxy.variable_selection(y_column, problem_type=problem_type, anonymize=True) # Variable selection using anonymized data.

Model-Driven Improvability

For the model-driven improvability analysis, predictions made by the production model should be contained in a column of the df. The variable prediction_column refers to said column. All columns in df but y_column and prediction_column are considered to be the explanatory variables/features used to train the production model.

anonymize = False # Set to True to anonymize your data before model-driven improvability
df.kxy.model_driven_improvability(y_column, prediction_column, problem_type=problem_type, anonymize=anonymize)

Data-Driven Improvability

For the data-driven improvability analysis, the list of columns representing new features/explanatory variables to consider (new_variables) should be provided. All columns in df that are neither y_column nor contained in new_variables are assumed to be the explanatory variables/features used to trained the production model.

anonymize = False # Set to True to anonymize your data before model-driven improvability
df.kxy.data_driven_improvability(y_column, new_variables, problem_type=problem_type, anonymize=anonymize)