Your Data

How We Use Your Data

To run our analyzes, the KXY backend needs your data. The methods below are the only methods involved in sharing your data with us. The kxy package only uploads your data if and when needed.

kxy.api.data_transfer.generate_upload_url(file_name)

Requests a pre-signed URL to upload a dataset.

Parameters

file_name (str) – A string that uniquely identifies the content of the file.

Returns

d – The dictionary containing the pre-signed url.

Return type

dict or None

kxy.api.data_transfer.upload_data(df, file_name=None)

Updloads a dataframe to kxy servers.

Parameters

df (pd.DataFrame) – The dataframe to upload.

Returns

d – Whether the upload was successful.

Return type

bool

Anonymizing Your Data

Fortunately, our analyses are invariant by various transformations that can completely anonymize your data.

You may simply run df_anonymized = df.kxy.anonymize() on any dataframe df to anonymize it, and work with df_anonymized instead df.

Check out the function below for more information on how we anonymize your data.

BaseAccessor.anonymize(columns_to_exclude=[])

Anonymize the dataframe in a manner that leaves all pre-learning and post-learning analyses (including data valuation, variable selection, model-driven improvability, data-driven improvability and model explanation) invariant.

Any transformation on continuous variables that preserves ranks will not change our pre-learning and post-learning analyses. The same holds for any 1-to-1 transformation on categorical variables.

This implementation replaces ordinal values (i.e. any column that can be cast as a float) with their within-column Gaussian score. For each non-ordinal column, we form the set of all possible values, we assign a unique integer index to each value in the set, and we systematically replace said value appearing in the dataframe by the hexadecimal code of its associated integer index.

For regression problems, accurate estimation of RMSE related metrics require the target column (and the prediction column for post-learning analyses) not to be anonymized.

Parameters

columns_to_exclude (list (optional)) – List of columns not to anonymize (e.g. target and prediction columns for regression problems).

Returns

result – The result is a pandas.Dataframe with columns (where applicable):

Return type

pandas.DataFrame