Bank Note (UCI, Classification, n=1372, d=4, 2 classes)

Loading The Data

In [1]:
from kxy_datasets.uci_classifications import BankNote # pip install kxy_datasets
In [2]:
dataset = BankNote()
df = dataset.df # Retrieve the dataset as a pandas dataframe
y_column = dataset.y_column # The name of the column corresponding to the target
problem_type = dataset.problem_type # 'regression' or 'classification'
In [3]:
df.kxy.describe() # Visualize a summary of the data

---------------
Column: Entropy
---------------
Type:   Continuous
Max:    2.4
p75:    0.4
Mean:   -1.2
Median: -0.6
p25:    -2.4
Min:    -8.5

---------------
Column: Is Fake
---------------
Type:   Continuous
Max:    1.0
p75:    1.0
Mean:   0.4
Median: 0.0
p25:    0.0
Min:    0.0

----------------
Column: Kurtosis
----------------
Type:   Continuous
Max:    17
p75:    3.2
Mean:   1.4
Median: 0.6
p25:    -1.6
Min:    -5.3

----------------
Column: Skewness
----------------
Type:   Continuous
Max:    12
p75:    6.8
Mean:   1.9
Median: 2.3
p25:    -1.7
Min:    -13.8

----------------
Column: Variance
----------------
Type:   Continuous
Max:    6.8
p75:    2.8
Mean:   0.4
Median: 0.5
p25:    -1.8
Min:    -7.0

Data Valuation

In [4]:
df.kxy.data_valuation(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[4]:
Achievable R-Squared Achievable Log-Likelihood Per Sample Achievable Accuracy
0 0.75 0.00 1.00

Automatic (Model-Free) Variable Selection

In [5]:
df.kxy.variable_selection(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[5]:
Variable Running Achievable R-Squared Running Achievable Accuracy
Selection Order
0 No Variable 0.00 0.56
1 Variance 0.51 0.90
2 Skewness 0.58 0.93
3 Kurtosis 0.75 1.00
4 Entropy 0.75 1.00