Skin Segmentation (UCI, Classification, n=245057, d=3, 2 classes)¶

Loading The Data¶

In [1]:

from kxy_datasets.uci_classifications import SkinSegmentation # pip install kxy_datasets

In [2]:

dataset = SkinSegmentation()
df = dataset.df # Retrieve the dataset as a pandas dataframe
y_column = dataset.y_column # The name of the column corresponding to the target
problem_type = dataset.problem_type # 'regression' or 'classification'

In [3]:

df.kxy.describe() # Visualize a summary of the data


---------
Column: B
---------
Type:   Continuous
Max:    255
p75:    176
Mean:   125
Median: 139
p25:    68
Min:    0.0

---------
Column: G
---------
Type:   Continuous
Max:    255
p75:    177
Mean:   132
Median: 153
p25:    87
Min:    0.0

---------
Column: R
---------
Type:   Continuous
Max:    255
p75:    164
Mean:   123
Median: 128
p25:    70
Min:    0.0

---------
Column: y
---------
Type:   Continuous
Max:    2.0
p75:    2.0
Mean:   1.8
Median: 2.0
p25:    2.0
Min:    1.0

Data Valuation¶

In [4]:

df.kxy.data_valuation(y_column, problem_type=problem_type)

[====================================================================================================] 100% ETA: 0s    Duration: 0s

Out[4]:

	Achievable R-Squared	Achievable Log-Likelihood Per Sample	Achievable Accuracy
0	0.64	-1.08e-04	1.00

Automatic (Model-Free) Variable Selection¶

In [5]:

df.kxy.variable_selection(y_column, problem_type=problem_type)

[====================================================================================================] 100% ETA: 0s    Duration: 0s

Out[5]:

	Variable	Running Achievable R-Squared	Running Achievable Accuracy
Selection Order
0	No Variable	0.00	0.79
1	R	0.40	0.93
2	G	0.63	1.00
3	B	0.64	1.00