Landsat (UCI, Classification, n=6435, d=36, 6 classes)

Loading The Data

In [1]:
from kxy_datasets.uci_classifications import Landsat # pip install kxy_datasets
In [2]:
dataset = Landsat()
df = dataset.df # Retrieve the dataset as a pandas dataframe
y_column = dataset.y_column # The name of the column corresponding to the target
problem_type = dataset.problem_type # 'regression' or 'classification'
In [3]:
df.kxy.describe() # Visualize a summary of the data

-----------
Column: x_0
-----------
Type:   Continuous
Max:    104
p75:    80
Mean:   69
Median: 68
p25:    60
Min:    39

-----------
Column: x_1
-----------
Type:   Continuous
Max:    137
p75:    103
Mean:   83
Median: 87
p25:    71
Min:    27

------------
Column: x_10
------------
Type:   Continuous
Max:    145
p75:    113
Mean:   98
Median: 100
p25:    85
Min:    50

------------
Column: x_11
------------
Type:   Continuous
Max:    157
p75:    92
Mean:   82
Median: 81
p25:    68
Min:    29

------------
Column: x_12
------------
Type:   Continuous
Max:    104
p75:    80
Mean:   69
Median: 68
p25:    60
Min:    39

------------
Column: x_13
------------
Type:   Continuous
Max:    137
p75:    103
Mean:   83
Median: 85
p25:    71
Min:    27

------------
Column: x_14
------------
Type:   Continuous
Max:    145
p75:    113
Mean:   99
Median: 101
p25:    85
Min:    50

------------
Column: x_15
------------
Type:   Continuous
Max:    154
p75:    92
Mean:   82
Median: 81
p25:    69
Min:    29

------------
Column: x_16
------------
Type:   Continuous
Max:    104
p75:    79
Mean:   69
Median: 68
p25:    60
Min:    40

------------
Column: x_17
------------
Type:   Continuous
Max:    130
p75:    103
Mean:   83
Median: 85
p25:    71
Min:    27

------------
Column: x_18
------------
Type:   Continuous
Max:    145
p75:    113
Mean:   99
Median: 100
p25:    85
Min:    50

------------
Column: x_19
------------
Type:   Continuous
Max:    157
p75:    92
Mean:   82
Median: 81
p25:    69
Min:    29

-----------
Column: x_2
-----------
Type:   Continuous
Max:    140
p75:    113
Mean:   99
Median: 101
p25:    85
Min:    53

------------
Column: x_20
------------
Type:   Continuous
Max:    104
p75:    79
Mean:   68
Median: 67
p25:    60
Min:    39

------------
Column: x_21
------------
Type:   Continuous
Max:    130
p75:    103
Mean:   82
Median: 84
p25:    71
Min:    27

------------
Column: x_22
------------
Type:   Continuous
Max:    145
p75:    113
Mean:   98
Median: 100
p25:    85
Min:    50

------------
Column: x_23
------------
Type:   Continuous
Max:    157
p75:    92
Mean:   82
Median: 81
p25:    68
Min:    29

------------
Column: x_24
------------
Type:   Continuous
Max:    104
p75:    79
Mean:   69
Median: 68
p25:    60
Min:    39

------------
Column: x_25
------------
Type:   Continuous
Max:    131
p75:    103
Mean:   83
Median: 85
p25:    71
Min:    27

------------
Column: x_26
------------
Type:   Continuous
Max:    140
p75:    113
Mean:   99
Median: 100
p25:    85
Min:    50

------------
Column: x_27
------------
Type:   Continuous
Max:    154
p75:    92
Mean:   82
Median: 81
p25:    69
Min:    29

------------
Column: x_28
------------
Type:   Continuous
Max:    104
p75:    79
Mean:   68
Median: 68
p25:    60
Min:    39

------------
Column: x_29
------------
Type:   Continuous
Max:    130
p75:    103
Mean:   83
Median: 85
p25:    71
Min:    27

-----------
Column: x_3
-----------
Type:   Continuous
Max:    154
p75:    92
Mean:   82
Median: 81
p25:    69
Min:    33

------------
Column: x_30
------------
Type:   Continuous
Max:    145
p75:    113
Mean:   99
Median: 100
p25:    85
Min:    50

------------
Column: x_31
------------
Type:   Continuous
Max:    157
p75:    92
Mean:   82
Median: 81
p25:    69
Min:    29

------------
Column: x_32
------------
Type:   Continuous
Max:    104
p75:    79
Mean:   68
Median: 67
p25:    60
Min:    39

------------
Column: x_33
------------
Type:   Continuous
Max:    130
p75:    103
Mean:   82
Median: 84
p25:    71
Min:    27

------------
Column: x_34
------------
Type:   Continuous
Max:    145
p75:    113
Mean:   98
Median: 100
p25:    85
Min:    50

------------
Column: x_35
------------
Type:   Continuous
Max:    157
p75:    92
Mean:   82
Median: 81
p25:    68
Min:    29

-----------
Column: x_4
-----------
Type:   Continuous
Max:    104
p75:    80
Mean:   69
Median: 68
p25:    60
Min:    39

-----------
Column: x_5
-----------
Type:   Continuous
Max:    137
p75:    103
Mean:   83
Median: 85
p25:    71
Min:    27

-----------
Column: x_6
-----------
Type:   Continuous
Max:    145
p75:    113
Mean:   99
Median: 101
p25:    85
Min:    50

-----------
Column: x_7
-----------
Type:   Continuous
Max:    157
p75:    92
Mean:   82
Median: 81
p25:    69
Min:    29

-----------
Column: x_8
-----------
Type:   Continuous
Max:    104
p75:    79
Mean:   68
Median: 67
p25:    60
Min:    40

-----------
Column: x_9
-----------
Type:   Continuous
Max:    130
p75:    102
Mean:   82
Median: 85
p25:    71
Min:    27

---------
Column: y
---------
Type:   Continuous
Max:    7.0
p75:    5.0
Mean:   3.7
Median: 3.0
p25:    2.0
Min:    1.0

Data Valuation

In [4]:
df.kxy.data_valuation(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[4]:
Achievable R-Squared Achievable Log-Likelihood Per Sample Achievable Accuracy
0 0.97 7.01e-02 1.00

Automatic (Model-Free) Variable Selection

In [5]:
df.kxy.variable_selection(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[5]:
Variable Running Achievable R-Squared Running Achievable Accuracy
Selection Order
0 No Variable 0.00 0.24
1 x_17 0.80 1.00
2 x_20 0.93 1.00
3 x_35 0.96 1.00
4 x_2 0.96 1.00
5 x_26 0.96 1.00
6 x_9 0.96 1.00
7 x_4 0.96 1.00
8 x_0 0.97 1.00
9 x_1 0.97 1.00
10 x_5 0.97 1.00
11 x_10 0.97 1.00
12 x_3 0.97 1.00
13 x_7 0.97 1.00
14 x_6 0.97 1.00
15 x_8 0.97 1.00
16 x_11 0.97 1.00
17 x_12 0.97 1.00
18 x_13 0.97 1.00
19 x_14 0.97 1.00
20 x_15 0.97 1.00
21 x_16 0.97 1.00
22 x_18 0.97 1.00
23 x_19 0.97 1.00
24 x_21 0.97 1.00
25 x_22 0.97 1.00
26 x_23 0.97 1.00
27 x_24 0.97 1.00
28 x_25 0.97 1.00
29 x_27 0.97 1.00
30 x_28 0.97 1.00
31 x_29 0.97 1.00
32 x_30 0.97 1.00
33 x_31 0.97 1.00
34 x_32 0.97 1.00
35 x_33 0.97 1.00
36 x_34 0.97 1.00