Sensorless Drive (UCI, Classification, n=58509, d=48, 11 classes)

Loading The Data

In [1]:
from kxy_datasets.uci_classifications import SensorLessDrive # pip install kxy_datasets
In [2]:
dataset = SensorLessDrive()
df = dataset.df # Retrieve the dataset as a pandas dataframe
y_column = dataset.y_column # The name of the column corresponding to the target
problem_type = dataset.problem_type # 'regression' or 'classification'
In [3]:
df.kxy.describe() # Visualize a summary of the data

-----------
Column: x_0
-----------
Type:   Continuous
Max:    0.0
p75:    0.0
Mean:   -0.0
Median: -0.0
p25:    -0.0
Min:    -0.0

-----------
Column: x_1
-----------
Type:   Continuous
Max:    0.0
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    -0.0
Min:    -0.0

------------
Column: x_10
------------
Type:   Continuous
Max:    0.4
p75:    0.0
Mean:   -0.0
Median: -0.0
p25:    -0.0
Min:    -0.2

------------
Column: x_11
------------
Type:   Continuous
Max:    0.4
p75:    0.0
Mean:   -0.0
Median: -0.0
p25:    -0.0
Min:    -0.2

------------
Column: x_12
------------
Type:   Continuous
Max:    0.1
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    0.0
Min:    0.0

------------
Column: x_13
------------
Type:   Continuous
Max:    0.1
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    0.0
Min:    0.0

------------
Column: x_14
------------
Type:   Continuous
Max:    0.1
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    0.0
Min:    0.0

------------
Column: x_15
------------
Type:   Continuous
Max:    0.1
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    0.0
Min:    0.0

------------
Column: x_16
------------
Type:   Continuous
Max:    0.1
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    0.0
Min:    0.0

------------
Column: x_17
------------
Type:   Continuous
Max:    0.1
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    0.0
Min:    0.0

------------
Column: x_18
------------
Type:   Continuous
Max:    2.4
p75:    1.9
Mean:   1.6
Median: 1.6
p25:    1.3
Min:    0.8

------------
Column: x_19
------------
Type:   Continuous
Max:    2.4
p75:    1.9
Mean:   1.6
Median: 1.6
p25:    1.3
Min:    0.8

-----------
Column: x_2
-----------
Type:   Continuous
Max:    0.0
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    -0.0
Min:    -0.0

------------
Column: x_20
------------
Type:   Continuous
Max:    2.4
p75:    1.9
Mean:   1.6
Median: 1.6
p25:    1.3
Min:    0.8

------------
Column: x_21
------------
Type:   Continuous
Max:    2.4
p75:    1.9
Mean:   1.6
Median: 1.6
p25:    1.3
Min:    0.8

------------
Column: x_22
------------
Type:   Continuous
Max:    2.4
p75:    1.9
Mean:   1.6
Median: 1.6
p25:    1.3
Min:    0.8

------------
Column: x_23
------------
Type:   Continuous
Max:    2.4
p75:    1.9
Mean:   1.6
Median: 1.6
p25:    1.3
Min:    0.8

------------
Column: x_24
------------
Type:   Continuous
Max:    28
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    -0.0
Min:    -15.8

------------
Column: x_25
------------
Type:   Continuous
Max:    12
p75:    0.2
Mean:   0.0
Median: 0.0
p25:    -0.2
Min:    -12.4

------------
Column: x_26
------------
Type:   Continuous
Max:    9.6
p75:    0.4
Mean:   -0.0
Median: -0.0
p25:    -0.5
Min:    -8.0

------------
Column: x_27
------------
Type:   Continuous
Max:    18
p75:    0.0
Mean:   -0.0
Median: 0.0
p25:    -0.0
Min:    -11.9

------------
Column: x_28
------------
Type:   Continuous
Max:    10
p75:    0.2
Mean:   0.0
Median: 0.0
p25:    -0.2
Min:    -12.5

------------
Column: x_29
------------
Type:   Continuous
Max:    8.8
p75:    0.4
Mean:   -0.0
Median: -0.0
p25:    -0.4
Min:    -10.0

-----------
Column: x_3
-----------
Type:   Continuous
Max:    0.0
p75:    0.0
Mean:   -0.0
Median: -0.0
p25:    -0.0
Min:    -0.0

------------
Column: x_30
------------
Type:   Continuous
Max:    0.1
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    -0.0
Min:    -0.1

------------
Column: x_31
------------
Type:   Continuous
Max:    0.1
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    -0.0
Min:    -0.1

------------
Column: x_32
------------
Type:   Continuous
Max:    0.1
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    -0.0
Min:    -0.1

------------
Column: x_33
------------
Type:   Continuous
Max:    0.2
p75:    0.0
Mean:   -0.0
Median: -0.0
p25:    -0.0
Min:    -0.3

------------
Column: x_34
------------
Type:   Continuous
Max:    0.2
p75:    0.0
Mean:   -0.0
Median: -0.0
p25:    -0.0
Min:    -0.3

------------
Column: x_35
------------
Type:   Continuous
Max:    0.2
p75:    0.0
Mean:   -0.0
Median: -0.0
p25:    -0.0
Min:    -0.3

------------
Column: x_36
------------
Type:   Continuous
Max:    4,015
p75:    -0.6
Mean:   -0.5
Median: -0.7
p25:    -0.7
Min:    -0.9

------------
Column: x_37
------------
Type:   Continuous
Max:    312
p75:    8.4
Mean:   7.4
Median: 3.3
p25:    1.5
Min:    -0.6

------------
Column: x_38
------------
Type:   Continuous
Max:    265
p75:    10.0
Mean:   8.4
Median: 6.6
p25:    4.5
Min:    0.5

------------
Column: x_39
------------
Type:   Continuous
Max:    3,670
p75:    -0.6
Mean:   -0.4
Median: -0.7
p25:    -0.7
Min:    -0.9

-----------
Column: x_4
-----------
Type:   Continuous
Max:    0.0
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    -0.0
Min:    -0.0

------------
Column: x_40
------------
Type:   Continuous
Max:    889
p75:    8.3
Mean:   7.3
Median: 3.3
p25:    1.5
Min:    -0.6

------------
Column: x_41
------------
Type:   Continuous
Max:    153
p75:    9.9
Mean:   8.3
Median: 6.5
p25:    4.4
Min:    0.3

------------
Column: x_42
------------
Type:   Continuous
Max:    -1.5
p75:    -1.5
Mean:   -1.5
Median: -1.5
p25:    -1.5
Min:    -1.5

------------
Column: x_43
------------
Type:   Continuous
Max:    -1.5
p75:    -1.5
Mean:   -1.5
Median: -1.5
p25:    -1.5
Min:    -1.5

------------
Column: x_44
------------
Type:   Continuous
Max:    -1.5
p75:    -1.5
Mean:   -1.5
Median: -1.5
p25:    -1.5
Min:    -1.5

------------
Column: x_45
------------
Type:   Continuous
Max:    -1.3
p75:    -1.5
Mean:   -1.5
Median: -1.5
p25:    -1.5
Min:    -1.5

------------
Column: x_46
------------
Type:   Continuous
Max:    -1.3
p75:    -1.5
Mean:   -1.5
Median: -1.5
p25:    -1.5
Min:    -1.5

------------
Column: x_47
------------
Type:   Continuous
Max:    -1.3
p75:    -1.5
Mean:   -1.5
Median: -1.5
p25:    -1.5
Min:    -1.5

-----------
Column: x_5
-----------
Type:   Continuous
Max:    0.0
p75:    0.0
Mean:   -0.0
Median: -0.0
p25:    -0.0
Min:    -0.0

-----------
Column: x_6
-----------
Type:   Continuous
Max:    0.1
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    -0.0
Min:    -0.1

-----------
Column: x_7
-----------
Type:   Continuous
Max:    0.1
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    -0.0
Min:    -0.1

-----------
Column: x_8
-----------
Type:   Continuous
Max:    0.1
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    -0.0
Min:    -0.1

-----------
Column: x_9
-----------
Type:   Continuous
Max:    0.4
p75:    0.0
Mean:   -0.0
Median: -0.0
p25:    -0.0
Min:    -0.2

---------
Column: y
---------
Type:   Continuous
Max:    11
p75:    9.0
Mean:   6.0
Median: 6.0
p25:    3.0
Min:    1.0

Data Valuation

In [4]:
df.kxy.data_valuation(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[4]:
Achievable R-Squared Achievable Log-Likelihood Per Sample Achievable Accuracy
0 0.99 -8.55e-05 1.00

Automatic (Model-Free) Variable Selection

In [5]:
df.kxy.variable_selection(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[5]:
Variable Running Achievable R-Squared Running Achievable Accuracy
Selection Order
0 No Variable 0.00 0.09
1 x_7 0.87 1.00
2 x_9 0.99 1.00
3 x_30 0.99 1.00
4 x_42 0.99 1.00
5 x_47 0.99 1.00
6 x_45 0.99 1.00
7 x_44 0.99 1.00
8 x_43 0.99 1.00
9 x_46 0.99 1.00
10 x_34 0.99 1.00
11 x_27 0.99 1.00
12 x_15 0.99 1.00
13 x_16 0.99 1.00
14 x_40 0.99 1.00
15 x_24 0.99 1.00
16 x_12 0.99 1.00
17 x_21 0.99 1.00
18 x_36 0.99 1.00
19 x_13 0.99 1.00
20 x_3 0.99 1.00
21 x_39 0.99 1.00
22 x_32 0.99 1.00
23 x_4 0.99 1.00
24 x_0 0.99 1.00
25 x_17 0.99 1.00
26 x_37 0.99 1.00
27 x_14 0.99 1.00
28 x_1 0.99 1.00
29 x_41 0.99 1.00
30 x_38 0.99 1.00
31 x_2 0.99 1.00
32 x_26 0.99 1.00
33 x_25 0.99 1.00
34 x_5 0.99 1.00
35 x_29 0.99 1.00
36 x_28 0.99 1.00
37 x_22 0.99 1.00
38 x_20 0.99 1.00
39 x_35 0.99 1.00
40 x_10 0.99 1.00
41 x_6 0.99 1.00
42 x_31 0.99 1.00
43 x_8 0.99 1.00
44 x_11 0.99 1.00
45 x_23 0.99 1.00
46 x_33 0.99 1.00
47 x_19 0.99 1.00
48 x_18 0.99 1.00