Energy Efficiency (UCI, Regression, n=768, d=8)

Loading The Data

In [1]:
from kxy_datasets.uci_regressions import EnergyEfficiency # pip install kxy_datasets
In [2]:
dataset = EnergyEfficiency()
df = dataset.df # Retrieve the dataset as a pandas dataframe
y_column = dataset.y_column # The name of the column corresponding to the target
problem_type = dataset.problem_type # 'regression' or 'classification'
In [3]:
df.kxy.describe() # Visualize a summary of the data

----------
Column: X1
----------
Type:   Continuous
Max:    1.0
p75:    0.8
Mean:   0.8
Median: 0.8
p25:    0.7
Min:    0.6

----------
Column: X2
----------
Type:   Continuous
Max:    808
p75:    741
Mean:   671
Median: 673
p25:    606
Min:    514

----------
Column: X3
----------
Type:   Continuous
Max:    416
p75:    343
Mean:   318
Median: 318
p25:    294
Min:    245

----------
Column: X4
----------
Type:   Continuous
Max:    220
p75:    220
Mean:   176
Median: 183
p25:    140
Min:    110

----------
Column: X5
----------
Type:   Continuous
Max:    7.0
p75:    7.0
Mean:   5.2
Median: 5.2
p25:    3.5
Min:    3.5

----------
Column: X6
----------
Type:   Continuous
Max:    5.0
p75:    4.2
Mean:   3.5
Median: 3.5
p25:    2.8
Min:    2.0

----------
Column: X7
----------
Type:   Continuous
Max:    0.4
p75:    0.4
Mean:   0.2
Median: 0.2
p25:    0.1
Min:    0.0

----------
Column: X8
----------
Type:   Continuous
Max:    5.0
p75:    4.0
Mean:   2.8
Median: 3.0
p25:    1.8
Min:    0.0

----------
Column: Y1
----------
Type:   Continuous
Max:    43
p75:    31
Mean:   22
Median: 18
p25:    12
Min:    6.0

----------
Column: Y2
----------
Type:   Continuous
Max:    48
p75:    33
Mean:   24
Median: 22
p25:    15
Min:    10

Data Valuation

In [4]:
df.kxy.data_valuation(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[4]:
Achievable R-Squared Achievable Log-Likelihood Per Sample Achievable RMSE
0 0.98 -1.61 1.35

Automatic (Model-Free) Variable Selection

In [5]:
df.kxy.variable_selection(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[5]:
Variable Running Achievable R-Squared Running Achievable RMSE
Selection Order
0 No Variable 0.00 1.01e+01
1 Y2 0.92 2.83
2 X3 0.98 1.38
3 X4 0.98 1.38
4 X7 0.98 1.35
5 X2 0.98 1.35
6 X8 0.98 1.35
7 X6 0.98 1.35
8 X5 0.98 1.35
9 X1 0.98 1.35