Abalone (UCI, Regression, n=4177, d=8)

Loading The Data

In [1]:
from kxy_datasets.uci_regressions import Abalone # pip install kxy_datasets
In [2]:
dataset = Abalone()
df = dataset.df # Retrieve the dataset as a pandas dataframe
y_column = dataset.y_column # The name of the column corresponding to the target
problem_type = dataset.problem_type # 'regression' or 'classification'
In [3]:
df.kxy.describe() # Visualize a summary of the data

-----------
Column: Age
-----------
Type:   Continuous
Max:    30
p75:    12
Mean:   11
Median: 10
p25:    9.5
Min:    2.5

----------------
Column: Diameter
----------------
Type:   Continuous
Max:    0.7
p75:    0.5
Mean:   0.4
Median: 0.4
p25:    0.3
Min:    0.1

--------------
Column: Height
--------------
Type:   Continuous
Max:    1.1
p75:    0.2
Mean:   0.1
Median: 0.1
p25:    0.1
Min:    0.0

--------------
Column: Length
--------------
Type:   Continuous
Max:    0.8
p75:    0.6
Mean:   0.5
Median: 0.5
p25:    0.5
Min:    0.1

-----------
Column: Sex
-----------
Type:      Categorical
Frequency: 36.58%, Label: M
Frequency: 32.13%, Label: I
Frequency: 31.29%, Label: F

--------------------
Column: Shell weight
--------------------
Type:   Continuous
Max:    1.0
p75:    0.3
Mean:   0.2
Median: 0.2
p25:    0.1
Min:    0.0

----------------------
Column: Shucked weight
----------------------
Type:   Continuous
Max:    1.5
p75:    0.5
Mean:   0.4
Median: 0.3
p25:    0.2
Min:    0.0

----------------------
Column: Viscera weight
----------------------
Type:   Continuous
Max:    0.8
p75:    0.3
Mean:   0.2
Median: 0.2
p25:    0.1
Min:    0.0

--------------------
Column: Whole weight
--------------------
Type:   Continuous
Max:    2.8
p75:    1.2
Mean:   0.8
Median: 0.8
p25:    0.4
Min:    0.0

Data Valuation

In [4]:
df.kxy.data_valuation(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[4]:
Achievable R-Squared Achievable Log-Likelihood Per Sample Achievable RMSE
0 1.00 2.46 2.50e-02

Automatic (Model-Free) Variable Selection

In [5]:
df.kxy.variable_selection(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[5]:
Variable Running Achievable R-Squared Running Achievable RMSE
Selection Order
0 No Variable 0.00 3.22
1 Shell weight 0.58 2.09
2 Shucked weight 0.64 1.93
3 Whole weight 0.64 1.93
4 Height 0.92 0.92
5 Sex 0.92 0.92
6 Viscera weight 0.92 0.92
7 Diameter 0.92 0.92
8 Length 1.00 0.0250