Abalone (UCI, Regression, n=4177, d=8)¶

Loading The Data¶

In [1]:

from kxy_datasets.uci_regressions import Abalone # pip install kxy_datasets

In [2]:

dataset = Abalone()
df = dataset.df # Retrieve the dataset as a pandas dataframe
y_column = dataset.y_column # The name of the column corresponding to the target
problem_type = dataset.problem_type # 'regression' or 'classification'

In [3]:

df.kxy.describe() # Visualize a summary of the data


-----------
Column: Age
-----------
Type:   Continuous
Max:    30
p75:    12
Mean:   11
Median: 10
p25:    9.5
Min:    2.5

----------------
Column: Diameter
----------------
Type:   Continuous
Max:    0.7
p75:    0.5
Mean:   0.4
Median: 0.4
p25:    0.3
Min:    0.1

--------------
Column: Height
--------------
Type:   Continuous
Max:    1.1
p75:    0.2
Mean:   0.1
Median: 0.1
p25:    0.1
Min:    0.0

--------------
Column: Length
--------------
Type:   Continuous
Max:    0.8
p75:    0.6
Mean:   0.5
Median: 0.5
p25:    0.5
Min:    0.1

-----------
Column: Sex
-----------
Type:      Categorical
Frequency: 36.58%, Label: M
Frequency: 32.13%, Label: I
Frequency: 31.29%, Label: F

--------------------
Column: Shell weight
--------------------
Type:   Continuous
Max:    1.0
p75:    0.3
Mean:   0.2
Median: 0.2
p25:    0.1
Min:    0.0

----------------------
Column: Shucked weight
----------------------
Type:   Continuous
Max:    1.5
p75:    0.5
Mean:   0.4
Median: 0.3
p25:    0.2
Min:    0.0

----------------------
Column: Viscera weight
----------------------
Type:   Continuous
Max:    0.8
p75:    0.3
Mean:   0.2
Median: 0.2
p25:    0.1
Min:    0.0

--------------------
Column: Whole weight
--------------------
Type:   Continuous
Max:    2.8
p75:    1.2
Mean:   0.8
Median: 0.8
p25:    0.4
Min:    0.0

Data Valuation¶

In [4]:

df.kxy.data_valuation(y_column, problem_type=problem_type)

[====================================================================================================] 100% ETA: 0s    Duration: 0s

Out[4]:

	Achievable R-Squared	Achievable Log-Likelihood Per Sample	Achievable RMSE
0	1.00	2.46	2.50e-02

Automatic (Model-Free) Variable Selection¶

In [5]:

df.kxy.variable_selection(y_column, problem_type=problem_type)

[====================================================================================================] 100% ETA: 0s    Duration: 0s

Out[5]:

	Variable	Running Achievable R-Squared	Running Achievable RMSE
Selection Order
0	No Variable	0.00	3.22
1	Shell weight	0.58	2.09
2	Shucked weight	0.64	1.93
3	Whole weight	0.64	1.93
4	Height	0.92	0.92
5	Sex	0.92	0.92
6	Viscera weight	0.92	0.92
7	Diameter	0.92	0.92
8	Length	1.00	0.0250