Yacht Hydrodynamics (UCI, Regression, n=308, d=6)

Loading The Data

In [1]:
from kxy_datasets.uci_regressions import YachtHydrodynamics # pip install kxy_datasets
In [2]:
dataset = YachtHydrodynamics()
df = dataset.df # Retrieve the dataset as a pandas dataframe
y_column = dataset.y_column # The name of the column corresponding to the target
problem_type = dataset.problem_type # 'regression' or 'classification'
In [3]:
df.kxy.describe() # Visualize a summary of the data

--------------------------
Column: Beam-Draught Ratio
--------------------------
Type:   Continuous
Max:    5.3
p75:    4.2
Mean:   3.9
Median: 4.0
p25:    3.8
Min:    2.8

---------------------
Column: Froude Number
---------------------
Type:   Continuous
Max:    0.5
p75:    0.4
Mean:   0.3
Median: 0.3
p25:    0.2
Min:    0.1

-------------------------
Column: Length-Beam Ratio
-------------------------
Type:   Continuous
Max:    3.6
p75:    3.5
Mean:   3.2
Median: 3.1
p25:    3.1
Min:    2.7

---------------------------
Column: Length-Displacement
---------------------------
Type:   Continuous
Max:    5.1
p75:    5.1
Mean:   4.8
Median: 4.8
p25:    4.8
Min:    4.3

-----------------------------
Column: Longitudinal Position
-----------------------------
Type:   Continuous
Max:    0.0
p75:    -2.3
Mean:   -2.4
Median: -2.3
p25:    -2.4
Min:    -5.0

------------------------------
Column: Prismatic Coeefficient
------------------------------
Type:   Continuous
Max:    0.6
p75:    0.6
Mean:   0.6
Median: 0.6
p25:    0.5
Min:    0.5

----------------------------
Column: Residuary Resistance
----------------------------
Type:   Continuous
Max:    62
p75:    12
Mean:   10
Median: 3.1
p25:    0.8
Min:    0.0

Data Valuation

In [4]:
df.kxy.data_valuation(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[4]:
Achievable R-Squared Achievable Log-Likelihood Per Sample Achievable RMSE
0 0.99 -1.30 1.46

Automatic (Model-Free) Variable Selection

In [5]:
df.kxy.variable_selection(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[5]:
Variable Running Achievable R-Squared Running Achievable RMSE
Selection Order
0 No Variable 0.00 1.51e+01
1 Froude Number 0.98 1.90
2 Beam-Draught Ratio 0.99 1.46
3 Longitudinal Position 0.99 1.46
4 Length-Displacement 0.99 1.46
5 Prismatic Coeefficient 0.99 1.46
6 Length-Beam Ratio 0.99 1.46