Bike Sharing (UCI, Regression, n=17379, d=18)

Loading The Data

In [1]:
from kxy_datasets.uci_regressions import BikeSharing # pip install kxy_datasets
In [2]:
dataset = BikeSharing()
df = dataset.df # Retrieve the dataset as a pandas dataframe
y_column = dataset.y_column # The name of the column corresponding to the target
problem_type = dataset.problem_type # 'regression' or 'classification'
In [3]:
df.kxy.describe() # Visualize a summary of the data

-------------
Column: atemp
-------------
Type:   Continuous
Max:    1.0
p75:    0.6
Mean:   0.5
Median: 0.5
p25:    0.3
Min:    0.0

--------------
Column: casual
--------------
Type:   Continuous
Max:    367
p75:    48
Mean:   35
Median: 17
p25:    4.0
Min:    0.0

-----------
Column: cnt
-----------
Type:   Continuous
Max:    977
p75:    281
Mean:   189
Median: 142
p25:    40
Min:    1.0

--------------
Column: dteday
--------------
Type:   Continuous
Max:    31
p75:    23
Mean:   15
Median: 16
p25:    8.0
Min:    1.0

----------------
Column: dtemonth
----------------
Type:   Continuous
Max:    12
p75:    10
Mean:   6.5
Median: 7.0
p25:    4.0
Min:    1.0

---------------
Column: dteyear
---------------
Type:   Continuous
Max:    2,012
p75:    2,012
Mean:   2,011
Median: 2,012
p25:    2,011
Min:    2,011

---------------
Column: holiday
---------------
Type:   Continuous
Max:    1.0
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    0.0
Min:    0.0

----------
Column: hr
----------
Type:   Continuous
Max:    23
p75:    18
Mean:   11
Median: 12
p25:    6.0
Min:    0.0

-----------
Column: hum
-----------
Type:   Continuous
Max:    1.0
p75:    0.8
Mean:   0.6
Median: 0.6
p25:    0.5
Min:    0.0

---------------
Column: instant
---------------
Type:   Continuous
Max:    17,379
p75:    13,034
Mean:   8,690
Median: 8,690
p25:    4,345
Min:    1.0

------------
Column: mnth
------------
Type:   Continuous
Max:    12
p75:    10
Mean:   6.5
Median: 7.0
p25:    4.0
Min:    1.0

------------------
Column: registered
------------------
Type:   Continuous
Max:    886
p75:    220
Mean:   153
Median: 115
p25:    34
Min:    0.0

--------------
Column: season
--------------
Type:   Continuous
Max:    4.0
p75:    3.0
Mean:   2.5
Median: 3.0
p25:    2.0
Min:    1.0

------------
Column: temp
------------
Type:   Continuous
Max:    1.0
p75:    0.7
Mean:   0.5
Median: 0.5
p25:    0.3
Min:    0.0

------------------
Column: weathersit
------------------
Type:   Continuous
Max:    4.0
p75:    2.0
Mean:   1.4
Median: 1.0
p25:    1.0
Min:    1.0

---------------
Column: weekday
---------------
Type:   Continuous
Max:    6.0
p75:    5.0
Mean:   3.0
Median: 3.0
p25:    1.0
Min:    0.0

-----------------
Column: windspeed
-----------------
Type:   Continuous
Max:    0.9
p75:    0.3
Mean:   0.2
Median: 0.2
p25:    0.1
Min:    0.0

------------------
Column: workingday
------------------
Type:   Continuous
Max:    1.0
p75:    1.0
Mean:   0.7
Median: 1.0
p25:    0.0
Min:    0.0

----------
Column: yr
----------
Type:   Continuous
Max:    1.0
p75:    1.0
Mean:   0.5
Median: 1.0
p25:    0.0
Min:    0.0

Data Valuation

In [4]:
df.kxy.data_valuation(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[4]:
Achievable R-Squared Achievable Log-Likelihood Per Sample Achievable RMSE
0 1.00 -3.14 3.98

Automatic (Model-Free) Variable Selection

In [5]:
df.kxy.variable_selection(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[5]:
Variable Running Achievable R-Squared Running Achievable RMSE
Selection Order
0 No Variable 0.00 1.81e+02
1 registered 0.99 2.11e+01
2 casual 1.00 3.98
3 hr 1.00 3.98
4 weekday 1.00 3.98
5 instant 1.00 3.98
6 hum 1.00 3.98
7 dteyear 1.00 3.98
8 dtemonth 1.00 3.98
9 dteday 1.00 3.98
10 workingday 1.00 3.98
11 season 1.00 3.98
12 windspeed 1.00 3.98
13 temp 1.00 3.98
14 atemp 1.00 3.98
15 weathersit 1.00 3.98
16 holiday 1.00 3.98
17 mnth 1.00 3.98
18 yr 1.00 3.98