Parkinson (UCI, Regression, n=5875, d=20)

Loading The Data

In [1]:
from kxy_datasets.uci_regressions import Parkinson # pip install kxy_datasets
In [2]:
dataset = Parkinson()
df = dataset.df # Retrieve the dataset as a pandas dataframe
y_column = dataset.y_column # The name of the column corresponding to the target
problem_type = dataset.problem_type # 'regression' or 'classification'
In [3]:
df.kxy.describe() # Visualize a summary of the data

-----------
Column: DFA
-----------
Type:   Continuous
Max:    0.9
p75:    0.7
Mean:   0.7
Median: 0.6
p25:    0.6
Min:    0.5

-----------
Column: HNR
-----------
Type:   Continuous
Max:    37
p75:    24
Mean:   21
Median: 21
p25:    19
Min:    1.7

-----------------
Column: Jitter(%)
-----------------
Type:   Continuous
Max:    0.1
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    0.0
Min:    0.0

-------------------
Column: Jitter(Abs)
-------------------
Type:   Continuous
Max:    0.0
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    0.0
Min:    0.0

------------------
Column: Jitter:DDP
------------------
Type:   Continuous
Max:    0.2
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    0.0
Min:    0.0

-------------------
Column: Jitter:PPQ5
-------------------
Type:   Continuous
Max:    0.1
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    0.0
Min:    0.0

------------------
Column: Jitter:RAP
------------------
Type:   Continuous
Max:    0.1
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    0.0
Min:    0.0

-----------
Column: NHR
-----------
Type:   Continuous
Max:    0.7
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    0.0
Min:    0.0

-----------
Column: PPE
-----------
Type:   Continuous
Max:    0.7
p75:    0.3
Mean:   0.2
Median: 0.2
p25:    0.2
Min:    0.0

------------
Column: RPDE
------------
Type:   Continuous
Max:    1.0
p75:    0.6
Mean:   0.5
Median: 0.5
p25:    0.5
Min:    0.2

---------------
Column: Shimmer
---------------
Type:   Continuous
Max:    0.3
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    0.0
Min:    0.0

-------------------
Column: Shimmer(dB)
-------------------
Type:   Continuous
Max:    2.1
p75:    0.4
Mean:   0.3
Median: 0.3
p25:    0.2
Min:    0.0

---------------------
Column: Shimmer:APQ11
---------------------
Type:   Continuous
Max:    0.3
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    0.0
Min:    0.0

--------------------
Column: Shimmer:APQ3
--------------------
Type:   Continuous
Max:    0.2
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    0.0
Min:    0.0

--------------------
Column: Shimmer:APQ5
--------------------
Type:   Continuous
Max:    0.2
p75:    0.0
Mean:   0.0
Median: 0.0
p25:    0.0
Min:    0.0

-------------------
Column: Shimmer:DDA
-------------------
Type:   Continuous
Max:    0.5
p75:    0.1
Mean:   0.1
Median: 0.0
p25:    0.0
Min:    0.0

-----------
Column: age
-----------
Type:   Continuous
Max:    85
p75:    72
Mean:   64
Median: 65
p25:    58
Min:    36

-------------------
Column: motor_UPDRS
-------------------
Type:   Continuous
Max:    39
p75:    27
Mean:   21
Median: 20
p25:    15
Min:    5.0

-----------
Column: sex
-----------
Type:   Continuous
Max:    1.0
p75:    1.0
Mean:   0.3
Median: 0.0
p25:    0.0
Min:    0.0

----------------
Column: subject#
----------------
Type:   Continuous
Max:    42
p75:    33
Mean:   21
Median: 22
p25:    10
Min:    1.0

-----------------
Column: test_time
-----------------
Type:   Continuous
Max:    215
p75:    138
Mean:   92
Median: 91
p25:    46
Min:    -4.3

-------------------
Column: total_UPDRS
-------------------
Type:   Continuous
Max:    54
p75:    36
Mean:   29
Median: 27
p25:    21
Min:    7.0

Data Valuation

In [4]:
df.kxy.data_valuation(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[4]:
Achievable R-Squared Achievable Log-Likelihood Per Sample Achievable RMSE
0 1.00 25 7.60e-13

Automatic (Model-Free) Variable Selection

In [5]:
df.kxy.variable_selection(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[5]:
Variable Running Achievable R-Squared Running Achievable RMSE
Selection Order
0 No Variable 0.00 8.13
1 total_UPDRS 0.93 2.19
2 DFA 0.94 2.06
3 sex 0.95 1.89
4 subject# 0.96 1.72
5 Jitter(Abs) 0.96 1.72
6 Jitter(%) 0.96 1.72
7 HNR 0.96 1.72
8 Jitter:DDP 0.96 1.72
9 age 0.96 1.72
10 Jitter:PPQ5 0.96 1.72
11 NHR 0.96 1.72
12 Shimmer:APQ11 0.96 1.72
13 Shimmer:APQ5 0.96 1.72
14 Shimmer 0.96 1.61
15 Shimmer(dB) 0.97 1.50
16 RPDE 0.97 1.40
17 Shimmer:APQ3 0.97 1.31
18 test_time 0.98 1.22
19 PPE 0.98 1.14
20 Jitter:RAP 1.00 0.0000
21 Shimmer:DDA 1.00 0.0000