Superconductivity (UCI, Regression, n=21263, d=81)

Loading The Data

In [1]:
from kxy_datasets.uci_regressions import Superconductivity # pip install kxy_datasets
In [2]:
dataset = Superconductivity()
df = dataset.df # Retrieve the dataset as a pandas dataframe
y_column = dataset.y_column # The name of the column corresponding to the target
problem_type = dataset.problem_type # 'regression' or 'classification'
In [3]:
df.kxy.describe() # Visualize a summary of the data

---------------------
Column: critical_temp
---------------------
Type:   Continuous
Max:    185
p75:    63
Mean:   34
Median: 20
p25:    5.4
Min:    0.0

-----------------------
Column: entropy_Density
-----------------------
Type:   Continuous
Max:    2.0
p75:    1.3
Mean:   1.1
Median: 1.1
p25:    0.9
Min:    0.0

--------------------------------
Column: entropy_ElectronAffinity
--------------------------------
Type:   Continuous
Max:    1.8
p75:    1.3
Mean:   1.1
Median: 1.1
p25:    0.9
Min:    0.0

--------------------------
Column: entropy_FusionHeat
--------------------------
Type:   Continuous
Max:    2.0
p75:    1.4
Mean:   1.1
Median: 1.1
p25:    0.8
Min:    0.0

-----------------------------------
Column: entropy_ThermalConductivity
-----------------------------------
Type:   Continuous
Max:    1.6
p75:    1.0
Mean:   0.7
Median: 0.7
p25:    0.5
Min:    0.0

-----------------------
Column: entropy_Valence
-----------------------
Type:   Continuous
Max:    2.1
p75:    1.6
Mean:   1.3
Median: 1.4
p25:    1.1
Min:    0.0

---------------------------
Column: entropy_atomic_mass
---------------------------
Type:   Continuous
Max:    2.0
p75:    1.4
Mean:   1.2
Median: 1.2
p25:    1.0
Min:    0.0

-----------------------------
Column: entropy_atomic_radius
-----------------------------
Type:   Continuous
Max:    2.1
p75:    1.5
Mean:   1.3
Median: 1.3
p25:    1.1
Min:    0.0

-------------------
Column: entropy_fie
-------------------
Type:   Continuous
Max:    2.2
p75:    1.6
Mean:   1.3
Median: 1.4
p25:    1.1
Min:    0.0

---------------------
Column: gmean_Density
---------------------
Type:   Continuous
Max:    22,590
p75:    5,794
Mean:   3,460
Median: 1,339
p25:    883
Min:    1.4

------------------------------
Column: gmean_ElectronAffinity
------------------------------
Type:   Continuous
Max:    326
p75:    67
Mean:   54
Median: 51
p25:    33
Min:    1.5

------------------------
Column: gmean_FusionHeat
------------------------
Type:   Continuous
Max:    105
p75:    13
Mean:   10
Median: 5.3
p25:    4.1
Min:    0.2

---------------------------------
Column: gmean_ThermalConductivity
---------------------------------
Type:   Continuous
Max:    317
p75:    42
Mean:   29
Median: 14
p25:    8.3
Min:    0.0

---------------------
Column: gmean_Valence
---------------------
Type:   Continuous
Max:    7.0
p75:    3.7
Mean:   3.1
Median: 2.6
p25:    2.3
Min:    1.0

-------------------------
Column: gmean_atomic_mass
-------------------------
Type:   Continuous
Max:    208
p75:    78
Mean:   71
Median: 66
p25:    58
Min:    5.3

---------------------------
Column: gmean_atomic_radius
---------------------------
Type:   Continuous
Max:    298
p75:    155
Mean:   144
Median: 142
p25:    133
Min:    48

-----------------
Column: gmean_fie
-----------------
Type:   Continuous
Max:    1,313
p75:    765
Mean:   737
Median: 727
p25:    692
Min:    375

--------------------
Column: mean_Density
--------------------
Type:   Continuous
Max:    22,590
p75:    6,728
Mean:   6,111
Median: 5,329
p25:    4,513
Min:    1.4

-----------------------------
Column: mean_ElectronAffinity
-----------------------------
Type:   Continuous
Max:    326
p75:    85
Mean:   76
Median: 73
p25:    62
Min:    1.5

-----------------------
Column: mean_FusionHeat
-----------------------
Type:   Continuous
Max:    105
p75:    17
Mean:   14
Median: 9.3
p25:    7.6
Min:    0.2

--------------------------------
Column: mean_ThermalConductivity
--------------------------------
Type:   Continuous
Max:    332
p75:    111
Mean:   89
Median: 96
p25:    61
Min:    0.0

--------------------
Column: mean_Valence
--------------------
Type:   Continuous
Max:    7.0
p75:    4.0
Mean:   3.2
Median: 2.8
p25:    2.3
Min:    1.0

------------------------
Column: mean_atomic_mass
------------------------
Type:   Continuous
Max:    208
p75:    100
Mean:   87
Median: 84
p25:    72
Min:    6.9

--------------------------
Column: mean_atomic_radius
--------------------------
Type:   Continuous
Max:    298
p75:    169
Mean:   157
Median: 160
p25:    149
Min:    48

----------------
Column: mean_fie
----------------
Type:   Continuous
Max:    1,313
p75:    796
Mean:   769
Median: 764
p25:    723
Min:    375

--------------------------
Column: number_of_elements
--------------------------
Type:   Continuous
Max:    9.0
p75:    5.0
Mean:   4.1
Median: 4.0
p25:    3.0
Min:    1.0

---------------------
Column: range_Density
---------------------
Type:   Continuous
Max:    22,588
p75:    9,778
Mean:   8,665
Median: 8,958
p25:    6,648
Min:    0.0

------------------------------
Column: range_ElectronAffinity
------------------------------
Type:   Continuous
Max:    349
p75:    138
Mean:   120
Median: 127
p25:    86
Min:    0.0

------------------------
Column: range_FusionHeat
------------------------
Type:   Continuous
Max:    104
p75:    23
Mean:   21
Median: 12
p25:    12
Min:    0.0

---------------------------------
Column: range_ThermalConductivity
---------------------------------
Type:   Continuous
Max:    429
p75:    399
Mean:   250
Median: 399
p25:    86
Min:    0.0

---------------------
Column: range_Valence
---------------------
Type:   Continuous
Max:    6.0
p75:    3.0
Mean:   2.0
Median: 2.0
p25:    1.0
Min:    0.0

-------------------------
Column: range_atomic_mass
-------------------------
Type:   Continuous
Max:    207
p75:    154
Mean:   115
Median: 122
p25:    78
Min:    0.0

---------------------------
Column: range_atomic_radius
---------------------------
Type:   Continuous
Max:    256
p75:    205
Mean:   139
Median: 171
p25:    80
Min:    0.0

-----------------
Column: range_fie
-----------------
Type:   Continuous
Max:    1,304
p75:    810
Mean:   572
Median: 764
p25:    262
Min:    0.0

-------------------
Column: std_Density
-------------------
Type:   Continuous
Max:    10,724
p75:    4,004
Mean:   3,416
Median: 3,301
p25:    2,819
Min:    0.0

----------------------------
Column: std_ElectronAffinity
----------------------------
Type:   Continuous
Max:    162
p75:    56
Mean:   48
Median: 51
p25:    38
Min:    0.0

----------------------
Column: std_FusionHeat
----------------------
Type:   Continuous
Max:    51
p75:    9.0
Mean:   8.3
Median: 4.9
p25:    4.3
Min:    0.0

-------------------------------
Column: std_ThermalConductivity
-------------------------------
Type:   Continuous
Max:    214
p75:    153
Mean:   98
Median: 135
p25:    37
Min:    0.0

-------------------
Column: std_Valence
-------------------
Type:   Continuous
Max:    3.0
p75:    1.2
Mean:   0.8
Median: 0.8
p25:    0.5
Min:    0.0

-----------------------
Column: std_atomic_mass
-----------------------
Type:   Continuous
Max:    101
p75:    59
Mean:   44
Median: 45
p25:    32
Min:    0.0

-------------------------
Column: std_atomic_radius
-------------------------
Type:   Continuous
Max:    115
p75:    69
Mean:   51
Median: 58
p25:    35
Min:    0.0

---------------
Column: std_fie
---------------
Type:   Continuous
Max:    499
p75:    297
Mean:   215
Median: 266
p25:    114
Min:    0.0

---------------------------
Column: wtd_entropy_Density
---------------------------
Type:   Continuous
Max:    1.7
p75:    1.1
Mean:   0.9
Median: 0.9
p25:    0.7
Min:    0.0

------------------------------------
Column: wtd_entropy_ElectronAffinity
------------------------------------
Type:   Continuous
Max:    1.7
p75:    0.9
Mean:   0.8
Median: 0.8
p25:    0.7
Min:    0.0

------------------------------
Column: wtd_entropy_FusionHeat
------------------------------
Type:   Continuous
Max:    1.7
p75:    1.2
Mean:   0.9
Median: 1.0
p25:    0.7
Min:    0.0

---------------------------------------
Column: wtd_entropy_ThermalConductivity
---------------------------------------
Type:   Continuous
Max:    1.6
p75:    0.8
Mean:   0.5
Median: 0.5
p25:    0.3
Min:    0.0

---------------------------
Column: wtd_entropy_Valence
---------------------------
Type:   Continuous
Max:    1.9
p75:    1.3
Mean:   1.1
Median: 1.2
p25:    0.8
Min:    0.0

-------------------------------
Column: wtd_entropy_atomic_mass
-------------------------------
Type:   Continuous
Max:    2.0
p75:    1.4
Mean:   1.1
Median: 1.1
p25:    0.8
Min:    0.0

---------------------------------
Column: wtd_entropy_atomic_radius
---------------------------------
Type:   Continuous
Max:    1.9
p75:    1.4
Mean:   1.1
Median: 1.2
p25:    0.9
Min:    0.0

-----------------------
Column: wtd_entropy_fie
-----------------------
Type:   Continuous
Max:    2.0
p75:    1.1
Mean:   0.9
Median: 0.9
p25:    0.8
Min:    0.0

-------------------------
Column: wtd_gmean_Density
-------------------------
Type:   Continuous
Max:    22,590
p75:    5,766
Mean:   3,117
Median: 1,515
p25:    66
Min:    0.7

----------------------------------
Column: wtd_gmean_ElectronAffinity
----------------------------------
Type:   Continuous
Max:    326
p75:    89
Mean:   72
Median: 73
p25:    50
Min:    1.5

----------------------------
Column: wtd_gmean_FusionHeat
----------------------------
Type:   Continuous
Max:    105
p75:    16
Mean:   10
Median: 4.9
p25:    1.3
Min:    0.2

-------------------------------------
Column: wtd_gmean_ThermalConductivity
-------------------------------------
Type:   Continuous
Max:    376
p75:    47
Mean:   27
Median: 6.1
p25:    1.1
Min:    0.0

-------------------------
Column: wtd_gmean_Valence
-------------------------
Type:   Continuous
Max:    7.0
p75:    3.9
Mean:   3.1
Median: 2.4
p25:    2.1
Min:    1.0

-----------------------------
Column: wtd_gmean_atomic_mass
-----------------------------
Type:   Continuous
Max:    208
p75:    73
Mean:   58
Median: 39
p25:    35
Min:    2.0

-------------------------------
Column: wtd_gmean_atomic_radius
-------------------------------
Type:   Continuous
Max:    298
p75:    150
Mean:   120
Median: 113
p25:    89
Min:    48

---------------------
Column: wtd_gmean_fie
---------------------
Type:   Continuous
Max:    1,327
p75:    937
Mean:   832
Median: 856
p25:    720
Min:    375

------------------------
Column: wtd_mean_Density
------------------------
Type:   Continuous
Max:    22,590
p75:    6,416
Mean:   5,267
Median: 4,303
p25:    2,999
Min:    1.4

---------------------------------
Column: wtd_mean_ElectronAffinity
---------------------------------
Type:   Continuous
Max:    326
p75:    110
Mean:   92
Median: 102
p25:    73
Min:    1.5

---------------------------
Column: wtd_mean_FusionHeat
---------------------------
Type:   Continuous
Max:    105
p75:    18
Mean:   13
Median: 8.3
p25:    5.0
Min:    0.2

------------------------------------
Column: wtd_mean_ThermalConductivity
------------------------------------
Type:   Continuous
Max:    406
p75:    99
Mean:   81
Median: 73
p25:    54
Min:    0.0

------------------------
Column: wtd_mean_Valence
------------------------
Type:   Continuous
Max:    7.0
p75:    4.0
Mean:   3.2
Median: 2.6
p25:    2.1
Min:    1.0

----------------------------
Column: wtd_mean_atomic_mass
----------------------------
Type:   Continuous
Max:    208
p75:    86
Mean:   72
Median: 60
p25:    52
Min:    6.4

------------------------------
Column: wtd_mean_atomic_radius
------------------------------
Type:   Continuous
Max:    298
p75:    158
Mean:   134
Median: 125
p25:    112
Min:    48

--------------------
Column: wtd_mean_fie
--------------------
Type:   Continuous
Max:    1,348
p75:    1,004
Mean:   870
Median: 889
p25:    738
Min:    375

-------------------------
Column: wtd_range_Density
-------------------------
Type:   Continuous
Max:    22,434
p75:    3,409
Mean:   2,902
Median: 2,082
p25:    1,656
Min:    0.0

----------------------------------
Column: wtd_range_ElectronAffinity
----------------------------------
Type:   Continuous
Max:    218
p75:    76
Mean:   59
Median: 71
p25:    34
Min:    0.0

----------------------------
Column: wtd_range_FusionHeat
----------------------------
Type:   Continuous
Max:    102
p75:    10
Mean:   8.2
Median: 3.4
p25:    2.3
Min:    0.0

-------------------------------------
Column: wtd_range_ThermalConductivity
-------------------------------------
Type:   Continuous
Max:    401
p75:    91
Mean:   62
Median: 56
p25:    29
Min:    0.0

-------------------------
Column: wtd_range_Valence
-------------------------
Type:   Continuous
Max:    7.0
p75:    1.9
Mean:   1.5
Median: 1.1
p25:    0.9
Min:    0.0

-----------------------------
Column: wtd_range_atomic_mass
-----------------------------
Type:   Continuous
Max:    205
p75:    38
Mean:   33
Median: 26
p25:    16
Min:    0.0

-------------------------------
Column: wtd_range_atomic_radius
-------------------------------
Type:   Continuous
Max:    240
p75:    60
Mean:   51
Median: 43
p25:    28
Min:    0.0

---------------------
Column: wtd_range_fie
---------------------
Type:   Continuous
Max:    1,251
p75:    690
Mean:   483
Median: 510
p25:    291
Min:    0.0

-----------------------
Column: wtd_std_Density
-----------------------
Type:   Continuous
Max:    10,410
p75:    3,959
Mean:   3,319
Median: 3,625
p25:    2,564
Min:    0.0

--------------------------------
Column: wtd_std_ElectronAffinity
--------------------------------
Type:   Continuous
Max:    169
p75:    53
Mean:   44
Median: 48
p25:    33
Min:    0.0

--------------------------
Column: wtd_std_FusionHeat
--------------------------
Type:   Continuous
Max:    51
p75:    8.0
Mean:   7.7
Median: 5.5
p25:    4.6
Min:    0.0

-----------------------------------
Column: wtd_std_ThermalConductivity
-----------------------------------
Type:   Continuous
Max:    213
p75:    162
Mean:   96
Median: 113
p25:    31
Min:    0.0

-----------------------
Column: wtd_std_Valence
-----------------------
Type:   Continuous
Max:    3.0
p75:    1.0
Mean:   0.7
Median: 0.5
p25:    0.3
Min:    0.0

---------------------------
Column: wtd_std_atomic_mass
---------------------------
Type:   Continuous
Max:    101
p75:    53
Mean:   41
Median: 44
p25:    28
Min:    0.0

-----------------------------
Column: wtd_std_atomic_radius
-----------------------------
Type:   Continuous
Max:    97
p75:    73
Mean:   52
Median: 59
p25:    32
Min:    0.0

-------------------
Column: wtd_std_fie
-------------------
Type:   Continuous
Max:    479
p75:    342
Mean:   224
Median: 258
p25:    92
Min:    0.0

Data Valuation

In [4]:
df.kxy.data_valuation(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[4]:
Achievable R-Squared Achievable Log-Likelihood Per Sample Achievable RMSE
0 1.00 9.43 2.72e-05

Automatic (Model-Free) Variable Selection

In [5]:
df.kxy.variable_selection(y_column, problem_type=problem_type)
[====================================================================================================] 100% ETA: 0s    Duration: 0s
Out[5]:
Variable Running Achievable R-Squared Running Achievable RMSE
Selection Order
0 No Variable 0.00 3.43e+01
1 range_atomic_radius 0.61 2.14e+01
2 wtd_entropy_atomic_mass 0.61 2.14e+01
3 wtd_mean_Valence 0.66 2.01e+01
4 wtd_gmean_ElectronAffinity 0.66 2.01e+01
... ... ... ...
77 wtd_gmean_Density 1.00 0.0001
78 gmean_atomic_radius 1.00 0.0001
79 wtd_std_atomic_radius 1.00 0.0000
80 wtd_range_fie 1.00 0.0000
81 wtd_gmean_Valence 1.00 0.0000

82 rows × 3 columns