Learning¶

class
kxy.classification.learning.
MaxEntClassifier
¶ Multiclass classifier based on the maximumentropy principle.

achievable_performance
¶ Dataframe containing the highest performance that can be achieved. Requires the model to be fitted first.

fit
(x_c, y, x_d=None, space='dual', categorical_encoding='twosplit')¶ Solves the maximumentropy problem on the API, in the background.
The resulting joint copulauniform distribution \((u_y, u_x)\) is the cornerstone of inference.
Parameters:  x_c ((n,d) np.array) – Continuous inputs.
 y ((n,) np.array) – Labels.
 x_d ((n, d) np.array or None (default), optional) – Discrete inputs.
 space (str, 'primal'  'dual') – The space in which the maximum entropy problem is solved.
When
space='primal'
, the maximum entropy problem is solved in the original observation space, under Pearson covariance constraints, leading to the Gaussian copula. Whenspace='dual'
, the maximum entropy problem is solved in the copulauniform dual space, under Spearman rank correlation constraints.  categorical_encoding (str, 'onehot'  'twosplit' (default)) – The encoding method to use to represent categorical variables. See kxy.api.core.utils.one_hot_encoding and kxy.api.core.utils.two_split_encoding.
Returns: a – Dataframe with columns:
'Achievable R^2'
: The highest \(R^2\) that can be achieved by a classification model using provided inputs to predict the label.'Achievable LogLikelihood Per Sample'
: The highest true loglikelihood per sample that can be achieved by a classification model using provided inputs to predict the label.'Achievable Accuracy'
: The highest classification accuracy that can be achieved by a classification model using provided inputs to predict the label.
Return type: pandas.DataFrame

predict
(x_c, x_d=None)¶ Calculates the posterior mean and posterior standard deviation of the copulauniform representation of the encode output
\[E(u_yx= *) \text{ and } \sqrt{Var \left( u_y \vert x= * \right)}\]under the maximumentropy distribution for the copulauniform representations \((u_y, u_x)\), and infer predicted labels.
Missing inputs are handled gracefully, and the posterior distribution is based on provided inputs.
Parameters:  x_c ((n,d) np.array) – Test continuous inputs. Missing inputs, if any, should be represented as np.nan or None.
 x_d ((n, d) np.array or None (default), optional) – Test discrete inputs. Missing inputs, if any, should be represented as np.nan or None.
Returns: Dictionary with keys:
posterior_mean_u_y
: The ndarray of the posterior mean of outputs encodings.posterior_std_u_y
: The ndarray of the posterior std of outputs encodings.predicted_labels
: The ndarray of predicted outputs.
Return type: dict
