Learning

class kxy.classification.learning.MaxEntClassifier

Multi-class classifier based on the maximum-entropy principle.

achievable_performance

Dataframe containing the highest performance that can be achieved. Requires the model to be fitted first.

fit(x_c, y, x_d=None, space='dual', categorical_encoding='two-split')

Solves the maximum-entropy problem on the API, in the background.

The resulting joint copula-uniform distribution \((u_y, u_x)\) is the cornerstone of inference.

Parameters:
  • x_c ((n,d) np.array) – Continuous inputs.
  • y ((n,) np.array) – Labels.
  • x_d ((n, d) np.array or None (default), optional) – Discrete inputs.
  • space (str, 'primal' | 'dual') – The space in which the maximum entropy problem is solved. When space='primal', the maximum entropy problem is solved in the original observation space, under Pearson covariance constraints, leading to the Gaussian copula. When space='dual', the maximum entropy problem is solved in the copula-uniform dual space, under Spearman rank correlation constraints.
  • categorical_encoding (str, 'one-hot' | 'two-split' (default)) – The encoding method to use to represent categorical variables. See kxy.api.core.utils.one_hot_encoding and kxy.api.core.utils.two_split_encoding.
Returns:

a – Dataframe with columns:

  • 'Achievable R^2': The highest \(R^2\) that can be achieved by a classification model using provided inputs to predict the label.
  • 'Achievable Log-Likelihood Per Sample': The highest true log-likelihood per sample that can be achieved by a classification model using provided inputs to predict the label.
  • 'Achievable Accuracy': The highest classification accuracy that can be achieved by a classification model using provided inputs to predict the label.

Return type:

pandas.DataFrame

predict(x_c, x_d=None)

Calculates the posterior mean and posterior standard deviation of the copula-uniform representation of the encode output

\[E(u_y|x= *) \text{ and } \sqrt{Var \left( u_y \vert x= * \right)}\]

under the maximum-entropy distribution for the copula-uniform representations \((u_y, u_x)\), and infer predicted labels.

Missing inputs are handled gracefully, and the posterior distribution is based on provided inputs.

Parameters:
  • x_c ((n,d) np.array) – Test continuous inputs. Missing inputs, if any, should be represented as np.nan or None.
  • x_d ((n, d) np.array or None (default), optional) – Test discrete inputs. Missing inputs, if any, should be represented as np.nan or None.
Returns:

Dictionary with keys:

  • posterior_mean_u_y: The ndarray of the posterior mean of outputs encodings.
  • posterior_std_u_y: The ndarray of the posterior std of outputs encodings.
  • predicted_labels: The ndarray of predicted outputs.

Return type:

dict