quapy.method package

Submodules

quapy.method.aggregative module

class quapy.method.aggregative.ACC(classifier: BaseEstimator = None, val_split=5, solver: Literal['minimize', 'exact', 'exact-raise', 'exact-cc'] = 'minimize', method: Literal['inversion', 'invariant-ratio'] = 'inversion', norm: Literal['clip', 'mapsimplex', 'condsoftmax'] = 'clip', n_jobs=None)[source]

Bases: AggregativeCrispQuantifier

Adjusted Classify & Count, the “adjusted” variant of CC, that corrects the predictions of CC according to the misclassification rates.

Parameters:

classifier – a sklearn’s Estimator that generates a classifier
val_split – specifies the data used for generating classifier predictions. This specification can be made as float in (0, 1) indicating the proportion of stratified held-out validation set to be extracted from the training set; or as an integer (default 5), indicating that the predictions are to be generated in a k-fold cross-validation manner (with this integer indicating the value for k); or as a collection defining the specific set of data to use for validation. Alternatively, this set can be specified at fit time by indicating the exact set of data on which the predictions are to be generated.
method (str) –
adjustment method to be used:
- ’inversion’: matrix inversion method based on the matrix equality $P(C)=P(C|Y)P(Y)$, which tries to invert $P(C|Y)$ matrix.
- ’invariant-ratio’: invariant ratio estimator of Vaz et al. 2018, which replaces the last equation with the normalization condition.
solver (str) –
indicates the method to use for solving the system of linear equations. Valid options are:
- ’exact-raise’: tries to solve the system using matrix inversion. Raises an error if the matrix has rank strictly less than n_classes.
- ’exact-cc’: if the matrix is not of full rank, returns p_c as the estimates, which corresponds to no adjustment (i.e., the classify and count method. See quapy.method.aggregative.CC)
- ’exact’: deprecated, defaults to ‘exact-cc’
- ’minimize’: minimizes the L2 norm of $|Ax-B|$. This one generally works better, and is the default parameter. More details about this can be consulted in Bunse, M. “On Multi-Class Extensions of Adjusted Classify and Count”, on proceedings of the 2nd International Workshop on Learning to Quantify: Methods and Applications (LQ 2022), ECML/PKDD 2022, Grenoble (France).
norm (str) –
the method to use for normalization.
- clip, the values are clipped to the range [0,1] and then L1-normalized.
- mapsimplex projects vectors onto the probability simplex. This implementation relies on Mathieu Blondel’s projection_simplex_sort
- condsoftmax, applies a softmax normalization only to prevalence vectors that lie outside the simplex
n_jobs – number of parallel workers

METHODS = ['inversion', 'invariant-ratio']

NORMALIZATIONS = ['clip', 'mapsimplex', 'condsoftmax', None]

SOLVERS = ['exact', 'minimize', 'exact-raise', 'exact-cc']

aggregate(classif_predictions)[source]

Implements the aggregation of label predictions.

Parameters:: classif_predictions – np.ndarray of label predictions
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]

Estimates the misclassification rates.

Parameters:

classif_predictions – a quapy.data.base.LabelledCollection containing, as instances, the label predictions issued by the classifier and, as labels, the true labels
data – a quapy.data.base.LabelledCollection consisting of the training data

classmethod getPteCondEstim(classes, y, y_)[source]

Estimate the matrix with entry (i,j) being the estimate of P(hat_yi|yj), that is, the probability that a document that belongs to yj ends up being classified as belonging to yi

Parameters:

classes – array-like with the class names
y – array-like with the true labels
y – array-like with the estimated labels

Returns:

np.ndarray

classmethod newInvariantRatioEstimation(classifier: BaseEstimator, val_split=5, n_jobs=None)[source]

Constructs a quantifier that implements the Invariant Ratio Estimator of Vaz et al. 2018. This amounts to setting method to ‘invariant-ratio’ and clipping to ‘project’.

Parameters:

classifier – a sklearn’s Estimator that generates a classifier
val_split – specifies the data used for generating classifier predictions. This specification

can be made as float in (0, 1) indicating the proportion of stratified held-out validation set to be extracted from the training set; or as an integer (default 5), indicating that the predictions are to be generated in a k-fold cross-validation manner (with this integer indicating the value for k); or as a collection defining the specific set of data to use for validation. Alternatively, this set can be specified at fit time by indicating the exact set of data on which the predictions are to be generated. :param n_jobs: number of parallel workers :return: an instance of ACC configured so that it implements the Invariant Ratio Estimator

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

quapy.method.aggregative.AdjustedClassifyAndCount: alias of ACC

class quapy.method.aggregative.AggregativeCrispQuantifier[source]

Bases: AggregativeQuantifier, ABC

Abstract class for quantification methods that base their estimations on the aggregation of crisp decisions as returned by a hard classifier. Aggregative crisp quantifiers thus extend Aggregative Quantifiers by implementing specifications about crisp predictions.

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

class quapy.method.aggregative.AggregativeMedianEstimator(base_quantifier: AggregativeQuantifier, param_grid: dict, random_state=None, n_jobs=None)[source]

Bases: BinaryQuantifier

This method is a meta-quantifier that returns, as the estimated class prevalence values, the median of the estimation returned by differently (hyper)parameterized base quantifiers. The median of unit-vectors is only guaranteed to be a unit-vector for n=2 dimensions, i.e., in cases of binary quantification.

Parameters:

base_quantifier – the base, binary quantifier
random_state – a seed to be set before fitting any base quantifier (default None)
param_grid – the grid or parameters towards which the median will be computed
n_jobs – number of parllel workes

fit(training: LabelledCollection, **kwargs)[source]

Trains a quantifier.

Parameters:: data – a quapy.data.base.LabelledCollection consisting of the training data
Returns:: self

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

quantify(instances)[source]

Generate class prevalence estimates for the sample’s instances

Parameters:: instances – array-like
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

set_fit_request(*, training: bool | None | str = '$UNCHANGED$') → AggregativeMedianEstimator

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: training (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for training parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

class quapy.method.aggregative.AggregativeQuantifier[source]

Bases: BaseQuantifier, ABC

Abstract class for quantification methods that base their estimations on the aggregation of classification results. Aggregative quantifiers implement a pipeline that consists of generating classification predictions and aggregating them. For this reason, the training phase is implemented by classification_fit() followed by aggregation_fit(), while the testing phase is implemented by classify() followed by aggregate(). Subclasses of this abstract class must provide implementations for these methods. Aggregative quantifiers also maintain a classifier attribute.

The method fit() comes with a default implementation based on classification_fit() and aggregation_fit().

The method quantify() comes with a default implementation based on classify() and aggregate().

abstract aggregate(classif_predictions: ndarray)[source]

Implements the aggregation of label predictions.

Parameters:: classif_predictions – np.ndarray of label predictions
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

abstract aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]

Trains the aggregation function.

Parameters:

classif_predictions – a quapy.data.base.LabelledCollection containing, as instances, the predictions issued by the classifier and, as labels, the true labels
data – a quapy.data.base.LabelledCollection consisting of the training data

property classes_

Class labels, in the same order in which class prevalence values are to be computed. This default implementation actually returns the class labels of the learner.

Returns:: array-like

property classifier

Gives access to the classifier

Returns:: the classifier (typically an sklearn’s Estimator)

classifier_fit_predict(data: LabelledCollection, fit_classifier=True, predict_on=None)[source]

Trains the classifier if requested (fit_classifier=True) and generate the necessary predictions to train the aggregation function.

Parameters:

data – a quapy.data.base.LabelledCollection consisting of the training data
fit_classifier – whether to train the learner (default is True). Set to False if the learner has been trained outside the quantifier.
predict_on – specifies the set on which predictions need to be issued. This parameter can be specified as None (default) to indicate no prediction is needed; a float in (0, 1) to indicate the proportion of instances to be used for predictions (the remainder is used for training); an integer >1 to indicate that the predictions must be generated via k-fold cross-validation, using this integer as k; or the data sample itself on which to generate the predictions.

classify(instances)[source]

Provides the label predictions for the given instances. The predictions should respect the format expected by aggregate(), e.g., posterior probabilities for probabilistic quantifiers, or crisp predictions for non-probabilistic quantifiers. The default one is “decision_function”.

Parameters:: instances – array-like of shape (n_instances, n_features,)
Returns:: np.ndarray of shape (n_instances,) with label predictions

fit(data: LabelledCollection, fit_classifier=True, val_split=None)[source]

Trains the aggregative quantifier. This comes down to training a classifier and an aggregation function.

Parameters:

data – a quapy.data.base.LabelledCollection consisting of the training data
fit_classifier – whether to train the learner (default is True). Set to False if the learner has been trained outside the quantifier.
val_split – specifies the data used for generating classifier predictions. This specification can be made as float in (0, 1) indicating the proportion of stratified held-out validation set to be extracted from the training set; or as an integer (default 5), indicating that the predictions are to be generated in a k-fold cross-validation manner (with this integer indicating the value for k); or as a collection defining the specific set of data to use for validation. Alternatively, this set can be specified at fit time by indicating the exact set of data on which the predictions are to be generated.

Returns:

self

quantify(instances)[source]

Generate class prevalence estimates for the sample’s instances by aggregating the label predictions generated by the classifier.

Parameters:: instances – array-like
Returns:: np.ndarray of shape (n_classes) with class prevalence estimates.

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

property val_split

val_split_ = None

class quapy.method.aggregative.AggregativeSoftQuantifier[source]

Bases: AggregativeQuantifier, ABC

Abstract class for quantification methods that base their estimations on the aggregation of posterior probabilities as returned by a probabilistic classifier. Aggregative soft quantifiers thus extend Aggregative Quantifiers by implementing specifications about soft predictions.

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

class quapy.method.aggregative.BayesianCC(classifier: BaseEstimator = None, val_split: float = 0.75, num_warmup: int = 500, num_samples: int = 1000, mcmc_seed: int = 0)[source]

Bases: AggregativeCrispQuantifier

Bayesian quantification method, which is a variant of ACC that calculates the posterior probability distribution over the prevalence vectors, rather than providing a point estimate obtained by matrix inversion.

Can be used to diagnose degeneracy in the predictions visible when the confusion matrix has high condition number or to quantify uncertainty around the point estimate.

This method relies on extra dependencies, which have to be installed via: $ pip install quapy[bayes]

Parameters:

classifier – a sklearn’s Estimator that generates a classifier
val_split – a float in (0, 1) indicating the proportion of the training data to be used, as a stratified held-out validation set, for generating classifier predictions.
num_warmup – number of warmup iterations for the MCMC sampler (default 500)
num_samples – number of samples to draw from the posterior (default 1000)
mcmc_seed – random seed for the MCMC sampler (default 0)

aggregate(classif_predictions)[source]

Implements the aggregation of label predictions.

Parameters:: classif_predictions – np.ndarray of label predictions
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]

Estimates the misclassification rates.

Parameters:

classif_predictions – a quapy.data.base.LabelledCollection containing, as instances, the label predictions issued by the classifier and, as labels, the true labels
data – a quapy.data.base.LabelledCollection consisting of the training data

get_conditional_probability_samples()[source]

get_prevalence_samples()[source]

sample_from_posterior(classif_predictions)[source]

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

class quapy.method.aggregative.BinaryAggregativeQuantifier[source]

Bases: AggregativeQuantifier, BinaryQuantifier

fit(data: LabelledCollection, fit_classifier=True, val_split=None)[source]

Trains the aggregative quantifier. This comes down to training a classifier and an aggregation function.

Parameters:

data – a quapy.data.base.LabelledCollection consisting of the training data
fit_classifier – whether to train the learner (default is True). Set to False if the learner has been trained outside the quantifier.
val_split – specifies the data used for generating classifier predictions. This specification can be made as float in (0, 1) indicating the proportion of stratified held-out validation set to be extracted from the training set; or as an integer (default 5), indicating that the predictions are to be generated in a k-fold cross-validation manner (with this integer indicating the value for k); or as a collection defining the specific set of data to use for validation. Alternatively, this set can be specified at fit time by indicating the exact set of data on which the predictions are to be generated.

Returns:

self

property neg_label

property pos_label

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

class quapy.method.aggregative.CC(classifier: BaseEstimator = None)[source]

Bases: AggregativeCrispQuantifier

The most basic Quantification method. One that simply classifies all instances and counts how many have been attributed to each of the classes in order to compute class prevalence estimates.

Parameters:: classifier – a sklearn’s Estimator that generates a classifier

aggregate(classif_predictions: ndarray)[source]

Computes class prevalence estimates by counting the prevalence of each of the predicted labels.

Parameters:: classif_predictions – array-like with label predictions
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]

Nothing to do here!

Parameters:

classif_predictions – not used
data – not used

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

quapy.method.aggregative.ClassifyAndCount: alias of CC

class quapy.method.aggregative.DMy(classifier: BaseEstimator = None, val_split=5, nbins=8, divergence: str | Callable = 'HD', cdf=False, search='optim_minimize', n_jobs=None)[source]

Bases: AggregativeSoftQuantifier

Generic Distribution Matching quantifier for binary or multiclass quantification based on the space of posterior probabilities. This implementation takes the number of bins, the divergence, and the possibility to work on CDF as hyperparameters.

Parameters:

classifier – a sklearn’s Estimator that generates a probabilistic classifier
val_split – indicates the proportion of data to be used as a stratified held-out validation set to model the validation distribution. This parameter can be indicated as a real value (between 0 and 1), representing a proportion of validation data, or as an integer, indicating that the validation distribution should be estimated via k-fold cross validation (this integer stands for the number of folds k, defaults 5), or as a quapy.data.base.LabelledCollection (the split itself).
nbins – number of bins used to discretize the distributions (default 8)
divergence – a string representing a divergence measure (currently, “HD” and “topsoe” are implemented) or a callable function taking two ndarrays of the same dimension as input (default “HD”, meaning Hellinger Distance)
cdf – whether to use CDF instead of PDF (default False)
n_jobs – number of parallel workers (default None)

aggregate(posteriors: ndarray)[source]

Searches for the mixture model parameter (the sought prevalence values) that yields a validation distribution (the mixture) that best matches the test distribution, in terms of the divergence measure of choice. In the multiclass case, with n the number of classes, the test and mixture distributions contain n channels (proper distributions of binned posterior probabilities), on which the divergence is computed independently. The matching is computed as an average of the divergence across all channels.

Parameters:: posteriors – posterior probabilities of the instances in the sample
Returns:: a vector of class prevalence estimates

aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]

Trains the aggregation function of a distribution matching method. This comes down to generating the validation distributions out of the training data. The validation distributions have shape (n, ch, nbins), with n the number of classes, ch the number of channels, and nbins the number of bins. In particular, let V be the validation distributions; then di=V[i] are the distributions obtained from training data labelled with class i; while dij = di[j] is the discrete distribution of posterior probabilities P(Y=j|X=x) for training data labelled with class i, and dij[k] is the fraction of instances with a value in the k-th bin.

Parameters:

classif_predictions – a quapy.data.base.LabelledCollection containing, as instances, the posterior probabilities issued by the classifier and, as labels, the true labels
data – a quapy.data.base.LabelledCollection consisting of the training data

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

quapy.method.aggregative.DistributionMatchingY: alias of DMy

class quapy.method.aggregative.DyS(classifier: BaseEstimator = None, val_split=5, n_bins=8, divergence: str | Callable = 'HD', tol=1e-05, n_jobs=None)[source]

Bases: AggregativeSoftQuantifier, BinaryAggregativeQuantifier

DyS framework (DyS). DyS is a generalization of HDy method, using a Ternary Search in order to find the prevalence that minimizes the distance between distributions. Details for the ternary search have been got from <https://dl.acm.org/doi/pdf/10.1145/3219819.3220059>

Parameters:

classifier – a sklearn’s Estimator that generates a binary classifier
val_split – a float in range (0,1) indicating the proportion of data to be used as a stratified held-out validation distribution, or a quapy.data.base.LabelledCollection (the split itself), or an integer indicating the number of folds (default 5)..
n_bins – an int with the number of bins to use to compute the histograms.
divergence – a str indicating the name of divergence (currently supported ones are “HD” or “topsoe”), or a callable function computes the divergence between two distributions (two equally sized arrays).
tol – a float with the tolerance for the ternary search algorithm.
n_jobs – number of parallel workers.

aggregate(classif_posteriors)[source]

Implements the aggregation of label predictions.

Parameters:: classif_predictions – np.ndarray of label predictions
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]

Trains the aggregation function of DyS.

Parameters:

classif_predictions – a quapy.data.base.LabelledCollection containing, as instances, the posterior probabilities issued by the classifier and, as labels, the true labels
data – a quapy.data.base.LabelledCollection consisting of the training data

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

class quapy.method.aggregative.EMQ(classifier: BaseEstimator = None, val_split=None, exact_train_prev=True, recalib=None, n_jobs=None)[source]

Bases: AggregativeSoftQuantifier

Expectation Maximization for Quantification (EMQ), aka Saerens-Latinne-Decaestecker (SLD) algorithm. EMQ consists of using the well-known Expectation Maximization algorithm to iteratively update the posterior probabilities generated by a probabilistic classifier and the class prevalence estimates obtained via maximum-likelihood estimation, in a mutually recursive way, until convergence.

This implementation also gives access to the heuristics proposed by Alexandari et al. paper. These heuristics consist of using, as the training prevalence, an estimate of it obtained via k-fold cross validation (instead of the true training prevalence), and to recalibrate the posterior probabilities of the classifier.

Parameters:

classifier – a sklearn’s Estimator that generates a classifier
val_split – specifies the data used for generating classifier predictions. This specification can be made as float in (0, 1) indicating the proportion of stratified held-out validation set to be extracted from the training set; or as an integer, indicating that the predictions are to be generated in a k-fold cross-validation manner (with this integer indicating the value for k, default 5); or as a collection defining the specific set of data to use for validation. Alternatively, this set can be specified at fit time by indicating the exact set of data on which the predictions are to be generated. This hyperparameter is only meant to be used when the heuristics are to be applied, i.e., if a recalibration is required. The default value is None (meaning the recalibration is not required). In case this hyperparameter is set to a value other than None, but the recalibration is not required (recalib=None), a warning message will be raised.
exact_train_prev – set to True (default) for using the true training prevalence as the initial observation; set to False for computing the training prevalence as an estimate of it, i.e., as the expected value of the posterior probabilities of the training instances.
recalib – a string indicating the method of recalibration. Available choices include “nbvs” (No-Bias Vector Scaling), “bcts” (Bias-Corrected Temperature Scaling, default), “ts” (Temperature Scaling), and “vs” (Vector Scaling). Default is None (no recalibration).
n_jobs – number of parallel workers. Only used for recalibrating the classifier if val_split is set to an integer k –the number of folds.

classmethod EM(tr_prev, posterior_probabilities, epsilon=0.0001)[source]

Computes the Expectation Maximization routine.

Parameters:

tr_prev – array-like, the training prevalence
posterior_probabilities – np.ndarray of shape (n_instances, n_classes,) with the posterior probabilities
epsilon – float, the threshold different between two consecutive iterations to reach before stopping the loop

Returns:

a tuple with the estimated prevalence values (shape (n_classes,)) and the corrected posterior probabilities (shape (n_instances, n_classes,))

classmethod EMQ_BCTS(classifier: BaseEstimator, n_jobs=None)[source]

Constructs an instance of EMQ using the best configuration found in the Alexandari et al. paper, i.e., one that relies on Bias-Corrected Temperature Scaling (BCTS) as a recalibration function, and that uses an estimate of the training prevalence instead of the true training prevalence.

Parameters:

classifier – a sklearn’s Estimator that generates a classifier
n_jobs – number of parallel workers.

Returns:

An instance of EMQ with BCTS

EPSILON = 0.0001

MAX_ITER = 1000

aggregate(classif_posteriors, epsilon=0.0001)[source]

Implements the aggregation of label predictions.

Parameters:: classif_predictions – np.ndarray of label predictions
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]

Trains the aggregation function of EMQ. This comes down to recalibrating the posterior probabilities ir requested.

Parameters:

classif_predictions – a quapy.data.base.LabelledCollection containing, as instances, the posterior probabilities issued by the classifier and, as labels, the true labels
data – a quapy.data.base.LabelledCollection consisting of the training data

classify(instances)[source]

Provides the posterior probabilities for the given instances. If the classifier was required to be recalibrated, then these posteriors are recalibrated accordingly.

Parameters:: instances – array-like of shape (n_instances, n_dimensions,)
Returns:: np.ndarray of shape (n_instances, n_classes,) with posterior probabilities

predict_proba(instances, epsilon=0.0001)[source]

Returns the posterior probabilities updated by the EM algorithm.

Parameters:

instances – np.ndarray of shape (n_instances, n_dimensions)
epsilon – error tolerance

Returns:

np.ndarray of shape (n_instances, n_classes)

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_proba_request(*, epsilon: bool | None | str = '$UNCHANGED$', instances: bool | None | str = '$UNCHANGED$') → EMQ

Request metadata passed to the predict_proba method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict_proba if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict_proba.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

epsilon (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for epsilon parameter in predict_proba.
instances (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for instances parameter in predict_proba.

Returns:

self – The updated object.

Return type:

object

quapy.method.aggregative.ExpectationMaximizationQuantifier: alias of EMQ

class quapy.method.aggregative.HDy(classifier: BaseEstimator = None, val_split=5)[source]

Bases: AggregativeSoftQuantifier, BinaryAggregativeQuantifier

Hellinger Distance y (HDy). HDy is a probabilistic method for training binary quantifiers, that models quantification as the problem of minimizing the divergence (in terms of the Hellinger Distance) between two distributions of posterior probabilities returned by the classifier. One of the distributions is generated from the unlabelled examples and the other is generated from a validation set. This latter distribution is defined as a mixture of the class-conditional distributions of the posterior probabilities returned for the positive and negative validation examples, respectively. The parameters of the mixture thus represent the estimates of the class prevalence values.

Parameters:

classifier – a sklearn’s Estimator that generates a binary classifier
val_split – a float in range (0,1) indicating the proportion of data to be used as a stratified held-out validation distribution, or a quapy.data.base.LabelledCollection (the split itself), or an integer indicating the number of folds (default 5)..

aggregate(classif_posteriors)[source]

Implements the aggregation of label predictions.

Parameters:: classif_predictions – np.ndarray of label predictions
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]

Trains the aggregation function of HDy.

Parameters:

classif_predictions – a quapy.data.base.LabelledCollection containing, as instances, the posterior probabilities issued by the classifier and, as labels, the true labels
data – a quapy.data.base.LabelledCollection consisting of the training data

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

quapy.method.aggregative.HellingerDistanceY: alias of HDy

class quapy.method.aggregative.OneVsAllAggregative(binary_quantifier, n_jobs=None, parallel_backend='multiprocessing')[source]

Bases: OneVsAllGeneric, AggregativeQuantifier

Allows any binary quantifier to perform quantification on single-label datasets. The method maintains one binary quantifier for each class, and then l1-normalizes the outputs so that the class prevelences sum up to 1. This variant was used, along with the EMQ quantifier, in Gao and Sebastiani, 2016.

Parameters:

binary_quantifier – a quantifier (binary) that will be employed to work on multiclass model in a one-vs-all manner
n_jobs – number of parallel workers
parallel_backend – the parallel backend for joblib (default “loky”); this is helpful for some quantifiers (e.g., ELM-based ones) that cannot be run with multiprocessing, since the temp dir they create during fit will is removed and no longer available at predict time.

aggregate(classif_predictions)[source]

Implements the aggregation of label predictions.

Parameters:: classif_predictions – np.ndarray of label predictions
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

classify(instances)[source]

If the base quantifier is not probabilistic, returns a matrix of shape (n,m,) with n the number of instances and m the number of classes. The entry (i,j) is a binary value indicating whether instance i `belongs to class `j. The binary classifications are independent of each other, meaning that an instance can end up be attributed to 0, 1, or more classes. If the base quantifier is probabilistic, returns a matrix of shape (n,m,2) with n the number of instances and m the number of classes. The entry (i,j,1) (resp. (i,j,0)) is a value in [0,1] indicating the posterior probability that instance i belongs (resp. does not belong) to class j. The posterior probabilities are independent of each other, meaning that, in general, they do not sum up to one.

Parameters:: instances – array-like
Returns:: np.ndarray

set_fit_request(*, data: bool | None | str = '$UNCHANGED$', fit_classifier: bool | None | str = '$UNCHANGED$') → OneVsAllAggregative

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.

Returns:

self – The updated object.

Return type:

object

class quapy.method.aggregative.PACC(classifier: BaseEstimator = None, val_split=5, solver: Literal['minimize', 'exact', 'exact-raise', 'exact-cc'] = 'minimize', method: Literal['inversion', 'invariant-ratio'] = 'inversion', norm: Literal['clip', 'mapsimplex', 'condsoftmax'] = 'clip', n_jobs=None)[source]

Bases: AggregativeSoftQuantifier

Probabilistic Adjusted Classify & Count, the probabilistic variant of ACC that relies on the posterior probabilities returned by a probabilistic classifier.

Parameters:

classifier – a sklearn’s Estimator that generates a classifier
val_split – specifies the data used for generating classifier predictions. This specification can be made as float in (0, 1) indicating the proportion of stratified held-out validation set to be extracted from the training set; or as an integer (default 5), indicating that the predictions are to be generated in a k-fold cross-validation manner (with this integer indicating the value for k). Alternatively, this set can be specified at fit time by indicating the exact set of data on which the predictions are to be generated.
method (str) –
adjustment method to be used:
- ’inversion’: matrix inversion method based on the matrix equality $P(C)=P(C|Y)P(Y)$, which tries to invert P(C|Y) matrix.
- ’invariant-ratio’: invariant ratio estimator of Vaz et al., which replaces the last equation with the normalization condition.
solver (str) –
the method to use for solving the system of linear equations. Valid options are:
- ’exact-raise’: tries to solve the system using matrix inversion. Raises an error if the matrix has rank strictly less than n_classes.
- ’exact-cc’: if the matrix is not of full rank, returns p_c as the estimates, which corresponds to no adjustment (i.e., the classify and count method. See quapy.method.aggregative.CC)
- ’exact’: deprecated, defaults to ‘exact-cc’
- ’minimize’: minimizes the L2 norm of $|Ax-B|$. This one generally works better, and is the default parameter. More details about this can be consulted in Bunse, M. “On Multi-Class Extensions of Adjusted Classify and Count”, on proceedings of the 2nd International Workshop on Learning to Quantify: Methods and Applications (LQ 2022), ECML/PKDD 2022, Grenoble (France).
norm (str) –
the method to use for normalization.
- clip, the values are clipped to the range [0,1] and then L1-normalized.
- mapsimplex projects vectors onto the probability simplex. This implementation relies on Mathieu Blondel’s projection_simplex_sort
- condsoftmax, applies a softmax normalization only to prevalence vectors that lie outside the simplex
n_jobs – number of parallel workers

aggregate(classif_posteriors)[source]

Implements the aggregation of label predictions.

Parameters:: classif_predictions – np.ndarray of label predictions
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]

Estimates the misclassification rates

Parameters:

classif_predictions – a quapy.data.base.LabelledCollection containing, as instances, the posterior probabilities issued by the classifier and, as labels, the true labels
data – a quapy.data.base.LabelledCollection consisting of the training data

classmethod getPteCondEstim(classes, y, y_)[source]

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

class quapy.method.aggregative.PCC(classifier: BaseEstimator = None)[source]

Bases: AggregativeSoftQuantifier

Probabilistic Classify & Count, the probabilistic variant of CC that relies on the posterior probabilities returned by a probabilistic classifier.

Parameters:: classifier – a sklearn’s Estimator that generates a classifier

aggregate(classif_posteriors)[source]

Implements the aggregation of label predictions.

Parameters:: classif_predictions – np.ndarray of label predictions
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]

Nothing to do here!

Parameters:

classif_predictions – not used
data – not used

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

quapy.method.aggregative.ProbabilisticAdjustedClassifyAndCount: alias of PACC

quapy.method.aggregative.ProbabilisticClassifyAndCount: alias of PCC

quapy.method.aggregative.SLD: alias of EMQ

class quapy.method.aggregative.SMM(classifier: BaseEstimator = None, val_split=5)[source]

Bases: AggregativeSoftQuantifier, BinaryAggregativeQuantifier

SMM method (SMM). SMM is a simplification of matching distribution methods where the representation of the examples is created using the mean instead of a histogram (conceptually equivalent to PACC).

Parameters:

classifier – a sklearn’s Estimator that generates a binary classifier.
val_split – a float in range (0,1) indicating the proportion of data to be used as a stratified held-out validation distribution, or a quapy.data.base.LabelledCollection (the split itself), or an integer indicating the number of folds (default 5)..

aggregate(classif_posteriors)[source]

Implements the aggregation of label predictions.

Parameters:: classif_predictions – np.ndarray of label predictions
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]

Trains the aggregation function of SMM.

Parameters:

classif_predictions – a quapy.data.base.LabelledCollection containing, as instances, the posterior probabilities issued by the classifier and, as labels, the true labels
data – a quapy.data.base.LabelledCollection consisting of the training data

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

quapy.method.aggregative.newELM(svmperf_base=None, loss='01', C=1)[source]

Explicit Loss Minimization (ELM) quantifiers. Quantifiers based on ELM represent a family of methods based on structured output learning; these quantifiers rely on classifiers that have been optimized using a quantification-oriented loss measure. This implementation relies on Joachims’ SVM perf structured output learning algorithm, which has to be installed and patched for the purpose (see this script). This function equivalent to:

>>> CC(SVMperf(svmperf_base, loss, C))

Parameters:

svmperf_base – path to the folder containing the binary files of SVM perf; if set to None (default) this path will be obtained from qp.environ[‘SVMPERF_HOME’]
loss – the loss to optimize (see quapy.classification.svmperf.SVMperf.valid_losses)
C – trade-off between training error and margin (default 0.01)

Returns:

returns an instance of CC set to work with SVMperf (with loss and C set properly) as the underlying classifier

quapy.method.aggregative.newSVMAE(svmperf_base=None, C=1)[source]

SVM(KLD) is an Explicit Loss Minimization (ELM) quantifier set to optimize for the Absolute Error as first used by Moreo and Sebastiani, 2021. Equivalent to:

>>> CC(SVMperf(svmperf_base, loss='mae', C=C))

Quantifiers based on ELM represent a family of methods based on structured output learning; these quantifiers rely on classifiers that have been optimized using a quantification-oriented loss measure. This implementation relies on Joachims’ SVM perf structured output learning algorithm, which has to be installed and patched for the purpose (see this script). This function is a wrapper around CC(SVMperf(svmperf_base, loss, C))

Parameters:

svmperf_base – path to the folder containing the binary files of SVM perf; if set to None (default) this path will be obtained from qp.environ[‘SVMPERF_HOME’]
C – trade-off between training error and margin (default 0.01)

Returns:

returns an instance of CC set to work with SVMperf (with loss and C set properly) as the underlying classifier

quapy.method.aggregative.newSVMKLD(svmperf_base=None, C=1)[source]

SVM(KLD) is an Explicit Loss Minimization (ELM) quantifier set to optimize for the Kullback-Leibler Divergence normalized via the logistic function, as proposed by Esuli et al. 2015. Equivalent to:

>>> CC(SVMperf(svmperf_base, loss='nkld', C=C))

Quantifiers based on ELM represent a family of methods based on structured output learning; these quantifiers rely on classifiers that have been optimized using a quantification-oriented loss measure. This implementation relies on Joachims’ SVM perf structured output learning algorithm, which has to be installed and patched for the purpose (see this script). This function is a wrapper around CC(SVMperf(svmperf_base, loss, C))

Parameters:

svmperf_base – path to the folder containing the binary files of SVM perf; if set to None (default) this path will be obtained from qp.environ[‘SVMPERF_HOME’]
C – trade-off between training error and margin (default 0.01)

Returns:

returns an instance of CC set to work with SVMperf (with loss and C set properly) as the underlying classifier

quapy.method.aggregative.newSVMQ(svmperf_base=None, C=1)[source]

SVM(Q) is an Explicit Loss Minimization (ELM) quantifier set to optimize for the Q loss combining a classification-oriented loss and a quantification-oriented loss, as proposed by Barranquero et al. 2015. Equivalent to:

>>> CC(SVMperf(svmperf_base, loss='q', C=C))

Quantifiers based on ELM represent a family of methods based on structured output learning; these quantifiers rely on classifiers that have been optimized using a quantification-oriented loss measure. This implementation relies on Joachims’ SVM perf structured output learning algorithm, which has to be installed and patched for the purpose (see this script). This function is a wrapper around CC(SVMperf(svmperf_base, loss, C))

Parameters:

svmperf_base – path to the folder containing the binary files of SVM perf; if set to None (default) this path will be obtained from qp.environ[‘SVMPERF_HOME’]
C – trade-off between training error and margin (default 0.01)

Returns:

returns an instance of CC set to work with SVMperf (with loss and C set properly) as the underlying classifier

quapy.method.aggregative.newSVMRAE(svmperf_base=None, C=1)[source]

SVM(KLD) is an Explicit Loss Minimization (ELM) quantifier set to optimize for the Relative Absolute Error as first used by Moreo and Sebastiani, 2021. Equivalent to:

>>> CC(SVMperf(svmperf_base, loss='mrae', C=C))

Quantifiers based on ELM represent a family of methods based on structured output learning; these quantifiers rely on classifiers that have been optimized using a quantification-oriented loss measure. This implementation relies on Joachims’ SVM perf structured output learning algorithm, which has to be installed and patched for the purpose (see this script). This function is a wrapper around CC(SVMperf(svmperf_base, loss, C))

Parameters:

svmperf_base – path to the folder containing the binary files of SVM perf; if set to None (default) this path will be obtained from qp.environ[‘SVMPERF_HOME’]
C – trade-off between training error and margin (default 0.01)

Returns:

returns an instance of CC set to work with SVMperf (with loss and C set properly) as the underlying classifier

class quapy.method._kdey.KDEBase[source]

Bases: object

Common ancestor for KDE-based methods. Implements some common routines.

BANDWIDTH_METHOD = ['scott', 'silverman']

get_kde_function(X, bandwidth)[source]

Wraps the KDE function from scikit-learn.

Parameters:

X – data for which the density function is to be estimated
bandwidth – the bandwidth of the kernel

Returns:

a scikit-learn’s KernelDensity object

get_mixture_components(X, y, classes, bandwidth)[source]

Returns an array containing the mixture components, i.e., the KDE functions for each class.

Parameters:

X – the data containing the covariates
y – the class labels
n_classes – integer, the number of classes
bandwidth – float, the bandwidth of the kernel

Returns:

a list of KernelDensity objects, each fitted with the corresponding class-specific covariates

pdf(kde, X)[source]

Wraps the density evalution of scikit-learn’s KDE. Scikit-learn returns log-scores (s), so this function returns $e^{s}$

Parameters:

kde – a previously fit KDE function
X – the data for which the density is to be estimated

Returns:

np.ndarray with the densities

class quapy.method._kdey.KDEyCS(classifier: BaseEstimator = None, val_split=5, bandwidth=0.1)[source]

Bases: AggregativeSoftQuantifier

Kernel Density Estimation model for quantification (KDEy) relying on the Cauchy-Schwarz divergence (CS) as the divergence measure to be minimized. This method was first proposed in the paper Kernel Density Estimation for Multiclass Quantification, in which the authors proposed a Monte Carlo approach for minimizing the divergence.

The distribution matching optimization problem comes down to solving:

$\hat{\alpha} = \arg\min_{\alpha\in\Delta^{n-1}} \mathcal{D}(\boldsymbol{p}_{\alpha}||q_{\widetilde{U}})$

where $p_{\alpha}$ is the mixture of class-specific KDEs with mixture parameter (hence class prevalence) $\alpha$ defined by

$\boldsymbol{p}_{\alpha}(\widetilde{x}) = \sum_{i=1}^n \alpha_i p_{\widetilde{L}_i}(\widetilde{x})$

where $p_X(\boldsymbol{x}) = \frac{1}{|X|} \sum_{x_i\in X} K\left(\frac{x-x_i}{h}\right)$ is the KDE function that uses the datapoints in X as the kernel centers.

In KDEy-CS, the divergence is taken to be the Cauchy-Schwarz divergence given by:

$\mathcal{D}_{\mathrm{CS}}(p||q)=-\log\left(\frac{\int p(x)q(x)dx}{\sqrt{\int p(x)^2dx \int q(x)^2dx}}\right)$

The authors showed that this distribution matching admits a closed-form solution

Parameters:

classifier – a sklearn’s Estimator that generates a binary classifier.
val_split – specifies the data used for generating classifier predictions. This specification can be made as float in (0, 1) indicating the proportion of stratified held-out validation set to be extracted from the training set; or as an integer (default 5), indicating that the predictions are to be generated in a k-fold cross-validation manner (with this integer indicating the value for k); or as a collection defining the specific set of data to use for validation. Alternatively, this set can be specified at fit time by indicating the exact set of data on which the predictions are to be generated.
bandwidth – float, the bandwidth of the Kernel

aggregate(posteriors: ndarray)[source]

Implements the aggregation of label predictions.

Parameters:: classif_predictions – np.ndarray of label predictions
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]

Trains the aggregation function.

Parameters:

classif_predictions – a quapy.data.base.LabelledCollection containing, as instances, the predictions issued by the classifier and, as labels, the true labels
data – a quapy.data.base.LabelledCollection consisting of the training data

gram_matrix_mix_sum(X, Y=None)[source]

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

class quapy.method._kdey.KDEyHD(classifier: BaseEstimator = None, val_split=5, divergence: str = 'HD', bandwidth=0.1, random_state=None, montecarlo_trials=10000)[source]

Bases: AggregativeSoftQuantifier, KDEBase

Kernel Density Estimation model for quantification (KDEy) relying on the squared Hellinger Disntace (HD) as the divergence measure to be minimized. This method was first proposed in the paper Kernel Density Estimation for Multiclass Quantification, in which the authors proposed a Monte Carlo approach for minimizing the divergence.

The distribution matching optimization problem comes down to solving:

$\hat{\alpha} = \arg\min_{\alpha\in\Delta^{n-1}} \mathcal{D}(\boldsymbol{p}_{\alpha}||q_{\widetilde{U}})$

where $p_{\alpha}$ is the mixture of class-specific KDEs with mixture parameter (hence class prevalence) $\alpha$ defined by

$\boldsymbol{p}_{\alpha}(\widetilde{x}) = \sum_{i=1}^n \alpha_i p_{\widetilde{L}_i}(\widetilde{x})$

where $p_X(\boldsymbol{x}) = \frac{1}{|X|} \sum_{x_i\in X} K\left(\frac{x-x_i}{h}\right)$ is the KDE function that uses the datapoints in X as the kernel centers.

In KDEy-HD, the divergence is taken to be the squared Hellinger Distance, an f-divergence with corresponding f-generator function given by:

$f(u)=(\sqrt{u}-1)^2$

The authors proposed a Monte Carlo solution that relies on importance sampling:

$\hat{D}_f(p||q)= \frac{1}{t} \sum_{i=1}^t f\left(\frac{p(x_i)}{q(x_i)}\right) \frac{q(x_i)}{r(x_i)}$

where the datapoints (trials) $x_1,\ldots,x_t\sim_{\mathrm{iid}} r$ with $r$ the uniform distribution.

Parameters:

classifier – a sklearn’s Estimator that generates a binary classifier.
val_split – specifies the data used for generating classifier predictions. This specification can be made as float in (0, 1) indicating the proportion of stratified held-out validation set to be extracted from the training set; or as an integer (default 5), indicating that the predictions are to be generated in a k-fold cross-validation manner (with this integer indicating the value for k); or as a collection defining the specific set of data to use for validation. Alternatively, this set can be specified at fit time by indicating the exact set of data on which the predictions are to be generated.
bandwidth – float, the bandwidth of the Kernel
random_state – a seed to be set before fitting any base quantifier (default None)
montecarlo_trials – number of Monte Carlo trials (default 10000)

aggregate(posteriors: ndarray)[source]

Implements the aggregation of label predictions.

Parameters:: classif_predictions – np.ndarray of label predictions
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]

Trains the aggregation function.

Parameters:

classif_predictions – a quapy.data.base.LabelledCollection containing, as instances, the predictions issued by the classifier and, as labels, the true labels
data – a quapy.data.base.LabelledCollection consisting of the training data

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

class quapy.method._kdey.KDEyML(classifier: BaseEstimator = None, val_split=5, bandwidth=0.1, random_state=None)[source]

Bases: AggregativeSoftQuantifier, KDEBase

Kernel Density Estimation model for quantification (KDEy) relying on the Kullback-Leibler divergence (KLD) as the divergence measure to be minimized. This method was first proposed in the paper Kernel Density Estimation for Multiclass Quantification, in which the authors show that minimizing the distribution mathing criterion for KLD is akin to performing maximum likelihood (ML).

The distribution matching optimization problem comes down to solving:

$\hat{\alpha} = \arg\min_{\alpha\in\Delta^{n-1}} \mathcal{D}(\boldsymbol{p}_{\alpha}||q_{\widetilde{U}})$

where $p_{\alpha}$ is the mixture of class-specific KDEs with mixture parameter (hence class prevalence) $\alpha$ defined by

$\boldsymbol{p}_{\alpha}(\widetilde{x}) = \sum_{i=1}^n \alpha_i p_{\widetilde{L}_i}(\widetilde{x})$

where $p_X(\boldsymbol{x}) = \frac{1}{|X|} \sum_{x_i\in X} K\left(\frac{x-x_i}{h}\right)$ is the KDE function that uses the datapoints in X as the kernel centers.

In KDEy-ML, the divergence is taken to be the Kullback-Leibler Divergence. This is equivalent to solving: $\hat{\alpha} = \arg\min_{\alpha\in\Delta^{n-1}} - \mathbb{E}_{q_{\widetilde{U}}} \left[ \log \boldsymbol{p}_{\alpha}(\widetilde{x}) \right]$

which corresponds to the maximum likelihood estimate.

Parameters:

classifier – a sklearn’s Estimator that generates a binary classifier.
val_split – specifies the data used for generating classifier predictions. This specification can be made as float in (0, 1) indicating the proportion of stratified held-out validation set to be extracted from the training set; or as an integer (default 5), indicating that the predictions are to be generated in a k-fold cross-validation manner (with this integer indicating the value for k); or as a collection defining the specific set of data to use for validation. Alternatively, this set can be specified at fit time by indicating the exact set of data on which the predictions are to be generated.
bandwidth – float, the bandwidth of the Kernel
random_state – a seed to be set before fitting any base quantifier (default None)

aggregate(posteriors: ndarray)[source]

Searches for the mixture model parameter (the sought prevalence values) that maximizes the likelihood of the data (i.e., that minimizes the negative log-likelihood)

Parameters:: posteriors – instances in the sample converted into posterior probabilities
Returns:: a vector of class prevalence estimates

aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]

Trains the aggregation function.

Parameters:

classif_predictions – a quapy.data.base.LabelledCollection containing, as instances, the predictions issued by the classifier and, as labels, the true labels
data – a quapy.data.base.LabelledCollection consisting of the training data

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

class quapy.method._neural.QuaNetModule(doc_embedding_size, n_classes, stats_size, lstm_hidden_size=64, lstm_nlayers=1, ff_layers=[1024, 512], bidirectional=True, qdrop_p=0.5, order_by=0)[source]

Bases: Module

Implements the QuaNet forward pass. See QuaNetTrainer for training QuaNet.

Parameters:

doc_embedding_size – integer, the dimensionality of the document embeddings
n_classes – integer, number of classes
stats_size – integer, number of statistics estimated by simple quantification methods
lstm_hidden_size – integer, hidden dimensionality of the LSTM cell
lstm_nlayers – integer, number of LSTM layers
ff_layers – list of integers, dimensions of the densely-connected FF layers on top of the quantification embedding
bidirectional – boolean, whether or not to use bidirectional LSTM
qdrop_p – float, dropout probability
order_by – integer, class for which the document embeddings are to be sorted

property device

forward(doc_embeddings, doc_posteriors, statistics)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class quapy.method._neural.QuaNetTrainer(classifier, sample_size=None, n_epochs=100, tr_iter_per_poch=500, va_iter_per_poch=100, lr=0.001, lstm_hidden_size=64, lstm_nlayers=1, ff_layers=[1024, 512], bidirectional=True, qdrop_p=0.5, patience=10, checkpointdir='../checkpoint', checkpointname=None, device='cuda')[source]

Bases: BaseQuantifier

Implementation of QuaNet, a neural network for quantification. This implementation uses PyTorch and can take advantage of GPU for speeding-up the training phase.

Example:

>>> import quapy as qp
>>> from quapy.method_name.meta import QuaNet
>>> from quapy.classification.neural import NeuralClassifierTrainer, CNNnet
>>>
>>> # use samples of 100 elements
>>> qp.environ['SAMPLE_SIZE'] = 100
>>>
>>> # load the kindle dataset as text, and convert words to numerical indexes
>>> dataset = qp.datasets.fetch_reviews('kindle', pickle=True)
>>> qp.train.preprocessing.index(dataset, min_df=5, inplace=True)
>>>
>>> # the text classifier is a CNN trained by NeuralClassifierTrainer
>>> cnn = CNNnet(dataset.vocabulary_size, dataset.n_classes)
>>> classifier = NeuralClassifierTrainer(cnn, device='cuda')
>>>
>>> # train QuaNet (QuaNet is an alias to QuaNetTrainer)
>>> model = QuaNet(classifier, qp.environ['SAMPLE_SIZE'], device='cuda')
>>> model.fit(dataset.training)
>>> estim_prevalence = model.quantify(dataset.test.instances)

Parameters:

classifier – an object implementing fit (i.e., that can be trained on labelled data), predict_proba (i.e., that can generate posterior probabilities of unlabelled examples) and transform (i.e., that can generate embedded representations of the unlabelled instances).
sample_size – integer, the sample size; default is None, meaning that the sample size should be taken from qp.environ[“SAMPLE_SIZE”]
n_epochs – integer, maximum number of training epochs
tr_iter_per_poch – integer, number of training iterations before considering an epoch complete
va_iter_per_poch – integer, number of validation iterations to perform after each epoch
lr – float, the learning rate
lstm_hidden_size – integer, hidden dimensionality of the LSTM cells
lstm_nlayers – integer, number of LSTM layers
ff_layers – list of integers, dimensions of the densely-connected FF layers on top of the quantification embedding
bidirectional – boolean, indicates whether the LSTM is bidirectional or not
qdrop_p – float, dropout probability
patience – integer, number of epochs showing no improvement in the validation set before stopping the training phase (early stopping)
checkpointdir – string, a path where to store models’ checkpoints
checkpointname – string (optional), the name of the model’s checkpoint
device – string, indicate “cpu” or “cuda”

property classes_

clean_checkpoint()[source]: Removes the checkpoint

clean_checkpoint_dir()[source]: Removes anything contained in the checkpoint directory

fit(data: LabelledCollection, fit_classifier=True)[source]

Trains QuaNet.

Parameters:

data – the training data on which to train QuaNet. If fit_classifier=True, the data will be split in 40/40/20 for training the classifier, training QuaNet, and validating QuaNet, respectively. If fit_classifier=False, the data will be split in 66/34 for training QuaNet and validating it, respectively.
fit_classifier – if True, trains the classifier on a split containing 40% of the data

Returns:

self

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

quantify(instances)[source]

Generate class prevalence estimates for the sample’s instances

Parameters:: instances – array-like
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

set_fit_request(*, data: bool | None | str = '$UNCHANGED$', fit_classifier: bool | None | str = '$UNCHANGED$') → QuaNetTrainer

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_params(**parameters)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

quapy.method._neural.mae_loss(output, target)[source]

Torch-like wrapper for the Mean Absolute Error

Parameters:

output – predictions
target – ground truth values

Returns:

mean absolute error loss

class quapy.method._threshold_optim.MAX(classifier: BaseEstimator = None, val_split=5)[source]

Bases: ThresholdOptimization

Threshold Optimization variant for ACC as proposed by Forman 2006 and Forman 2008 that looks for the threshold that maximizes tpr-fpr. The goal is to bring improved stability to the denominator of the adjustment.

Parameters:

classifier – a sklearn’s Estimator that generates a classifier
val_split – indicates the proportion of data to be used as a stratified held-out validation set in which the misclassification rates are to be estimated. This parameter can be indicated as a real value (between 0 and 1), representing a proportion of validation data, or as an integer, indicating that the misclassification rates should be estimated via k-fold cross validation (this integer stands for the number of folds k, defaults 5), or as a quapy.data.base.LabelledCollection (the split itself).

condition(tpr, fpr) → float[source]

Implements the criterion according to which the threshold should be selected. This function should return the (float) score to be minimized.

Parameters:

tpr – float, true positive rate
fpr – float, false positive rate

Returns:

float, a score for the given tpr and fpr

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

class quapy.method._threshold_optim.MS(classifier: BaseEstimator = None, val_split=5)[source]

Bases: ThresholdOptimization

Median Sweep. Threshold Optimization variant for ACC as proposed by Forman 2006 and Forman 2008 that generates class prevalence estimates for all decision thresholds and returns the median of them all. The goal is to bring improved stability to the denominator of the adjustment.

Parameters:

classifier – a sklearn’s Estimator that generates a classifier
val_split – indicates the proportion of data to be used as a stratified held-out validation set in which the misclassification rates are to be estimated. This parameter can be indicated as a real value (between 0 and 1), representing a proportion of validation data, or as an integer, indicating that the misclassification rates should be estimated via k-fold cross validation (this integer stands for the number of folds k, defaults 5), or as a quapy.data.base.LabelledCollection (the split itself).

aggregate(classif_predictions: ndarray)[source]

Implements the aggregation of label predictions.

Parameters:: classif_predictions – np.ndarray of label predictions
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]

Trains the aggregation function.

Parameters:

classif_predictions – a quapy.data.base.LabelledCollection containing, as instances, the predictions issued by the classifier and, as labels, the true labels
data – a quapy.data.base.LabelledCollection consisting of the training data

condition(tpr, fpr) → float[source]

Implements the criterion according to which the threshold should be selected. This function should return the (float) score to be minimized.

Parameters:

tpr – float, true positive rate
fpr – float, false positive rate

Returns:

float, a score for the given tpr and fpr

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

class quapy.method._threshold_optim.MS2(classifier: BaseEstimator = None, val_split=5)[source]

Bases: MS

Median Sweep 2. Threshold Optimization variant for ACC as proposed by Forman 2006 and Forman 2008 that generates class prevalence estimates for all decision thresholds and returns the median of for cases in which tpr-fpr>0.25 The goal is to bring improved stability to the denominator of the adjustment.

Parameters:

classifier – a sklearn’s Estimator that generates a classifier
val_split – indicates the proportion of data to be used as a stratified held-out validation set in which the misclassification rates are to be estimated. This parameter can be indicated as a real value (between 0 and 1), representing a proportion of validation data, or as an integer, indicating that the misclassification rates should be estimated via k-fold cross validation (this integer stands for the number of folds k, defaults 5), or as a quapy.data.base.LabelledCollection (the split itself).

discard(tpr, fpr) → bool[source]

Indicates whether a combination of tpr and fpr should be discarded

Parameters:

tpr – float, true positive rate
fpr – float, false positive rate

Returns:

true if the combination is to be discarded, false otherwise

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

class quapy.method._threshold_optim.T50(classifier: BaseEstimator = None, val_split=5)[source]

Bases: ThresholdOptimization

Threshold Optimization variant for ACC as proposed by Forman 2006 and Forman 2008 that looks for the threshold that makes tpr closest to 0.5. The goal is to bring improved stability to the denominator of the adjustment.

Parameters:

classifier – a sklearn’s Estimator that generates a classifier
val_split – indicates the proportion of data to be used as a stratified held-out validation set in which the misclassification rates are to be estimated. This parameter can be indicated as a real value (between 0 and 1), representing a proportion of validation data, or as an integer, indicating that the misclassification rates should be estimated via k-fold cross validation (this integer stands for the number of folds k, defaults 5), or as a quapy.data.base.LabelledCollection (the split itself).

condition(tpr, fpr) → float[source]

Implements the criterion according to which the threshold should be selected. This function should return the (float) score to be minimized.

Parameters:

tpr – float, true positive rate
fpr – float, false positive rate

Returns:

float, a score for the given tpr and fpr

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

class quapy.method._threshold_optim.ThresholdOptimization(classifier: BaseEstimator = None, val_split=None, n_jobs=None)[source]

Bases: BinaryAggregativeQuantifier

Abstract class of Threshold Optimization variants for ACC as proposed by Forman 2006 and Forman 2008. The goal is to bring improved stability to the denominator of the adjustment. The different variants are based on different heuristics for choosing a decision threshold that would allow for more true positives and many more false positives, on the grounds this would deliver larger denominators.

Parameters:

classifier – a sklearn’s Estimator that generates a classifier
val_split – indicates the proportion of data to be used as a stratified held-out validation set in which the misclassification rates are to be estimated. This parameter can be indicated as a real value (between 0 and 1), representing a proportion of validation data, or as an integer, indicating that the misclassification rates should be estimated via k-fold cross validation (this integer stands for the number of folds k, defaults 5), or as a quapy.data.base.LabelledCollection (the split itself).

aggregate(classif_predictions: ndarray)[source]

Implements the aggregation of label predictions.

Parameters:: classif_predictions – np.ndarray of label predictions
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

aggregate_with_threshold(classif_predictions, tprs, fprs, thresholds)[source]

aggregation_fit(classif_predictions: LabelledCollection, data: LabelledCollection)[source]

Trains the aggregation function.

Parameters:

classif_predictions – a quapy.data.base.LabelledCollection containing, as instances, the predictions issued by the classifier and, as labels, the true labels
data – a quapy.data.base.LabelledCollection consisting of the training data

abstract condition(tpr, fpr) → float[source]

Implements the criterion according to which the threshold should be selected. This function should return the (float) score to be minimized.

Parameters:

tpr – float, true positive rate
fpr – float, false positive rate

Returns:

float, a score for the given tpr and fpr

discard(tpr, fpr) → bool[source]

Indicates whether a combination of tpr and fpr should be discarded

Parameters:

tpr – float, true positive rate
fpr – float, false positive rate

Returns:

true if the combination is to be discarded, false otherwise

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

class quapy.method._threshold_optim.X(classifier: BaseEstimator = None, val_split=5)[source]

Bases: ThresholdOptimization

Threshold Optimization variant for ACC as proposed by Forman 2006 and Forman 2008 that looks for the threshold that yields tpr=1-fpr. The goal is to bring improved stability to the denominator of the adjustment.

Parameters:

classifier – a sklearn’s Estimator that generates a classifier
val_split – indicates the proportion of data to be used as a stratified held-out validation set in which the misclassification rates are to be estimated. This parameter can be indicated as a real value (between 0 and 1), representing a proportion of validation data, or as an integer, indicating that the misclassification rates should be estimated via k-fold cross validation (this integer stands for the number of folds k, defaults 5), or as a quapy.data.base.LabelledCollection (the split itself).

condition(tpr, fpr) → float[source]

Implements the criterion according to which the threshold should be selected. This function should return the (float) score to be minimized.

Parameters:

tpr – float, true positive rate
fpr – float, false positive rate

Returns:

float, a score for the given tpr and fpr

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

quapy.method.base module

class quapy.method.base.BaseQuantifier[source]

Bases: BaseEstimator

Abstract Quantifier. A quantifier is defined as an object of a class that implements the method fit() on quapy.data.base.LabelledCollection, the method quantify(), and the set_params() and get_params() for model selection (see quapy.model_selection.GridSearchQ())

abstract fit(data: LabelledCollection)[source]

Trains a quantifier.

Parameters:: data – a quapy.data.base.LabelledCollection consisting of the training data
Returns:: self

abstract quantify(instances)[source]

Generate class prevalence estimates for the sample’s instances

Parameters:: instances – array-like
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

set_fit_request(*, data: bool | None | str = '$UNCHANGED$') → BaseQuantifier

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
Returns:: self – The updated object.
Return type:: object

class quapy.method.base.BinaryQuantifier[source]

Bases: BaseQuantifier

Abstract class of binary quantifiers, i.e., quantifiers estimating class prevalence values for only two classes (typically, to be interpreted as one class and its complement).

set_fit_request(*, data: bool | None | str = '$UNCHANGED$') → BinaryQuantifier

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
Returns:: self – The updated object.
Return type:: object

class quapy.method.base.OneVsAll[source]: Bases: object

class quapy.method.base.OneVsAllGeneric(binary_quantifier: BaseQuantifier, n_jobs=None)[source]

Bases: OneVsAll, BaseQuantifier

Allows any binary quantifier to perform quantification on single-label datasets. The method maintains one binary quantifier for each class, and then l1-normalizes the outputs so that the class prevelence values sum up to 1.

property classes_

fit(data: LabelledCollection, fit_classifier=True)[source]

Trains a quantifier.

Parameters:: data – a quapy.data.base.LabelledCollection consisting of the training data
Returns:: self

quantify(instances)[source]

Generate class prevalence estimates for the sample’s instances

Parameters:: instances – array-like
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

set_fit_request(*, data: bool | None | str = '$UNCHANGED$', fit_classifier: bool | None | str = '$UNCHANGED$') → OneVsAllGeneric

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
fit_classifier (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fit_classifier parameter in fit.

Returns:

self – The updated object.

Return type:

object

quapy.method.base.newOneVsAll(binary_quantifier: BaseQuantifier, n_jobs=None)[source]

quapy.method.meta module

quapy.method.meta.EACC(classifier, param_grid=None, optim=None, param_mod_sel=None, **kwargs)[source]

Implements an ensemble of quapy.method.aggregative.ACC quantifiers, as used by Pérez-Gállego et al., 2019.

Equivalent to:

>>> ensembleFactory(classifier, ACC, param_grid, optim, param_mod_sel, **kwargs)

See ensembleFactory() for further details.

Parameters:

classifier – sklearn’s Estimator that generates a classifier
param_grid – a dictionary with the grid of parameters to optimize for
optim – a valid quantification or classification error, or a string name of it
param_model_sel – a dictionary containing any keyworded argument to pass to quapy.model_selection.GridSearchQ
kwargs – kwargs for the class Ensemble

Returns:

an instance of Ensemble

quapy.method.meta.ECC(classifier, param_grid=None, optim=None, param_mod_sel=None, **kwargs)[source]

Implements an ensemble of quapy.method.aggregative.CC quantifiers, as used by Pérez-Gállego et al., 2019.

Equivalent to:

>>> ensembleFactory(classifier, CC, param_grid, optim, param_mod_sel, **kwargs)

See ensembleFactory() for further details.

Parameters:

classifier – sklearn’s Estimator that generates a classifier
param_grid – a dictionary with the grid of parameters to optimize for
optim – a valid quantification or classification error, or a string name of it
param_model_sel – a dictionary containing any keyworded argument to pass to quapy.model_selection.GridSearchQ
kwargs – kwargs for the class Ensemble

Returns:

an instance of Ensemble

quapy.method.meta.EEMQ(classifier, param_grid=None, optim=None, param_mod_sel=None, **kwargs)[source]

Implements an ensemble of quapy.method.aggregative.EMQ quantifiers.

Equivalent to:

>>> ensembleFactory(classifier, EMQ, param_grid, optim, param_mod_sel, **kwargs)

See ensembleFactory() for further details.

Parameters:

classifier – sklearn’s Estimator that generates a classifier
param_grid – a dictionary with the grid of parameters to optimize for
optim – a valid quantification or classification error, or a string name of it
param_model_sel – a dictionary containing any keyworded argument to pass to quapy.model_selection.GridSearchQ
kwargs – kwargs for the class Ensemble

Returns:

an instance of Ensemble

quapy.method.meta.EHDy(classifier, param_grid=None, optim=None, param_mod_sel=None, **kwargs)[source]

Implements an ensemble of quapy.method.aggregative.HDy quantifiers, as used by Pérez-Gállego et al., 2019.

Equivalent to:

>>> ensembleFactory(classifier, HDy, param_grid, optim, param_mod_sel, **kwargs)

See ensembleFactory() for further details.

Parameters:

classifier – sklearn’s Estimator that generates a classifier
param_grid – a dictionary with the grid of parameters to optimize for
optim – a valid quantification or classification error, or a string name of it
param_model_sel – a dictionary containing any keyworded argument to pass to quapy.model_selection.GridSearchQ
kwargs – kwargs for the class Ensemble

Returns:

an instance of Ensemble

quapy.method.meta.EPACC(classifier, param_grid=None, optim=None, param_mod_sel=None, **kwargs)[source]

Implements an ensemble of quapy.method.aggregative.PACC quantifiers.

Equivalent to:

>>> ensembleFactory(classifier, PACC, param_grid, optim, param_mod_sel, **kwargs)

See ensembleFactory() for further details.

Parameters:

classifier – sklearn’s Estimator that generates a classifier
param_grid – a dictionary with the grid of parameters to optimize for
optim – a valid quantification or classification error, or a string name of it
param_model_sel – a dictionary containing any keyworded argument to pass to quapy.model_selection.GridSearchQ
kwargs – kwargs for the class Ensemble

Returns:

an instance of Ensemble

class quapy.method.meta.Ensemble(quantifier: BaseQuantifier, size=50, red_size=25, min_pos=5, policy='ave', max_sample_size=None, val_split: LabelledCollection | float = None, n_jobs=None, verbose=False)[source]

Bases: BaseQuantifier

VALID_POLICIES = {'ave', 'ds', 'mae', 'mkld', 'mnae', 'mnkld', 'mnrae', 'mrae', 'mse', 'ptr'}

Implementation of the Ensemble methods for quantification described by Pérez-Gállego et al., 2017 and Pérez-Gállego et al., 2019. The policies implemented include:

Average (policy=’ave’): computes class prevalence estimates as the average of the estimates returned by the base quantifiers.
Training Prevalence (policy=’ptr’): applies a dynamic selection to the ensemble’s members by retaining only those members such that the class prevalence values in the samples they use as training set are closest to preliminary class prevalence estimates computed as the average of the estimates of all the members. The final estimate is recomputed by considering only the selected members.
Distribution Similarity (policy=’ds’): performs a dynamic selection of base members by retaining the members trained on samples whose distribution of posterior probabilities is closest, in terms of the Hellinger Distance, to the distribution of posterior probabilities in the test sample
Accuracy (policy=’<valid error name>’): performs a static selection of the ensemble members by retaining those that minimize a quantification error measure, which is passed as an argument.

Example:

>>> model = Ensemble(quantifier=ACC(LogisticRegression()), size=30, policy='ave', n_jobs=-1)

Parameters:

quantifier – base quantification member of the ensemble
size – number of members
red_size – number of members to retain after selection (depending on the policy)
min_pos – minimum number of positive instances to consider a sample as valid
policy – the selection policy; available policies include: ave (default), ptr, ds, and accuracy (which is instantiated via a valid error name, e.g., mae)
max_sample_size – maximum number of instances to consider in the samples (set to None to indicate no limit, default)
val_split – a float in range (0,1) indicating the proportion of data to be used as a stratified held-out validation split, or a quapy.data.base.LabelledCollection (the split itself).
n_jobs – number of parallel workers (default 1)
verbose – set to True (default is False) to get some information in standard output

property aggregative

Indicates that the quantifier is not aggregative.

Returns:: False

fit(data: LabelledCollection, val_split: LabelledCollection | float = None)[source]

Trains a quantifier.

Parameters:: data – a quapy.data.base.LabelledCollection consisting of the training data
Returns:: self

get_params(deep=True)[source]

This function should not be used within quapy.model_selection.GridSearchQ (is here for compatibility with the abstract class). Instead, use Ensemble(GridSearchQ(q),…), with q a Quantifier (recommended), or Ensemble(Q(GridSearchCV(l))) with Q a quantifier class that has a classifier l optimized for classification (not recommended).

Parameters:: deep – for compatibility with scikit-learn
Returns:: raises an Exception

property probabilistic

Indicates that the quantifier is not probabilistic.

Returns:: False

quantify(instances)[source]

Generate class prevalence estimates for the sample’s instances

Parameters:: instances – array-like
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

set_fit_request(*, data: bool | None | str = '$UNCHANGED$', val_split: bool | None | str = '$UNCHANGED$') → Ensemble

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
val_split (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for val_split parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_params(**parameters)[source]

This function should not be used within quapy.model_selection.GridSearchQ (is here for compatibility with the abstract class). Instead, use Ensemble(GridSearchQ(q),…), with q a Quantifier (recommended), or Ensemble(Q(GridSearchCV(l))) with Q a quantifier class that has a classifier l optimized for classification (not recommended).

Parameters:: parameters – dictionary
Returns:: raises an Exception

class quapy.method.meta.MedianEstimator(base_quantifier: BinaryQuantifier, param_grid: dict, random_state=None, n_jobs=None)[source]

Bases: BinaryQuantifier

This method is a meta-quantifier that returns, as the estimated class prevalence values, the median of the estimation returned by differently (hyper)parameterized base quantifiers. The median of unit-vectors is only guaranteed to be a unit-vector for n=2 dimensions, i.e., in cases of binary quantification.

Parameters:

base_quantifier – the base, binary quantifier
random_state – a seed to be set before fitting any base quantifier (default None)
param_grid – the grid or parameters towards which the median will be computed
n_jobs – number of parllel workes

fit(training: LabelledCollection)[source]

Trains a quantifier.

Parameters:: data – a quapy.data.base.LabelledCollection consisting of the training data
Returns:: self

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

quantify(instances)[source]

Generate class prevalence estimates for the sample’s instances

Parameters:: instances – array-like
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

set_fit_request(*, training: bool | None | str = '$UNCHANGED$') → MedianEstimator

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: training (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for training parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

class quapy.method.meta.MedianEstimator2(base_quantifier: BinaryQuantifier, param_grid: dict, random_state=None, n_jobs=None)[source]

Bases: BinaryQuantifier

This method is a meta-quantifier that returns, as the estimated class prevalence values, the median of the estimation returned by differently (hyper)parameterized base quantifiers. The median of unit-vectors is only guaranteed to be a unit-vector for n=2 dimensions, i.e., in cases of binary quantification.

Parameters:

base_quantifier – the base, binary quantifier
random_state – a seed to be set before fitting any base quantifier (default None)
param_grid – the grid or parameters towards which the median will be computed
n_jobs – number of parllel workes

fit(training: LabelledCollection)[source]

Trains a quantifier.

Parameters:: data – a quapy.data.base.LabelledCollection consisting of the training data
Returns:: self

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

quantify(instances)[source]

Generate class prevalence estimates for the sample’s instances

Parameters:: instances – array-like
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

set_fit_request(*, training: bool | None | str = '$UNCHANGED$') → MedianEstimator2

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: training (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for training parameter in fit.
Returns:: self – The updated object.
Return type:: object

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

quapy.method.meta.ensembleFactory(classifier, base_quantifier_class, param_grid=None, optim=None, param_model_sel: dict = None, **kwargs)[source]

Ensemble factory. Provides a unified interface for instantiating ensembles that can be optimized (via model selection for quantification) for a given evaluation metric using quapy.model_selection.GridSearchQ. If the evaluation metric is classification-oriented (instead of quantification-oriented), then the optimization will be carried out via sklearn’s GridSearchCV.

Example to instantiate an Ensemble based on quapy.method.aggregative.PACC in which the base members are optimized for quapy.error.mae() via quapy.model_selection.GridSearchQ. The ensemble follows the policy Accuracy based on quapy.error.mae() (the same measure being optimized), meaning that a static selection of members of the ensemble is made based on their performance in terms of this error.

>>> param_grid = {
>>>     'C': np.logspace(-3,3,7),
>>>     'class_weight': ['balanced', None]
>>> }
>>> param_mod_sel = {
>>>     'sample_size': 500,
>>>     'protocol': 'app'
>>> }
>>> common={
>>>     'max_sample_size': 1000,
>>>     'n_jobs': -1,
>>>     'param_grid': param_grid,
>>>     'param_mod_sel': param_mod_sel,
>>> }
>>>
>>> ensembleFactory(LogisticRegression(), PACC, optim='mae', policy='mae', **common)

Parameters:

classifier – sklearn’s Estimator that generates a classifier
base_quantifier_class – a class of quantifiers
param_grid – a dictionary with the grid of parameters to optimize for
optim – a valid quantification or classification error, or a string name of it
param_model_sel – a dictionary containing any keyworded argument to pass to quapy.model_selection.GridSearchQ
kwargs – kwargs for the class Ensemble

Returns:

an instance of Ensemble

quapy.method.meta.get_probability_distribution(posterior_probabilities, bins=8)[source]

Gets a histogram out of the posterior probabilities (only for the binary case).

Parameters:

posterior_probabilities – array-like of shape (n_instances, 2,)
bins – integer

Returns:

np.ndarray with the relative frequencies for each bin (for the positive class only)

quapy.method.non_aggregative module

class quapy.method.non_aggregative.DMx(nbins=8, divergence: str | Callable = 'HD', cdf=False, search='optim_minimize', n_jobs=None)[source]

Bases: BaseQuantifier

Generic Distribution Matching quantifier for binary or multiclass quantification based on the space of covariates. This implementation takes the number of bins, the divergence, and the possibility to work on CDF as hyperparameters.

Parameters:

nbins – number of bins used to discretize the distributions (default 8)
divergence – a string representing a divergence measure (currently, “HD” and “topsoe” are implemented) or a callable function taking two ndarrays of the same dimension as input (default “HD”, meaning Hellinger Distance)
cdf – whether to use CDF instead of PDF (default False)
n_jobs – number of parallel workers (default None)

classmethod HDx(n_jobs=None)[source]

Hellinger Distance x (HDx). HDx is a method for training binary quantifiers, that models quantification as the problem of minimizing the average divergence (in terms of the Hellinger Distance) across the feature-specific normalized histograms of two representations, one for the unlabelled examples, and another generated from the training examples as a mixture model of the class-specific representations. The parameters of the mixture thus represent the estimates of the class prevalence values.

The method computes all matchings for nbins in [10, 20, …, 110] and reports the mean of the median. The best prevalence is searched via linear search, from 0 to 1 stepping by 0.01.

Parameters:: n_jobs – number of parallel workers
Returns:: an instance of this class setup to mimick the performance of the HDx as originally proposed by González-Castro, Alaiz-Rodríguez, Alegre (2013)

fit(data: LabelledCollection)[source]

Generates the validation distributions out of the training data (covariates). The validation distributions have shape (n, nfeats, nbins), with n the number of classes, nfeats the number of features, and nbins the number of bins. In particular, let V be the validation distributions; then di=V[i] are the distributions obtained from training data labelled with class i; while dij = di[j] is the discrete distribution for feature j in training data labelled with class i, and dij[k] is the fraction of instances with a value in the k-th bin.

Parameters:: data – the training set

quantify(instances)[source]

Searches for the mixture model parameter (the sought prevalence values) that yields a validation distribution (the mixture) that best matches the test distribution, in terms of the divergence measure of choice. The matching is computed as the average dissimilarity (in terms of the dissimilarity measure of choice) between all feature-specific discrete distributions.

Parameters:: instances – instances in the sample
Returns:: a vector of class prevalence estimates

set_fit_request(*, data: bool | None | str = '$UNCHANGED$') → DMx

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
Returns:: self – The updated object.
Return type:: object

quapy.method.non_aggregative.DistributionMatchingX: alias of DMx

class quapy.method.non_aggregative.MaximumLikelihoodPrevalenceEstimation[source]

Bases: BaseQuantifier

The Maximum Likelihood Prevalence Estimation (MLPE) method is a lazy method that assumes there is no prior probability shift between training and test instances (put it other way, that the i.i.d. assumpion holds). The estimation of class prevalence values for any test sample is always (i.e., irrespective of the test sample itself) the class prevalence seen during training. This method is considered to be a lower-bound quantifier that any quantification method should beat.

fit(data: LabelledCollection)[source]

Computes the training prevalence and stores it.

Parameters:: data – the training sample
Returns:: self

quantify(instances)[source]

Ignores the input instances and returns, as the class prevalence estimantes, the training prevalence.

Parameters:: instances – array-like (ignored)
Returns:: the class prevalence seen during training

set_fit_request(*, data: bool | None | str = '$UNCHANGED$') → MaximumLikelihoodPrevalenceEstimation

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
Returns:: self – The updated object.
Return type:: object

class quapy.method.non_aggregative.ReadMe(bootstrap_trials=100, bootstrap_range=100, bagging_trials=100, bagging_range=25, **vectorizer_kwargs)[source]

Bases: BaseQuantifier

fit(data: LabelledCollection)[source]

Trains a quantifier.

Parameters:: data – a quapy.data.base.LabelledCollection consisting of the training data
Returns:: self

quantify(instances)[source]

Generate class prevalence estimates for the sample’s instances

Parameters:: instances – array-like
Returns:: np.ndarray of shape (n_classes,) with class prevalence estimates.

set_fit_request(*, data: bool | None | str = '$UNCHANGED$') → ReadMe

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: data (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for data parameter in fit.
Returns:: self – The updated object.
Return type:: object

std_constrained_linear_ls(X, class_cond_X: dict)[source]

quapy.method.composable module

This module allows the composition of quantification methods from loss functions and feature transformations. This functionality is realized through an integration of the qunfold package: https://github.com/mirkobunse/qunfold.

class quapy.method.composable.BlobelLoss[source]

Bases: FunctionLoss

The loss function of RUN (Blobel, 1985).

This loss function models a likelihood function under the assumption of independent Poisson-distributed elements of q with Poisson rates M*p.

class quapy.method.composable.CVClassifier(estimator, n_estimators=5, random_state=None)[source]

Bases: BaseEstimator, ClassifierMixin

An ensemble of classifiers that are trained from cross-validation folds.

All objects of this type have a fixed attribute oob_score = True and, when trained, a fitted attribute self.oob_decision_function_, just like scikit-learn bagging classifiers.

Parameters:

estimator – A classifier that implements the API of scikit-learn.
n_estimators (optional) – The number of stratified cross-validation folds. Defaults to 5.
random_state (optional) – The random state for stratification. Defaults to None.

Examples

Here, we create an instance of ACC that trains a logistic regression classifier with 10 cross-validation folds.

>>> ACC(CVClassifier(LogisticRegression(), 10))

fit(X, y)[source]

predict(X)[source]

predict_proba(X)[source]

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → CVClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

class quapy.method.composable.ClassTransformer(classifier, is_probabilistic=False, fit_classifier=True)[source]

Bases: AbstractTransformer

A classification-based feature transformation.

This transformation can either be probabilistic (using the posterior predictions of a classifier) or crisp (using the class predictions of a classifier). It is used in ACC, PACC, CC, PCC, and SLD.

Parameters:

classifier – A classifier that implements the API of scikit-learn.
is_probabilistic (optional) – Whether probabilistic or crisp predictions of the classifier are used to transform the data. Defaults to False.
fit_classifier (optional) – Whether to fit the classifier when this quantifier is fitted. Defaults to True.

fit_transform(X, y, average=True, n_classes=None)[source]

This abstract method has to fit the transformer and to return the transformation of the input data.

Note

Implementations of this abstract method should check the sanity of labels by calling _check_y(y, n_classes) and they must set the property self.p_trn = class_prevalences(y, n_classes).

Parameters:

X – The feature matrix to which this transformer will be fitted.
y – The labels to which this transformer will be fitted.
average (optional) – Whether to return a transfer matrix M or a transformation (f(X), y). Defaults to True.
n_classes (optional) – The number of expected classes. Defaults to None.

Returns:

A transfer matrix M if average==True or a transformation (f(X), y) if average==False.

transform(X, average=True)[source]

This abstract method has to transform the data X.

Parameters:

X – The feature matrix that will be transformed.
average (optional) – Whether to return a vector q or a transformation f(X). Defaults to True.

Returns:

A vector q = f(X).mean(axis=0) if average==True or a transformation f(X) if average==False.

class quapy.method.composable.CombinedLoss(*losses, weights=None)[source]

Bases: AbstractLoss

The weighted sum of multiple losses.

Parameters:

*losses – An arbitrary number of losses to be added together.
weights (optional) – An array of weights which the losses are scaled.

quapy.method.composable.ComposableQuantifier(loss, transformer, **kwargs)[source]

A generic quantification / unfolding method that solves a linear system of equations.

This class represents any quantifier that can be described in terms of a loss function, a feature transformation, and a regularization term. In this implementation, the loss is minimized through unconstrained second-order minimization. Valid probability estimates are ensured through a soft-max trick by Bunse (2022).

Parameters:

loss – An instance of a loss class from quapy.methods.composable.
transformer – An instance of a transformer class from quapy.methods.composable.
solver (optional) – The method argument in scipy.optimize.minimize. Defaults to “trust-ncg”.
solver_options (optional) – The options argument in scipy.optimize.minimize. Defaults to {“gtol”: 1e-8, “maxiter”: 1000}.
seed (optional) – A random number generator seed from which a numpy RandomState is created. Defaults to None.

Examples

Here, we create the ordinal variant of ACC (Bunse et al., 2023). This variant consists of the original feature transformation of ACC and of the original loss of ACC, the latter of which is regularized towards smooth solutions.

>>> from quapy.method.composable import (
>>>     ComposableQuantifier,
>>>     TikhonovRegularized,
>>>     LeastSquaresLoss,
>>>     ClassTransformer,
>>> )
>>> from sklearn.ensemble import RandomForestClassifier
>>> o_acc = ComposableQuantifier(
>>>     TikhonovRegularized(LeastSquaresLoss(), 0.01),
>>>     ClassTransformer(RandomForestClassifier(oob_score=True))
>>> )

Here, we perform hyper-parameter optimization with the ordinal ACC.

>>> quapy.model_selection.GridSearchQ(
>>>     model = o_acc,
>>>     param_grid = { # try both splitting criteria
>>>         "transformer__classifier__estimator__criterion": ["gini", "entropy"],
>>>     },
>>>     # ...
>>> )

To use a classifier that does not provide the oob_score argument, such as logistic regression, you have to configure a cross validation of this classifier. Here, we employ 10 cross validation folds. 5 folds are the default.

>>> from quapy.method.composable import CVClassifier
>>> from sklearn.linear_model import LogisticRegression
>>> acc_lr = ComposableQuantifier(
>>>     LeastSquaresLoss(),
>>>     ClassTransformer(CVClassifier(LogisticRegression(), 10))
>>> )

class quapy.method.composable.DistanceTransformer(metric='euclidean', preprocessor=None)[source]

Bases: AbstractTransformer

A distance-based feature transformation, as it is used in EDx and EDy.

Parameters:

metric (optional) – The metric with which the distance between data items is measured. Can take any value that is accepted by scipy.spatial.distance.cdist. Defaults to “euclidean”.
preprocessor (optional) – Another AbstractTransformer that is called before this transformer. Defaults to None.

fit_transform(X, y, average=True, n_classes=None)[source]

This abstract method has to fit the transformer and to return the transformation of the input data.

Note

Implementations of this abstract method should check the sanity of labels by calling _check_y(y, n_classes) and they must set the property self.p_trn = class_prevalences(y, n_classes).

Parameters:

X – The feature matrix to which this transformer will be fitted.
y – The labels to which this transformer will be fitted.
average (optional) – Whether to return a transfer matrix M or a transformation (f(X), y). Defaults to True.
n_classes (optional) – The number of expected classes. Defaults to None.

Returns:

A transfer matrix M if average==True or a transformation (f(X), y) if average==False.

transform(X, average=True)[source]

This abstract method has to transform the data X.

Parameters:

X – The feature matrix that will be transformed.
average (optional) – Whether to return a vector q or a transformation f(X). Defaults to True.

Returns:

A vector q = f(X).mean(axis=0) if average==True or a transformation f(X) if average==False.

class quapy.method.composable.EnergyKernelTransformer(preprocessor=None)[source]

Bases: AbstractTransformer

A kernel-based feature transformation, as it is used in KMM, that uses the energy kernel:

k(x_1, x_2) = ||x_1|| + ||x_2|| - ||x_1 - x_2||

Note

The methods of this transformer do not support setting average=False.

Parameters:: preprocessor (optional) – Another AbstractTransformer that is called before this transformer. Defaults to None.

fit_transform(X, y, average=True, n_classes=None)[source]

This abstract method has to fit the transformer and to return the transformation of the input data.

Note

Implementations of this abstract method should check the sanity of labels by calling _check_y(y, n_classes) and they must set the property self.p_trn = class_prevalences(y, n_classes).

Parameters:

X – The feature matrix to which this transformer will be fitted.
y – The labels to which this transformer will be fitted.
average (optional) – Whether to return a transfer matrix M or a transformation (f(X), y). Defaults to True.
n_classes (optional) – The number of expected classes. Defaults to None.

Returns:

A transfer matrix M if average==True or a transformation (f(X), y) if average==False.

transform(X, average=True)[source]

This abstract method has to transform the data X.

Parameters:

X – The feature matrix that will be transformed.
average (optional) – Whether to return a vector q or a transformation f(X). Defaults to True.

Returns:

A vector q = f(X).mean(axis=0) if average==True or a transformation f(X) if average==False.

class quapy.method.composable.EnergyLoss[source]

Bases: FunctionLoss

The loss function of EDx (Kawakubo et al., 2016) and EDy (Castaño et al., 2022).

This loss function represents the Energy Distance between two samples.

class quapy.method.composable.GaussianKernelTransformer(sigma=1, preprocessor=None)[source]

Bases: AbstractTransformer

A kernel-based feature transformation, as it is used in KMM, that uses the gaussian kernel:

k(x, y) = exp(-||x - y||^2 / (2σ^2))

Parameters:

sigma (optional) – A smoothing parameter of the kernel function. Defaults to 1.
preprocessor (optional) – Another AbstractTransformer that is called before this transformer. Defaults to None.

fit_transform(X, y, average=True, n_classes=None)[source]

This abstract method has to fit the transformer and to return the transformation of the input data.

Note

Implementations of this abstract method should check the sanity of labels by calling _check_y(y, n_classes) and they must set the property self.p_trn = class_prevalences(y, n_classes).

Parameters:

X – The feature matrix to which this transformer will be fitted.
y – The labels to which this transformer will be fitted.
average (optional) – Whether to return a transfer matrix M or a transformation (f(X), y). Defaults to True.
n_classes (optional) – The number of expected classes. Defaults to None.

Returns:

A transfer matrix M if average==True or a transformation (f(X), y) if average==False.

transform(X, average=True)[source]

This abstract method has to transform the data X.

Parameters:

X – The feature matrix that will be transformed.
average (optional) – Whether to return a vector q or a transformation f(X). Defaults to True.

Returns:

A vector q = f(X).mean(axis=0) if average==True or a transformation f(X) if average==False.

class quapy.method.composable.GaussianRFFKernelTransformer(sigma=1, n_rff=1000, preprocessor=None, seed=None)[source]

Bases: AbstractTransformer

An efficient approximation of the GaussianKernelTransformer, as it is used in KMM, using random Fourier features.

Parameters:

sigma (optional) – A smoothing parameter of the kernel function. Defaults to 1.
n_rff (optional) – The number of random Fourier features. Defaults to 1000.
preprocessor (optional) – Another AbstractTransformer that is called before this transformer. Defaults to None.
seed (optional) – Controls the randomness of the random Fourier features. Defaults to None.

fit_transform(X, y, average=True, n_classes=None)[source]

This abstract method has to fit the transformer and to return the transformation of the input data.

Note

Implementations of this abstract method should check the sanity of labels by calling _check_y(y, n_classes) and they must set the property self.p_trn = class_prevalences(y, n_classes).

Parameters:

X – The feature matrix to which this transformer will be fitted.
y – The labels to which this transformer will be fitted.
average (optional) – Whether to return a transfer matrix M or a transformation (f(X), y). Defaults to True.
n_classes (optional) – The number of expected classes. Defaults to None.

Returns:

A transfer matrix M if average==True or a transformation (f(X), y) if average==False.

transform(X, average=True)[source]

This abstract method has to transform the data X.

Parameters:

X – The feature matrix that will be transformed.
average (optional) – Whether to return a vector q or a transformation f(X). Defaults to True.

Returns:

A vector q = f(X).mean(axis=0) if average==True or a transformation f(X) if average==False.

class quapy.method.composable.HellingerSurrogateLoss[source]

Bases: FunctionLoss

The loss function of HDx and HDy (González-Castro et al., 2013).

This loss function computes the average of the squared Hellinger distances between feature-wise (or class-wise) histograms. Note that the original HDx and HDy by González-Castro et al (2013) do not use the squared but the regular Hellinger distance. Their approach is problematic because the regular distance is not always twice differentiable and, hence, complicates numerical optimizations.

class quapy.method.composable.HistogramTransformer(n_bins, preprocessor=None, unit_scale=True)[source]

Bases: AbstractTransformer

A histogram-based feature transformation, as it is used in HDx and HDy.

Parameters:

n_bins – The number of bins in each feature.
preprocessor (optional) – Another AbstractTransformer that is called before this transformer. Defaults to None.
unit_scale (optional) – Whether or not to scale each output to a sum of one. A value of False indicates that the sum of each output is the number of features. Defaults to True.

fit_transform(X, y, average=True, n_classes=None)[source]

This abstract method has to fit the transformer and to return the transformation of the input data.

Note

Implementations of this abstract method should check the sanity of labels by calling _check_y(y, n_classes) and they must set the property self.p_trn = class_prevalences(y, n_classes).

Parameters:

X – The feature matrix to which this transformer will be fitted.
y – The labels to which this transformer will be fitted.
average (optional) – Whether to return a transfer matrix M or a transformation (f(X), y). Defaults to True.
n_classes (optional) – The number of expected classes. Defaults to None.

Returns:

A transfer matrix M if average==True or a transformation (f(X), y) if average==False.

transform(X, average=True)[source]

This abstract method has to transform the data X.

Parameters:

X – The feature matrix that will be transformed.
average (optional) – Whether to return a vector q or a transformation f(X). Defaults to True.

Returns:

A vector q = f(X).mean(axis=0) if average==True or a transformation f(X) if average==False.

class quapy.method.composable.KernelTransformer(kernel)[source]

Bases: AbstractTransformer

A general kernel-based feature transformation, as it is used in KMM. If you intend to use a Gaussian kernel or energy kernel, prefer their dedicated and more efficient implementations over this class.

Note

The methods of this transformer do not support setting average=False.

Parameters:: kernel – A callable that will be used as the kernel. Must follow the signature (X[y==i], X[y==j]) -> scalar.

fit_transform(X, y, average=True, n_classes=None)[source]

This abstract method has to fit the transformer and to return the transformation of the input data.

Note

Implementations of this abstract method should check the sanity of labels by calling _check_y(y, n_classes) and they must set the property self.p_trn = class_prevalences(y, n_classes).

Parameters:

X – The feature matrix to which this transformer will be fitted.
y – The labels to which this transformer will be fitted.
average (optional) – Whether to return a transfer matrix M or a transformation (f(X), y). Defaults to True.
n_classes (optional) – The number of expected classes. Defaults to None.

Returns:

A transfer matrix M if average==True or a transformation (f(X), y) if average==False.

transform(X, average=True)[source]

This abstract method has to transform the data X.

Parameters:

X – The feature matrix that will be transformed.
average (optional) – Whether to return a vector q or a transformation f(X). Defaults to True.

Returns:

A vector q = f(X).mean(axis=0) if average==True or a transformation f(X) if average==False.

class quapy.method.composable.LaplacianKernelTransformer(sigma=1)[source]

Bases: KernelTransformer

A kernel-based feature transformation, as it is used in KMM, that uses the laplacian kernel.

Parameters:: sigma (optional) – A smoothing parameter of the kernel function. Defaults to 1.

property kernel

class quapy.method.composable.LeastSquaresLoss[source]

Bases: FunctionLoss

The loss function of ACC (Forman, 2008), PACC (Bella et al., 2019), and ReadMe (Hopkins & King, 2010).

This loss function computes the sum of squares of element-wise errors between q and M*p.

class quapy.method.composable.TikhonovRegularization[source]

Bases: AbstractLoss

Tikhonov regularization, as proposed by Blobel (1985).

This regularization promotes smooth solutions. This behavior is often required in ordinal quantification and in unfolding problems.

quapy.method.composable.TikhonovRegularized(loss, tau=0.0)[source]

Add TikhonovRegularization (Blobel, 1985) to any loss.

Calling this function is equivalent to calling

>>> CombinedLoss(loss, TikhonovRegularization(), weights=[1, tau])

Parameters:

loss – An instance from qunfold.losses.
tau (optional) – The regularization strength. Defaults to 0.

Returns:

An instance of CombinedLoss.

Examples

The regularized loss of RUN (Blobel, 1985) is:

>>> TikhonovRegularization(BlobelLoss(), tau)

quapy.method package

Submodules

quapy.method.aggregative module

quapy.method.base module

quapy.method.meta module

quapy.method.non_aggregative module

quapy.method.composable module

Module contents