Q-means API

Module for quantum k-means algorithm with a class containing sk-learn style functions resembling the k-means algorithm.

This module contains the QuantumKMeans class for clustering according to euclidian distances calculated by running quantum circuits.

Typical usage example:

import numpy as np
import pandas as pd
from qmeans.qmeans import *

backend = AerSimulator()
X = pd.DataFrame(np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]))
q_means = QuantumKMeans(backend, n_clusters=2, verbose=True)
q_means.fit(X)
print(q_means.labels_)
class qmeans.QuantumKMeans(backend: Backend = AerSimulator('aer_simulator'), n_clusters: int = 2, init: str = 'random', tol: float = 0.0001, max_iter: int = 300, verbose: bool = False, map_type: str = 'probability', shots: int = 1024, norm_relevance: bool = False, initial_center: str = 'random', noise_model: NoiseModel = None)

Quantum k-means clustering algorithm. This k-means alternative implements quantum machine learning to calculate distances between data points and centroids using quantum circuits.

Parameters:
  • n_clusters – The number of clusters to use and the amount of centroids generated.

  • init – {‘q-means++, ‘random’}, callable or array-like of shape (n_clusters, n_features)

  • initialization (Method for) – ‘q-means++’ : selects initial cluster centers for q-mean clustering in a smart way to speed up convergence. ‘random’: choose n_clusters observations (rows) at random from data for the initial centroids. If an array is passed, it should be of shape (n_clusters, n_features) and gives the initial centers. If a callable is passed, it should take arguments X, n_clusters and a random state and return an initialization.

  • tol – Relative tolerance with regards to Frobenius norm of the difference in the cluster centers of two consecutive iterations to declare convergence.

  • verbose – Defines if verbosity is active for deeper insight into the class processes.

  • max_iter – Maximum number of iterations of the quantum k-means algorithm for a single run.

  • backend – IBM quantum device to run the quantum k-means algorithm on.

  • map_type – {‘angle’, ‘probability’} Specifies the type of data encoding. ‘angle’: Uses U3 gates with its theta angle being the phase angle of the complex data point. ‘probability’: Relies on data normalization to preprocess the data to acquire a norm of 1.

  • shots – Number of repetitions of each circuit, for sampling.

  • norm_relevance – If true, maps two-dimensional data onto 2 angles, one for the angle between both data points and another for the magnitude of the data points.

  • initial_center – {‘random’, ‘far’} Speficies the strategy for setting the initial cluster center. ‘random’: Assigns a random initial center. ‘far’: Specifies the furthest point as the initial center.

  • noise_model – Noise model to use when runnings circuits on a simulator.

cluster_centers_

Coordinates of cluster centers.

labels_

Centroid labels for each data point.

n_iter_

Number of iterations run before convergence.

fit(X: ndarray, y: ndarray = None, batch: bool = False)

Computes quantum k-means clustering.

Parameters:
  • X – Training instances to cluster.

  • batch – Option for using batches to calculate distances.

Returns:

Fitted estimator.

Return type:

self

get_params(deep: bool = True)

Get parameters for this estimator.

Parameters:

deep – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

Parameter names mapped to their values.

Return type:

params

predict(X: ndarray, sample_weight: ndarray = None, batch: bool = False)

Predict the closest cluster each sample in X belongs to.

Parameters:
  • X – New data points to predict.

  • sample_weight – The weights for each observation in X. If None, all observations are

  • weight. (assigned equal)

  • batch – Option for using batches to calculate distances.

Returns:

Centroid labels for each data point.

Return type:

labels

set_fit_request(*, batch: bool | None | str = '$UNCHANGED$') QuantumKMeans

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

batch (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for batch parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_params(**params)

Set the parameters of this estimator.

Parameters:

**params – Estimator parameters.

Returns:

Estimator instance.

Return type:

self

set_predict_request(*, batch: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$') QuantumKMeans

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • batch (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for batch parameter in predict.

  • sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in predict.

Returns:

self – The updated object.

Return type:

object

qmeans.batch_collect(batch_d: ndarray, desired_shape: Tuple[int, int])

Collects batches of distances.

Retrieves batches of distances and transforms the shape of the data to a desired shape.

Parameters:
  • batch_d – Batches of distances.

  • desired_shape – The shape of the collected distances.

Returns:

Transformed distances.

Return type:

final_batch_d

qmeans.batch_distance(B: Tuple[ndarray, ndarray], backend: Backend, norm_B: ndarray, map_type: str = 'angle', shots: int = 1024)

Finds the distance between pairs of data points and cluster centers inside a batch by mapping the data points onto qubits using amplitude or angle encoding and then using a swap test.

The algorithm performs angle encoding if the type is ‘angle’ and amplitude encoding if the type is ‘probability’.

Parameters:
  • B – The batch of X data points and y cluster centers.

  • backend – IBM quantum device to calculate the distance with.

  • map_type – {‘angle’, ‘probability’} Specifies the type of data encoding. ‘angle’: Uses U3 gates with its theta angle being the phase angle of the complex data point. ‘probability’: Relies on data normalization to preprocess the data to acquire a norm of 1.

  • shots – Number of repetitions of each circuit, for sampling.

Returns:

Distance between the data points and cluster centers of the batch.

Return type:

distance

qmeans.batch_distances(X: ndarray, cluster_centers: ndarray, backend: Backend, map_type: str, shots: int, verbose: bool, norms: ndarray, cluster_norms: ndarray)

Batches data and calculates and collects distances.

Data is separated into batches, sent to the quantum device to calculate distances and the distances are then collected from the results.

Parameters:
  • X – Training instances to cluster.

  • cluster_centers – Coordinates of cluster centers.

  • backend – IBM quantum device to run the quantum k-means algorithm on.

  • map_type – {‘angle’, ‘probability’} Specifies the type of data encoding. ‘angle’: Uses U3 gates with its theta angle being the phase angle of the complex data point. ‘probability’: Relies on data normalization to preprocess the data to acquire a norm of 1.

  • shots – Number of repetitions of each circuit, for sampling.

  • verbose – Defines if verbosity is active for deeper insight into the class processes.

Returns:

Distance between the data points and cluster centers.

Return type:

distance

qmeans.batch_separate(X: ndarray, clusters: ndarray, max_experiments: int, norms: ndarray, cluster_norms: ndarray)

Creates batches of pairs of vectors.

Separates data points X and cluster centers into a number of batches of elements for distance calculations in a single job. Each batch contains a set of data points and cluster centers, corresponding to the data for distance measurements in each batch.

Parameters:
  • X – Training instances to cluster.

  • clusters – Cluster centers.

  • max_experiments – The amount of distance measurements in each batch.

Returns:

Batches with pairs of data points and cluster centers.

Return type:

B

qmeans.distance(x: ndarray, y: ndarray, backend: Backend, map_type: str = 'probability', shots: int = 1024, norms: ndarray = array([1, 1]), norm_relevance: bool = False, noise_model: NoiseModel = None)

Finds the distance between two data points by mapping the data points onto qubits using amplitude or angle encoding and then using a swap test.

The algorithm performs angle encoding if the type is ‘angle’ and amplitude encoding if the type is ‘probability’.

Parameters:
  • x – The first data point.

  • y – The second data point.

  • backend – IBM quantum device to calculate the distance with.

  • map_type – {‘angle’, ‘probability’} Specify the type of data encoding. ‘angle’: Uses U3 gates with its theta angle being the phase angle of the complex data point. ‘probability’: Relies on data normalization to preprocess the data to acquire a norm of 1.

  • shots – Number of repetitions of each circuit, for sampling.

  • norm_relevance – If true, maps two-dimensional data onto 2 angles, one for the angle between both data points and another for the magnitude of the data points.

  • noise_model – Noise model to use when runnings circuits on a simulator.

Returns:

Distance between the two data points.

Return type:

distance

qmeans.preprocess(points: ndarray, map_type: str = 'angle', norm_relevance: bool = False)

Preprocesses data points according to a type criteria.

The algorithm scales the data points if the type is ‘angle’ and normalizes the data points if the type is ‘probability’.

Parameters:
  • points – The input data points.

  • map_type – {‘angle’, ‘probability’} Specifies the type of data encoding. ‘angle’: Uses U3 gates with its theta angle being the phase angle of the complex data point. ‘probability’: Relies on data normalization to preprocess the data to acquire a norm of 1.

  • norm_relevance – If true, maps two-dimensional data onto 2 angles, one for the angle between both data points and another for the magnitude of the data points.

Returns:

Preprocessed points.

Return type:

p_points

qmeans.qmeans_plusplus(X: ndarray, n_clusters: int, backend: Backend, map_type: str, verbose: bool, initial_center: str, shots: int = 1024, norms: ndarray = array([1, 1]), batch: bool = True, x_squared_norms: ndarray = None, n_local_trials: int = None, random_state: int = None, noise_model: NoiseModel = None)

Init n_clusters seeds according to q-means++.

Selects initial cluster centers for q-mean clustering in a smart way to speed up convergence.

Parameters:
  • X – The data to pick seeds from.

  • n_clusters – The number of centroids to initialize.

  • backend – IBM quantum device to run the quantum k-means algorithm on.

  • map_type – {‘angle’, ‘probability’} Specifies the type of data encoding. ‘angle’: Uses U3 gates with its theta angle being the phase angle of the complex data point. ‘probability’: Relies on data normalization to preprocess the data to acquire a norm of 1.

  • verbose – Defines if verbosity is active for deeper insight into the class processes.

  • initial_center – {‘random’, ‘far’} Speficies the strategy for setting the initial cluster

  • center. – ‘random’: Assigns a random initial center. ‘far’: Specifies the furthest point as the initial center.

  • x_squared_norms – Squared Euclidean norm of each data point.

  • n_local_trials – The number of seeding trials for each center (except the first), of which the one reducing inertia the most is greedily chosen. Set to None to make the number of trials depend logarithmically on the number of seeds (2+log(k)). random_state: Determines random number generation for centroid initialization. Pass an int for reproducible output across multiple function calls.

  • noise_model – Noise model to use when runnings circuits on a simulator.

Returns:

The initial centers for q-means. indices: The index location of the chosen centers in the data array X. For a given index

and center, X[index] = center.

Return type:

centers