考虑使用基于图的聚类算法,如谱聚类(spectral clustering)或基于最小生成树的聚类算法(Minimum Spanning Tree Clustering)。这些算法可以通过选择一个相似性度量方法和一个距离度量方法来将配对对分组到不同的聚类中。
例如,以下是使用谱聚类实现上述问题的Python代码示例:
from sklearn.cluster import SpectralClustering
import numpy as np
# Generate some sample data
pairs = np.array([[1, 2], [2, 3], [1, 4], [5, 6]])
# Create a similarity matrix by computing pairwise cosine similarities
def compute_similarities(pairs):
similarities = np.zeros((np.max(pairs) + 1, np.max(pairs) + 1))
for pair in pairs:
similarities[pair[0]][pair[1]] = 1
similarities[pair[1]][pair[0]] = 1
return similarities
similarities = compute_similarities(pairs)
# Compute the spectral clustering
n_clusters = 2 # number of clusters to create
model = SpectralClustering(n_clusters=n_clusters, affinity='precomputed', assign_labels='kmeans')
labels = model.fit_predict(similarities)
# Print the output
for i in range(n_clusters):
indices = np.where(labels == i)[0]
print("Cluster ", i, ": ", indices)
此代码使用余弦相似度作为相似性度量方法,并使用预先计算的相似度矩阵作为输入。然后,使用谱聚类算法将数据分为两个集群,并打印出聚类的索引。