[1]林丽,薛芳.基于逻辑回归函数的加权K-means聚类算法[J].集美大学学报(自然科学版),2021,26(2):139-145.
LIN Li,XUE Fang.A Weighted K-means Clustering Algorithm Based on Logistic Regression Functions[J].Journal of Jimei University,2021,26(2):139-145.
点击复制
基于逻辑回归函数的加权K-means聚类算法(PDF)
《集美大学学报(自然科学版)》[ISSN:1007-7405/CN:35-1186/N]
- 卷:
-
第26卷
- 期数:
-
2021年第2期
- 页码:
-
139-145
- 栏目:
-
数理科学与信息工程
- 出版日期:
-
2021-03-28
文章信息/Info
- Title:
-
A Weighted K-means Clustering Algorithm Based on Logistic Regression Functions
- 作者:
-
林丽1; 薛芳2
-
(1.集美大学计算机学院,福建 厦门 361021;2.集美大学信息化中心,福建 厦门 361021)
- Author(s):
-
LIN Li1; XUE Fang2
-
(1.College of Computer Engineering,Jimei University,Xiamen 361021,China;2.Informatization Center,Jimei University,Xiamen 361021,China)
-
- 关键词:
-
欧式距离; 特征加权的K-means算法; 逻辑回归函数; 初始聚类中心
- Keywords:
-
Euclidean distance; feature-weighted K-means algorithm; logistic regression function; initial clustering center
- 分类号:
-
-
- DOI:
-
-
- 文献标志码:
-
-
- 摘要:
-
传统K-means聚类算法通过欧式距离计算样本的相似度,将数据所有的属性特征均平等对待,忽略每个属性特征的不同贡献,导致样本相似度计算的准确率不高。针对这个不足,提出一种特征加权的K-means算法进行优化。首先,运用Softmax和Sigmoid逻辑回归函数计算特征权重,使得加权的欧式距离更能准确地表示样本相似度;其次,优化初始聚类中心选择策略,选择距离较大的K个样本作为初始聚类中心,可有效避免样本的错误聚类及空簇问题。实验结果表明,在UCI标准数据集中采用加权K-means聚类算法可以有效减少迭代次数,提高聚类的准确率、精确率和召回率。
- Abstract:
-
Traditional K-means clustering algorithms calculate the similarity of samples according to their Euclidean distance.All attributes of the data are treated equally and the potentially different contribution of each attribute is ignored.This can lead to a lack of accuracy in sample similarity calculations.To rectify this deficiency,a feature-weighted K-means algorithm is proposed.First of all,Softmax and Sigmoid logistic regression functions are used to calculate feature weights.The Euclidean distance after feature-weighting is able to represent the similarities and differences between samples more accurately.After this,the K samples with the largest distances between them are selected as clustering centers to optimize the strategy for selecting initial clustering centers.This can effectively avoid incorrect or empty sample clustering.Experimental results for application of the weighted Kmeans clustering algorithm to UCI standard datasets show that it is able to reduce the number of iterations and has better clustering accuracy,precision and recall rates than traditional Kmeans clustering approaches
参考文献/References:
-
相似文献/References:
更新日期/Last Update:
2021-05-17