猿问

PySpark:属性错误:“管道模型”对象没有属性“群集中心”

我用Pypsark创建了一个kmeans算法。现在,我还想提取集群中心。如何将其包含在管道中?这是我到目前为止拥有的代码,但它给我带来了一个错误“AttributeError:'PipelineModel'对象没有属性'clusterCenters'。如何修复?


#### model K-Means ###


from pyspark.ml.clustering import KMeans, KMeansModel


kmeans = KMeans() \

          .setK(3) \

          .setFeaturesCol("scaledFeatures")\

          .setPredictionCol("cluster")


# Chain indexer and tree in a Pipeline

pipeline = Pipeline(stages=[kmeans])


model = pipeline.fit(matrix_normalized)


cluster = model.transform(matrix_normalized)


#get cluster centers

centers = model.clusterCenters()


湖上湖
浏览 78回答 1
1回答

aluckdog

虚拟数据from pyspark.ml.linalg import Vectorsfrom pyspark.ml.clustering import KMeans, KMeansModelfrom pyspark.ml.pipeline import Pipelinedata = [(Vectors.dense([0.0, 0.0]),), (Vectors.dense([1.0, 1.0]),),        (Vectors.dense([9.0, 8.0]),), (Vectors.dense([8.0, 9.0]),)]matrix_normalized = spark.createDataFrame(data, ["scaledFeatures"])您的代码kmeans = KMeans() \          .setK(3) \          .setFeaturesCol("scaledFeatures")\          .setPredictionCol("cluster")# Chain indexer and tree in a Pipelinepipeline = Pipeline(stages=[kmeans])model = pipeline.fit(matrix_normalized)cluster = model.transform(matrix_normalized)只需更改最后一行model.stages[0].clusterCenters()[array([0.5, 0.5]), array([8., 9.]), array([9., 8.])]
随时随地看视频慕课网APP

相关分类

Python
我要回答