继续浏览精彩内容
慕课网APP
程序员的梦工厂
打开
继续
感谢您的支持,我会继续努力的
赞赏金额会直接到老师账户
将二维码发送给自己后长按识别
微信支付
支付宝支付

Cluster Analysis with Iris Dataset

乌然娅措
关注TA
已关注
手记 64
粉丝 21
获赞 12

Data Science Day 19:

In Supervised Learning, we specify the possible categorical values and train the models for pattern recognition.  However, *what if we don’t have the existing classified data model to learn from? *

[caption id=“attachment_1074” align=“alignnone” width=“750”]image

Radfotosonn / Pixabay[/caption]

The case we model the data in order to discover the way it clusters, based on certain attributes is Unsupervised Learning.

Clustering Analysis in one of the Unsupervised Techniques, it rather than learning by example, learn by observation.

There are 3 types of clustering methods in general, Partitioning, Hierarchical, and Density-based clustering.

1.Partitioning: n objects is grouped into k ≤ n disjoint clusters.
   Partitioning methods are based on a distance measure, it applies iterative relocation until some distance-based error metric is minimized.

2.Hierarchical: either combining(agglomerative) or splitting(divisive) cluster based on some measure (distance, density or continuity), in a stepwise fashion.

Agglomerative starts with each point in its own cluster and combine them in steps, and divisive starts with the data in one cluster and divide it up

3. The density-based method is based on its density; it measures the cluster “goodness”.

Example with Iris Dataset

  1. Partitioning: K-Means=3
    image
#Iris dataset
iris=datasets.load_iris()
x=iris.data
y=iris.target

#Plotting
fig = plt.figure(1, figsize=(7,7))
ax = Axes3D(fig, rect=[0, 0, 0.95, 1], elev=48, azim=134)
ax.scatter(x[:, 3], x[:, 0], x[:, 2],
          c=labels.astype(np.float), edgecolor="k", s=50)
ax.set_xlabel("Petal width")
ax.set_ylabel("Sepal length")
ax.set_zlabel("Petal length")
plt.title("Iris Clustering K Means=3", fontsize=14)
plt.show()

2.   **Hierarchical **

image

#Hierachy Clustering 
hier=linkage(x,"ward")
max_d=7.08
plt.figure(figsize=(25,10))
plt.title('Iris Hierarchical Clustering Dendrogram')
plt.xlabel('Species')
plt.ylabel('distance')
dendrogram(
    hier,
    truncate_mode='lastp',  
    p=50,                  
    leaf_rotation=90.,      
    leaf_font_size=8.,     
)
plt.axhline(y=max_d, c='k')
plt.show()

3. Density-based method DBSCAN

image

dbscan=DBSCAN()
dbscan.fit(x)
pca=PCA(n_components=2).fit(x)
pca_2d=pca.transform(x)

for i in range(0, pca_2d.shape[0]):
    if dbscan.labels_[i] == 0:
        c1 = plt.scatter(pca_2d[i, 0], pca_2d[i, 1], c='r', marker='+')
    elif dbscan.labels_[i] == 1:
        c2 = plt.scatter(pca_2d[i, 0], pca_2d[i, 1], c='g', marker='o')
    elif dbscan.labels_[i] == -1:
        c3 = plt.scatter(pca_2d[i, 0], pca_2d[i, 1], c='b', marker='*')

plt.legend([c1, c2, c3], ['Cluster 1', 'Cluster 2', 'Noise'])
plt.title('DBSCAN finds 2 clusters and Noise')
plt.show()

Thanks very much to Dr.Rumbaugh’s clustering analysis notes!

Happy studying! 😊

打开App,阅读手记
0人推荐
发表评论
随时随地看视频慕课网APP