最近看到微信公众号推荐了些文章,有关于用Python爬取自己的微信好友,然后做了一些分析。其实之前我也有过这样的想法,一直没去实现。刚好今天元旦,回公司写了这么一个小项目。
其实获取微信好友很简单,有现成的模块直接使用,这是itchat
的官网https://itchat.readthedocs.io/zh/latest/ 。首先通过pip3
进行安装
pip3 install itchat
然后导入itchat
模块,通过get_friends()
方法获取所有微信好友,
import itchat
# auto_login()无参数,会生成一个二维码,扫描登录;设置为True时,手机端确认登录即可
itchat.auto_login(True)
friends = itchat.get_friends()
为了后面方便数据分析,我将微信好友信息入库处理,首先创建数据库,
create table t_friends
(
id int auto_increment primary key,
user_name varchar(255) null,
nick_name varchar(20) null,
remark_name varchar(20) null,
sex int null,
head_img_url varchar(255) null,
province varchar(20) null,
city varchar(20) null,
signature varchar(255) null
);
将获取的微信好友插入数据库,
import pymysql
connect = pymysql.connect(host='localhost',
user='root',
password='root1234',
db='itchat_db',
charset='utf8mb4')
cursor = connect.cursor()
for friend in friends:
sql = "INSERT INTO t_friends (`user_name`, `nick_name`, `remark_name`, `sex`, `head_img_url`, `province`, `city`, `Signature`) VALUES (%s, %s, %s, %s, %s, %s, %s, %s) "
cursor.execute(sql, (friend['UserName'], friend['NickName'], friend['RemarkName'], friend['Sex'], friend['HeadImgUrl'], friend['Province'], friend['City'], friend['Signature']))
connect.commit()
connect.close()
有了数据之后,就可以进行分析了。我使用的是基于图像处理库的pylab
接口模块matplotlib
,还是通过pip3
进行安装,
pip3 install matplotlib
先分析一下好友的男女比例,
import pymysql
import matplotlib.pyplot as plt
connect = pymysql.connect(host='localhost',
user='root',
password='root1234',
db='itchat_db',
charset='utf8mb4')
cursor = connect.cursor()
sql = "select case when sex = 1 then '男' when sex = 2 then '女' else '其它' end as '性别', count(sex) from t_friends group by sex;"
cursor.execute(sql)
results = cursor.fetchall()
fig, ax = plt.subplots(figsize=(15, 8), subplot_kw=dict(aspect="equal"))
data = [val[1] for val in results]
sex = [key[0] for key in results]
def func(pct, allvals):
absolute = int(pct/100.*np.sum(allvals))
return "{:.1f}%\n({:d} 人)".format(pct, absolute)
wedges, texts, autotexts = ax.pie(data, autopct=lambda pct: func(pct, data), textprops=dict(color="w"))
ax.legend(wedges, sex, title="男女比例", loc="cneter left", bbox_to_anchor=(1, 0, 0.5, 1))
plt.setp(autotexts, size=8, weight="bold")
ax.set_title("微信好友男女比例分布")
plt.show()
效果展示,
然后分析一下微信好友都是分布在哪些省份和城市,
import pymysql
import matplotlib.pyplot as plt
connect = pymysql.connect(host='localhost',
user='root',
password='root1234',
db='itchat_db',
charset='utf8mb4')
cursor = connect.cursor()
# 各省份人数查询SQL
sql = "select province, count(1) counts from t_friends where province != '' group by province order by counts desc limit 20;"
cursor.execute(sql)
results = cursor.fetchall()
cities = [city[0] for city in results]
counts = [count[1] for count in results]
fig, axs = plt.subplots(1, 1, figsize=(15, 8), sharey=True)
axs.bar(cities, counts)
for x, y in zip(cities, counts):
plt.text(x, y+0.05, '%.0f' % y, ha='center', va='bottom', fontsize=11)
axs.set_title('微信好友所在省份前20分布')
plt.show()
效果展示,
import pymysql
import matplotlib.pyplot as plt
connect = pymysql.connect(host='localhost',
user='root',
password='root1234',
db='itchat_db',
charset='utf8mb4')
cursor = connect.cursor()
# 各城市人数查询SQL
sql1 = "select city, count(1) counts from t_friends where city != '' group by province, city order by counts desc limit 25;"
cursor.execute(sql1)
results1 = cursor.fetchall()
cities1 = [city[0] for city in results1]
counts1 = [count[1] for count in results1]
fig, axs = plt.subplots(1, 1, figsize=(15, 8), sharey=True)
axs.bar(cities, counts)
for x, y in zip(cities1, counts1):
plt.text(x, y+0.05, '%.0f' % y, ha='center', va='bottom', fontsize=11)
axs.set_title('微信好友所在城市前25分布')
plt.show()
效果展示,
通过上面的饼图和柱状图来看,我的微信好友还是以男性居多,还有部分是未知性别的,啊哈哈哈(邪恶😈)。因为我是安徽人,所以安徽人居多是肯定的啦,大部分都是我从小学到大学的同学,朋友及家人等等。然后河南人占了第二的位置,也是能理解的,毕竟从毕业后,由于工作原因在郑州待了一年,唉,还是有点想念郑州的伙伴啊。剩下的比如江苏、浙江、上海是不少人向往、打拼的城市吧。其他的话有在脸书、推特上认识的一些朋友,就不细说了。
人生很短,为了梦想加油吧!
itchat
是一个开源的微信个人号接口项目,它支持python2
以及python3
,很方便的扩展个人的微信号、方便自己的生活。如果你很感兴趣,那就去官网探索吧。