案例:
1某随机序列中,找到出现次数最高的3个元素,他们的出现次数是多少?
2.某英文文章的单词,进行词频统计,找到出现次数最高的10个单词,他们的出现次数是多少?
step1:列表解析创建随机序列
step2:统计结果应是字典,创建value全为0的字典
step3:根据字典中的值,对字典中的项进行统计
In [1]: from random import randint
In [2]: data = [randint(0,20) for _ in xrange(30)]
In [3]: data
Out[3]:
[0,
0,
17,
5,
5,
10,
3,
17,
20,
13,
14,
17,
16,
17,
13,
8,
6,
14,
1,
18,
2,
5,
6,
10,
20,
12,
7,
7,
5,
10]
In [4]: c = dict.fromkeys(data,0)
In [5]: c
Out[5]:
{0: 0,
1: 0,
2: 0,
3: 0,
5: 0,
6: 0,
7: 0,
8: 0,
10: 0,
12: 0,
13: 0,
14: 0,
16: 0,
17: 0,
18: 0,
20: 0}
In [6]: for x in data:
...: c[x] += 1
...:
In [7]: c
Out[7]:
{0: 2,
1: 1,
2: 1,
3: 1,
5: 4,
6: 2,
7: 2,
8: 1,
10: 3,
12: 1,
13: 2,
14: 2,
16: 1,
17: 4,
18: 1,
20: 2}
解决方案:
使用collections.Counter对象,将序列传入Counter的构造器,得到Counter对象是元素频度的字典。
英文文章词频统计
利用正则表达使:用非字母形式对文章进行分割
re.split('\W+',txt)
Counter.most_common(n)方法得到的频度最高的n个元素的列表
In [8]: from collections import Counter
In [9]: c2 = Counter(data)
In [10]: c
Out[10]:
{0: 2,
1: 1,
2: 1,
3: 1,
5: 4,
6: 2,
7: 2,
8: 1,
10: 3,
12: 1,
13: 2,
14: 2,
16: 1,
17: 4,
18: 1,
20: 2}
In [11]: c2
Out[11]: Counter({5: 4, 17: 4, 10: 3, 0: 2, 6: 2, 7: 2, 13: 2, 14: 2, 20: 2, 1: 1, 2: 1, 3: 1, 8: 1, 12: 1, 16: 1, 18: 1})
In [12]: c2.most_common(3)
Out[12]: [(5, 4), (17, 4), (10, 3)]
In [13]: c2.most_common(10)
Out[13]:
[(5, 4),
(17, 4),
(10, 3),
(0, 2),
(6, 2),
(7, 2),
(13, 2),
(14, 2),
(20, 2),
(1, 1)]
In [14]: import re
In [15]: txt = open('test.txt').read()
In [16]: c3 = Counter(re.split('\W+',txt))
In [17]: c3
In [18]: c3.most_common(10)
Out[18]:
[('00', 1023),
('0', 764),
('p', 563),
('fd', 513),
('so', 434),
('00000000', 418),
('usr', 387),
('lib64', 382),
('r', 297),
('1', 284)]