继续浏览精彩内容
慕课网APP
程序员的梦工厂
打开
继续
感谢您的支持,我会继续努力的
赞赏金额会直接到老师账户
将二维码发送给自己后长按识别
微信支付
支付宝支付

python 数据结构使用技巧

孤独的小猪
关注TA
已关注
手记 6
粉丝 29
获赞 89
一、在列表、字典、集合中根据条件筛选数据

下面实验的数据都是采用random模块随机生成符合条件的数据,故每次实验结果会有不同

1. 过滤列表中的负数

# -*- coding:utf-8 -*-
from random import randint

data = [randint(-10,10) for _ in xrange(10)]

# 方法一采用filter函数
print filter(lambda x:x>=0, data)
# 方法二采用列表解析
print [x for x in data if x>=0]

输出:
[8, 5, 0, 3]
[8, 5, 0, 3]

2. 筛出字典中值高于90的项

# -*- coding:utf-8 -*-
from random import randint

d = {x: randint(60,100) for x in xrange(1,21)}

print { k:v for k,v in d.iteritems() if v >90}

输出:

{10: 97}

3. 筛出集合中能被3整除的元素

# -*- coding:utf-8 -*-

from random import randint

data = [randint(-10,10) for _ in xrange(10)]

s = set(data)

print {x for x in s if x % 3 ==0}

输出:
set([0, 9])
二、命名统计字典

1.如何为元组中的每个元素命名,提高程序可读性

方案一:

定义类似与其他语言类似的枚举类型,也就是定义一系列数值常量

# -*- coding:utf-8 -*-
NAME = 0
AGE = 1
SEX = 2
EMAIL = 3

student = ('jim', 16, 'male', 'jim@gmail.com')

if student[AGE] > 18:
  pass

if student[SEX] == 'male':
  pass

方案二:

使用标准库中collections.namedtuple 替代内置tuple

生成的s是个元组,namedtuple相当于一个类的工厂,s既可以用索引,也可以用属性查找


# -*- coding:utf-8 -*-
from collections import namedtuple

student = namedtuple('Student',['name', 'age', 'sex', 'male'])

s = student('jim', 16, 'male', 'jim@gmail.com')

print s
print s.name

输出:

Student(name='jim', age=16, sex='male', male='jim@gmail.com')
jim

### 2.如何统计序列中元素的出现频度
例如1: 某随机序列中,找到出现次数最高的3个元素,它们的出现次数是多少?

方法一:
-- coding:utf-8 --

from random import randint

data = [randint(0,20) for _ in xrange(30)]

c = dict.fromkeys(data, 0)
print c
for x in data:
c[x] = c[x] + 1
print c.items()
print sorted(c.items(), key=lambda d:d[1])[-3:]

输出:
[(0, 1), (1, 2), (2, 3), (3, 2), (4, 1), (5, 2), (6, 1), (7, 2), (8,
3), (9, 3), (11, 2), (12, 2), (15, 4), (19, 1), (20, 1)]
[(8, 3), (9, 3), (15, 4)]


方法二:使用collections.Counter对象

将序列传入Counter的构造器,得到Counter对象是元素频度的字典,Counter.most_common(n)方法得到频度最高的n个元素的列表
-- coding:utf-8 --

from random import randint
from collections import Counter

data = [randint(0,20) for _ in xrange(30)]
c2 = Counter(data)

print c2.most_common(3)

输出:
[(18, 4), (5, 3), (14, 3)]


例如2: 对某英文文章的单词,进行词频统计,找到出现次数最高的10个单词,它们出现次数是多少?
> 以文件内容不是英文字符进行切片
-- coding:utf-8 --

from collections import Counter
import re

txt = open('test.txt').read()
c3 = Counter(re.split('\W+', txt))
print c3.most_common(3)

输出:
[('openhpc', 26), ('resource', 17), ('queue', 16)]


### 3.根据字典中值的大小,对字典中的项排序

解决方案:

1.利用zip将字典转化为元组

2.传递sorted函数的key参数
-- coding:utf-8 --

from random import randint

d = {x:randint(60,100) for x in 'xyzabc' }
print sorted(zip(d.itervalues(),d.iterkeys()))

输出:
[(80, 'y'), (89, 'x'), (91, 'b'), (94, 'a'), (94, 'z'), (99, 'c')]

-- coding:utf-8 --

from random import randint

d = {x:randint(60,100) for x in 'xyzabc' }
print sorted(d.items(), key=lambda x: x[1])

输出:

[('x', 67), ('y', 71), ('c', 72), ('a', 75), ('z', 88), ('b', 89)]

## 三、公共键

### 1.如何快速找到多个字典中的公共键
-- coding:utf-8 --

from random import randint, sample

s1 = {x: randint(1,4) for x in sample('abcdefg', randint(3,6))}
s2 = {x: randint(1,4) for x in sample('abcdefg', randint(3,6))}
s3 = {x: randint(1,4) for x in sample('abcdefg', randint(3,6))}

如果数据集比较少可以采用下面方法

print s1.viewkeys() & s2.viewkeys() & s3.viewkeys()

step1:使用字典的viewkeys()方法,得到一个字典的keys集合;
step2: 使用map函数,得到所有字典的keys集合;
step3:使用reduce函数,取所有字典的keys集合的交集。
数据集多的话采用下面方法

print reduce(lambda a,b:a&b, map(dict.viewkeys, [s1,s2,s3]))

输出:

set(['c', 'd'])
set(['c', 'd'])

## 四、如何让字典保持有序
### 1.使用collections.OrderedDict

from time import time
from random import randint
from collections import OrderedDict

d = OrderedDict()
players = list('ABCDEFGH')
start = time()

for i in xrange(8):
raw_input()
p = players.pop(randint(0,7-i))
end = time()
print i+1,p, end - start
d[p] = (i+1, end - start)

print ''20
for k in d:
print k, d[k]

输出:
后面for循环遍历的字典是以元素进入字典的顺序进行排列的

1 C 0.934000015259

2 D 1.40899991989

3 F 1.67999982834

4 A 1.95599985123

5 E 2.16599988937

6 H 2.37599992752

7 B 2.60699987411

8 G 2.99799990654


C (1, 0.9340000152587891)
D (2, 1.4089999198913574)
F (3, 1.679999828338623)
A (4, 1.9559998512268066)
E (5, 2.1659998893737793)
H (6, 2.375999927520752)
B (7, 2.6069998741149902)
G (8, 2.997999906539917)

## 五、历史记录
### 1. 实现用户的历史记录功能(最多n条)

使用容量为n的队列历史存储记录

使用标准库collections中的deque,它是一个双端循环队列,程序退出前,可以使用pickle将队列对象存入文件,再次运行程序时将其导入。

from random import randint
from collections import deque
N = randint(0, 100)
history = deque([], 5)

def guess(k):
if k == N:
print 'right'
return True

if k < N:
print '%s is less than N' % k
else:
print '%s is greater than N' % k
return False

while True:
line = raw_input("please input a number: ")
if line.isdigit():
k = int(line)
history.append(k)
if guess(k):
break
elif line == 'history' or line =='h?':
print list(history)

In [1]: import pickle

In [2]: from collections import deque

In [3]: q = deque([],5)

In [4]: q.append(1)

In [5]: q.append(2)

In [6]: q.append(3)

In [7]: q.append(4)

In [8]: q.append(5)

In [9]: q.append(6)

In [10]: q
Out[10]: deque([2, 3, 4, 5, 6])

In [11]: pickle.dump(q,open('history','w'))

In [12]: pickle.load(open('history'))
Out[12]: deque([2, 3, 4, 5, 6])

ps:最后吐槽一下,我明明编写好的markdown格式,并且预览都是正常的,为什么发布的时候格式就全变了。@慕女神

打开App,阅读手记
0人推荐
发表评论
随时随地看视频慕课网APP