src下没有http-慕课网

首页课程实战体系课手记专栏慕课教程

src下没有http

来源：4-2 python正则表达式练习

慕勒4252497

2017-12-11 17:09

为什么我的src标签下没有http？？

写回答关注

3回答

华灯初上丶

2018-01-27 22:18:42

import re

import urllib

req = urllib.request.urlopen('http://www.imooc.com/course/list')

#此处加上decode()，不然拿下来的数据都是乱码

buf = req.read().decode("utf-8")

#老师讲课的url地址已经发生改变，改一下正则匹配就好

# listurl = re.findall(r'src=.+\.jpg', buf)

listurl = re.findall(r'//img.+?\.jpg', buf)

# 改成非贪婪模式就行了

#前面没有了http：，那么这里手动加上

for index,app_id in enumerate(listurl):

listurl[index] = str(app_id).replace('//', 'http://')

print(index, listurl[index])

print(listurl)

i = 0

for url in listurl:

#写入模式修改为“wb+”，不然不支持将bytes写入，亲测

f = open(str(i)+".jpg", "wb+")

req = urllib.request.urlopen(url)

buf = req.read()

f.write(buf)

i+=1

f.close()

1 1

慕斯卡凌

非常感谢！

2018-10-21 16:39:28

共 1 条回复 >
地大新手

2017-12-20 12:45:42

我也是。。所以用正则表达式匹配不到需要的url地址~好尴尬

1 0

qq_慕工程...

2021年。我也是，你的问题解决没有，教教我

2021-03-12 19:59:14

共 1 条回复 >
慕粉4012628

2017-12-12 04:37:04

有时候url是相对地址，就是说前面还要加上服务器的地址。

0 0

python正则表达式

如何使用正则处理文本，带你对python正则有个全面了解

80569 学习 · 176 问题

相似问题

print(re.findall(r'http:.+\.jpg', 'http://123.jpg,http://234.jpg')) # 为啥打印出的是['http:123.jpg,http:234.jpg']，不是['http:123.jpg', 'http:234.jpg']

回答 2

有没有素材？

回答 1

写入文件的时候，没有下载的过程图片是怎么下载下来的啊？

回答 3

re.findall(r'src=.+\.jpg', buf) 中的 \ . 不是不发生转义嘛，干嘛还这样写？

回答 1

请问为什么会出现urllib2.HTTPError: HTTP Error 400: Bad Request错误？

回答 2

打开慕课网App查看更多内容