python遇见数据采集_技术笔记

慕九州1469150 2021-11-17

2

0赞 · 0采集
慕九州1469150 2021-11-17

wiki

0赞 · 0采集
慕容7012403 2019-03-10

from url.requst import urlopen
from bs4 import BeautifulSoup
import re
#获取网页源代码
resp= urlopen("https://en.wikipedia.org/wiki/Main_Page").read().decode("utf-8")
#解析？网页
soup = BeautifulSoup(resp."html.parse")
#获取所有以/wiki/开头的链接
urls = soup.findAll('a',href=re.compeil("^/wiki/"))
#获取链接内容
for uls in urls:
if not re.search("/.(jpg|JPG)$"，url["href"])
print(url.get_text(),url["href"])

0赞 · 0采集
慕前端6197812 2018-10-25

输出某个页面的全部href
专业人士如何改代码，代码不是抄出来的，是改出来的

0赞 · 1采集
盛世荒唐丶 2018-03-26

导入模块
1.读取网页信息
2.对读取到的信息进行排版
3.对排版过的数据进行二次获取操作。
4.打印结果

截图
1赞 · 0采集
慕粉3824845 2018-03-15

获取维基百科词条

截图
0赞 · 0采集
qq_小飞蛇_0 2017-12-30

python 获取维基百科词条信息

0赞 · 0采集
清凉sama 2017-04-23

过滤一些不需要的内容 if not re.search("里面写你不需要的内容",string)

1赞 · 2采集
秦__川 2017-04-09

python 1，怎么模拟登录网页 2，怎么下载网页内容 3，怎么在下载好的内容中，找到自己想要的 4，然后怎么储存 5，还有一些细节，比如，下载的都是一个文档，而不是图片。比如找到图片所在位置，然后提取地址，下载。

截图
1赞 · 1采集
慕工程3881054 2017-03-27

课程代码

截图
0赞 · 0采集
_jinyi 2016-10-19

代码1111

截图
0赞 · 0采集
sdgcfu 2016-10-07

python开发爬虫过程中引入三个模块

截图
0赞 · 0采集
慕粉3878587 2016-09-01

from urllib.request import urlopen from bs4 import BeautifulSoup as bs import re resp = urlopen("https://en.wikipedia.org/wiki/Main_Page").read().decode("utf-8") soup = bs(resp,"html.parser") listUrls = soup.findAll("a", herf=re.compile("^/wiki/")) for url in listUrls: print(url["herf"])

1赞 · 3采集
顾小北 2016-08-27

获取ｗｉｋｉ首页的ａ标签，以/wiki/开头的，并且不是图片的一些标签，（正则表达式）

截图
0赞 · 0采集

数据加载中...