我正在尝试从该网站的第二行中抓取 url:https://www.cwb.gov.tw/V7/js/HDRadar_1000_n_val.js。
我使用 python 来爬行,但不确定是否应该使用 beautifulsoup 或正则表达式。
我使用python BS4来爬行。但我不知道如何仅使用 url 捕获第二行。在我的尝试中,它捕获了所有网站。
import requests
import re
from bs4 import BeautifulSoup
res = requests.get('https://www.cwb.gov.tw/V7/js/HDRadar_1000_n_val.js')
soup = BeautifulSoup(res.text,'html.parser')
print(soup)
预期的:
/V7/observe/radar/Data/HD_Radar/CV1_1000_201903271140.png
实际的:
var HDRadar_1000_n_val=new Array( new
Array/V7/observe/radar/Data/HD_Radar/CV1_1000_201903271140.png("2019/03/27 11:40","/V7/observe/radar/Data/HD_Radar/CV1_1000_201903271140.png"), new Array("2019/03/27 11:30","/V7/observe/radar/Data/HD_Radar/CV1_1000_201903271130.png"), new Array("2019/03/27 11:20","/V7/observe/radar/Data/HD_Radar/CV1_1000_201903271120.png"), new Array("2019/03/27 11:10","/V7/observe/radar/Data/HD_Radar/CV1_1000_201903271110.png"), new Array("2019/03/27 11:00","/V7/observe/radar/Data/HD_Radar/CV1_1000_201903271100.png"), new Array("2019/03/27 10:50","/V7/observe/radar/Data/HD_Radar/CV1_1000_201903271050.png"), new Array("2019/03/27 10:40","/V7/observe/radar/Data/HD_Radar/CV1_1000_201903271040.png"), new Array("2019/03/27 10:30","/V7/observe/radar/Data/HD_Radar/CV1_1000_201903271030.png"), new Array("2019/03/27 10:20","/V7/observe/radar/Data/HD_Radar/CV1_1000_201903271020.png"), new Array("2019/03/27 10:10","/V7/observe/radar/Data/HD_Radar/CV1_1000_201903271010.png"), new Array("2019/03/27 10:00","/V7/observe/radar/Data/HD_Radar/CV1_1000_201903271000.png"), new Array("2019/03/27
...
红颜莎娜
RISEBY
相关分类