GET 请求的输出与查看源不同

我正在尝试从 whoscored.com 中提取匹配数据。当我在 firefox 上查看源代码时，我在第 816 行找到了一个很大的 json 字符串，其中包含我想要的 matchid 数据。我的目标是最终得到这个 json。

为此，我尝试下载https://www.whoscored.com/Matches/ID/Live 的每一页，其中 ID 是比赛的 ID。我写了一个小的 Go 程序来 GET 请求每个 ID 到某个点：

package main

import (

"fmt"

"io/ioutil"

"net/http"

"os"

)

// http://www.whoscored.com/Matches/614052/Live is the match for

// Eveton vs Manchester

const match_address = "http://www.whoscored.com/Matches/"

// the max id we get

const max_id = 300

const num_workers = 10

// function that get the bytes of the match id from the website

func match_fetch(matchid int) {

url := fmt.Sprintf("%s%d/Live", match_address, matchid)

resp, err := http.Get(url)

if err != nil {

fmt.Println(err)

return

}

// if we sucessfully got a response, store the

// body in memory

defer resp.Body.Close()

body, err := ioutil.ReadAll(resp.Body)

if err != nil {

fmt.Println(err)

return

}

// write the body to memory

pwd, _ := os.Getwd()

filepath := fmt.Sprintf("%s/match_data/%d", pwd, matchid)

err = ioutil.WriteFile(filepath, body, 0644)

if err != nil {

fmt.Println(err)

return

}

// data type to send to the workers,

// last means this job is the last one

// matchid is the match id to be fetched

// a matchid of -1 means don't fetch a match

type job struct {

last bool

matchid int

}

func create_worker(jobs chan job) {

for {

next_job := <-jobs

if next_job.matchid != -1 {

match_fetch(next_job.matchid)

}

if next_job.last {

return

}

该代码似乎可以工作，因为它用 html 填充了一个名为 match_data 的目录。问题是这个html和我在浏览器中得到的完全不同！这是我认为执行此操作的部分：（来自http://www.whoscored.com/Matches/614052/Live的 GET 请求正文。

我认为是这种情况的原因是页面中的 javascript 获取并将 DOM 编辑为我在查看源中看到的内容。如何让 golang 运行 javascript？有图书馆可以做到这一点吗？更好的是，我可以直接从服务器获取 JSON 吗？

HUX布斯

浏览 367回答 2

2回答

慕妹3242003

一般来说，最好使用 Web API 而非抓取。例如，whoscored 自己使用 OPTA，您应该可以直接访问它。http://www.jokecamp.com/blog/guide-to-football-and-soccer-data-and-apis/#opta

富国沪深

这可以通过https://godoc.org/github.com/sourcegraph/webloop#View.EvaluateJavaScript完成 阅读他们的主要示例https://github.com/sourcegraph/webloop一般来说，您需要的是“无头浏览器”。

随时随地看视频慕课网APP