我开始从事一个有趣的项目,以更好地练习我的数据抓取技能,我从 NHL API 抓取数据并尝试记录射门和进球的所有位置坐标(此 API 将向您显示任何 NHL 比赛并具有坐标以及在所述游戏中发生的每个事件的玩家信息)。但是,我在通过数据编制索引时遇到问题,而且我真的不确定如何处理它。下面是我的代码...
import requests as rq
import csv
GAME_ID = "2017021121" #Game ID indicates which game I want to look at...first 4 digits is the year, second two the point in season, (01 Pre, 02 Reg, 03 Playoffs, 04 All Star)
#URL to access the coordinates of every event in given game...comes in nested dictionary form
url = f"https://statsapi.web.nhl.com/api/v1/game/{GAME_ID}/feed/live"
game = rq.get(url)
#turn the file into a readable one
contents = game.text
#split text into list so we can fool around with it
contents_list = list(csv.reader(contents.splitlines()))
def main():
file = open( f'coordinates.{GAME_ID}.txt', 'a')
我现在要做的是使用 for 循环遍历数据集并检查“事件类型”以及它们是否等于“射门”或“目标”,以及它们是否要添加它们的值x, y 坐标到打印到新文件中的字典。我已经尝试通过自己建立索引,但我不太擅长数据抓取,所以我没有走得太远。作为参考,这里是数据集的样子(或至少是它的一个片段)。
} ],
"result" : {
"event" : "Penalty",
"eventCode" : "COL162",
"eventTypeId" : "PENALTY",
"description" : "Blake Coleman Tripping against Erik Johnson",
"secondaryType" : "Tripping",
"penaltySeverity" : "Minor",
"penaltyMinutes" : 2
},
"about" : {
"eventIdx" : 30,
"eventId" : 162,
"period" : 1,
"periodType" : "REGULAR",
"ordinalNum" : "1st",
"periodTime" : "04:47",
"periodTimeRemaining" : "15:13",
"dateTime" : "2019-03-17T19:15:33Z",
"goals" : {
"away" : 0,
"home" : 0
}
},
"coordinates" : {
"x" : -58.0,
"y" : -37.0
},
对我来说,它看起来像是一堆嵌套的字典,但我又不太确定。
任何帮助将不胜感激!!谢谢你!!
慕仙森
慕的地8271018
相关分类