猿问

从 XML 数据创建 pandas 数据框

我正在处理一个 XML 数据文件,其中包含足球比赛期间球员的跟踪数据。查看 XML 数据文件顶部的片段:


<?xml version="1.0" encoding="utf-8"?>

<Tracking update="2017-01-23T14:41:26">

  <Match id="2019285" dateMatch="2016-09-13T18:45:00" matchNumber="13">

    <Competition id="20159" name="UEFA Champions League 2016/2017" />

    <Stadium id="85265" name="Estádio do SL Benfica" pitchLength="10500" pitchWidth="6800" />

    <Phases>

      <Phase start="2016-09-13T18:45:35.245" end="2016-09-13T19:31:49.09" leftTeamID="50157" />

      <Phase start="2016-09-13T19:47:39.336" end="2016-09-13T20:37:10.591" leftTeamID="50147" />

    </Phases>

    <Frames>

      <Frame utc="2016-09-13T18:45:35.272" isBallInPlay="0">

        <Objs>

          <Obj type="7" id="0" x="-46" y="-2562" z="0" sampling="0" />

          <Obj type="0" id="105823" x="939" y="113" sampling="0" />

          <Obj type="0" id="250086090" x="1194" y="1425" sampling="0" />

          <Obj type="0" id="250080473" x="37" y="2875" sampling="0" />

          <Obj type="0" id="250054760" x="329" y="833" sampling="0" />

          <Obj type="1" id="98593" x="-978" y="654" sampling="0" />

          <Obj type="0" id="250075765" x="1724" y="392" sampling="0" />

          <Obj type="1" id="53733" x="-4702" y="45" sampling="0" />

          <Obj type="0" id="250101112" x="54" y="1436" sampling="0" />

          <Obj type="1" id="250017920" x="-46" y="-2562" sampling="0" />

          <Obj type="1" id="105588" x="-1449" y="209" sampling="0" />

          <Obj type="1" id="250003757" x="-2395" y="-308" sampling="0" />

          <Obj type="1" id="101473" x="-690" y="-644" sampling="0" />

          <Obj type="0" id="250075775" x="2069" y="-895" sampling="0" />

          <Obj type="1" id="103695" x="-1654" y="-2022" sampling="0" />

        </Objs>

      </Frame>

    </Frames>

  </Match>

</Tracking>


牛魔王的故事
浏览 146回答 1
1回答

呼唤远方

我使用xml etree模块来遍历 xml 并提取相关数据。注释在下面的代码中以解释该过程:看看它,然后玩代码。希望它适合您的用例import xml.etree.ElementTree as ETfrom collections import defaultdictd = defaultdict(list)#since u r reading from a file,# root should be root = ET.parse('filename.xml').getroot()#mine is wrapped in a string hence :&nbsp;root = ET.fromstring(data)#required data is in the Frame sectionfor ent in root.findall('./Match//Frame'):&nbsp; &nbsp; #this gets us the timestamp&nbsp; &nbsp; Frame = ent.attrib['utc']&nbsp; &nbsp; for entry in ent.findall('Objs/Obj'):&nbsp; &nbsp; &nbsp; &nbsp; #append the objects to the relevant timestamp&nbsp; &nbsp; &nbsp; &nbsp; d[Frame].append(entry.attrib)df = (pd.concat((pd.DataFrame(value) #create dataframe of the values&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;.assign(Frame=key) #assign keys to the dataframe&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;.filter(['id','Frame','x','y','z']) #keep only required columns&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;for key, value in d.items()),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; axis=1) #concatenate on the columns axis&nbsp; &nbsp; &nbsp;)df.head()id&nbsp; Frame&nbsp; &nbsp;x&nbsp; &nbsp;y&nbsp; &nbsp;z&nbsp; &nbsp;id&nbsp; Frame&nbsp; &nbsp;x&nbsp; &nbsp;y&nbsp; &nbsp;z0&nbsp; &nbsp;0&nbsp; &nbsp;2016-09-13T18:45:35.272 -46 -2562&nbsp; &nbsp;0&nbsp; &nbsp;0&nbsp; &nbsp;2016-09-13T18:45:35.319 -46 -2558&nbsp; &nbsp;01&nbsp; &nbsp;105823&nbsp; 2016-09-13T18:45:35.272 939 113 NaN 105823&nbsp; 2016-09-13T18:45:35.319 938 113 NaN2&nbsp; &nbsp;250086090&nbsp; &nbsp;2016-09-13T18:45:35.272 1194&nbsp; &nbsp; 1425&nbsp; &nbsp; NaN 250086090&nbsp; &nbsp;2016-09-13T18:45:35.319 1198&nbsp; &nbsp; 1426&nbsp; &nbsp; NaN3&nbsp; &nbsp;250080473&nbsp; &nbsp;2016-09-13T18:45:35.272 37&nbsp; 2875&nbsp; &nbsp; NaN 250080473&nbsp; &nbsp;2016-09-13T18:45:35.319 36&nbsp; 2874&nbsp; &nbsp; NaN4&nbsp; &nbsp;250054760&nbsp; &nbsp;2016-09-13T18:45:35.272 329 833 NaN 250054760&nbsp; &nbsp;2016-09-13T18:45:35.319 330 833 NaN
随时随地看视频慕课网APP

相关分类

Python
我要回答