当变量具有相同名称时从多级 XML 中提取数据子集

我有大量的 xml 数据,看起来像这样(只显示了一小部分数据):


<weatherdata xmlns:xsi="http://www.website.com" xsi:noNamespaceSchemaLocation="www.website.com" created="2020-07-06T14:53:48Z">

  <meta>

    <model name="xxxxxx" termin="2020-07-06T06:00:00Z" runended="2020-07-06T09:48:31Z" nextrun="2020-07-06T16:00:00Z" from="2020-07-06T15:00:00Z" to="2020-07-08T12:00:00Z"/>

    <model name="xxxxxx" termin="2020-07-06T00:00:00Z" runended="2020-07-06T09:48:31Z" nextrun="2020-07-06T18:00:00Z" from="2020-07-08T13:00:00Z" to="2020-07-09T18:00:00Z"/>

    <model name="xxxxxx" termin="2020-07-06T00:00:00Z" runended="2020-07-06T09:48:31Z" nextrun="2020-07-06T18:00:00Z" from="2020-07-09T21:00:00Z" to="2020-07-12T00:00:00Z"/>

    <model name="xxxxxx" termin="2020-07-06T00:00:00Z" runended="2020-07-06T09:48:31Z" nextrun="2020-07-06T18:00:00Z" from="2020-07-12T06:00:00Z" to="2020-07-16T00:00:00Z"/>

  </meta>

  <product class="pointData">

    <time datatype="forecast" from="2020-07-06T15:00:00Z" to="2020-07-06T15:00:00Z">

     <location altitude="10" latitude="123" longitude="123">

      <temperature id="TTT" unit="celsius" value="18.8"/>

      <windDirection id="dd" deg="296.5" name="NW"/>

      <windSpeed id="ff" mps="5.8" beaufort="4" name="Laber bris"/>

      <globalRadiation value="524.2" unit="W/m^2"/>

      <humidity value="59.0" unit="percent"/>

      <pressure id="pr" unit="hPa" value="1022.9"/>

      <cloudiness id="NN" percent="22.7"/>

      <lowClouds id="LOW" percent="22.7"/>

      <mediumClouds id="MEDIUM" percent="0.0"/>

      <highClouds id="HIGH" percent="0.0"/>

      <dewpointTemperature id="TD" unit="celsius" value="10.6"/>

     </location>

    </time>

    <time datatype="forecast" from="2020-07-06T14:00:00Z" to="2020-07-06T15:00:00Z">

     <location altitude="10" latitude="123" longitude="123">

      <precipitation unit="mm" value="0.0" minvalue="0.0" maxvalue="0.0" probability="2.0"/>

      <symbol id="LightCloud" number="2"/>

     </location>



扬帆大鱼
浏览 72回答 1
1回答

蛊毒传说

考虑分别构建temperature数据框和precipitation数据框,concat然后通过节点merge中的公共值将版本连接在一起。并考虑使用列表/字典理解将所有属性值绑定在一起。timelocationimport xml.etree.ElementTree as etimport pandas as pdtree = et.parse('Input.xml')&nbsp; &nbsp; &nbsp;# load in the dataroot = tree.getroot()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # get the element tree roottemp_list = []; precip_list = []for n, x in enumerate(root.iter('time')):&nbsp; &nbsp; # GET LIST OF DICTIONARIES OF ALL ATTRIBUTES&nbsp; &nbsp; x_list = [{i.tag+'_'+k:v for k,v in i.attrib.items()} for i in x.iter('*')]&nbsp;&nbsp; &nbsp; # COMBINE INTO SINGLE DICTIONARY&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; x_dict = {k:v for d in x_list for k,v in d.items()}&nbsp; &nbsp; # BUILD DATA FRAME&nbsp; &nbsp; df = pd.DataFrame(x_dict, index=[0])&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; # SEPARATELY SAVE TO LIST OF DATA FRAMES&nbsp; &nbsp; if 'temperature_unit' in df.columns: temp_list.append(df)&nbsp; &nbsp; if 'precipitation_unit' in df.columns: precip_list.append(df)&nbsp; &nbsp;&nbsp;# MERGE CONCATENATED SETS BY COMMON VARSdf = pd.merge(pd.concat(temp_list),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; pd.concat(precip_list),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; on=['time_to', 'time_datatype',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'location_altitude', 'location_latitude',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'location_longitude'],&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; suffixes=['_t','_p'])
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python