从我的 XML 文件中提取信息并为其分配一个向量

3回答

德玛西亚99

您的问题中并非所有内容都清楚...以下是数据提取部分import xml.etree.ElementTree as ETxml = '''<?xml version='1.0' encoding='UTF-8'?><arggraph id="micro_b002" topic_id="higher_dog_poo_fines" stance="pro">  <edu id="e1"><![CDATA[One can hardly move in Friedrichshain or Neukölln these days without permanently scanning the ground for dog dirt.]]></edu>  <edu id="e2"><![CDATA[And when bad luck does strike and you step into one of the many 'land mines' you have to painstakingly scrape the remains off your soles.]]></edu>  <edu id="e3"><![CDATA[Higher fines are therefore the right measure against negligent, lazy or simply thoughtless dog owners.]]></edu>  <edu id="e4"><![CDATA[Of course, first they'd actually need to be caught in the act by public order officers,]]></edu>  <edu id="e5"><![CDATA[but once they have to dig into their pockets, their laziness will sure vanish!]]></edu>  <adu id="a1" type="pro"/>  <adu id="a2" type="pro"/>  <adu id="a3" type="pro"/>  <adu id="a4" type="opp"/>  <adu id="a5" type="pro"/>  <edge id="c6" src="e1" trg="a1" type="seg"/>  <edge id="c7" src="e2" trg="a2" type="seg"/>  <edge id="c8" src="e3" trg="a3" type="seg"/>  <edge id="c9" src="e4" trg="a4" type="seg"/>  <edge id="c10" src="e5" trg="a5" type="seg"/>  <edge id="c1" src="a1" trg="a3" type="sup"/>  <edge id="c2" src="a2" trg="a3" type="sup"/>  <edge id="c4" src="a4" trg="a3" type="reb"/>  <edge id="c5" src="a5" trg="c4" type="und"/></arggraph>'''root = ET.fromstring(xml)interesting_edges_src = [e.attrib['src'] for e in root.findall('.//edge') if e.attrib['type'] != 'seg' ]print(interesting_edges_src)输出['a1', 'a2', 'a4', 'a5']

0 0

手掌心

这里可以被认为是某种接近最终答案的答案myList = [] myEdgesList=[]#read the whole text from for root, dirs, files in os.walk(path): for file in files: if file.endswith('.xml'): with open(os.path.join(root, file), encoding="UTF-8") as content: tree = ET.parse(content) myList.append(tree) for k in myList: Edge= [e.attrib['src'] for e in k.findall('.//edge') if e.attrib['type'] != 'seg' ] myEdgesList.append(Edge)这提供['a1', 'a2', 'a4', 'a5'] 对于上面的示例以及所有其他示例的列表[['a1', 'a2', 'a3', 'a4'], ['a1', 'a2', 'a4', 'a5'], ['a1', 'a2', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a1', 'a2', 'a4', 'a5'], ['a1', 'a2', 'a4', 'a5'], ['a1', 'a2', 'a4', 'a5'], ['a1', 'a2', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a1', 'a2', 'a4', 'a5'], ['a1', 'a2', 'a4', 'a5'], ['a1', 'a2', 'a4', 'a5'], ['a1', 'a2', 'a4', 'a5'], ['a1', 'a2', 'a4', 'a5'], ['a1', 'a2', 'a4', 'a5'], ['a1', 'a2', 'a4', 'a5'], ['a1', 'a2', 'a3', 'a4'], ['a1', 'a2', 'a3', 'a4'], ['a1', 'a2', 'a3', 'a4'], ['a1', 'a2', 'a3', 'a4'], ['a1', 'a2', 'a3', 'a4'], ['a1', 'a2', 'a3', 'a4'], ['a1', 'a2', 'a3', 'a4'], ['a1', 'a2', 'a3', 'a4'], ['a2', 'a3', 'a4'], ['a2', 'a3', 'a4'], ['a2', 'a3', 'a4'], ['a2', 'a3', 'a4'], ['a2', 'a3', 'a4'], ['a2', 'a3', 'a4'], ['a2', 'a3', 'a4'], ['a2', 'a3', 'a4'], ['a2', 'a3', 'a4'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3', 'a4', 'a5'], ['a2', 'a3'], ['a2', 'a3'], ['a2', 'a3'], ['a2', 'a3'], ['a2', 'a3'], ['a2', 'a3'], ['a2', 'a3'], ['a2', 'a3'], ['a2', 'a3'], ['a2', 'a3'], ['a2', 'a3'], ['a1', 'a2', 'a3'], ['a1', 'a2', 'a3'], ['a1', 'a2', 'a3'], ['a1', 'a2', 'a3'], ['a1', 'a2', 'a3'], ['a1', 'a2', 'a3'], ['a1', 'a2', 'a3'], ['a1', 'a2', 'a3'], ['a1', 'a2', 'a3'], ['a1', 'a2', 'a3'], ['a1', 'a2', 'a3'], ['a1', 'a2', 'a3'], ['a2', 'a3', 'a4', 'a5'],...只剩下将此列表转换为(0,0,0,0,1) <----- ['a1', 'a2', 'a3', 'a4']#as a5 is missing (0,0,1,0,0) <------ ['a1', 'a2', 'a4', 'a5']#as a3 is misisng ...(0,0,1) <------- ['a2', 'a3']#as a1 is missing 等等如果您有任何想法请告诉我，我也在努力

0 0

牧羊人nacy

对于下一个问题myEdgtlistmap=[]for lst in myEdgesList:    tp=[]    for el in lst:        if el=="a1":            tp.append(1)        if el=="a2":            tp.append(2)        if el=="a3":            tp.append(3)        if el=="a4":            tp.append(4)        if el=="a5":            tp.append(5)        if el=="a6":            tp.append(6)    myEdgtlistmap.append(tp)label=[]for le in myEdgtlistmap:    b=[1]*(len(le)+1)    for v in le:         b[v-1]=0    label.append(b)y=[l for lab in label for l in lab ]

0 0