比较两个网络边缘列表

3回答

波斯汪

如果我正确理解您的要求，那么您需要做的是：$ awk '    { edge=($1>$2 ? $1 FS $2 : $2 FS $1) }    NR==FNR{ file1[edge]; next }    !(edge in file1)' child.txt master.txtD    F如果您想在子级中找到不在母版中的边缘，则只需翻转输入文件的顺序即可：$ awk '    { edge=($1>$2 ? $1 FS $2 : $2 FS $1) }    NR==FNR{ file1[edge]; next }    !(edge in file1)' master.txt child.txtE    F上面的代码非常快，因为它只是在进行哈希查找。

慕神8447489

您可能想使用pythondict进行快速查找：child = {}with open('child.txt', 'r') as c:    for line in c:        p1, p2 = line.strip().split()        child[p1] = p2        child[p2] = p1with open('master.txt', 'r') as m:    for line in m:        p1, p2 = line.strip().split()        if child.get(p1) == p2:            continue        print(line)关于您的代码，您将重新分配给loc_names该对['E', 'F']，因此外循环的下一次迭代意味着loc_names将设置内循环j为'E'：file1 = open("master.txt", "r")file2 = open("child.txt", "r")probe_id = file1.readlines()loc_names = file2.readlines()`#flag=0for i in probe_id:    i=i.rstrip()    probe_info=i.split("\t")    probe_info[0]=probe_info[0].strip()    probe_info[1]=probe_info[1].strip()    flag=0    for j in loc_names: # j will be 'E' after second iteration of outer loop        j=j.strip()        loc_names=j.split("\t")         loc_names[0]=loc_names[0].strip()        loc_names[1]=loc_names[1].strip()  # loc_names is ['E', 'F']        if (probe_info[0]==loc_names[0] and probe_info[1]==loc_names[1]) or (probe_info[0]==loc_names[1] and probe_info[1]==loc_names[0]):            flag=1        if flag==0:            print i

守着一只汪

您可以将每行中的项目拆分为frozensets，然后将其放入set每个文件的，以便可以有效set.difference地获取无效的内容child.txt：print(' '.join({frozenset(l.split()) for l in open("master.txt")} - {frozenset(l.split()) for l in open("child.txt")}))