如何在列表中查找重叠元组并返回重叠元组

我目前有一个包含元组的列表。


overlap_list = [(10001656, 10001717), (700, 60000), (10001657, 10001718), (10001657, 10001716), (10031548, 10031643), (10031556, 10031656)]

我想要以下输出:


new_list=[(10001656, 10001717),(10001657, 10001718),(10001657, 10001716),(10031548, 10031643), (10031556, 10031656)]

元组内的数字是开始和结束边界。我想找到数字之间重叠的任何元组。


我已经尝试过我找到的这段代码,它问了一个类似的问题:


import itertools as ittools


def pairwise(iterable):

    a, b = ittools.tee(iterable)

    next(b, None)

    return zip(a, b)


overlap_list = [(10001656, 10001717), (700, 60000), (10001657, 10001718), (10001657, 10001716), (10031548, 10031643), (10031556, 10031656)]

print([list(p) for k, p in it.groupby(pairwise(overlap_list), lambda x: x[0][0] < x[1][0] < x[0][1]) if k])

但这给出了:


[[((10031548, 10031643), (10031556, 10031656))]]

我看过不同的解决方案,但我面临的问题是,按之前的位置进行索引似乎不起作用。


如何获得所需的输出?任何帮助将不胜感激。


慕盖茨4494581
浏览 125回答 2
2回答

ibeautiful

老实说-我并不真正了解您的代码及其背后的想法,因此无法告诉您为什么结果仅包含所需元组的子集。但是,我有一个不同的方法,你可能会觉得有趣。主要思想是有一个可以测试两个元组是否重叠的函数。此函数适用于overlap_list. 如果两个重叠,则将它们添加到结果列表中,该列表随后将包含重复项,因此list(set(result))最终应用。但是,您可以将演员表放在列表中,因为一组都可以,因此我可以...测试函数的想法是简单地对要测试的两个元组的 4 个值进行排序并查看排序顺序(请参阅 参考资料numpy.argsort)。如果前两个索引是 0/1 或 2/3,则两个元组不重叠。换句话说:针对存在进行测试,>1它们必须是不相等的,即不能同时为真或假:def overlap_test(tpl1, tpl2):&nbsp; &nbsp; import numpy as np&nbsp; &nbsp; a, b = np.argsort(tpl1 + tpl2)[:2] > 1&nbsp; &nbsp; return a != b这是使用该函数的循环:import itertools as itresult = []for test_tpl, sec_tpl in list(it.combinations(overlap_list, 2)):&nbsp; &nbsp; if overlap_test(test_tpl, sec_tpl):&nbsp; &nbsp; &nbsp; &nbsp; result.extend([test_tpl, sec_tpl])result = list(set(result))# [(10001657, 10001718),#&nbsp; (10031556, 10031656),#&nbsp; (10031548, 10031643),#&nbsp; (10001657, 10001716),#&nbsp; (10001656, 10001717)]我仍然想知道循环是否不能更有效,并且这样是否也set无法优化对循环的需求 - 好吧,也许你会找到一个更好的循环。编辑:到目前为止并没有真正发现有什么不同,但有一点改进:相同的方法,但从set一开始就使用:def find_overlap_tuples_0(tpl_list):&nbsp; &nbsp; result = set()&nbsp; &nbsp; for test_tpl, sec_tpl in list(it.combinations(tpl_list, 2)):&nbsp; &nbsp; &nbsp; &nbsp; if overlap_test(test_tpl, sec_tpl):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; result.add(test_tpl)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; result.add(sec_tpl)&nbsp; &nbsp; return list(result)# %timeit find_overlap_tuples_0(overlap_list)# 178 µs ± 4.87 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)有点不同,仅基于排列和分组(似乎稍微快一点):def find_overlap_tuples_1(tpl_list):&nbsp; &nbsp; result = set()&nbsp; &nbsp; no_ovl = set()&nbsp; &nbsp; for a, grp in it.groupby(it.permutations(tpl_list, 2), lambda x: x[0]):&nbsp; &nbsp; &nbsp; &nbsp; for b in grp:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (a not in result) and (b[1] not in no_ovl):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if overlap_test(*b):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; result.add(b[0])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; result.add(b[1])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; no_ovl.add(b[0])&nbsp; &nbsp; return list(result)# %timeit find_overlap_tuples_1(overlap_list)# 139 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

汪汪一只猫

似乎您可以对列表进行排序,以便任何重叠的开始和停止都是相邻的,然后只比较邻居以确定是否由于不重叠而需要过滤掉任何元组(不需要在代码末尾进行排序,只是更容易在打印输出中看到重叠的邻居)。l = [(10001656, 10001717), (700, 60000), (10001657, 10001718), (10001657, 10001716), (10031548, 10031643), (10031556, 10031656)]l.sort()overlap = set()for a, b in zip(l, l[1:]):&nbsp; &nbsp; if a[1] >= b[0] and a[1] <= b[1]:&nbsp; &nbsp; &nbsp; &nbsp; overlap.add(a)&nbsp; &nbsp; if b[0] >= a[0] and b[0] <= a[1]:&nbsp; &nbsp; &nbsp; &nbsp; overlap.add(b)overlap = sorted(overlap)&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;print(overlap)# [(10001657, 10001716), (10001657, 10001718), (10031548, 10031643), (10031556, 10031656)]
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python