从超字符串中去除子字符串

我有一个字典,它的键是一个字符串元组,值是它的频率,例如


 {('this','is'):2,('some','word'):3....}

我需要消除一些包含这些子键的键,例如:


d={('large','blue'):4,('cute','blue'):3,('large','blue','dog'):2,

  ('cute','blue','dog'):2,('cute','blue','elephant'):1}

我需要消除,('large','blue')因为它只出现在'large blue dog'但是我不能删除“可爱的蓝色”,因为它出现在'cute blue dog'和'cute blue elephant'


d={('large','blue'):4,('cute','blue'):3,('large','blue','dog'):2,

('cute','blue','dog'):2,('cute','blue','elephant'):1}

final_list=[]

for k,v in d.items():

    final_list.append(' '.join(f for f in k))


final_list=sorted(final_list, key=len,reverse=True)

completed=set()

for f in final_list:

    if not completed:

        completed.add(f)

    else:

        if sum(f in s for s in completed)==1:

            continue


print(final_list)

print(completed)

但这只给了我 ['可爱的蓝象'] 我需要


[large blue dog] :2

[cute blue dog]:2

[cute blue elephant]:1

[cute blue]:3


慕田峪4524236
浏览 167回答 3
3回答

慕容3067478

更新。如果您也想要计数,我宁愿将大部分代码重写为:d={('large','blue'):4,('cute','blue'):3,('large','blue','dog'):2,('cute','blue','dog'):2,('cute','blue','elephant'):1}completed = {}for k,v in d.items():     if len([k1 for k1,v1 in d.items() if k != k1 and set(k).issubset(set(k1))]) != 1:         completed[k] = vprint(completed)结果{('cute', 'blue'): 3, ('large', 'blue', 'dog'): 2, ('cute', 'blue', 'dog'): 2, ('cute', '蓝色', '大象'): 1}我还没有检查性能。我就交给你了。--换个怎么样for f in final_list:    if not completed:        completed.add(f)    else:        if sum(f in s for s in completed)==1:            continue和for f in final_list:    if len([x for x in final_list if f != x and f in x]) != 1:        completed.add(f)这是你想要的?

暮色呼如

这应该有效:previous = " "previousCount = 0for words in sorted([ " ".join(key) for key in d ]) + [" "]:&nbsp; &nbsp; if words.startswith(previous):&nbsp; &nbsp; &nbsp; &nbsp; previousCount += 1&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; print(previous,previousCount)&nbsp; &nbsp; &nbsp; &nbsp; if previousCount < 2 and previous != " ":&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; del d[tuple(previous.split(" "))]&nbsp; &nbsp; &nbsp; &nbsp; previous = words&nbsp; &nbsp; &nbsp; &nbsp; previousCount = 0

江户川乱折腾

必须有更有效的(非O(n^2))方法来做到这一点,但这似乎是您想要的:input = {&nbsp; &nbsp; ('large','blue'): 4,&nbsp; &nbsp; ('cute','blue'): 3,&nbsp; &nbsp; ('large','blue','dog'): 2,&nbsp; &nbsp; ('cute','blue','dog'): 2,&nbsp; &nbsp; ('cute','blue','elephant'): 1,}keys = set(' '.join(k) for k in input)filtered = {&nbsp; &nbsp; tuple(f.split())&nbsp; &nbsp; for f in keys&nbsp; &nbsp; if sum(f != k and f in k for k in keys) == 1}result = {k: v for k, v in input.items() if k not in filtered}from pprint import pprintpprint(sorted(result.items()))结果:[(('cute', 'blue'), 3),&nbsp;(('cute', 'blue', 'dog'), 2),&nbsp;(('cute', 'blue', 'elephant'), 1),&nbsp;(('large', 'blue', 'dog'), 2)]根据您的要求,这个想法是将出现一次的键识别为其他键的一部分。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python