HUH函数
由于某些原因,人们经常问如何在没有defaultdict的情况下执行此操作>>> text= "I say what I mean. I mean what I say. i do.">>> sentences = text.lower().split('.')>>> dic = {}>>> for i, sen in enumerate(sentences):... for word in sen.split():... if word not in dic: # you just need these... dic[word] = set() # two extra lines... dic[word].add(i)... >>> dic{'i': set([0, 1, 2]), 'do': set([2]), 'say': set([0, 1]), 'what': set([0, 1]), 'mean': set([0, 1])}如果您确实想要列表,可以通过以下修改来做到这一点>>> text= "I say what I mean. I mean what I say. i do.">>> sentences = text.lower().split('.')>>> dic = {}>>> for i, sen in enumerate(sentences):... for word in sen.split():... if word not in dic:... dic[word] = [i]... elif dic[word][-1] != i: # this prevents duplicate entries... dic[word].append(i)... >>> dic{'i': [0, 1, 2], 'do': [2], 'say': [0, 1], 'what': [0, 1], 'mean': [0, 1]}如果您甚至不被允许使用枚举>>> text= "I say what I mean. I mean what I say. i do.">>> sentences = text.lower().split('.')>>> dic = {}>>> i = -1>>> for sen in sentences:... i += 1... for word in sen.split():... if word not in dic:... dic[word] = [i]... elif dic[word][-1] != i: # this prevents duplicate entries... dic[word].append(i)... >>> dic{'i': [0, 1, 2], 'do': [2], 'say': [0, 1], 'what': [0, 1], 'mean': [0, 1]}
暮色呼如
您可以collections.defaultdict在这里使用:>>> from collections import defaultdict>>> text= "I say what I mean. I mean what I say. i do."# convert the text to lower-case and split at `'.'` to get the sentences.>>> sentences = text.lower().split('.') >>> dic = defaultdict(set) #sets contain only unique itemefor i,sen in enumerate(sentences): #use enumerate to get the sentence as well as index for word in sen.split(): #split the sentence at white-spaces to get words dic[word].add(i)>>> dicdefaultdict(<type 'set'>,{'i': set([0, 1, 2]), 'do': set([2]), 'say': set([0, 1]), 'what': set([0, 1]), 'mean': set([0, 1])})使用普通字典:>>> dic = {}for i,sen in enumerate(sentences): for word in sen.split(): dic.setdefault(word,set()).add(i)... >>> dic{'i': set([0, 1, 2]), 'do': set([2]), 'say': set([0, 1]), 'what': set([0, 1]), 'mean': set([0, 1])}没有enumerate:>>> dic = {}>>> index = 0for sen in sentences: for word in sen.split(): dic.setdefault(word,set()).add(index) index += 1... >>> dic{'i': set([0, 1, 2]), 'do': set([2]), 'say': set([0, 1]), 'what': set([0, 1]), 'mean': set([0, 1])}