为包含单词的列表生成唯一 ID

你有两个错误。首先，你有一个简单的错字，在这里：for word1,word2 in labels:    ids.append([word_to_id [word1], word_to_id [word1]])您在那里添加了word1 两次id 。更正第二个word1以查找word2。接下来，您不是在测试您之前是否见过某个单词，因此'Kleiber'您首先为其指定 id 4，然后6在下一次迭代中覆盖该条目。您需要提供唯一的单词编号，而不是所有单词：counter = 0for word in vocabulary:    if word not in word_to_id:        word_to_id[word] = counter        counter += 1或者，vocabulary如果您已经列出了该词，则您根本无法添加该词。vocabulary顺便说一下，您在这里真的不需要单独的列表。一个单独的循环不会给你买任何东西，所以以下也有效：word_to_id = {}counter = 0for words in labels:    for word in words:        word_to_id [word] = counter        counter += 1您可以通过使用defaultdict对象并itertools.count()提供默认值来大大简化代码：from collections import defaultdictfrom itertools import countdef words_to_ids(labels):    word_ids = defaultdict(count().__next__)    return [[word_ids[w1], word_ids[w2]] for w1, w2 in labels]count()每次__next__调用该对象时，该对象都会为您提供系列中的下一个整数值，并且defaultdict()每次您尝试访问字典中尚不存在的键时都会调用该值。它们一起确保每个唯一单词的唯一 ID。

为包含单词的列表生成唯一 ID

2回答