我有一个这样的数据库:
manufacturer cylinders description
0 toyota 5 cylinders toyota, gmc 10 years old.
1 NaN NaN gmc, Motor runs and drives good.
2 NaN NaN Motor old, in pieces. 4 cylinders
3 NaN 12 cylinders 2 owner 0 rust. Cadillac.
还有这组关键词:
manufacturer = ['gmc', 'toyota', 'cadillac']
cylinders = ['12 cylinders', '4 cylinders', '5 cylinders']
我想创建一个程序来读取描述并根据所需的关键字向每列添加正确的信息。理想情况下,它看起来像这样:
manufacturer cylinders description
0 toyota 5 cylinders toyota, gmc 10 years old.
1 gmc NaN gmc, Motor runs and drives good.
2 NaN 4 cylinders Motor old, in pieces. 4 cylinders
3 cadillac 12 cylinders 2 owner 0 rust. Cadillac.
一直在尝试一切,但似乎没有任何效果。这是我为了将单词添加到一列而尝试的方法,但我需要将其更改为多个列,并且该程序会更改值,即使它不是 NaN(fe 将“toyota”更改为“gmc”),这是我不想要的。
import re
keyword = ['gmc', 'toyota', 'cadillac']
bag_of_words = []
for i, description in enumerate(test3['description']):
bag_of_words = re.findall(r"""[A-Za-z\-]+""", test3["description"][i])
for word in bag_of_words:
if word.lower() in keyword:
test3.loc[i, 'manufacturer'] = word.lower()
我知道如何解决这个问题吗?谢谢。
元芳怎么了
相关分类