慕工程0101907
使用 Grzegorz Skibinski 的设置df = pd.DataFrame({ "review_trimmed": [ "dog and cat", "Cat chases mouse", "horrible thing", "noodle soup", "chilli", "pizza is Good" ]})searchfor = "yes cat Dog soup good bad horrible".split()df review_trimmed0 dog and cat1 Cat chases mouse2 horrible thing3 noodle soup4 chilli5 pizza is Good_______________________________________________________解决方案 ( pandas.Series.str.findall)用于'|'.join将搜索到的所有项目组合成一个正则表达式字符串,以搜索任何项目。使用flag=2这意味着IGNORECASEdf.review_trimmed.str.findall('|'.join(searchfor), 2)0 [dog, cat]1 [Cat]2 [horrible]3 [soup]4 []5 [Good]Name: review_trimmed, dtype: object我们可以join这样';':df.review_trimmed.str.findall('|'.join(searchfor), 2).str.join(';')0 dog;cat1 Cat2 horrible3 soup4 5 GoodName: review_trimmed, dtype: object
ITMISS
使用numpy:searchfor=[wrd.lower() for wrd in searchfor]searchfor=set(searchfor)df["found"]=np.bitwise_and(df["review_trimmed"].str.lower().str.split("[^\w+]").map(set), searchfor)为了显示输出,我使用了虚拟数据:import pandas as pdimport numpy as npdf=pd.DataFrame({"review_trimmed": ["dog and cat", "Cat chases mouse", "horrible thing", "noodle soup", "chilli", "pizza is Good"]})searchfor="yes cat Dog soup good bad horrible".split(" ")searchfor=[wrd.lower() for wrd in searchfor]searchfor=set(searchfor)df["found"]=np.bitwise_and(df["review_trimmed"].str.lower().str.split("[^\w+]").map(set), searchfor)print(searchfor)print(df)输出:#searchfor:{'cat', 'good', 'yes', 'dog', 'bad', 'horrible', 'soup'}#df: review_trimmed found0 dog and cat {cat, dog}1 Cat chases mouse {cat}2 horrible thing {horrible}3 noodle soup {soup}4 chilli {}5 pizza is Good {good}编辑IIUC - 只需添加.str.join(";")searchfor=[wrd.lower() for wrd in searchfor]searchfor=set(searchfor)df["found"]=np.bitwise_and(df["review_trimmed"].str.lower().str.split("[^\w+]").map(set), searchfor).str.join(";")print(searchfor)print(df)输出:{'dog', 'soup', 'cat', 'bad', 'good', 'yes', 'horrible'} review_trimmed found0 dog and cat dog;cat1 Cat chases mouse cat2 horrible thing horrible3 noodle soup soup4 chilli5 pizza is Good good