Python：搜索多列并识别包含列表中任何元素的行

3回答

温温酱

首先，正确的语法是meds_df[['readcode_1', 'readcode_2','generic_name']]（list索引切片中的列名）。这就是为什么你得到一个KeyError.要回答您的问题，这是一种实现方法：# Updated to use tuple per David's suggestionidx = pd.concat((med_df[col].astype(str).str.startswith(tuple(list_to_extract)) for col in ['readcode_1', 'readcode_2','generic_name']), axis=1).any(axis=1)med_df.loc[idx]结果：      ID readcode_1    readcode_2 generic_name1   1001       bxd1  1.146785e+09  Simvastatin3   1003        NaN           NaN  Pravastatin5   1005       bxd4  4.543234e+07          NaN10  1010       bxde           NaN          NaN

0 0

繁花如伊

您可以通过这种方式进行申请：list_to_extract = ["bxd", "Simvastatin", "1146785342", "45432344", "Pravastatin"]bool_df = df[['readcode_1', 'readcode_2','generic_name']].apply(lambda x: x.str.startswith(tuple(list_to_extract), na=False), axis=1)df.loc[bool_df[bool_df.any(axis=1)].index]输出：    ID  readcode_1  readcode_2  generic_name1   1001    bxd1    1.146785e+09    Simvastatin3   1003    NaN     NaN             Pravastatin5   1005    bxd4    4.543234e+07    NaN10  1010    bxde    NaN             NaN感谢 r.ook 发现了一个小错误

0 0

叮当猫咪

另一种解决方案，在重新创建数据帧之前，字符串处理发生在 vanilla python 中：list_to_extract = ["bxd", "Simvastatin", "1146785342", "45432344", "Pravastatin"]cols_to_search = ['readcode_1', 'readcode_2','generic_name']output = [(ID, *searchbox)           for ID, searchbox in zip(df.ID,df.filter(cols_to_search).to_numpy())          if any([str(box).startswith(tuple(list_to_extract)) for box in searchbox])]pd.DataFrame(output, columns = df.columns)     ID readcode_1  readcode_2  generic_name0   1001    bxd1     1.146785e+09   Simvastatin1   1003    NaN      NaN            Pravastatin2   1005    bxd4     4.543234e+07   NaN3   1010    bxde     NaN            NaN

0 0