在 Pandas 中使用查询函数返回位于两个列表交叉处的行

我有这个 df：

pd.DataFrame([[1, "type_1"], [2, "type_2"], [2, "type_1; type_2"], [2, "type_1; type_3"], [2, "type_3"], [2, "type_1; type_2, type_3"]],

columns=["a", "b"])

a b

0 1 type_1

1 2 type_2

2 2 type_1; type_2

3 2 type_1; type_3

4 2 type_3

5 2 type_1; type_2, type_3

我需要使用从配置文件中获取的大量查询字符串，如下所示：

my_list = ["type_1", "type_2"]

df.query("a == 2 and b in @my_list")

现在输出：

a b

1 2 type_2

但我希望输出是这样的，因为 b 中至少有一个值在 my_list 中：

a b

0 2 type_2

1 2 type_1; type_2

2 2 type_1; type_3

3 2 type_1; type_2, type_3

如您所见，问题是我的某些列实际上是列表。目前它们是由分隔的字符串，;但我可以将它们转换为列表。但是，我不确定这将如何帮助我仅使用 .query()从column b内部过滤具有至少一个值的行（因为否则我将不得不解析查询字符串并且它会变得混乱）my_list

这将是列表的等效代码：

pd.DataFrame([[1, ["type_1"]], [2, ["type_2"]], [2, ["type_1", "type_2"]], [2, ["type_1", "type_3"]], [2, "type_3"], [2, ["type_1", "type_2", "type_3"]]],

columns=["a", "b"])

吃鸡游戏

浏览 276回答 2

2回答

梦里花落0921

其实，我错了。看起来这是“python”引擎支持的。df.query("a == 2 and b.str.contains('|'.join(@my_list))", engine='python')   a                       b1  2                  type_22  2          type_1; type_23  2          type_1; type_35  2  type_1; type_2, type_3（旧答案）您的查询可以分为两部分：需要子字符串检查的部分和其他所有内容。您可以分别计算两个掩码。我建议使用str.contains和DataFrame.eval。然后，您可以 AND 掩码和 filter df。m1 = df.eval("a == 2")m2 = df['b'].str.contains('|'.join(my_list))df[m1 & m2]   a                       b1  2                  type_22  2          type_1; type_23  2          type_1; type_35  2  type_1; type_2, type_3

0 0

鸿蒙传说

您可以使用str.splitbefore 重新创建您的列表，如列并使用isinand any.Noticeisin是完全匹配的，这意味着如果您 hvae type_11，使用isin它会返回Falsedf[(pd.DataFrame(df.b.str.split(';').tolist()).isin(my_list).any(1))&(df.a==2)]Out[88]:    a                       b1  2                  type_22  2          type_1; type_23  2          type_1; type_35  2  type_1; type_2, type_3

0 0

随时随地看视频慕课网APP