使用函数过滤 Pandas DataFrame

这个问题与我昨天发布的问题有关，可以在这里找到。

因此，我继续将 Jan 提供的解决方案实施到整个数据集。解决方法如下：

import re

def is_probably_english(row, threshold=0.90):

regular_expression = re.compile(r'[-a-zA-Z0-9_ ]')

ascii = [character for character in row['App'] if regular_expression.search(character)]

quotient = len(ascii) / len(row['App'])

passed = True if quotient >= threshold else False

return passed

google_play_store_is_probably_english = google_play_store_no_duplicates.apply(is_probably_english, axis=1)

google_play_store_english = google_play_store_no_duplicates[google_play_store_is_probably_english]

因此，据我了解，我们正在使用 is_probably_english 函数过滤 google_play_store_no_duplicates DataFrame 并将结果（布尔值）存储到另一个 DataFrame (google_play_store_is_probably_english) 中。然后使用 google_play_store_is_probably_english 过滤掉 google_play_store_no_duplicates DataFrame 中的非英语应用程序，最终结果存储在新的 DataFrame 中。

这是否有意义，是否看起来是解决问题的好方法？有一个更好的方法吗？

侃侃无极

浏览 169回答 0

使用函数过滤 Pandas DataFrame

0回答