迭代行以确定特定单词的计数

首页课程实战体系课手记专栏慕课教程

迭代行以确定特定单词的计数

我在迭代 pandas 数据框中的行时遇到问题。我需要为每一行（包含字符串）确定以下内容：

字符串中每个标点符号的计数；
大写字母的数量。

为了回答第一点，我对字符串进行了如下尝试，以查看该方法是否也适用于数据框：

from nltk.corpus import stopwords

from nltk.tokenize import word_tokenize

t= "Have a non-programming question?"

t_low = search.lower()

stop_words = set(stopwords.words('english'))

word_tokens = word_tokenize(t_low)

m = [w for w in word_tokens if not w in stop_words]

m = []

for w in word_tokens:

if w not in stop_words:

m.append(w)

然后，在标记化后对它们进行计数：

import string

from collections import Counter

c = Counter(word_tokens)

for x in string.punctuation:

print(p , c[x])

对于第二点，我将以下内容应用于该句子：

sum(1 for c in t if c.isupper()))

然而，这种情况只能应用于字符串。因为我有一个如下所示的 pandas 数据框：

Text

"Have a non-programming question?"

1回答

米琪卡哇伊

您可以在 DF 上使用 lambda 函数来执行此操作：import stringdef Capitals(strng):    return sum(1 for c in strng if c.isupper())def Punctuation(strng):    return sum([1 for c in strng if c in string.punctuation])df['Caps'] = df['name'].apply(lambda x:Capitals(x))df['Punc'] = df['name'].apply(lambda x:Punctuation(x))Caps 是一个包含大写字母数量的新列。Punc 是一个包含标点符号数量的新列。名称是测试的字符串。

0 0

随时随地看视频慕课网APP