AttributeError: 'list' 对象没有属性 'isdigit'。

假设我有一个句子列表（在一个大型语料库中）作为标记词的集合。样本格式如下：

tokenized_raw_data 的格式如下：

[['arxiv', ':', 'astro-ph/9505066', '.'], ['seds', 'page', 'on', '``',

'globular', 'star', 'clusters', "''", 'douglas', 'scott', '``', 'independent',

'age', 'estimates', "''", 'krysstal', '``', 'the', 'scale', 'of', 'the',

'universe', "''", 'space', 'and', 'time', 'scaled', 'for', 'the', 'beginner',

'.'], ['icosmos', ':', 'cosmology', 'calculator', '(', 'with', 'graph',

'generation', ')', 'the', 'expanding', 'universe', '(', 'american',

'institute', 'of', 'physics', ')']]

我想申请pos_tag.

到目前为止，我尝试过的内容如下。

import os, nltk, re

from nltk.corpus import stopwords

from unidecode import unidecode

from nltk.tokenize import word_tokenize, sent_tokenize

from nltk.tag import pos_tag

def read_data():

global tokenized_raw_data

with open("path//merge_text_results_pu.txt", 'r', encoding='utf-8', errors = 'replace') as f:

raw_data = f.read()

tokenized_raw_data = '\n'.join(nltk.line_tokenize(raw_data))

read_data()

def function1():

tokens_sentences = sent_tokenize(tokenized_raw_data.lower())

unfiltered_tokens = [[word for word in word_tokenize(word)] for word in tokens_sentences]

tagged_tokens = nltk.pos_tag(unfiltered_tokens)

nouns = [word.encode('utf-8') for word,pos in tagged_tokens

if (pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS')]

joined_nouns_text = (' '.join(map(bytes.decode, nouns))).strip()

noun_tokens = [t for t in wordpunct_tokenize(joined_nouns_text)]

stop_words = set(stopwords.words("english"))

function1()

我收到以下错误。

> AttributeError: 'list' object has no attribute 'isdigit'

请帮助如何以高效的方式克服这个错误？我哪里出错了？

注意：我在 Windows 10 上使用 Python 3.7。

qq_笑_17

浏览 671回答 1

1回答

慕田峪7331174

尝试这个-word_list=[]for i in range(len(unfiltered_tokens)):    word_list.append([])for i in range(len(unfiltered_tokens)):     for word in unfiltered_tokens[i]:        if word[1:].isalpha():            word_list[i].append(word[1:]) 然后在做之后tagged_tokens=[]for token in word_list:    tagged_tokens.append(nltk.pos_tag(token))你会得到你想要的结果！希望这有帮助。

随时随地看视频慕课网APP