pyPDF2中的extractText（）函数抛出错误

我正在尝试从PDF中提取文本，以便可以对其进行分析，但是当我尝试从页面中提取文本时，出现以下错误。

Traceback (most recent call last):

File "C:\Program Files (x86)\eclipse\plugins\org.python.pydev_2.7.4.2013051601\pysrc\pydevd_comm.py", line 765, in doIt

result = pydevd_vars.evaluateExpression(self.thread_id, self.frame_id, self.expression, self.doExec)

File "C:\Program Files (x86)\eclipse\plugins\org.python.pydev_2.7.4.2013051601\pysrc\pydevd_vars.py", line 376, in evaluateExpression

result = eval(compiled, updated_globals, frame.f_locals)

File "<string>", line 1, in <module>

File "C:\Python33\lib\site-packages\pypdf2-1.9.0-py3.3.egg\PyPDF2\pdf.py", line 1701, in extractText

content = ContentStream(content, self.pdf)

File "C:\Python33\lib\site-packages\pypdf2-1.9.0-py3.3.egg\PyPDF2\pdf.py", line 1783, in __init__

stream = StringIO(stream.getData())

File "C:\Python33\lib\site-packages\pypdf2-1.9.0-py3.3.egg\PyPDF2\generic.py", line 801, in getData

decoded._data = filters.decodeStreamData(self)

File "C:\Python33\lib\site-packages\pypdf2-1.9.0-py3.3.egg\PyPDF2\filters.py", line 228, in decodeStreamData

data = ASCII85Decode.decode(data)

File "C:\Python33\lib\site-packages\pypdf2-1.9.0-py3.3.egg\PyPDF2\filters.py", line 170, in decode

data = [y for y in data if not (y in ' \n\r\t')]

File "C:\Python33\lib\site-packages\pypdf2-1.9.0-py3.3.egg\PyPDF2\filters.py", line 170, in <listcomp>

data = [y for y in data if not (y in ' \n\r\t')]

TypeError: 'in <string>' requires string as left operand, not int

2回答

吃鸡游戏

您正在一行中做两件事。尝试打破所做的事情以进一步解决问题。改变：page_Content = Pdf_File.getPage(pg_idx).extractText()进入page = Pdf_File.getPage(pg_idx)page_Content = page.extractText()查看错误发生的位置。还要从命令行而不是从Eclipse运行该程序，只是为了确保它是相同的错误。您说它发生在，extractText()但是该行没有显示在回溯中。

0 0

随时随地看视频慕课网APP