问答详情
源自:5-5 python读取PDF文档(二)

为什么最后用urlopen读取线上pdf地址时,读取信息显示异常

显示如下:

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 2096

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 3237

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 884

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 1528

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 703

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 3344

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 4177

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 1492

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 990

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 2082

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 686

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 801

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 703

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 2096

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 3237

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 5196

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 933

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 884

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 1528

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 1492

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 990

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 2082

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 686

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 801

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 4033

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 841

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 686

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 1107

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 1625

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 683

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 2201

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 3647

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 660

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 2059

WARNING:pdfminer.converter:undefined: <PDFCIDFont: basefont='VIKMFH+MSungHK-Light', cidcoding='Adobe-CNS1'>, 2986

...

...

提问者:原来我叫小土慕课网给我改了名字 2016-11-12 22:51

个回答

  • jimcurry4297201
    2016-11-16 17:02:56
    已采纳

    WARNING:pdfminer.converter:undefined:

    i try this, and it works.

    import logging 
    logging.Logger.propagate = False 
    logging.getLogger().setLevel(logging.ERROR)

    however , i don't know why !

    -------------------------------------------------------------------------------------------------------------------------------------------

    it sets the root logger to level Error. This will stop PDFMiner warn logging, since it logs to the root logger, but not your own logging.

    I needed to set propagation to False, because after PDFMiner usage, I had duplicate logging entries. This was caused by the root logger.

    from: http://stackoverflow.com/questions/29762706/warnings-on-pdfminer

  • 慕粉5528709
    2018-12-03 10:23:46

    emmmmmm 对啊,去除警告不是目的,目的是为了显示中文啊。。。。警告去了,中文还是没显示出来。。有啥意义呢

  • jimcurry4297201
    2016-11-17 16:01:39

    回复 原来我叫小土慕课网给我改了名字:

    我後來繼續做 發現 pdf 分兩種 

    1.文字轉pdf => 用pdfminerk3 處理 轉回txt

    2.圖片轉pdf=> 用Tesseract (OCR庫)處理 轉回txt

    所以上面那篇如果轉出來 還是沒東西的話 

    可以用Tesseract (OCR庫)試試看 

    我最後用下面幾個庫 解決pdf是圖檔狀態下的問題

    tesseract ( OCR庫 命令在python外執行 )

    pyocr     (tesseract  python 庫的接口 ) 

    pillow   (p3從python圖像庫PIL分出來的 )

    imagemagick

    wand      (imagemagick python 庫的接口 )