如何从 Java 调用 tabula (JAR)?

Tabula 看起来像是从 PDF 中提取表格数据的好工具。有很多示例说明如何从命令行调用它或在 Python 中使用它,但似乎没有任何可用于 Java 的文档。有没有人有一个有效的例子?

请注意,tabula 确实提供了源代码,但在版本之间似乎很混乱。例如,GitHub 上的示例引用了 JAR 中似乎不存在的 TableExtractor 类。

https://github.com/tabulapdf/tabula-java


收到一只叮咚
浏览 465回答 2
2回答

30秒到达战场

您可以使用以下代码从 Java 中调用 tabula,希望对您有所帮助&nbsp; public static void main(String[] args) throws IOException {&nbsp; &nbsp; final String FILENAME="../test.pdf";&nbsp; &nbsp; PDDocument pd = PDDocument.load(new File(FILENAME));&nbsp; &nbsp; int totalPages = pd.getNumberOfPages();&nbsp; &nbsp; System.out.println("Total Pages in Document: "+totalPages);&nbsp; &nbsp; ObjectExtractor oe = new ObjectExtractor(pd);&nbsp; &nbsp; SpreadsheetExtractionAlgorithm sea = new SpreadsheetExtractionAlgorithm();&nbsp; &nbsp; Page page = oe.extract(1);&nbsp; &nbsp; // extract text from the table after detecting&nbsp; &nbsp; List<Table> table = sea.extract(page);&nbsp; &nbsp; for(Table tables: table) {&nbsp; &nbsp; &nbsp; &nbsp; List<List<RectangularTextContainer>> rows = tables.getRows();&nbsp; &nbsp; &nbsp; &nbsp; for(int i=0; i<rows.size(); i++) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; List<RectangularTextContainer> cells = rows.get(i);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for(int j=0; j<cells.size(); j++) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; System.out.print(cells.get(j).getText()+"|");&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;// System.out.println();&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }}

心有法竹

// ****** Extract text from the table after detecting & TRANSFER TO XLSX *****&nbsp; &nbsp; XSSFWorkbook wb = new XSSFWorkbook();&nbsp; &nbsp; Sheet sheet = wb.createSheet("Barang Baik");&nbsp; &nbsp; List<Table> table = sea.extract(page);&nbsp; &nbsp; for (Table t : table) {&nbsp; &nbsp; &nbsp; &nbsp; int rowNumber = 0;&nbsp; &nbsp; &nbsp; &nbsp; try {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; while (sheet.getRow(rowNumber).getCell(0) != null) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; rowNumber++;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; } catch (Exception e) { }&nbsp; &nbsp; &nbsp; &nbsp; List<List<RectangularTextContainer>> rows = t.getRows();&nbsp; &nbsp; &nbsp; &nbsp; for (int i = 0; i < rows.size(); i++) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; List<RectangularTextContainer> cells = rows.get(i);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Row row = sheet.createRow(i+rowNumber);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for (int j = 0; j < cells.size(); j++) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Cell cell = row.createCell(j);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; String cellValue = cells.get(j).getText();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; cell.setCellValue(cellValue);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; FileOutputStream fos = new FileOutputStream("C:\\your\\file.xlsx");&nbsp; &nbsp; &nbsp; &nbsp; wb.write(fos);&nbsp; &nbsp; &nbsp; &nbsp; fos.close();&nbsp; &nbsp; }
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Java