猿问

使用 iText 替换 PDF 文件中的文本

我正在使用iText(5.5.13)库读取 .PDF 并替换文件中的模式。问题是没有找到模式,因为当图书馆读取 pdf 时,不知何故会出现一些奇怪的字符。

例如,在句子中:

"This is a test in order to see if the"

当我试图阅读它时变成了这个:

[(This is a )9(te)-3(st)9( in o)-4(rd)15(er )-2(t)9(o)-5( s)8(ee)7( if t)-3(h)3(e )]

因此,如果我尝试查找和替换"test",则不会"test"在 pdf 中找到任何单词,也不会被替换

这是我正在使用的代码:

public void processPDF(String src, String dest) {


    try {


      PdfReader reader = new PdfReader(src);

      PdfArray refs = null;

      PRIndirectReference reference = null;


      int nPages = reader.getNumberOfPages();


      for (int i = 1; i <= nPages; i++) {

        PdfDictionary dict = reader.getPageN(i);

        PdfObject object = dict.getDirectObject(PdfName.CONTENTS);

        if (object.isArray()) {

          refs = dict.getAsArray(PdfName.CONTENTS);

          ArrayList<PdfObject> references = refs.getArrayList();


          for (PdfObject r : references) {


            reference = (PRIndirectReference) r;

            PRStream stream = (PRStream) PdfReader.getPdfObject(reference);

            byte[] data = PdfReader.getStreamBytes(stream);

            String dd = new String(data, "UTF-8");


            dd = dd.replaceAll("@pattern_1234", "trueValue");

            dd = dd.replaceAll("test", "tested");


            stream.setData(dd.getBytes());

          }


        }

        if (object instanceof PRStream) {

          PRStream stream = (PRStream) object;


          byte[] data = PdfReader.getStreamBytes(stream);

          String dd = new String(data, "UTF-8");

          System.out.println("content---->" + dd);

          dd = dd.replaceAll("@pattern_1234", "trueValue");

          dd = dd.replaceAll("This", "FIRST");


          stream.setData(dd.getBytes(StandardCharsets.UTF_8));

        }

      }

      PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));

      stamper.close();

      reader.close();

    }


    catch (Exception e) {

    }

  }


开心每一天1111
浏览 816回答 1
1回答

烙印99

正如评论和答案中已经提到的,PDF 不是一种用于文本编辑的格式。它是最终格式,有关文本流、布局甚至到 Unicode 的映射的信息都是可选的。因此,即使假设存在关于将字形映射到 Unicode 的可选信息,使用 iText 完成此任务的方法可能看起来有点不令人满意:首先使用自定义文本提取策略确定相关文本的位置,然后继续删除该位置所有内容的当前内容PdfCleanUpProcessor,最后将替换文本绘制到间隙中。在这个答案中,我将提供一个帮助程序类,允许结合前两个步骤,查找和删除现有文本,其优点是实际上只删除文本,而不是任何背景图形等,就像PdfCleanUpProcessor编辑的情况一样。助手还返回被移除文本的位置,允许在其上标记替换。helper 类基于此较早答案PdfContentStreamEditor中提供的内容。不过,请使用github 上此类的版本,因为原始类自构想以来已得到一些增强。helperSimpleTextRemover类说明了从 PDF 中正确删除文本所必需的内容。其实限制在几个方面:它只替换实际页面内容流中的文本。要同时替换嵌入式 XObject 中的文本,必须递归地遍历相关页面的 XObject 资源,并将编辑器应用于它们。它的“简单”方式与以下方式相同SimpleTextExtractionStrategy:它假定显示说明的文本按阅读顺序出现在内容中。还要处理顺序不同且指令必须排序的内容流,这意味着所有传入指令和相关呈现信息必须缓存到页面末尾,而不仅仅是一次几个指令。然后可以对渲染信息进行排序,可以在排序后的渲染信息中标识要移除的部分,可以操纵相关联的指令,并且最终可以存储指令。它不会尝试识别在视觉上代表空白的字形之间的间隙,而实际上根本没有字形。要识别间隙,必须扩展代码以检查两个连续的字形是否完全相继,或者是否存在间隙或跳行。在计算删除字形的间隙时,它还没有考虑字符和单词的间距。要改进这一点,必须改进字形宽度计算。但是,考虑到您的内容流中的示例摘录,这些限制可能不会妨碍您。public class SimpleTextRemover extends PdfContentStreamEditor {&nbsp; &nbsp; public SimpleTextRemover() {&nbsp; &nbsp; &nbsp; &nbsp; super (new SimpleTextRemoverListener());&nbsp; &nbsp; &nbsp; &nbsp; ((SimpleTextRemoverListener)getRenderListener()).simpleTextRemover = this;&nbsp; &nbsp; }&nbsp; &nbsp; /**&nbsp; &nbsp; &nbsp;* <p>Removes the string to remove from the given page of the&nbsp; &nbsp; &nbsp;* document in the PDF reader the given PDF stamper works on.</p>&nbsp; &nbsp; &nbsp;* <p>The result is a list of glyph lists each of which represents&nbsp; &nbsp; &nbsp;* a match can can be queried for position information.</p>&nbsp; &nbsp; &nbsp;*/&nbsp; &nbsp; public List<List<Glyph>> remove(PdfStamper pdfStamper, int pageNum, String toRemove) throws IOException {&nbsp; &nbsp; &nbsp; &nbsp; if (toRemove.length()&nbsp; == 0)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return Collections.emptyList();&nbsp; &nbsp; &nbsp; &nbsp; this.toRemove = toRemove;&nbsp; &nbsp; &nbsp; &nbsp; cachedOperations.clear();&nbsp; &nbsp; &nbsp; &nbsp; elementNumber = -1;&nbsp; &nbsp; &nbsp; &nbsp; pendingMatch.clear();&nbsp; &nbsp; &nbsp; &nbsp; matches.clear();&nbsp; &nbsp; &nbsp; &nbsp; allMatches.clear();&nbsp; &nbsp; &nbsp; &nbsp; editPage(pdfStamper, pageNum);&nbsp; &nbsp; &nbsp; &nbsp; return allMatches;&nbsp; &nbsp; }&nbsp; &nbsp; /**&nbsp; &nbsp; &nbsp;* Adds the given operation to the cached operations and checks&nbsp; &nbsp; &nbsp;* whether some cached operations can meanwhile be processed and&nbsp; &nbsp; &nbsp;* written to the result content stream.&nbsp; &nbsp; &nbsp;*/&nbsp; &nbsp; @Override&nbsp; &nbsp; protected void write(PdfContentStreamProcessor processor, PdfLiteral operator, List<PdfObject> operands) throws IOException {&nbsp; &nbsp; &nbsp; &nbsp; cachedOperations.add(new ArrayList<>(operands));&nbsp; &nbsp; &nbsp; &nbsp; while (process(processor)) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; cachedOperations.remove(0);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; /**&nbsp; &nbsp; &nbsp;* Removes any started match and sends all remaining cached&nbsp; &nbsp; &nbsp;* operations for processing.&nbsp; &nbsp; &nbsp;*/&nbsp; &nbsp; @Override&nbsp; &nbsp; public void finalizeContent() {&nbsp; &nbsp; &nbsp; &nbsp; pendingMatch.clear();&nbsp; &nbsp; &nbsp; &nbsp; try {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; while (!cachedOperations.isEmpty()) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (!process(this)) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // TODO: Should not happen, so warn&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; System.err.printf("Failure flushing operation %s; dropping.\n", cachedOperations.get(0));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; cachedOperations.remove(0);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; } catch (IOException e) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; throw new ExceptionConverter(e);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; /**&nbsp; &nbsp; &nbsp;* Tries to process the first cached operation. Returns whether&nbsp; &nbsp; &nbsp;* it could be processed.&nbsp; &nbsp; &nbsp;*/&nbsp; &nbsp; boolean process(PdfContentStreamProcessor processor) throws IOException {&nbsp; &nbsp; &nbsp; &nbsp; if (cachedOperations.isEmpty())&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return false;&nbsp; &nbsp; &nbsp; &nbsp; List<PdfObject> operands = cachedOperations.get(0);&nbsp; &nbsp; &nbsp; &nbsp; PdfLiteral operator = (PdfLiteral) operands.get(operands.size() - 1);&nbsp; &nbsp; &nbsp; &nbsp; String operatorString = operator.toString();&nbsp; &nbsp; &nbsp; &nbsp; if (TEXT_SHOWING_OPERATORS.contains(operatorString))&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return processTextShowingOp(processor, operator, operands);&nbsp; &nbsp; &nbsp; &nbsp; super.write(processor, operator, operands);&nbsp; &nbsp; &nbsp; &nbsp; return true;&nbsp; &nbsp; }&nbsp; &nbsp; /**&nbsp; &nbsp; &nbsp;* Tries to processes a text showing operation. Unless a match&nbsp; &nbsp; &nbsp;* is pending and starts before the end of the argument of this&nbsp; &nbsp; &nbsp;* instruction, it can be processed. If the instructions contains&nbsp; &nbsp; &nbsp;* a part of a match, it is transformed to a TJ operation and&nbsp; &nbsp; &nbsp;* the glyphs in question are replaced by text position adjustments.&nbsp; &nbsp; &nbsp;* If the original operation had a side effect (jump to next line&nbsp; &nbsp; &nbsp;* or spacing adjustment), this side effect is explicitly added.&nbsp; &nbsp; &nbsp;*/&nbsp; &nbsp; boolean processTextShowingOp(PdfContentStreamProcessor processor, PdfLiteral operator, List<PdfObject> operands) throws IOException {&nbsp; &nbsp; &nbsp; &nbsp; PdfObject object = operands.get(operands.size() - 2);&nbsp; &nbsp; &nbsp; &nbsp; boolean isArray = object instanceof PdfArray;&nbsp; &nbsp; &nbsp; &nbsp; PdfArray array = isArray ? (PdfArray) object : new PdfArray(object);&nbsp; &nbsp; &nbsp; &nbsp; int elementCount = countStrings(object);&nbsp; &nbsp; &nbsp; &nbsp; // Currently pending glyph intersects parameter of this operation -> cannot yet process&nbsp; &nbsp; &nbsp; &nbsp; if (!pendingMatch.isEmpty() && pendingMatch.get(0).elementNumber < processedElements + elementCount)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return false;&nbsp; &nbsp; &nbsp; &nbsp; // The parameter of this operation is subject to a match -> copy as is&nbsp; &nbsp; &nbsp; &nbsp; if (matches.size() == 0 || processedElements + elementCount <= matches.get(0).get(0).elementNumber || elementCount == 0) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; super.write(processor, operator, operands);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; processedElements += elementCount;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return true;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; // The parameter of this operation contains glyphs of a match -> manipulate&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; PdfArray newArray = new PdfArray();&nbsp; &nbsp; &nbsp; &nbsp; for (int arrayIndex = 0; arrayIndex < array.size(); arrayIndex++) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; PdfObject entry = array.getPdfObject(arrayIndex);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (!(entry instanceof PdfString)) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; newArray.add(entry);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; } else {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; PdfString entryString = (PdfString) entry;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; byte[] entryBytes = entryString.getBytes();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for (int index = 0; index < entryBytes.length; ) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; List<Glyph> match = matches.size() == 0 ? null : matches.get(0);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Glyph glyph = match == null ? null : match.get(0);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (glyph == null || processedElements < glyph.elementNumber) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; newArray.add(new PdfString(Arrays.copyOfRange(entryBytes, index, entryBytes.length)));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (index < glyph.index) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; newArray.add(new PdfString(Arrays.copyOfRange(entryBytes, index, glyph.index)));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; index = glyph.index;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; continue;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; newArray.add(new PdfNumber(-glyph.width));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; index++;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; match.remove(0);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (match.isEmpty())&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; matches.remove(0);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; processedElements++;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; writeSideEffect(processor, operator, operands);&nbsp; &nbsp; &nbsp; &nbsp; writeTJ(processor, newArray);&nbsp; &nbsp; &nbsp; &nbsp; return true;&nbsp; &nbsp; }&nbsp; &nbsp; /**&nbsp; &nbsp; &nbsp;* Counts the strings in the given argument, itself a string or&nbsp; &nbsp; &nbsp;* an array containing strings and non-strings.&nbsp; &nbsp; &nbsp;*/&nbsp; &nbsp; int countStrings(PdfObject textArgument) {&nbsp; &nbsp; &nbsp; &nbsp; if (textArgument instanceof PdfArray) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; int result = 0;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for (PdfObject object : (PdfArray)textArgument) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (object instanceof PdfString)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; result++;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return result;&nbsp; &nbsp; &nbsp; &nbsp; } else&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return textArgument instanceof PdfString ? 1 : 0;&nbsp; &nbsp; }&nbsp; &nbsp; /**&nbsp; &nbsp; &nbsp;* Writes side effects of a text showing operation which is going to be&nbsp; &nbsp; &nbsp;* replaced by a TJ operation. Side effects are line jumps and changes&nbsp; &nbsp; &nbsp;* of character or word spacing.&nbsp; &nbsp; &nbsp;*/&nbsp; &nbsp; void writeSideEffect(PdfContentStreamProcessor processor, PdfLiteral operator, List<PdfObject> operands) throws IOException {&nbsp; &nbsp; &nbsp; &nbsp; switch (operator.toString()) {&nbsp; &nbsp; &nbsp; &nbsp; case "\"":&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; super.write(processor, OPERATOR_Tw, Arrays.asList(operands.get(0), OPERATOR_Tw));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; super.write(processor, OPERATOR_Tc, Arrays.asList(operands.get(1), OPERATOR_Tc));&nbsp; &nbsp; &nbsp; &nbsp; case "'":&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; super.write(processor, OPERATOR_Tasterisk, Collections.singletonList(OPERATOR_Tasterisk));&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; /**&nbsp; &nbsp; &nbsp;* Writes a TJ operation with the given array unless array is empty.&nbsp; &nbsp; &nbsp;*/&nbsp; &nbsp; void writeTJ(PdfContentStreamProcessor processor, PdfArray array) throws IOException {&nbsp; &nbsp; &nbsp; &nbsp; if (!array.isEmpty()) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; List<PdfObject> operands = Arrays.asList(array, OPERATOR_TJ);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; super.write(processor, OPERATOR_TJ, operands);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; /**&nbsp; &nbsp; &nbsp;* Analyzes the given text render info whether it starts a new match or&nbsp; &nbsp; &nbsp;* finishes / continues / breaks a pending match. This method is called&nbsp; &nbsp; &nbsp;* by the {@link SimpleTextRemoverListener} registered as render listener&nbsp; &nbsp; &nbsp;* of the underlying content stream processor.&nbsp; &nbsp; &nbsp;*/&nbsp; &nbsp; void renderText(TextRenderInfo renderInfo) {&nbsp; &nbsp; &nbsp; &nbsp; elementNumber++;&nbsp; &nbsp; &nbsp; &nbsp; int index = 0;&nbsp; &nbsp; &nbsp; &nbsp; for (TextRenderInfo info : renderInfo.getCharacterRenderInfos()) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; int matchPosition = pendingMatch.size();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; pendingMatch.add(new Glyph(info, elementNumber, index));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (!toRemove.substring(matchPosition, matchPosition + info.getText().length()).equals(info.getText())) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; reduceToPartialMatch();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (pendingMatch.size() == toRemove.length()) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; matches.add(new ArrayList<>(pendingMatch));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; allMatches.add(new ArrayList<>(pendingMatch));&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; pendingMatch.clear();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; index++;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; /**&nbsp; &nbsp; &nbsp;* Reduces the current pending match to an actual (partial) match&nbsp; &nbsp; &nbsp;* after the addition of the next glyph has invalidated it as a&nbsp; &nbsp; &nbsp;* whole match.&nbsp; &nbsp; &nbsp;*/&nbsp; &nbsp; void reduceToPartialMatch() {&nbsp; &nbsp; &nbsp; &nbsp; outer:&nbsp; &nbsp; &nbsp; &nbsp; while (!pendingMatch.isEmpty()) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; pendingMatch.remove(0);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; int index = 0;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for (Glyph glyph : pendingMatch) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (!toRemove.substring(index, index + glyph.text.length()).equals(glyph.text)) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; continue outer;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; index++;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; String toRemove = null;&nbsp; &nbsp; final List<List<PdfObject>> cachedOperations = new LinkedList<>();&nbsp; &nbsp; int elementNumber = -1;&nbsp; &nbsp; int processedElements = 0;&nbsp; &nbsp; final List<Glyph> pendingMatch = new ArrayList<>();&nbsp; &nbsp; final List<List<Glyph>> matches = new ArrayList<>();&nbsp; &nbsp; final List<List<Glyph>> allMatches = new ArrayList<>();&nbsp; &nbsp; /**&nbsp; &nbsp; &nbsp;* Render listener class used by {@link SimpleTextRemover} as listener&nbsp; &nbsp; &nbsp;* of its content stream processor ancestor. Essentially it forwards&nbsp; &nbsp; &nbsp;* {@link TextRenderInfo} events and ignores all else.&nbsp; &nbsp; &nbsp;*/&nbsp; &nbsp; static class SimpleTextRemoverListener implements RenderListener {&nbsp; &nbsp; &nbsp; &nbsp; @Override&nbsp; &nbsp; &nbsp; &nbsp; public void beginTextBlock() { }&nbsp; &nbsp; &nbsp; &nbsp; @Override&nbsp; &nbsp; &nbsp; &nbsp; public void renderText(TextRenderInfo renderInfo) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; simpleTextRemover.renderText(renderInfo);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; @Override&nbsp; &nbsp; &nbsp; &nbsp; public void endTextBlock() { }&nbsp; &nbsp; &nbsp; &nbsp; @Override&nbsp; &nbsp; &nbsp; &nbsp; public void renderImage(ImageRenderInfo renderInfo) { }&nbsp; &nbsp; &nbsp; &nbsp; SimpleTextRemover simpleTextRemover = null;&nbsp; &nbsp; }&nbsp; &nbsp; /**&nbsp; &nbsp; &nbsp;* Value class representing a glyph with information on&nbsp; &nbsp; &nbsp;* the displayed text and its position, the overall number&nbsp; &nbsp; &nbsp;* of the string argument of a text showing instruction&nbsp; &nbsp; &nbsp;* it is in and the index at which it can be found therein,&nbsp; &nbsp; &nbsp;* and the width to use as text position adjustment when&nbsp; &nbsp; &nbsp;* replacing it. Beware, the width does not yet consider&nbsp; &nbsp; &nbsp;* character and word spacing!&nbsp; &nbsp; &nbsp;*/&nbsp; &nbsp; public static class Glyph {&nbsp; &nbsp; &nbsp; &nbsp; public Glyph(TextRenderInfo info, int elementNumber, int index) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; text = info.getText();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ascent = info.getAscentLine();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; base = info.getBaseline();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; descent = info.getDescentLine();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; this.elementNumber = elementNumber;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; this.index = index;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; this.width = info.getFont().getWidth(text);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; public final String text;&nbsp; &nbsp; &nbsp; &nbsp; public final LineSegment ascent;&nbsp; &nbsp; &nbsp; &nbsp; public final LineSegment base;&nbsp; &nbsp; &nbsp; &nbsp; public final LineSegment descent;&nbsp; &nbsp; &nbsp; &nbsp; final int elementNumber;&nbsp; &nbsp; &nbsp; &nbsp; final int index;&nbsp; &nbsp; &nbsp; &nbsp; final float width;&nbsp; &nbsp; }&nbsp; &nbsp; final PdfLiteral OPERATOR_Tasterisk = new PdfLiteral("T*");&nbsp; &nbsp; final PdfLiteral OPERATOR_Tc = new PdfLiteral("Tc");&nbsp; &nbsp; final PdfLiteral OPERATOR_Tw = new PdfLiteral("Tw");&nbsp; &nbsp; final PdfLiteral OPERATOR_Tj = new PdfLiteral("Tj");&nbsp; &nbsp; final PdfLiteral OPERATOR_TJ = new PdfLiteral("TJ");&nbsp; &nbsp; final static List<String> TEXT_SHOWING_OPERATORS = Arrays.asList("Tj", "'", "\"", "TJ");&nbsp; &nbsp; final static Glyph[] EMPTY_GLYPH_ARRAY = new Glyph[0];}( SimpleTextRemover辅助类)你可以像这样使用它:PdfReader pdfReader = new PdfReader(SOURCE);PdfStamper pdfStamper = new PdfStamper(pdfReader, RESULT_STREAM);SimpleTextRemover remover = new SimpleTextRemover();System.out.printf("\ntest.pdf - Test\n");for (int i = 1; i <= pdfReader.getNumberOfPages(); i++){&nbsp; &nbsp; System.out.printf("Page %d:\n", i);&nbsp; &nbsp; List<List<Glyph>> matches = remover.remove(pdfStamper, i, "Test");&nbsp; &nbsp; for (List<Glyph> match : matches) {&nbsp; &nbsp; &nbsp; &nbsp; Glyph first = match.get(0);&nbsp; &nbsp; &nbsp; &nbsp; Vector baseStart = first.base.getStartPoint();&nbsp; &nbsp; &nbsp; &nbsp; Glyph last = match.get(match.size()-1);&nbsp; &nbsp; &nbsp; &nbsp; Vector baseEnd = last.base.getEndPoint();&nbsp; &nbsp; &nbsp; &nbsp; System.out.printf("&nbsp; Match from (%3.1f %3.1f) to (%3.1f %3.1f)\n", baseStart.get(I1), baseStart.get(I2), baseEnd.get(I1), baseEnd.get(I2));&nbsp; &nbsp; }}pdfStamper.close();(移除页面文本内容测试testRemoveTestFromTest)我的测试文件有以下控制台输出:test.pdf - TestPage 1:&nbsp; Match from (134,8 666,9) to (177,8 666,9)&nbsp; Match from (134,8 642,0) to (153,4 642,0)&nbsp; Match from (172,8 642,0) to (191,4 642,0)以及输出 PDF 中那些位置缺少“测试”的情况。您可以使用它们在相关位置绘制替换文本,而不是输出匹配坐标。

扬帆大鱼

PDF 文件不是文字处理文件。您看到的是字符的显式放置,这些字符紧贴在一起和/或许多其他东西。您梦想以这种方式“替换”文本是不可能的,或者说更好,即使不是不可能,也不太可能。PDF 是具有字节偏移量的二进制文件。它有很多部分。就像这是在这个字节偏移量处读取这个,然后去那个字节偏移量并读取那个。您不能只是将“foo”替换为“foobar”并认为它会起作用。它会破坏所有字节偏移并完全破坏文件。在询问之前自己尝试一下。在你上面的例子中,在一些编辑器中打开文件并更改你发布的字符串:This&nbsp;is&nbsp;a对此:WOW&nbsp;Let&nbsp;me&nbsp;change&nbsp;this&nbsp;data&nbsp;around&nbsp;for&nbsp;the&nbsp;content&nbsp;"This&nbsp;is&nbsp;a"保存该文件并尝试打开它。即便如此,这是一组不跨越您确定的边界的内容也不会起作用。因为它不是文字处理文件。它不是文本文件。它是一个二进制文件,您无法像您认为的那样对其进行操作。
随时随地看视频慕课网APP

相关分类

Java
我要回答