如何将 JDF 文件转换为 PDF(从多编码文档中删除文本)

我正在尝试使用 C# 将 JDF 文件转换为 PDF 文件。


查看JDF 格式后...我可以看到该文件只是一个放置在 PDF 文档顶部的 XML。


我已经尝试使用StreamWriter / StreamReaderC# 中的功能,但由于 PDF 文档还包含二进制数据和可变换行符(\r\t 和 \t),因此无法打开生成的文件,因为某些二进制数据在 PDF 上被销毁了。这是我尝试使用但没有成功的一些代码。


using (StreamReader reader = new StreamReader(_jdf.FullName, Encoding.Default))

{

    using (StreamWriter writer = new StreamWriter(_pdf.FullName, false, Encoding.Default))

    {


        writer.NewLine = "\n"; //Tried without this and with \r\n


        bool IsStartOfPDF = false;

        while (!reader.EndOfStream)

        {

            var line = reader.ReadLine();


            if (line.IndexOf("%PDF-") != -1)

            {

                IsStartOfPDF = true;

            }


            if (!IsStartOfPDF)

            {

                continue;

            }


            writer.WriteLine(line);

        }

    }

}


呼如林
浏览 106回答 1
1回答

大话西游666

我正在自我回答这个问题,因为它可能是一个有点常见的问题,并且解决方案可以为其他人提供信息。由于文档包含二进制文件和文本,我们不能简单地使用将StreamWriter二进制文件写回另一个文件。即使当您使用StreamWriter读取文件然后将所有内容写入另一个文件时,您也会意识到文件之间的差异。您可以使用BinaryWriter来搜索包含多个部分的文档并将每个字节完全按照您在另一个文档中找到的那样写入。//Using a Binary Reader/Writer as the PDF is multitypeusing (var reader = new BinaryReader(File.Open(_file.FullName, FileMode.Open))){&nbsp; &nbsp; using (var writer = new BinaryWriter(File.Open(tempFileName.FullName, FileMode.CreateNew)))&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; //We are searching for the start of the PDF&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; bool searchingForstartOfPDF = true;&nbsp; &nbsp; &nbsp; &nbsp; var startOfPDF = "%PDF-".ToCharArray();&nbsp; &nbsp; &nbsp; &nbsp; //While we haven't reached the end of the stream&nbsp; &nbsp; &nbsp; &nbsp; while (reader.BaseStream.Position != reader.BaseStream.Length)&nbsp; &nbsp; &nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; //If we are still searching for the start of the PDF&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (searchingForstartOfPDF)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; //Read the current Char&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; var str = reader.ReadChar();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; //If it matches the start of the PDF signiture&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (str.Equals(startOfPDF[0]))&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; //Check the next few characters to see if they match&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; //keeping an eye on our current position in the stream incase something goes wrong&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; var currBasePos = reader.BaseStream.Position;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for (var i = 1; i < startOfPDF.Length; i++)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; //If we found a char that isn't in the PDF signiture, then resume the while loop&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; //to start searching again from the next position&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (!reader.ReadChar().Equals(startOfPDF[i]))&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; reader.BaseStream.Position = currBasePos;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; //If we've reached the end of the PDF signiture then we've found a match&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (i == startOfPDF.Length - 1)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; //Success&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; //Set the Position to the start of the PDF signiture&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; searchingForstartOfPDF = false;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; reader.BaseStream.Position -= startOfPDF.Length;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; //We are no longer searching for the PDF Signiture so&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; //the remaining bytes in the file will be directly wrote&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; //using the stream writer&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; //We are writing the binary now&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; writer.Write(reader.ReadByte());&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }}此代码示例使用BinaryReader1 对 1 地读取每个字符,如果找到匹配的字符串%PDF-(PDF 开始签名),它将阅读器位置移回%,然后使用 写入剩余文档writer.Write(reader.ReadByte())。
打开App,查看更多内容
随时随地看视频慕课网APP