将 Word 文档转换为 HTML 而不丢失原始文档

将 Word 文档转换为 HTML 而不丢失原始文档

我目前正在开发一个程序，需要将 Word 文档显示为 HTML，但要跟踪 HTML 和原始文件的位置。

为此，在最初加载 Word 文档时，会为文档中的每个元素生成 ID。

foreach (Table t in document.Tables)

{

t.ID = GUID();

Range range = t.Range;

foreach (Cell c in range.Cells)

{

c.ID = t.ID + TableIDSeparator + GUID();

}

}

foreach (Paragraph p in document.Paragraphs)

{

p.ID = GUID();

}

然后我可以通过这种方式将文档保存为 HTML：

document.SaveAs2(tempFileName, WdSaveFormat.wdFormatFilteredHTML);

但随后document对象变成了 HTML 文档，而不是原始的 Word 文档（就像使用 Word 菜单中的另存为时，当前窗口显示新保存的文档而不是原始文档一样）。

所以我尝试以这种方式将文档保存为 HTML：

Document temp = new Document();

string x = document.Range().XML;

temp.Range().InsertXML(x);

temp.SaveAs2(fn, WdSaveFormat.wdFormatFilteredHTML);

temp.Close(false);

但是现在新temp文档缺少我在原始文档中创建的所有 ID，因此我无法根据原始文档找到 HTML 文件中的位置。

我是否遗漏了一些重要的东西，或者有什么方法可以在不丢失对原始文件的引用的情况下另存为 word 文档？

翻翻过去那场雪

浏览 193回答 2

2回答

RISEBY

由于文档结果相同，我使用以下方法将 ID 复制到新文档。请注意段落/表格/等。数组从元素索引 1 开始，而不是 0。        string fn = Path.GetTempPath() + TmpPrefix +GUID() + ".html";        Document temp = new Document();        // Copy whole old document to new document        temp.Range().InsertXML(doc.Range().XML);        // copy IDs assuming the documents are identical and have same amount of elements        for (int i = 1; i <= temp.Tables.Count; i++) {            temp.Tables[i].ID = doc.Tables[i].ID;            Range sRange = doc.Tables[i].Range;            Range tRange = temp.Tables[i].Range;            for(int j = 1; j <= tRange.Cells.Count; j++)            {                tRange.Cells[j].ID = sRange.Cells[j].ID;            }        }        for(int i=1; i <= temp.Paragraphs.Count; i++)        {            temp.Paragraphs[i].ID = doc.Paragraphs[i].ID;        }        // Save new temp document as HTML        temp.SaveAs2(fn, WdSaveFormat.wdFormatFilteredHTML);        temp.Close();        return fn;由于我不需要输出的 DOCX 文件中的 ID（我只使用 ID 来跟踪内存中加载的 DOCX 文件和我的应用程序中显示的 HTML 表示），这对我的情况非常有用。

0

0

互换的青春

尽管上面的这种方法在大型文档上非常慢，所以我不得不以不同的方式做：    public static string RenderHTMLFile(Document doc)    {        string fn = Path.GetTempPath() + TmpPrefix +GUID() + ".html";        var vba = doc.VBProject;        var module = vba.VBComponents.Add(Microsoft.Vbe.Interop.vbext_ComponentType.vbext_ct_StdModule);        var code = Properties.Resources.HTMLRenderer;        module.CodeModule.AddFromString(code);        var dataMacro = Word.Run("renderHTMLCopy", fn);         return fn;    }Properties.Resources.HTMLRenderer带有以下VB代码的txt文件在哪里：Sub renderHTMLCopy(ByVal path As String)'' renderHTMLCopy Macro''Selection.WholeStorySelection.CopyDocuments.AddSelection.PasteAndFormat wdPasteDefaultActiveDocument.SaveAs2 path, WdSaveFormat.wdFormatFilteredHTMLActiveDocument.Close FalseEnd Sub之前的版本处理一个小文档大约需要 1500 毫秒，而这个版本在大约 400 毫秒内渲染同一个文档！

0

0

随时随地看视频慕课网APP

相关分类

C#: typedef入门问题 5 回答