在 Java (7) 中通过 Apache POI (3.17) 加载特定 Excel (XLSX) 文件时,我收到有关编码 ( org.apache.xerces.impl.io.MalformedByteSequenceException: Invalid byte 2 of 4-byte UTF-8 sequence)的异常。这似乎是在读取sharedStrings.xml文件时(注意此文件以 UTF8 编码)。
但是,如果我通过 anInputStream而不是 a加载文件File,则文件会正确加载。在这两种情况下,我都(或可以)指定编码。我知道从 an 加载InputStream不是最佳的,我很想避免这种情况。
我写了一个小例子来强调我的问题,但不幸的是我无法分享有问题的文件:
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.text.MessageFormat;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;
public class POIEncodingIssue
{
public static void main(final String[] args) throws Exception
{
final File file = new File("path\\to\\my\\file.xlsx"); //$NON-NLS-1$
Workbook workbook = null;
// This works
System.out.println("Trying Stream based approach..."); //$NON-NLS-1$
try (InputStream stream = new FileInputStream(file))
{
workbook = WorkbookFactory.create(stream);
System.out.println(MessageFormat.format("Value was \"{0}\"", workbook.getSheetAt(0).getRow(0).getCell(0))); //$NON-NLS-1$
}
catch (final Exception e)
{
e.printStackTrace();
}
finally
{
if (workbook != null)
{
workbook.close();
}
}
// This doesn't
System.out.println("Trying File based approach..."); //$NON-NLS-1$
try
{
workbook = WorkbookFactory.create(file);
System.out.println(MessageFormat.format("Value was \"{0}\"", workbook.getSheetAt(0).getRow(0).getCell(0))); //$NON-NLS-1$
}
catch (final Exception e)
{
e.printStackTrace();
}
finally
{
if (workbook != null)
{
workbook.close();
}
}
}
}
墨色风雨
相关分类