猿问

Java 和 C 之间的代码点不匹配

所以,我在imgui到科特林的端口中的以下字符遇到了一些问题–


在花了一整天的时间研究字符集和编码之后,我终于找到了我唯一的希望:依靠unicode代码点。


JVM上的那个字符


"–"[0].toInt() // same as codePointAt()


返回代码点 u2013


在C上,我不确定,但因为这是正在做的事情:


const ImFontGlyph* ImFont::FindGlyph(ImWchar c) const

{

    if (c >= IndexLookup.Size)

        return FallbackGlyph;

    const ImWchar i = IndexLookup.Data[c];

    if (i == (ImWchar)-1)

        return FallbackGlyph;

    return &Glyphs.Data[i];

}

哪里


typedef unsigned short ImWchar



ImVector<ImWchar> IndexLookup; // Sparse. Index glyphs by Unicode code-point.


所以,这样做


char* a = "–";

int b = a[0];

返回代码点 u0096


就我所读到的,看起来我们处于“扩展的Ascii”领域,这很糟糕,因为它似乎有不同的版本/解释。1270x7F


例如,此编码表与我的代码点不匹配,但 Cp1252 编码匹配,因此我倾向于认为这是 C 上实际使用的编码。


在刚才提到的链接底部的表格中,您实际上可以看到(小数,从右列与给定数字开始的计数)确实对应于(十六进制,我发现它有点不连贯,但无论如何)。1502013


为了解决这个问题,我试图将我在Kotlin上的s转换为相同的编码(暂时忽略这当然是特定于平台的),所以对于每个Stringc: Char


"$c".toByteArray(Charset.forName("Cp1252"))[0].toUnsignedInt


这有效,但会中断外来字体(如中文、日文等)的渲染。


所以,我的问题是:为什么在JVM和C上有什么区别?u2013u0096


哪种是正确的处理方法?


慕少森
浏览 124回答 1
1回答

慕后森

目前我在Windows上解决了这个问题,我在检索字符代码点之前插入了这个函数。它基本上重新映射了所有与ISO-8859-1不同的字符。你可以在这个表格中看到它们,它们都是那些带有浅灰色边框的人。internal fun Char.remapCodepointIfProblematic(): Int {&nbsp; &nbsp; val i = toInt()&nbsp; &nbsp; return when (Platform.get()) {&nbsp; &nbsp; &nbsp; &nbsp; /*&nbsp; https://en.wikipedia.org/wiki/Windows-1252#Character_set&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;*&nbsp; manually remap the difference from&nbsp; ISO-8859-1 */&nbsp; &nbsp; &nbsp; &nbsp; Platform.WINDOWS -> when (i) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // 8_128&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x20AC -> 128 // €&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x201A -> 130 // ‚&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x0192 -> 131 // ƒ&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x201E -> 132 // „&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x2026 -> 133 // …&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x2020 -> 134 // †&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x2021 -> 135 // ‡&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x02C6 -> 136 // ˆ&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x2030 -> 137 // ‰&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x0160 -> 138 // Š&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x2039 -> 139 // ‹&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x0152 -> 140 // Œ&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x017D -> 142 // Ž&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // 9_144&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x2018 -> 145 // ‘&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x2019 -> 146 // ’&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x201C -> 147 // “&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x201D -> 148 // ”&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x2022 -> 149 // •&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x2013 -> 150 // –&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x2014 -> 151 // —&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x02DC -> 152 // ˜&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x2122 -> 153 // ™&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x0161 -> 154 // š&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x203A -> 155 // ›&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x0153 -> 156 // œ&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x017E -> 158 // ž&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0x0178 -> 159 // Ÿ&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else -> i&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; else -> i // TODO&nbsp; &nbsp; }}
随时随地看视频慕课网APP

相关分类

Java
我要回答