猿问

java.text.Collat​​or 将“v”和“w”视为瑞典语言/区域设置的相同字母

以下测试在 Java 8 中正确通过。


Comparator<String> stringComparator = Collator.getInstance(new Locale("sv", "SE"));


Assert.assertTrue(stringComparator.compare("aaaa", "bbbb") < 0);

Assert.assertTrue(stringComparator.compare("waaa", "vbbb") < 0);

Assert.assertTrue(stringComparator.compare("vaaa", "wbbb") < 0);

此令waaa前前后后。vbbb_ 显然它将和视为同一个字母。vaaawbbbvw


事实上,根据维基百科,在瑞典语中:


到 2006 年,由于新的外来词,“W”的使用量有所增加,因此“W”正式成为字母,“V”=“W”的排序规则已被弃用。2006 年以前的书籍和软件通常使用该规则。该规则被弃用后,一些书籍和软件继续应用它。


有没有人对此有一般的解决方法,以便在瑞典语言环境v中w被视为单独的字母?


BIG阳
浏览 136回答 3
3回答

小唯快跑啊

创建您自己的RuleBasedCollator。检查返回的字符串的值((RuleBasedCollator)Collator.getInstance(new&nbsp;Locale("sv",&nbsp;"SE"))).getRules()并修改它以满足您的需求,然后使用您修改的规则创建一个新的整理器。并且可能也提交一份 JDK 错误报告,这是一个很好的衡量标准。

料青山看我应如是

这将 waaa 置于 vbbb 之前,将 vaaa 置于 wbbb 之前。显然它将 v 和 w 视为同一个字母。即使在瑞典语言环境中,JDK 也确实不会将 'w' 和 'v' 视为相同的字符。字母“v”出现在“w”之前。Assert.assertEquals(1, stringComparator.compare("w", "v"));//TRUE但是,根据瑞典的排序规则,JDK 将 'wa' 排序在 'vb' 之前。Assert.assertEquals(1, stringComparator.compare("wa", "vb"));//FALSE

12345678_0001

您可以创建一个自定义比较器,它包装整理器并手动处理v您w想要的方式。我对此做了两个实现。第一个简短而优雅,它使用 Guavaslexicographical比较器以及 Holger 在评论中提供的棘手的正则表达式。private static final Pattern VW_BOUNDARY = Pattern.compile("(?=[vw])|(?<=[vw])", Pattern.CASE_INSENSITIVE);public static Comparator<String> smallCorrectVwWrapper(Comparator<Object> original) {&nbsp; &nbsp; return Comparator.comparing(&nbsp; &nbsp; &nbsp; &nbsp; s -> Arrays.asList(VW_BOUNDARY.split((String) s)),&nbsp; &nbsp; &nbsp; &nbsp; Comparators.lexicographical(original));第二个实现是一个大而复杂的事情,它做同样的事情,但是手动实现,没有库和正则表达式。public static Comparator<String> correctVwWrapper(Comparator<Object> original) {&nbsp; &nbsp; return (s1, s2) -> compareSplittedVw(original, s1, s2);}/**&nbsp;* Compares the two string by first splitting them into segments separated by W&nbsp;* and V, then comparing the segments one by one.&nbsp;*/private static int compareSplittedVw(Comparator<Object> original, String s1, String s2) {&nbsp; &nbsp; List<String> l1 = splitVw(s1);&nbsp; &nbsp; List<String> l2 = splitVw(s2);&nbsp; &nbsp; int minSize = Math.min(l1.size(), l2.size());&nbsp; &nbsp; for (int ix = 0; ix < minSize; ix++) {&nbsp; &nbsp; &nbsp; &nbsp; int comp = original.compare(l1.get(ix), l2.get(ix));&nbsp; &nbsp; &nbsp; &nbsp; if (comp != 0) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return comp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; return Integer.compare(l1.size(), l2.size());}private static boolean isVw(int ch) {&nbsp; &nbsp; return ch == 'V' || ch == 'v' || ch == 'W' || ch == 'w';}/**&nbsp;* Splits the string into segments separated by V and W.&nbsp;*/public static List<String> splitVw(String s) {&nbsp; &nbsp; var b = new StringBuilder();&nbsp; &nbsp; var result = new ArrayList<String>();&nbsp; &nbsp; for (int offset = 0; offset < s.length();) {&nbsp; &nbsp; &nbsp; &nbsp; int ch = s.codePointAt(offset);&nbsp; &nbsp; &nbsp; &nbsp; if (isVw(ch)) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (b.length() > 0) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; result.add(b.toString());&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; b.setLength(0);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; result.add(Character.toString((char) ch));&nbsp; &nbsp; &nbsp; &nbsp; } else {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; b.appendCodePoint(ch);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; offset += Character.charCount(ch);&nbsp; &nbsp; }&nbsp; &nbsp; if (b.length() > 0) {&nbsp; &nbsp; &nbsp; &nbsp; result.add(b.toString());&nbsp; &nbsp; }&nbsp; &nbsp; return result;}用法:public static void main(String[] args) throws Exception {&nbsp; &nbsp; Comparator<String> stringComparator = correctVwWrapper(Collator.getInstance(new Locale("sv", "SE")));&nbsp; &nbsp; System.out.println(stringComparator.compare("a", "z") < 0);&nbsp; &nbsp; &nbsp;// true&nbsp; &nbsp; System.out.println(stringComparator.compare("wa", "vz") < 0);&nbsp; &nbsp;// false&nbsp; &nbsp; System.out.println(stringComparator.compare("wwa", "vvz") < 0); // false&nbsp; &nbsp; System.out.println(stringComparator.compare("va", "wz") < 0);&nbsp; &nbsp;// true&nbsp; &nbsp; System.out.println(stringComparator.compare("v", "w") < 0);&nbsp; &nbsp; &nbsp;// true}实现一个 wrapping 需要做更多的工作Collator,但它不应该太复杂。
随时随地看视频慕课网APP

相关分类

Java
我要回答