猿问

RegEx拆分camelCase或TitleCase(高级)

我找到了一个出色的RegEx来提取camelCase或TitleCase表达的一部分。


 (?<!^)(?=[A-Z])

它按预期工作:


值->值

camelValue-> camel / Value

TitleValue->标题/值

例如,使用Java:


String s = "loremIpsum";

words = s.split("(?<!^)(?=[A-Z])");

//words equals words = new String[]{"lorem","Ipsum"}

我的问题是在某些情况下它不起作用:


情况1:VALUE-> V / A / L / U / E

情况2:eclipseRCPExt-> eclipse / R / C / P / Ext

在我看来,结果应该是:


情况1:VALUE

情况2:日食/ RCP /外部

换句话说,给定n个大写字符:


如果n个字符后跟小写字符,则组应为:(n-1个字符)/(第n个字符+小写字符)

如果n个字符位于末尾,则该组应为:(n个字符)。

关于如何改善此正则表达式的任何想法吗?


繁星淼淼
浏览 724回答 3
3回答

拉风的咖菲猫

以下正则表达式适用于所有上述示例:public static void main(String[] args){&nbsp; &nbsp; for (String w : "camelValue".split("(?<!(^|[A-Z]))(?=[A-Z])|(?<!^)(?=[A-Z][a-z])")) {&nbsp; &nbsp; &nbsp; &nbsp; System.out.println(w);&nbsp; &nbsp; }}&nbsp; &nbsp;它通过强制否定的向后看不仅在字符串的开头忽略匹配项,而且还忽略在大写字母后跟另一个大写字母的匹配项。这样可以处理“ VALUE”之类的情况。正则表达式的第一部分本身由于无法在“ RPC”和“ Ext”之间分割而在“ eclipseRCPExt”上失败。这是第二个条款的目的:(?<!^)(?=[A-Z][a-z]。此子句允许在每个大写字母前跟一个小写字母前进行拆分,但字符串的开头除外。

狐的传说

看来您正在使此过程变得比所需的更为复杂。对于camelCase,拆分位置仅是大写字母紧跟在小写字母之后的任何位置:(?<=[a-z])(?=[A-Z])这是此正则表达式如何拆分示例数据的方法:value -> valuecamelValue -> camel / ValueTitleValue -> Title / ValueVALUE -> VALUEeclipseRCPExt -> eclipse / RCPExt与所需输出的唯一区别是与eclipseRCPExt,我认为这是在此处正确分割的。附录-改进版本注意:这个答案最近得到了好评,我意识到有更好的方法...通过在上述正则表达式中添加第二种替代方法,可以正确拆分所有OP的测试用例。(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])这是改进的正则表达式如何拆分示例数据的方法:value -> valuecamelValue -> camel / ValueTitleValue -> Title / ValueVALUE -> VALUEeclipseRCPExt -> eclipse / RCP / Ext

斯蒂芬大帝

我无法获得aix的解决方案(也不能在RegExr上运行),所以我想出了自己的经过测试的方法,似乎可以完全满足您的要求:((^[a-z]+)|([A-Z]{1}[a-z]+)|([A-Z]+(?=([A-Z][a-z])|($))))这是一个使用它的示例:; Regex Breakdown:&nbsp; This will match against each word in Camel and Pascal case strings, while properly handling acrynoms.;&nbsp; &nbsp;(^[a-z]+)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Match against any lower-case letters at the start of the string.;&nbsp; &nbsp;([A-Z]{1}[a-z]+)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Match against Title case words (one upper case followed by lower case letters).;&nbsp; &nbsp;([A-Z]+(?=([A-Z][a-z])|($)))&nbsp; &nbsp; Match against multiple consecutive upper-case letters, leaving the last upper case letter out the match if it is followed by lower case letters, and including it if it's followed by the end of the string.newString := RegExReplace(oldCamelOrPascalString, "((^[a-z]+)|([A-Z]{1}[a-z]+)|([A-Z]+(?=([A-Z][a-z])|($))))", "$1 ")newString := Trim(newString)在这里,我用空格分隔每个单词,因此,下面是一些如何转换字符串的示例:ThisIsATitleCASEString =>这是一个标题案例字符串andThisOneIsCamelCASE =>而这一个是Camel CASE上面的解决方案可以满足原始帖子的要求,但是我还需要一个正则表达式来查找包含数字的骆驼和帕斯卡字符串,因此我也想出了一种包含数字的变体:((^[a-z]+)|([0-9]+)|([A-Z]{1}[a-z]+)|([A-Z]+(?=([A-Z][a-z])|($)|([0-9]))))以及使用它的示例:; Regex Breakdown:&nbsp; This will match against each word in Camel and Pascal case strings, while properly handling acrynoms and including numbers.;&nbsp; &nbsp;(^[a-z]+)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Match against any lower-case letters at the start of the command.;&nbsp; &nbsp;([0-9]+)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Match against one or more consecutive numbers (anywhere in the string, including at the start).;&nbsp; &nbsp;([A-Z]{1}[a-z]+)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Match against Title case words (one upper case followed by lower case letters).;&nbsp; &nbsp;([A-Z]+(?=([A-Z][a-z])|($)|([0-9])))&nbsp; &nbsp; Match against multiple consecutive upper-case letters, leaving the last upper case letter out the match if it is followed by lower case letters, and including it if it's followed by the end of the string or a number.newString := RegExReplace(oldCamelOrPascalString, "((^[a-z]+)|([0-9]+)|([A-Z]{1}[a-z]+)|([A-Z]+(?=([A-Z][a-z])|($)|([0-9]))))", "$1 ")newString := Trim(newString)以下是一些使用此正则表达式转换数字字符串的示例:myVariable123 =>我的变量123my2Variables =>我的2个变量3rdVariableIsHere =>第3rdVariable在这里12345NumsAtTheStartIncludedToo => 12345 Nums在开始时也包含
随时随地看视频慕课网APP
我要回答