正则表达式用两个点分隔的部分提取主题标签

我正在尝试创建一个正则表达式以便从字符串中提取一些文本。我想从网址或普通短信中提取文本,例如:

endpoint/?userId=#someuser.id

要么

Hi #someuser.name, how are you?

我想#someuser.name从消息和#someuser.idurl 中准确提取。可能有很多这样的字符串要从 url 和消息中提取。

我的正则表达式目前看起来像这样:

(#[^\.]+?\.)([^\W]\w+\b)

它工作正常,除了一对一的情况,我不知道该怎么做 - 例如:

这些字符串不应匹配:# .id#.id#和之间必须至少有一个字符.。不应匹配这些字符之间的一个或多个空格。

我怎样才能使用我当前的正则表达式来做到这一点?


噜噜哒
浏览 182回答 4
4回答

Cats萌萌

你可以使用String regex = "#[^.#]*[^.#\\s][^#.]*\\.\\w+";请参阅正则表达式演示及其图表:细节#- 一个#符号[^.#]*.- 除and之外的零个或多个字符#[^.#\\s]- 任何字符,但.,#和空格[^#.]*.- - 除and之外的零个或多个字符#\.- 一个点\w+- 1+ 个单词字符(字母、数字或_)。Java演示:String s = "# #.id\nendpoint/?userId=#someuser.id\nHi #someuser.name, how are you?";String regex = "#[^.#]*[^.#\\s][^#.]*\\.\\w+";Pattern pattern = Pattern.compile(regex);Matcher matcher = pattern.matcher(s);while (matcher.find()){    System.out.println(matcher.group(0)); } 输出:#someuser.id#someuser.name

白衣非少年

重新定义的要求是:找花样#A.BA可以是任何东西,除了空格,也不能包含#或.B只能是常规的 ASCII 字母或数字将这些要求转换为(可能的)正则表达式:#[^.#]+((?<!#\\s+)\\.)[A-Za-z0-9]+解释:#[^.#]+((?<!#\\s+)\\.)[A-Za-z0-9]+&nbsp; # The entire capture for the Java-Matcher:#&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;#&nbsp; A literal '#' character&nbsp;[^.#]+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;#&nbsp; Followed by 1 or more characters which are NOT '.' nor '#'&nbsp; &nbsp; &nbsp; &nbsp;(&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \\.)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; #&nbsp; Followed by a '.' character&nbsp; &nbsp; &nbsp; &nbsp; (?<!&nbsp; &nbsp; &nbsp;)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; #&nbsp; Which is NOT preceded by (negative lookbehind):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; #&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;#&nbsp; &nbsp;A literal '#'&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;\\s+&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;#&nbsp; &nbsp;With 1 or more whitespaces&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; [A-Za-z0-9]+&nbsp; #&nbsp; Followed by 1 or more alphanumeric characters&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; #&nbsp; (PS: \\w+ could be used here if '_' is allowed as well)测试代码:String input = "endpoint/?userId=#someuser.id Hi #someuser.name, how are you? # .id #.id %^*#@*(.H(@EH Ok, # some spaces here .but none here #$p€©ï@l.$p€©ï@l that should do it..";System.out.println("Input: \""+ input + '"');System.out.println("Outputs: ");java.util.regex.Matcher matcher = java.util.regex.Pattern.compile("#[^.#]+((?<!#\\s+)\\.)[A-Za-z0-9]+")&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;.matcher(input);while(matcher.find())&nbsp; System.out.println('"'+matcher.group()+'"');在线尝试。哪些输出:Input: "endpoint/?userId=#someuser.id Hi #someuser.name, how are you? # .id #.id %^*#@*(.H(@EH Ok, # some spaces here .but none here #$p€©ï@l.$p€©ï@l that should do it.."Outputs:&nbsp;"#someuser.id""#someuser.name""#@*(.H""# some spaces here .but"

慕侠2389804

您可以尝试以下正则表达式:#(\w+)\.(\w+)演示笔记:如果您不想捕获任何组,请删除括号。在你的java正则表达式字符串中你需要转义每一个\这给#(\\w+)\\.(\\w+)如果id仅由数字组成,则可以通过以下方式更改第二\w个[0-9]如果username包含除字母表、数字和下划线以外的其他字符,则必须更改\w为具有明确定义的所有授权字符的字符类。代码示例:String input = "endpoint/?userId=#someuser.id Hi #someuser.name, how are you? # .id, #.id.";Matcher m = Pattern.compile("#(\\w+)\\.(\\w+)").matcher(input);while (m.find()) {&nbsp; &nbsp; System.out.println(m.group());}输出:#someuser.id#someuser.name

至尊宝的传说

#(\w+)[.](\w+)结果两组,例如endpoint/?userId=#someuser.id&nbsp;->&nbsp;group[0]=someuser&nbsp;and&nbsp;group[1]=id
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Java