目前,我正在使用此正则表达式来检测日语主题标签和英语主题标签。
\B([##][·・ー_0-90-9a-zA-Za-zA-Zぁ-んァ-ン一-龠]{1,24})(?=\W|$)
规则是:
Hashtag must be started with # character.
Hashtag will be detected by space character and other special characters (!,@,&,*,%,$).
Example 1: Hello#guys. This is a #test. -> Valid hashtag: #test.
Example 2: Hello#guys. This is a #test!#message. -> Valid hashtag: #test and #message
Example 3: Hello#guys. This is a #test #message. -> Valid hashtag: #test and #message
Example 4: Hello#guys. This is a #test#message. -> Valid hashtag: #test
Example 5: #asdasdasdasdasdasdasdasdasd -> Valid hashtag: none
Example 6: # -> Valid hashtag: none
到目前为止它一直有效,直到我遇到这两个带有日语字符的特定情况:(
#Japan#asd => 有效主题标签应为#Japan
# Japanese Japanese Japanese Japanese Japanese Japanese Japanese Japanese Japanese => 不是有效的主题标签
上面的正则表达式无法检测到这两种情况,我尝试了很多方法但到目前为止没有找到任何解决方案。
目前,我正在使用此网站进行测试: https: //regexr.com/
请帮忙,提前致谢。
感谢@Ryszard Czech
最终的解决方案是,这将像 Twitter 主题标签一样工作:
/(?<![\p{L}0-9ー_])([##][一_0-90-9a-zA-Za-zA-Zァ-ン゙゚一-龠ぁ-ゔァ-ヴ]{1,24})(?![\p{L}0-9ー_])/gu
测试: https: //regex101.com/r/Goaqqs/1
慕桂英4014372
HUH函数
相关分类