猿问

Go 的正则表达式,匹配包含平衡括号的内容

所以我的用例如下:我正在解析一个 SQL 查询,试图获取函数名称和发送到该函数的各个参数。这要求我的正则表达式能够找到名称、左括号、内容和右括号。不幸的是,在测试时发现它有时过于贪婪,抓住额外的括号而其他时候它错过了结束的括号。


这是我在操场上的测试代码:


func getRegex(name string) string {

    return fmt.Sprintf("\\$__%s\\b(?:\\((.*?\\)?)\\))?", name)

}


func main() {

    var rawSQL = "(select min(time) from table where $__timeFilter(time))"

    rgx, err := regexp.Compile(getRegex("timeFilter"))

    if err != nil {

        fmt.Println(err)

    }

    var match = rgx.FindAllStringSubmatch(rawSQL, -1)


    fmt.Println(match)

}

举个例子https://go.dev/play/p/4FpZblia7Ks


我测试的4个案例如下:


(select min(time) from table where $__timeFilter(time) ) OK

(select min(time) from table where $__timeFilter(time)) NOK

select * from foo where $__timeFilter(cast(sth as timestamp)) OK

select * from foo where $__timeFilter(cast(sth as timestamp) ) NOK

这是一个实时正则表达式版本https://regexr.com/700oh


我来自 javascript 世界,所以从未使用过递归正则表达式,看起来这可能是一种情况?


子衿沉夜
浏览 148回答 2
2回答

九州编程

您的正则表达式似乎有两个主要问题,其中一个比另一个更容易处理:正则表达式天生就不擅长处理递归匹配,例如分组左括号和右括号,因为它们没有内存。就您而言,我认为您已尝试通过将自己限制在一些特定情况下来解决此问题,但正则表达式的贪婪性质在这里对您不利。您不匹配右括号前可能有空格的情况。这两个问题一起导致您的正则表达式在这两种情况下失败,但也导致您的第一个案例匹配。要解决此问题,您必须在将字符串发送到正则表达式之前对字符串进行一些预处理:if strings.HasPrefix(rawSql, "(") {     rawSql = rawSql[1:len(rawSql) - 1] }这将去掉任何外括号,如果没有内存或额外的子句,正则表达式将无法忽略这些括号。接下来,您需要修改正则表达式以处理内部函数调用和$__timeFilter调用之间可能存在空格的情况:func getRegex(name string) string {     return fmt.Sprintf("\\$__%s\\b(\\((.*?\\)?)\\s*\\))?", name) }这样做之后,您的正则表达式应该可以工作了。您可以在此 playground 链接上找到完整示例。

FFIVE

尽管我最终不得不走另一条路,但我还是选择了 Woody 的答案作为正确答案。附加的测试用例不包括某些场景,结果我还必须能够提取括号内的参数。所以这是我的最终解决方案,我手动解析文本,找到边界括号并提取它们之间的任何内容:// getMacroMatches extracts macro strings with their respective arguments from the sql input given// It manually parses the string to find the closing parenthesis of the macro (because regex has no memory)func getMacroMatches(input string, name string) ([][]string, error) {&nbsp; &nbsp; macroName := fmt.Sprintf("\\$__%s\\b", name)&nbsp; &nbsp; matchedMacros := [][]string{}&nbsp; &nbsp; rgx, err := regexp.Compile(macroName)&nbsp; &nbsp; if err != nil {&nbsp; &nbsp; &nbsp; &nbsp; return nil, err&nbsp; &nbsp; }&nbsp; &nbsp; // get all matching macro instances&nbsp; &nbsp; matched := rgx.FindAllStringIndex(input, -1)&nbsp; &nbsp; if matched == nil {&nbsp; &nbsp; &nbsp; &nbsp; return nil, nil&nbsp; &nbsp; }&nbsp; &nbsp; for matchedIndex := 0; matchedIndex < len(matched); matchedIndex++ {&nbsp; &nbsp; &nbsp; &nbsp; var macroEnd = 0&nbsp; &nbsp; &nbsp; &nbsp; var argStart = 0&nbsp; &nbsp; &nbsp; &nbsp; macroStart := matched[matchedIndex][0]&nbsp; &nbsp; &nbsp; &nbsp; inputCopy := input[macroStart:]&nbsp; &nbsp; &nbsp; &nbsp; cache := make([]rune, 0)&nbsp; &nbsp; &nbsp; &nbsp; // find the opening and closing arguments brackets&nbsp; &nbsp; &nbsp; &nbsp; for idx, r := range inputCopy {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if len(cache) == 0 && macroEnd > 0 {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; switch r {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; case '(':&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; cache = append(cache, r)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if argStart == 0 {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; argStart = idx + 1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; case ')':&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; l := len(cache)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if l == 0 {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; cache = cache[:l-1]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; macroEnd = idx + 1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; default:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; continue&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; // macroEnd equals to 0 means there are no parentheses, so just set it&nbsp; &nbsp; &nbsp; &nbsp; // to the end of the regex match&nbsp; &nbsp; &nbsp; &nbsp; if macroEnd == 0 {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; macroEnd = matched[matchedIndex][1] - macroStart&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; macroString := inputCopy[0:macroEnd]&nbsp; &nbsp; &nbsp; &nbsp; macroMatch := []string{macroString}&nbsp; &nbsp; &nbsp; &nbsp; args := ""&nbsp; &nbsp; &nbsp; &nbsp; // if opening parenthesis was found, extract contents as arguments&nbsp; &nbsp; &nbsp; &nbsp; if argStart > 0 {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; args = inputCopy[argStart : macroEnd-1]&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; macroMatch = append(macroMatch, args)&nbsp; &nbsp; &nbsp; &nbsp; matchedMacros = append(matchedMacros, macroMatch)&nbsp; &nbsp; }&nbsp; &nbsp; return matchedMacros, nil}游乐场链接:https://go.dev/play/p/-odWKMBLCBv
随时随地看视频慕课网APP

相关分类

Go
我要回答