使用正则表达式在文本中搜索多个单词 (Java)

我有一种方法可以在文本中搜索单词,这两个单词都是由参数插入的。


public Integer findTheWord(String stringToCheck, String regexString) throws IOException {


        int count = 0;

        Pattern regexp = Pattern.compile("\\b" + regexString + "\\b");

        Matcher matcher = regexp.matcher(stringToCheck);


        while (matcher.find()) {

                count++;

                String matchString = matcher.group();

                System.out.println(matchString);

            }

        System.out.println(count);

        return count;

  }

如何插入多个单词并返回每个单词的出现?


慕丝7291255
浏览 228回答 2
2回答

泛舟湖上清波郎朗

因此,第一个也是最简单的选择是使用您的实际findTheWord()方法并创建一个使用它的新方法:public Map<String, Integer> findTheWords(String stringToCheck, List<String> words) {&nbsp; &nbsp; return words.stream().distinct()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .collect(Collectors.toMap(Function.identity(), word -> findTheWord(stringToCheck, word)));}public Integer findTheWord(String stringToCheck, String regexString) {&nbsp; &nbsp; Pattern regexp = Pattern.compile("\\b" + regexString + "\\b");&nbsp; &nbsp; Matcher matcher = regexp.matcher(stringToCheck);&nbsp; &nbsp; int count = 0;&nbsp; &nbsp; while (matcher.find()) {&nbsp; &nbsp; &nbsp; &nbsp; count++;&nbsp; &nbsp; }&nbsp; &nbsp; return count;}这样做的问题是,如果您使用大量单词来查找大文本,因为它会为每个单词遍历给定的字符串。因此,另一种方法是为所有单词创建一个正则表达式,并在生成的映射中递增下一个找到的单词:public Map<String, Integer> findTheWords(String stringToCheck, List<String> words) {&nbsp; &nbsp; Pattern regexp = Pattern.compile(words.stream().distinct().map(word -> "\\b" + word + "\\b").collect(Collectors.joining("|")));&nbsp; &nbsp; // creates a pattern like this: "\ba\b|\bb\b|\bc\b|\bd\b|\be\b"&nbsp; &nbsp; Matcher matcher = regexp.matcher(stringToCheck);&nbsp; &nbsp; Map<String, Integer> result = new HashMap<>();&nbsp; &nbsp; while (matcher.find()) {&nbsp; &nbsp; &nbsp; &nbsp; String word = matcher.group();&nbsp; &nbsp; &nbsp; &nbsp; result.put(word, result.getOrDefault(word, 0) + 1);&nbsp; &nbsp; }&nbsp; &nbsp; return result;}除此之外,您可能正在考虑对Set单词使用 a 而不是 the List,因为值是唯一的,因此无需调用.distinct()流。

森栏

HashMap 作为参数,输入字符串作为键,正则表达式作为值,遍历所有条目,执行你的方法并返回一个 HashMap,匹配的词作为键,出现作为值。&nbsp;public HashMap<String, Integer> findTheWordsAndOccurences(HashMap<String, String> stringsAndRegex) throws IOException {&nbsp; &nbsp; HashMap<String, Integer> result = null;&nbsp; &nbsp; for (Map.Entry<String, String> entry : stringsAndRegex.entrySet()){&nbsp; &nbsp; &nbsp; &nbsp; String stringToCheck = entry.getKey();&nbsp; &nbsp; &nbsp; &nbsp; String regexString = entry.getValue();&nbsp; &nbsp; &nbsp; &nbsp; String matchString = "";&nbsp; &nbsp; &nbsp; &nbsp; int count = 0;&nbsp; &nbsp; &nbsp; &nbsp; Pattern regexp = Pattern.compile("\\b" + regexString + "\\b");&nbsp; &nbsp; &nbsp; &nbsp; Matcher matcher = regexp.matcher(stringToCheck);&nbsp; &nbsp; &nbsp; &nbsp; while (matcher.find()) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; count++;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; matchString = matcher.group();&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; System.out.println(matchString);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; result.put(matchString, count);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; return result;}
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Java