如何查找数组列表中出现频率最高的 20 个单词

我有一个任务,给我一个带有文本的文件。该文本是一本书的一部分。我的任务是将该文件传输到我所做的数组列表、哈希图(其中之一)中。该工作的第二部分是从该文件中查找 20 个最常见的单词,并按降序对它们进行排序。


到目前为止,我将文件中的所有这些单词插入到 hashmap 和 arraylist (下面提供了代码),我在单独的方法中完成了这两件事。hashmap 方法仅返回数字,而 arraylist 只返回最常见的单词以及重复次数。


所以代码的第一部分将是哈希图


public void findWords() throws Exception {

    // ovde traxim 20 reci koje se najcesce ponavljaju u tekstu

    String line;

    Integer counter = 0;

    FileReader fr = new FileReader("src/Fajl/blab");

    BufferedReader br = new BufferedReader(fr);


    while ((line = br.readLine()) != null) {

        String string[] = line.toLowerCase().split("([,.\\s]+)");

        for (String s : string) {

            if (hashmap.containsKey(s)) {

                counter++;

            } else

                counter = 1;

            hashmap.put(s, counter);

        }

    }

接下来的部分是按值排序,并显示前 20 个单词的重复次数,从多到少


Collection<Integer> values = mapaKnjiga.values();

    ArrayList<Integer> list = new ArrayList<Integer>(values);

    Collections.sort(list, Collections.reverseOrder());

    for (int i = 0; i < 20; i++)

        System.out.println(list.get(i));

}


青春有我
浏览 213回答 5
5回答

紫衣仙女

将单词视为哈希图,其中单词为键,计数为值。LinkedHashMap<String, Integer> reverseSortedMap = new LinkedHashMap<>();words.entrySet()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .stream()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .forEachOrdered(x -> reverseSortedMap.put(x.getKey(), x.getValue()));List<String> finalList = reverseSortedMap.entrySet()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .stream()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .map(entry -> entry.getKey())&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .limit(20)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .collect(Collectors.toList());

猛跑小猪

您可以创建一个类 TextCounter 并将其添加到基于地图收集的数据的列表中class TextCounter{&nbsp; String text;&nbsp; int count;}现在按他的计数值排序

慕姐8265434

假设您想要列出前 20 个单词及其在地图中的频率,从文件中读取单词,java-8 解决方案将是LinkedHashMap<String, Long> top20WordsByFrequency = null;&nbsp; &nbsp; try {&nbsp; &nbsp; &nbsp; &nbsp; // Convert a file into stream of lines&nbsp; &nbsp; &nbsp; &nbsp; top20WordsByFrequency = Files.lines(Paths.get("src/Fajl/blab"))&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // convert lines into words&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .flatMap(line -> Arrays.stream(line.toLowerCase().split("([,.\\\\s]+)")))&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // make a map by grouping by key as word and value as the count of the word&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .collect(Collectors.groupingBy(Function.identity(), Collectors.counting())).entrySet().stream()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // sort the map based on values (frequency) in reverse order and limit the map&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // to 20&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .sorted(Entry.comparingByValue(Comparator.reverseOrder())).limit(20)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // after limiting sort based on keys in descending order&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .sorted(Map.Entry.<String, Long>comparingByKey().reversed())&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // preserve the order in a LinkedHashMap&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .collect(Collectors.toMap(Entry::getKey, Entry::getValue, (u, v) -> u, LinkedHashMap::new));&nbsp; &nbsp; } catch (IOException e) {&nbsp; &nbsp; &nbsp; &nbsp; e.printStackTrace();&nbsp; &nbsp; }&nbsp; &nbsp; System.out.println(top20WordsByFrequency);

蝴蝶不菲

使用流 API 怎么样:String[] words = {"one", "two", "two", "three", "three", "three"};Map<String, Long> result =&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; Arrays.stream(words)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .collect(Collectors.groupingBy(Function.identity(),&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Collectors.counting()));第二部分:List<Long> collect = result.entrySet().stream()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .sorted(Map.Entry.comparingByValue(Comparator.reverseOrder()))&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .limit(20)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .map(Map.Entry::getValue)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .collect(Collectors.toList());System.out.println(collect);

喵喵时光机

假设您的 findWords() 函数工作正常,并且您拥有所有单词及其计数,您可以执行以下操作:因为您必须打印特定单词的计数。因此,您可以首先定义一个具有属性内容和计数的 Word 类,并定义一个默认比较器。如下所示:class Item implements Comparable<Item>{&nbsp; &nbsp;String word;&nbsp; &nbsp;int count;public Item(String word, int count){&nbsp; &nbsp; this.count = count;&nbsp; &nbsp; this.word = word;}public int compareTo(Item word){&nbsp; &nbsp; //sorting in descending order&nbsp; &nbsp; return word.count - this.count;}public String toString(){&nbsp; &nbsp; return "Word: " + word +"Count: " + count;}}定义一个项目 ArrayList 来保存 Item 对象:ArrayList<Item> al = new ArrayList<>();您可以迭代整个哈希图,然后将每对插入为:Item item = new Item(word, count);al.add(item);最后,您可以对列表进行排序,然后选择前 20 个单词:Collections.sort(al);
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Java