计算具有重复项的数组列表中每个不同的数组出现

计算具有重复项的数组列表中每个不同的数组出现

问题

我有一个数组列表，我想计算重复的出现次数。

例如，如果我有这个：

{{1,2,3},

{1,0,3},

{1,2,3},

{5,2,6},

{5,2,6},

{5,2,6}}

我想要一张像这样的地图（或任何相关的集合）：

{ {1,2,3} -> 2,

{1,0,3} -> 1,

{5,2,6} -> 3 }

我什至可以丢失数组值，我只对基数感兴趣（例如这里的 2、1 和 3）。

我的解决方案

我使用以下算法：

首先对数组进行散列，并检查每个散列是否在 an 中HashMap<Integer, ArrayList<int[]>>，我们将其命名为distinctHash，其中键是散列，值是一个 ArrayList，让我们将其命名为rowList，其中包含此散列的不同数组（以避免冲突）。
如果散列不在distinctHash 中，请将其与值 1 放在另一个HashMap<int[], Long>计算每次出现的次数中，我们称其为distinctElements。
然后，如果哈希在distinctHash 中，则检查相应的数组是否包含在rowList 中。如果是，则增加与在rowList 中找到的相同数组关联的distinctElements中的值。（如果您使用新数组作为键，您将创建另一个键，因为它们的引用不同）。

这是代码，返回的布尔值告诉是否找到了新的不同数组，我在所有数组上依次应用此函数：

HashMap<int[], Long> distinctElements;

HashMap<Integer, ArrayList<int[]>> distinctHash;

private boolean addRow(int[] row) {

if (distinctHash.containsKey(hash)) {

int[] indexRow = distinctHash.get(hash).get(0);

for (int[] previousRow: distinctHash.get(hash)) {

if (Arrays.equals(previousRow, row)) {

distinctElements.put(

indexRow,

distinctElements.get(indexRow) + 1

);

return false;

}

}

distinctElements.put(row, 1L);

ArrayList<int[]> rowList = distinctHash.get(hash);

rowList.add(row);

distinctHash.put(hash, rowList);

return true;

} else {

distinctElements.put(row, 1L);

ArrayList<int[]> newValue = new ArrayList<>();

newValue.add(row);

distinctHash.put(hash, newValue);

return true;

}

}

题

问题是我的算法对于我的需求来说太慢了（5,000,000 个数组需要 40 秒，20,000,000 个数组需要 2h-3h）。使用 NetBeans 分析告诉我散列占用了 70% 的运行时间（使用 Google Guava murmur3_128 散列函数）。

还有另一种算法可以更快吗？正如我所说，我对数组值不感兴趣，只对它们出现的次数感兴趣。我准备为了速度牺牲精度，所以概率算法很好。

精慕HU

浏览 170回答 3

3回答

拉丁的传说

将包装int[]在实现equalsand的类中hashCode，然后Map将包装类构建为实例计数。class IntArray {    private int[] array;    public IntArray(int[] array) {        this.array = array;    }    @Override    public int hashCode() {        return Arrays.hashCode(this.array);    }    @Override    public boolean equals(Object obj) {        return (obj instanceof IntArray && Arrays.equals(this.array, ((IntArray) obj).array));    }    @Override    public String toString() {        return Arrays.toString(this.array);    }}测试int[][] input = {{1,2,3},                 {1,0,3},                 {1,2,3},                 {5,2,6},                 {5,2,6},                 {5,2,6}};Map<IntArray, Long> map = Arrays.stream(input).map(IntArray::new)        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));map.entrySet().forEach(System.out::println);输出[1, 2, 3]=2[1, 0, 3]=1[5, 2, 6]=3注意：上述解决方案比Ravindra Ranwala 的解决方案更快并且使用更少的内存，但它确实需要创建一个额外的类，因此哪个更好是有争议的。对于较小的阵列，请使用下面由 Ravindra Ranwala 提供的更简单的解决方案。对于较大的阵列，上述解决方案可能更好。 Map<List<Integer>, Long> map = Stream.of(input)         .map(a -> Arrays.stream(a).boxed().collect(Collectors.toList()))         .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

0

0

哔哔one

你可以这样做，Map<List<Integer>, Long> result = Stream.of(source)        .map(a -> Arrays.stream(a).boxed().collect(Collectors.toList()))        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));这是输出，{[1, 2, 3]=2, [1, 0, 3]=1, [5, 2, 6]=3}

0

0

翻翻过去那场雪

如果该数组的所有重复的元素序列彼此相似并且每个数组的长度不多，则可以将每个数组映射到一个int数字并使用方法的最后一部分。虽然这种方法减少了散列时间，但这里有一些假设可能不适用于您的情况。

0

0

随时随地看视频慕课网APP

相关分类

Java