猿问

哈希表的性能,为什么C++最慢?

对 hash 进行了一个简单的性能测试,看起来 C++ 版本比 perl 版本和 golang 版本都慢。

  • perl 版本用了大约 200 毫秒,

  • C++ 版本用了 280 毫秒。

  • golang 版本用了 56 毫秒。

在我的带有 Core(TM) i7-2670QM CPU @ 2.20GHz、Ubuntu 14.04.3LTS 的 PC 上,

有任何想法吗?

perl 版本

use Time::HiRes qw( usleep ualarm gettimeofday tv_interval nanosleep

                      clock_gettime clock_getres clock_nanosleep clock

                      stat );

sub getTS {

    my ($seconds, $microseconds) = gettimeofday;

    return $seconds + (0.0+ $microseconds)/1000000.0;

}

my %mymap;

$mymap{"U.S."} = "Washington";

$mymap{"U.K."} = "London";

$mymap{"France"} = "Paris";

$mymap{"Russia"} = "Moscow";

$mymap{"China"} = "Beijing";

$mymap{"Germany"} = "Berlin";

$mymap{"Japan"} = "Tokyo";

$mymap{"China"} = "Beijing";

$mymap{"Italy"} = "Rome";

$mymap{"Spain"} = "Madrad";

$x = "";

$start = getTS();

for ($i=0; $i<1000000; $i++) {

    $x = $mymap{"China"};

}

printf "took %f sec\n", getTS() - $start;

C++版本


#include <iostream>

#include <string>

#include <unordered_map>

#include <sys/time.h>


double getTS() {

    struct timeval tv;

    gettimeofday(&tv, NULL);

    return tv.tv_sec + tv.tv_usec/1000000.0;

}

using namespace std;

int main () {

  std::unordered_map<std::string,std::string> mymap;


  // populating container:

    mymap["U.S."] = "Washington";

    mymap["U.K."] = "London";

    mymap["France"] = "Paris";

    mymap["Russia"] = "Moscow";

    mymap["China"] = "Beijing";

    mymap["Germany"] = "Berlin";

    mymap["Japan"] = "Tokyo";

    mymap["China"] = "Beijing";

    mymap["Italy"] = "Rome";

    mymap["Spain"] = "Madrad";  


  double start = getTS();

  string x;

  for (int i=0; i<1000000; i++) {

      mymap["China"];

  }

  printf("took %f sec\n", getTS() - start);

  return 0;

}


函数式编程
浏览 277回答 3
3回答

达令说

从您的 Perl 代码(在您尝试修复它之前):@mymap = ();$mymap["U.S."] = "Washington";$mymap["U.K."] = "London";这不是地图而是数组。哈希映射的语法是:&nbsp; %mymap;&nbsp; $mymap{"U.S."} = ....因此,您有效地做的是创建一个数组而不是哈希映射并始终访问元素 0。请使用use strict;和use warnings;所有用Perl的时候,甚至有警告简单的语法检查会显示你的问题:perl -cw x.plArgument "U.S." isn't numeric in array element at x.pl line 9.Argument "U.K." isn't numeric in array element at x.pl line 10.除此之外,基准测试的主要部分实际上没有任何用处(分配一个变量并且从不使用它)并且一些编译器可以检测到它并简单地将其优化掉。如果您检查 Perl 程序生成的代码,您会看到:$ perl -MO=Deparse x.pl@mymap = ();$mymap[0] = 'Washington';$mymap[0] = 'London';...for ($i = 0; $i < 1000000; ++$i) {&nbsp; &nbsp; $x = $mymap[0];}也就是说,它在编译时检测到问题,并用对数组索引 0 的访问来替换它。因此,无论何时进行基准测试,您都需要:检查您的程序是否确实执行了它应该执行的操作。检查编译器是否不优化,也不在编译时执行其他语言将在运行时执行的内容。任何没有结果或可预测结果的语句都易于进行这种优化。验证您实际测量的是您打算测量的内容,而不是其他内容。即使对程序进行很小的更改也会影响运行时间,因为需要分配内存,而这些更改可能与您打算测量的内容无关。在您的基准测试中,您一次又一次地测量对同一哈希元素的访问,而没有对其他元素的访问。这是一项可以很好地与处理器缓存配合使用的活动,但可能无法反映现实世界的使用情况。而且,使用简单的计时器并不是一个现实的基准。系统上还有其他进程,有调度程序,有缓存垃圾......而对于今天的 CPU,它高度依赖于系统上的负载,因为也许 CPU 会以比其他基准测试更低的功耗模式运行一个基准测试,即使用不同的 CPU 时钟。例如,同一“基准”的多次运行在我的系统上的测量时间在 100 毫秒到 150 毫秒之间变化。基准是谎言,而像您这样的微观基准则是双重的。

缥缈止盈

我稍微修改了您的示例以获取有关哈希表结构的一些详细信息:#include <iostream>#include <string>#include <unordered_map>#include <sys/time.h>#include <chrono>using namespace std;int main(){&nbsp; &nbsp; std::unordered_map<std::string, std::string> mymap;&nbsp; &nbsp; // populating container:&nbsp; &nbsp; mymap["U.S."] = "Washington";&nbsp; &nbsp; mymap["U.K."] = "London";&nbsp; &nbsp; mymap["France"] = "Paris";&nbsp; &nbsp; mymap["Russia"] = "Moscow";&nbsp; &nbsp; mymap["China"] = "Beijing";&nbsp; &nbsp; mymap["Germany"] = "Berlin";&nbsp; &nbsp; mymap["Japan"] = "Tokyo";&nbsp; &nbsp; mymap["China"] = "Beijing";&nbsp; &nbsp; mymap["Italy"] = "Rome";&nbsp; &nbsp; mymap["Spain"] = "Madrad";&nbsp; &nbsp; std::hash<std::string> h;&nbsp; &nbsp; for ( auto const& i : mymap )&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; printf( "hash(%s) = %ud\n", i.first.c_str(), h( i.first ) );&nbsp; &nbsp; }&nbsp; &nbsp; for ( int i = 0; i != mymap.bucket_count(); ++i )&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; auto const bucketsize = mymap.bucket_size( i );&nbsp; &nbsp; &nbsp; &nbsp; if ( 0 != bucketsize )&nbsp; &nbsp; &nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; printf( "size of bucket %d = %d\n", i, bucketsize );&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; auto const start = std::chrono::steady_clock::now();&nbsp; &nbsp; string const x = "China";&nbsp; &nbsp; std::string res;&nbsp; &nbsp; for ( int i = 0; i < 1000000; i++ )&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; mymap.find( x );&nbsp; &nbsp; }&nbsp; &nbsp; auto const elapsed = std::chrono::steady_clock::now() - start;&nbsp; &nbsp; printf( "%s\n", res );&nbsp; &nbsp; printf( "took %d ms\n",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; std::chrono::duration_cast<std::chrono::milliseconds>( elapsed ).count() );&nbsp; &nbsp; return 0;}在我的系统上运行它,我得到了 ~68ms 的运行时间,输出如下:hash(Japan) = 3611029618dhash(Spain) = 749986602dhash(China) = 3154384700dhash(U.S.) = 2546447179dhash(Italy) = 2246786301dhash(Germany) = 2319993784dhash(U.K.) = 2699630607dhash(France) = 3266727934dhash(Russia) = 3992029278dsize of bucket 0 = 0size of bucket 1 = 0size of bucket 2 = 1size of bucket 3 = 1size of bucket 4 = 1size of bucket 5 = 0size of bucket 6 = 1size of bucket 7 = 0size of bucket 8 = 0size of bucket 9 = 2size of bucket 10 = 3事实证明,哈希表没有得到很好的优化,并且包含一些冲突。进一步打印bucket中的元素显示西班牙和中国在bucket 9。bucket可能是一个链表,节点分布在内存中,解释了性能下降。如果您选择另一个哈希表大小,这样就不会发生冲突,您可以获得加速。我通过添加对其进行了测试,mymap.rehash(1001)并在 44-52 毫秒之间将速度提高了 20-30%。现在,另一点是计算“中国”的哈希值。该函数在每次迭代中执行。当我们切换到常量纯 C 字符串时,我们可以消除这种情况:#include <iostream>#include <string>#include <unordered_map>#include <sys/time.h>#include <chrono>static auto constexpr us = "U.S.";static auto constexpr uk = "U.K.";static auto constexpr fr = "France";static auto constexpr ru = "Russia";static auto constexpr cn = "China";static auto constexpr ge = "Germany";static auto constexpr jp = "Japan";static auto constexpr it = "Italy";static auto constexpr sp = "Spain";using namespace std;int main(){&nbsp; &nbsp; std::unordered_map<const char*, std::string> mymap;&nbsp; &nbsp; // populating container:&nbsp; &nbsp; mymap[us] = "Washington";&nbsp; &nbsp; mymap[uk] = "London";&nbsp; &nbsp; mymap[fr] = "Paris";&nbsp; &nbsp; mymap[ru] = "Moscow";&nbsp; &nbsp; mymap[cn] = "Beijing";&nbsp; &nbsp; mymap[ge] = "Berlin";&nbsp; &nbsp; mymap[jp] = "Tokyo";&nbsp; &nbsp; mymap[it] = "Rome";&nbsp; &nbsp; mymap[sp] = "Madrad";&nbsp; &nbsp; string const x = "China";&nbsp; &nbsp; char const* res = nullptr;&nbsp; &nbsp; auto const start = std::chrono::steady_clock::now();&nbsp; &nbsp; for ( int i = 0; i < 1000000; i++ )&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; res = mymap[cn].c_str();&nbsp; &nbsp; }&nbsp; &nbsp; auto const elapsed = std::chrono::steady_clock::now() - start;&nbsp; &nbsp; printf( "%s\n", res );&nbsp; &nbsp; printf( "took %d ms\n",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; std::chrono::duration_cast<std::chrono::milliseconds>( elapsed ).count() );&nbsp; &nbsp; return 0;}在我的机器上,这将运行时间减少了 50% 到大约 20 毫秒。不同之处在于,它不是从字符串内容计算散列值,而是将地址转换为散列值,这要快得多,因为它只是将地址值作为 size_t 返回。我们也不需要重新散列,因为与cn.

潇湘沐

这只是表明对于这个特定用例,Go 哈希映射实现优化得非常好。mymap["China"]调用专门针对字符串键优化的mapaccess1_faststr。特别是对于小型单桶映射,甚至不为短(小于 32 字节)字符串计算哈希码。
随时随地看视频慕课网APP

相关分类

Go
我要回答