猿问

Linux内核中的可能/不可能的宏是如何工作的,它们的好处是什么?

我一直在深入研究Linux内核的某些部分,并发现了这样的调用:

if (unlikely(fd < 0)){
    /* Do something */}

if (likely(!err)){
    /* Do something */}

我找到了他们的定义:

#define likely(x)       __builtin_expect((x),1)#define unlikely(x)     __builtin_expect((x),0)

我知道它们是用于优化的,但是它们是如何工作的呢?使用它们可以减少多少性能/大小?至少在瓶颈代码中(当然,在用户空间中),这是否值得麻烦(并且可能会失去可移植性)。


莫回无
浏览 610回答 3
3回答

呼唤远方

它们是编译器发出指令的提示,这些指令将导致分支预测偏向跳转指令的“可能”一侧。这可能是一个巨大的胜利,如果预测是正确的,这意味着跳转指令基本上是免费的,将采取零周期。另一方面,如果预测是错误的,那么它意味着处理器流水线需要冲洗,它可能需要花费几个周期。只要预测大多数时候都是正确的,这将有利于性能。像所有这样的性能优化一样,您只应该在进行了广泛的分析之后才会这样做,以确保代码确实处于瓶颈状态,并且可能考虑到它的微观特性,它是在一个紧密的循环中运行的。一般来说,Linux开发人员都很有经验,所以我可以想象他们会这么做。他们并不太关心可移植性,因为他们只关注GCC,而且他们对他们希望它产生的组装有一个非常密切的概念。

GCT1015

让我们来看看GCC 4.8对它做了什么无__builtin_expect#include "stdio.h"#include "time.h"int main() {&nbsp; &nbsp; /* Use time to prevent it from being optimized away. */&nbsp; &nbsp; int i = !time(NULL);&nbsp; &nbsp; if (i)&nbsp; &nbsp; &nbsp; &nbsp; printf("%d\n", i);&nbsp; &nbsp; puts("a");&nbsp; &nbsp; return 0;}用GCC 4.8.2x86_64 Linux编译和反编译:gcc -c -O3 -std=gnu11 main.cobjdump -dr main.o产出:0000000000000000 <main>:&nbsp; &nbsp;0:&nbsp; &nbsp; &nbsp; &nbsp;48 83 ec 08&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;sub&nbsp; &nbsp; $0x8,%rsp&nbsp; &nbsp;4:&nbsp; &nbsp; &nbsp; &nbsp;31 ff&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;xor&nbsp; &nbsp; %edi,%edi&nbsp; &nbsp;6:&nbsp; &nbsp; &nbsp; &nbsp;e8 00 00 00 00&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; callq&nbsp; b <main+0xb>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 7: R_X86_64_PC32&nbsp; &nbsp; &nbsp; &nbsp; time-0x4&nbsp; &nbsp;b:&nbsp; &nbsp; &nbsp; &nbsp;48 85 c0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; test&nbsp; &nbsp;%rax,%rax&nbsp; &nbsp;e:&nbsp; &nbsp; &nbsp; &nbsp;75 14&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;jne&nbsp; &nbsp; 24 <main+0x24>&nbsp; 10:&nbsp; &nbsp; &nbsp; &nbsp;ba 01 00 00 00&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; mov&nbsp; &nbsp; $0x1,%edx&nbsp; 15:&nbsp; &nbsp; &nbsp; &nbsp;be 00 00 00 00&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; mov&nbsp; &nbsp; $0x0,%esi&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 16: R_X86_64_32 .rodata.str1.1&nbsp; 1a:&nbsp; &nbsp; &nbsp; &nbsp;bf 01 00 00 00&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; mov&nbsp; &nbsp; $0x1,%edi&nbsp; 1f:&nbsp; &nbsp; &nbsp; &nbsp;e8 00 00 00 00&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; callq&nbsp; 24 <main+0x24>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 20: R_X86_64_PC32&nbsp; &nbsp; &nbsp; &nbsp;__printf_chk-0x4&nbsp; 24:&nbsp; &nbsp; &nbsp; &nbsp;bf 00 00 00 00&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; mov&nbsp; &nbsp; $0x0,%edi&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 25: R_X86_64_32 .rodata.str1.1+0x4&nbsp; 29:&nbsp; &nbsp; &nbsp; &nbsp;e8 00 00 00 00&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; callq&nbsp; 2e <main+0x2e>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 2a: R_X86_64_PC32&nbsp; &nbsp; &nbsp; &nbsp;puts-0x4&nbsp; 2e:&nbsp; &nbsp; &nbsp; &nbsp;31 c0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;xor&nbsp; &nbsp; %eax,%eax&nbsp; 30:&nbsp; &nbsp; &nbsp; &nbsp;48 83 c4 08&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;add&nbsp; &nbsp; $0x8,%rsp&nbsp; 34:&nbsp; &nbsp; &nbsp; &nbsp;c3&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; retq内存中的指令顺序保持不变:首先,printf然后puts而retq回去吧。带着__builtin_expect现在替换if (i)有:if (__builtin_expect(i, 0))我们得到:0000000000000000 <main>:&nbsp; &nbsp;0:&nbsp; &nbsp; &nbsp; &nbsp;48 83 ec 08&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;sub&nbsp; &nbsp; $0x8,%rsp&nbsp; &nbsp;4:&nbsp; &nbsp; &nbsp; &nbsp;31 ff&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;xor&nbsp; &nbsp; %edi,%edi&nbsp; &nbsp;6:&nbsp; &nbsp; &nbsp; &nbsp;e8 00 00 00 00&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; callq&nbsp; b <main+0xb>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 7: R_X86_64_PC32&nbsp; &nbsp; &nbsp; &nbsp; time-0x4&nbsp; &nbsp;b:&nbsp; &nbsp; &nbsp; &nbsp;48 85 c0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; test&nbsp; &nbsp;%rax,%rax&nbsp; &nbsp;e:&nbsp; &nbsp; &nbsp; &nbsp;74 11&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;je&nbsp; &nbsp; &nbsp;21 <main+0x21>&nbsp; 10:&nbsp; &nbsp; &nbsp; &nbsp;bf 00 00 00 00&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; mov&nbsp; &nbsp; $0x0,%edi&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 11: R_X86_64_32 .rodata.str1.1+0x4&nbsp; 15:&nbsp; &nbsp; &nbsp; &nbsp;e8 00 00 00 00&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; callq&nbsp; 1a <main+0x1a>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 16: R_X86_64_PC32&nbsp; &nbsp; &nbsp; &nbsp;puts-0x4&nbsp; 1a:&nbsp; &nbsp; &nbsp; &nbsp;31 c0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;xor&nbsp; &nbsp; %eax,%eax&nbsp; 1c:&nbsp; &nbsp; &nbsp; &nbsp;48 83 c4 08&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;add&nbsp; &nbsp; $0x8,%rsp&nbsp; 20:&nbsp; &nbsp; &nbsp; &nbsp;c3&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; retq&nbsp; 21:&nbsp; &nbsp; &nbsp; &nbsp;ba 01 00 00 00&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; mov&nbsp; &nbsp; $0x1,%edx&nbsp; 26:&nbsp; &nbsp; &nbsp; &nbsp;be 00 00 00 00&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; mov&nbsp; &nbsp; $0x0,%esi&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 27: R_X86_64_32 .rodata.str1.1&nbsp; 2b:&nbsp; &nbsp; &nbsp; &nbsp;bf 01 00 00 00&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; mov&nbsp; &nbsp; $0x1,%edi&nbsp; 30:&nbsp; &nbsp; &nbsp; &nbsp;e8 00 00 00 00&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; callq&nbsp; 35 <main+0x35>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 31: R_X86_64_PC32&nbsp; &nbsp; &nbsp; &nbsp;__printf_chk-0x4&nbsp; 35:&nbsp; &nbsp; &nbsp; &nbsp;eb d9&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;jmp&nbsp; &nbsp; 10 <main+0x10>这个printf(汇编成__printf_chk)被移到函数的末尾,之后puts以及其他答案中提到的改进分支预测的回报。所以基本上是一样的:int i = !time(NULL);if (i)&nbsp; &nbsp; goto printf;puts:puts("a");return 0;printf:printf("%d\n", i);goto puts;这个优化没有用-O0.但是,在编写一个运行速度更快的示例时,祝您好运。__builtin_expect比没有,那些时候CPU真的很聪明..我天真的尝试在这里.
随时随地看视频慕课网APP
我要回答