猿问

Swift Beta性能:对数组进行排序

我在Swift Beta中实现一种算法,发现性能非常差。深入研究后,我意识到瓶颈之一就是对数组进行排序一样简单。相关部分在这里:


let n = 1000000

var x =  [Int](repeating: 0, count: n)

for i in 0..<n {

    x[i] = random()

}

// start clock here

let y = sort(x)

// stop clock here

在C ++中,类似的操作在我的计算机上花费0.06s。


在Python中,它花费0.6秒(绝招,仅y =整数列表的sorted(x))。


在Swift中,如果使用以下命令进行编译,则需要6s:


xcrun swift -O3 -sdk `xcrun --show-sdk-path --sdk macosx`

如果使用以下命令进行编译,则最多需要88s:


xcrun swift -O0 -sdk `xcrun --show-sdk-path --sdk macosx`

Xcode中具有“发布”与“调试”构建的时序是相似的。


怎么了 与C ++相比,我可以理解一些性能损失,但与纯Python相比,却不能降低10倍。


编辑:天气注意到更改-O3为-Ofast使此代码运行几乎与C ++版本一样快!但是,-Ofast它极大地改变了语言的语义-在我的测试中,它禁用了对整数溢出和数组索引溢出的检查。例如,使用-Ofast以下Swift代码,无提示地运行而不会崩溃(并打印出一些垃圾):


let n = 10000000

print(n*n*n*n*n)

let x =  [Int](repeating: 10, count: n)

print(x[n])

因此,-Ofast这不是我们想要的。Swift的全部要点是我们拥有安全网。当然,安全网会对性能产生一些影响,但是它们不应使程序慢100倍。请记住,Java已经检查了数组的边界,在典型情况下,速度下降的幅度远小于2。在Clang和GCC中,我们必须-ftrapv检查(有符号的)整数溢出,但也没有那么慢。


因此产生了一个问题:如何在Swift中获得合理的性能而又不损失安全网?


编辑2:我进行了一些基准测试,其中的循环非常简单


for i in 0..<n {

    x[i] = x[i] ^ 12345678

}

(这里有xor操作,以便我可以更容易地在汇编代码中找到相关的循环。我试图选择一个易于发现但又“无害”的操作,因为它不需要任何相关的检查。到整数溢出。)


同样,-O3和之间的性能存在巨大差异-Ofast。所以我看了一下汇编代码:


有了-Ofast我,我得到了我所期望的。相关部分是一个包含5条机器语言指令的循环。


随着-O3我得到的东西,出乎我的想象。内部循环跨越88行汇编代码。我没有尝试理解所有内容,但是最可疑的部分是“ callq _swift_retain”的13个调用和“ callq _swift_release”的另外13个调用。也就是说,内部循环中有26个子例程调用!


编辑3:在评论中,费鲁奇奥要求提供不违反内置函数(例如排序)的公平基准。我认为以下程序是一个很好的例子:


let n = 10000

var x = [Int](repeating: 1, count: n)

for i in 0..<n {

    for j in 0..<n {

        x[i] = x[j]

    }

}

没有算术运算,因此我们不必担心整数溢出。我们唯一要做的就是大量数组引用。结果就在这里-与-Ofast相比,Swift -O3损失了将近500倍:


C ++ -O3:0.05 s

C ++ -O0:0.4秒

Java:0.2秒

带有PyPy的Python:0.5秒

Python:12秒

迅捷-Ofast:0.05 s

Swift -O3:23秒

迅捷-O0:443秒

(如果担心编译器可能会完全优化无意义的循环,则可以将其更改为eg x[i] ^= x[j],并添加一个输出输出的print语句x[0]。这不会改变任何内容;时序将非常相似。)


是的,这里的Python实现是一个愚蠢的纯Python实现,带有一个整数列表和嵌套的for循环。这应该是很多比未优化雨燕慢。Swift和数组索引似乎严重破坏了某些东西。


喵喔喔
浏览 800回答 3
3回答

江户川乱折腾

使用默认的发行优化级别[-O],在此基准测试中,Swift 1.0现在与C一样快。这是Swift Beta中的就地快速排序:func quicksort_swift(inout a:CInt[], start:Int, end:Int) {&nbsp; &nbsp; if (end - start < 2){&nbsp; &nbsp; &nbsp; &nbsp; return&nbsp; &nbsp; }&nbsp; &nbsp; var p = a[start + (end - start)/2]&nbsp; &nbsp; var l = start&nbsp; &nbsp; var r = end - 1&nbsp; &nbsp; while (l <= r){&nbsp; &nbsp; &nbsp; &nbsp; if (a[l] < p){&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; l += 1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; continue&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; if (a[r] > p){&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; r -= 1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; continue&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; var t = a[l]&nbsp; &nbsp; &nbsp; &nbsp; a[l] = a[r]&nbsp; &nbsp; &nbsp; &nbsp; a[r] = t&nbsp; &nbsp; &nbsp; &nbsp; l += 1&nbsp; &nbsp; &nbsp; &nbsp; r -= 1&nbsp; &nbsp; }&nbsp; &nbsp; quicksort_swift(&a, start, r + 1)&nbsp; &nbsp; quicksort_swift(&a, r + 1, end)}和在C中相同:void quicksort_c(int *a, int n) {&nbsp; &nbsp; if (n < 2)&nbsp; &nbsp; &nbsp; &nbsp; return;&nbsp; &nbsp; int p = a[n / 2];&nbsp; &nbsp; int *l = a;&nbsp; &nbsp; int *r = a + n - 1;&nbsp; &nbsp; while (l <= r) {&nbsp; &nbsp; &nbsp; &nbsp; if (*l < p) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; l++;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; continue;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; if (*r > p) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; r--;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; continue;&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; int t = *l;&nbsp; &nbsp; &nbsp; &nbsp; *l++ = *r;&nbsp; &nbsp; &nbsp; &nbsp; *r-- = t;&nbsp; &nbsp; }&nbsp; &nbsp; quicksort_c(a, r - a + 1);&nbsp; &nbsp; quicksort_c(l, a + n - l);}两种工作:var a_swift:CInt[] = [0,5,2,8,1234,-1,2]var a_c:CInt[] = [0,5,2,8,1234,-1,2]quicksort_swift(&a_swift, 0, a_swift.count)quicksort_c(&a_c, CInt(a_c.count))// [-1, 0, 2, 2, 5, 8, 1234]// [-1, 0, 2, 2, 5, 8, 1234]两者都在与编写的相同的程序中调用。var x_swift = CInt[](count: n, repeatedValue: 0)var x_c = CInt[](count: n, repeatedValue: 0)for var i = 0; i < n; ++i {&nbsp; &nbsp; x_swift[i] = CInt(random())&nbsp; &nbsp; x_c[i] = CInt(random())}let swift_start:UInt64 = mach_absolute_time();quicksort_swift(&x_swift, 0, x_swift.count)let swift_stop:UInt64 = mach_absolute_time();let c_start:UInt64 = mach_absolute_time();quicksort_c(&x_c, CInt(x_c.count))let c_stop:UInt64 = mach_absolute_time();这会将绝对时间转换为秒:static const uint64_t NANOS_PER_USEC = 1000ULL;static const uint64_t NANOS_PER_MSEC = 1000ULL * NANOS_PER_USEC;static const uint64_t NANOS_PER_SEC = 1000ULL * NANOS_PER_MSEC;mach_timebase_info_data_t timebase_info;uint64_t abs_to_nanos(uint64_t abs) {&nbsp; &nbsp; if ( timebase_info.denom == 0 ) {&nbsp; &nbsp; &nbsp; &nbsp; (void)mach_timebase_info(&timebase_info);&nbsp; &nbsp; }&nbsp; &nbsp; return abs * timebase_info.numer&nbsp; / timebase_info.denom;}double abs_to_seconds(uint64_t abs) {&nbsp; &nbsp; return abs_to_nanos(abs) / (double)NANOS_PER_SEC;}以下是编译器优化级别的摘要:[-Onone] no optimizations, the default for debug.[-O]&nbsp; &nbsp; &nbsp;perform optimizations, the default for release.[-Ofast] perform optimizations and disable runtime overflow checks and runtime type checks.时间与秒[-Onone]为N = 10_000:Swift:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.895296452C:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.001223848这是Swift的内置sort(),其n = 10_000:Swift_builtin:&nbsp; &nbsp; 0.77865783这是[-O],n = 10_000:Swift:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.045478346C:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.000784666Swift_builtin:&nbsp; &nbsp; 0.032513488如您所见,Swift的性能提高了20倍。根据mweathers的回答,设置[-Ofast]会带来真正的不同,导致这些时间为n = 10_000:Swift:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.000706745C:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.000742374Swift_builtin:&nbsp; &nbsp; 0.000603576对于n = 1_000_000:Swift:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.107111846C:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.114957179Swift_sort:&nbsp; &nbsp; &nbsp; &nbsp;0.092688548为了比较,这是[-Onone]的n = 1_000_000:Swift:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 142.659763258C:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.162065333Swift_sort:&nbsp; &nbsp; &nbsp; &nbsp;114.095478272因此,在开发的这个阶段,没有优化的Swift几乎比该基准测试中的C慢1000倍。另一方面,将两个编译器都设置为[-Ofast],即使实际上不比C稍好,Swift的实际效果也至少一样好。已经指出,[-Ofast]更改了语言的语义,使其潜在地不安全。这就是Apple在Xcode 5.0发行说明中指出的内容:LLVM中提供了新的优化级别-Ofast,可以进行积极的优化。-Ofast放宽了一些保守的限制,大多数情况下对于浮点运算来说是安全的,这对大多数代码来说都是安全的。它可以从编译器中获得明显的高性能优势。他们全都主张。无论是不是明智,我都不能说,但是据我所知,如果您不进行高精度浮点运算并且您确信没有整数或整数,那么在发行版中使用[-Ofast]似乎足够合理。程序中可能发生数组溢出。如果您确实需要高性能和溢出检查/精确算术,请立即选择另一种语言。测试版3更新:n = 10_000,带有[-O]:Swift:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.019697268C:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.000718064Swift_sort:&nbsp; &nbsp; &nbsp; &nbsp;0.002094721Swift通常要快一些,看起来Swift的内置排序已经发生了很大变化。最后更新:[-Onone]:Swift:&nbsp; &nbsp;0.678056695C:&nbsp; &nbsp; &nbsp; &nbsp;0.000973914[-O]:Swift:&nbsp; &nbsp;0.001158492C:&nbsp; &nbsp; &nbsp; &nbsp;0.001192406[-Ounchecked]:Swift:&nbsp; &nbsp;0.000827764C:&nbsp; &nbsp; &nbsp; &nbsp;0.001078914

慕标5832272

我决定看看这个很有趣,下面是我得到的时间安排:Swift 4.0.2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;:&nbsp; &nbsp;0.83s (0.74s with `-Ounchecked`)C++ (Apple LLVM 8.0.0):&nbsp; &nbsp;0.74s迅速// Swift 4.0 codeimport Foundationfunc doTest() -> Void {&nbsp; &nbsp; let arraySize = 10000000&nbsp; &nbsp; var randomNumbers = [UInt32]()&nbsp; &nbsp; for _ in 0..<arraySize {&nbsp; &nbsp; &nbsp; &nbsp; randomNumbers.append(arc4random_uniform(UInt32(arraySize)))&nbsp; &nbsp; }&nbsp; &nbsp; let start = Date()&nbsp; &nbsp; randomNumbers.sort()&nbsp; &nbsp; let end = Date()&nbsp; &nbsp; print(randomNumbers[0])&nbsp; &nbsp; print("Elapsed time: \(end.timeIntervalSince(start))")}doTest()结果:斯威夫特1.1xcrun swiftc --versionSwift version 1.1 (swift-600.0.54.20)Target: x86_64-apple-darwin14.0.0xcrun swiftc -O SwiftSort.swift./SwiftSort&nbsp; &nbsp; &nbsp;Elapsed time: 1.02204304933548斯威夫特1.2xcrun swiftc --versionApple Swift version 1.2 (swiftlang-602.0.49.6 clang-602.0.49)Target: x86_64-apple-darwin14.3.0xcrun -sdk macosx swiftc -O SwiftSort.swift./SwiftSort&nbsp; &nbsp; &nbsp;Elapsed time: 0.738763988018036雨燕2.0xcrun swiftc --versionApple Swift version 2.0 (swiftlang-700.0.59 clang-700.0.72)Target: x86_64-apple-darwin15.0.0xcrun -sdk macosx swiftc -O SwiftSort.swift./SwiftSort&nbsp; &nbsp; &nbsp;Elapsed time: 0.767306983470917如果我用编译,性能似乎相同-Ounchecked。斯威夫特3.0xcrun swiftc --versionApple Swift version 3.0 (swiftlang-800.0.46.2 clang-800.0.38)Target: x86_64-apple-macosx10.9xcrun -sdk macosx swiftc -O SwiftSort.swift./SwiftSort&nbsp; &nbsp; &nbsp;Elapsed time: 0.939633965492249xcrun -sdk macosx swiftc -Ounchecked SwiftSort.swift./SwiftSort&nbsp; &nbsp; &nbsp;Elapsed time: 0.866258025169373似乎是一个性能回归从雨燕2.0至3.0雨燕,而且我也看到之间的差异-O,并-Ounchecked首次。迅捷4.0xcrun swiftc --versionApple Swift version 4.0.2 (swiftlang-900.0.69.2 clang-900.0.38)Target: x86_64-apple-macosx10.9xcrun -sdk macosx swiftc -O SwiftSort.swift./SwiftSort&nbsp; &nbsp; &nbsp;Elapsed time: 0.834299981594086xcrun -sdk macosx swiftc -Ounchecked SwiftSort.swift./SwiftSort&nbsp; &nbsp; &nbsp;Elapsed time: 0.742045998573303Swift 4再次提高了性能,同时保持-O和之间的差距-Ounchecked。-O -whole-module-optimization似乎没有什么不同。C ++#include <chrono>#include <iostream>#include <vector>#include <cstdint>#include <stdlib.h>using namespace std;using namespace std::chrono;int main(int argc, const char * argv[]) {&nbsp; &nbsp; const auto arraySize = 10000000;&nbsp; &nbsp; vector<uint32_t> randomNumbers;&nbsp; &nbsp; for (int i = 0; i < arraySize; ++i) {&nbsp; &nbsp; &nbsp; &nbsp; randomNumbers.emplace_back(arc4random_uniform(arraySize));&nbsp; &nbsp; }&nbsp; &nbsp; const auto start = high_resolution_clock::now();&nbsp; &nbsp; sort(begin(randomNumbers), end(randomNumbers));&nbsp; &nbsp; const auto end = high_resolution_clock::now();&nbsp; &nbsp; cout << randomNumbers[0] << "\n";&nbsp; &nbsp; cout << "Elapsed time: " << duration_cast<duration<double>>(end - start).count() << "\n";&nbsp; &nbsp; return 0;}结果:苹果C6.0clang++ --versionApple LLVM version 6.0 (clang-600.0.54) (based on LLVM 3.5svn)Target: x86_64-apple-darwin14.0.0Thread model: posixclang++ -O3 -std=c++11 CppSort.cpp -o CppSort./CppSort&nbsp; &nbsp; &nbsp;Elapsed time: 0.688969苹果C 6.1.0clang++ --versionApple LLVM version 6.1.0 (clang-602.0.49) (based on LLVM 3.6.0svn)Target: x86_64-apple-darwin14.3.0Thread model: posixclang++ -O3 -std=c++11 CppSort.cpp -o CppSort./CppSort&nbsp; &nbsp; &nbsp;Elapsed time: 0.670652苹果Clang 7.0.0clang++ --versionApple LLVM version 7.0.0 (clang-700.0.72)Target: x86_64-apple-darwin15.0.0Thread model: posixclang++ -O3 -std=c++11 CppSort.cpp -o CppSort./CppSort&nbsp; &nbsp; &nbsp;Elapsed time: 0.690152苹果铛8.0.0clang++ --versionApple LLVM version 8.0.0 (clang-800.0.38)Target: x86_64-apple-darwin15.6.0Thread model: posixclang++ -O3 -std=c++11 CppSort.cpp -o CppSort./CppSort&nbsp; &nbsp; &nbsp;Elapsed time: 0.68253苹果Clang 9.0.0clang++ --versionApple LLVM version 9.0.0 (clang-900.0.38)Target: x86_64-apple-darwin16.7.0Thread model: posixclang++ -O3 -std=c++11 CppSort.cpp -o CppSort./CppSort&nbsp; &nbsp; &nbsp;Elapsed time: 0.736784判决在撰写本文时,Swift的排序速度很快,但-O与使用上述编译器和库进行编译时,C ++的排序还不及C ++的排序。使用-Ounchecked,它似乎与Swift 4.0.2和Apple LLVM 9.0.0中的C ++一样快。
随时随地看视频慕课网APP
我要回答