似乎达到了CUDA限制,但这有什么限制?
我有一个CUDA程序似乎在某种资源的某种限制,但我无法弄清楚该资源是什么。这是内核函数:
__global__ void DoCheck(float2* points, int* segmentToPolylineIndexMap, int segmentCount, int* output){ int segmentIndex = threadIdx.x + blockIdx.x * blockDim.x; int pointCount = segmentCount + 1; if(segmentIndex >= segmentCount) return; int polylineIndex = segmentToPolylineIndexMap[segmentIndex]; int result = 0; if(polylineIndex >= 0) { float2 p1 = points[segmentIndex]; float2 p2 = points[segmentIndex+1]; float2 A = p2; float2 a; a.x = p2.x - p1.x; a.y = p2.y - p1.y; for(int i = segmentIndex+2; i < segmentCount; i++) { int currentPolylineIndex = segmentToPolylineIndexMap[i]; // if not a different segment within out polyline and // not a fake segment bool isLegit = (currentPolylineIndex != polylineIndex && currentPolylineIndex >= 0); float2 p3 = points[i]; float2 p4 = points[i+1]; float2 B = p4; float2 b; b.x = p4.x - p3.x; b.y = p4.y - p3.y; float2 c; c.x = B.x - A.x; c.y = B.y - A.y; float2 b_perp; b_perp.x = -b.y; b_perp.y = b.x; float numerator = dot(b_perp, c); float denominator = dot(b_perp, a); bool isParallel = (denominator == 0.0); output[segmentIndex] = result;}
参数的大如下:
devicePoints = 22,464 float2s = 179,712字节
deviceSegmentsToPolylineIndexMap = 22,463 ints = 89,852字节
numSegments = 1 int = 4个字节
deviceOutput = 22,463 ints = 89,852字节
当我执行这个内核时,它会崩溃视频卡。看起来我正在达到某种限制,因为如果我使用DoCheck<<<300, 32>>>(...);
它来执行内核,它就可以工作。需要明确的是,参数是相同的,只是块数不同。
知道为什么一个人崩溃了视频驱动程序,而另一个没有?失败的那个似乎仍然在卡的数量限制内。
更新 有关我的系统配置的更多信息:
视频卡:nVidia 8800GT
CUDA版本:1.1
操作系统:Windows Server 2008 R2
我也尝试在具有以下配置的笔记本电脑上,但得到了相同的结果:
视频卡:nVidia Quadro FX 880M
CUDA版本:1.2
操作系统:Windows 7 64位
繁星点点滴滴