linux - CUDA C v. Thrust, am I missing something? -
i started learning cuda
programming. trundling through simple cuda c
examples , going swimmingly. then! suddenly! thrust! consider myself versed on c++ functors , taken aback @ difference between cuda c
, thrust
i find hard believe that
__global__ void square(float *a, int n) { int idx = blockidx.x * blockdim.x + threadidx.x; if (idx < n) { a[idx] = a[idx] * a[idx]; } } int main(int argc, char** argv) { float *ahost, *adevice; const int n = 10; size_t size = n * sizeof(float); ahost = (float*)malloc(size); cudamalloc((void**)&adevice, size); (int = 0; < n; i++) { ahost[i] = (float)i; } cudamemcpy(adevice, ahost, size, cudamemcpyhosttodevice); int block = 4; int nblock = n/block + (n % block == 0 ? 0:1); square<<<nblock, block>>>(adevice, n); cudamemcpy(ahost, adevice, size, cudamemcpydevicetohost); (int = 0; < n; i++) { printf("%d, %f\n", i, ahost[i]); } free(ahost); cudafree(adevice); }
is equvalent to
template <typename t> struct square { __host__ __device__ t operator()(const t& x) const { return x * x; } }; int main(int argc, char** argv) { const int n = 10; thrust::device_vector<float> dvec(n); thrust::sequence(dvec.begin(), dvec.end()); thrust::transform(dvec.begin(), dvec.end(), dvec.begin(), square<float>()); thrust::copy(dvec.begin(), dvec.end(), std::ostream_iterator<float>(std::cout, "\n")); }
am missing something? above code being run on gpu? thrust great tool, i'm skeptical takes care of heavy c-style memory management.
- is
thrust
code being executed on gpu? how can tell? - how did
thrust
eliminate bizarre syntax of evoking kernel? - is
thrust
evoking kernel? - does
thrust
automatically handle thread index computation?
thanks time. sorry if these silly questions, find incredulous examples i've seen transition instantly can described model t m3.
roughly: yes, of course. thrust library, of them born make easier. great point avoiding explicit cuda code, looks strange rest of programmers, providing friendly c++-like interface.
thrust uses gpu, not just gpu. makes same operations make if write own code, i.e., c/c++ code allocating memory, copying, set grid , block sizes... , invokes gpu executing kernel.
it choice don't want inside low level cuda stuff take advantage of gpu parallelism in simple (but frequent) problem, vector operations.
Comments
Post a Comment