i started learning cuda programming. trundling through simple cuda c examples , going swimmingly. then! suddenly! thrust! consider myself versed on c++ functors , taken aback @ difference between cuda c , thrust

i find hard believe that

__global__ void square(float *a, int n) {     int idx = blockidx.x * blockdim.x + threadidx.x;     if (idx < n) {         a[idx] = a[idx] * a[idx];     } }  int main(int argc, char** argv) {  float *ahost, *adevice;  const int n = 10; size_t size = n * sizeof(float);  ahost = (float*)malloc(size); cudamalloc((void**)&adevice, size);  (int = 0; < n; i++) {     ahost[i] = (float)i; }  cudamemcpy(adevice, ahost, size, cudamemcpyhosttodevice);  int block = 4; int nblock = n/block + (n % block == 0 ? 0:1);  square<<<nblock, block>>>(adevice, n);  cudamemcpy(ahost, adevice, size, cudamemcpydevicetohost);  (int = 0; < n; i++) {     printf("%d, %f\n", i, ahost[i]); }  free(ahost); cudafree(adevice); } 

is equvalent to

template <typename t>     struct square {     __host__ __device__ t operator()(const t& x) const {         return x * x;     } };   int main(int argc, char** argv) {     const int n = 10;     thrust::device_vector<float> dvec(n);     thrust::sequence(dvec.begin(), dvec.end());     thrust::transform(dvec.begin(), dvec.end(), dvec.begin(), square<float>());     thrust::copy(dvec.begin(), dvec.end(), std::ostream_iterator<float>(std::cout, "\n")); } 

am missing something? above code being run on gpu? thrust great tool, i'm skeptical takes care of heavy c-style memory management.

  • is thrust code being executed on gpu? how can tell?
  • how did thrust eliminate bizarre syntax of evoking kernel?
  • is thrust evoking kernel?
  • does thrust automatically handle thread index computation?

thanks time. sorry if these silly questions, find incredulous examples i've seen transition instantly can described model t m3.

roughly: yes, of course. thrust library, of them born make easier. great point avoiding explicit cuda code, looks strange rest of programmers, providing friendly c++-like interface.

thrust uses gpu, not just gpu. makes same operations make if write own code, i.e., c/c++ code allocating memory, copying, set grid , block sizes... , invokes gpu executing kernel.

it choice don't want inside low level cuda stuff take advantage of gpu parallelism in simple (but frequent) problem, vector operations.


