Array indexing doesn't work there. I'm yet to setup my CUDA computer, in the meantime this proved to work (tested by Daniel Salazar). If I found other ways of doing it I get back to that.
reviewed and approved by Brecht my first OpenCL code \o/