If cuda-gdb throws Program received signal CUDA_EXCEPTION_4, Warp Illegal Instruction. for the following code line:

[Switching focus to CUDA kernel 295, grid 148, block (0,2,3), thread (0,0,0), device 0, sm 0, warp 26, lane 0]
0x00000000010b39f0 in cos ()
(cuda-gdb) disass
Dump of assembler code for function cos:
...
   0x00000000010b39e8 <+376>:	LDC.64 R32, c[0x3][R12]
(note debugger always points to the next address after problematic instruction, i.e. 0xe8 + 0x8 = 0xf0 in this case)

then this means used register index is outside of the legal bounds set by kernel’s register count.