/articles/toma

To get this branch, use:
bzr branch http://darksoft.org/webbzr/articles/toma
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
\begin{table}[htb]
\caption{\label{table:alg_cmd} CUDA/OpenCL functions}
\begin{tabularx}{\columnwidth} { lX }
\hline
Function & Description \\
\hline
$sync$     & Denotes a synchronization point. The further execution is blocked until all threads of the block reach this point. It is implemented with $\_\_syncthreads()$ command in CUDA and $barrier()$ with the $CLK\_LOCAL\_MEM\_FENCE$ type in OpenCL \\
$fence$    & Enforces ordering of loads and stores. Equivalent to $\_\_threadfence\_block()$ in CUDA and $mem\_fence()$ in OpenCL \\
$tex2d$    & 2D fetch from the texture mapped to the sinogram. It is implemented with $tex2D()$ function in CUDA and $read\_imagef()$ in OpenCL.  \\
$shfl*$    & A group of CUDA functions (\emph{\_\_shfl}, \emph{\_\_shfl\_up}, \emph{\_\_shfl\_down}, \emph{\_\_shfl\_xor}) used to exchange data between the threads of a warp~\cite{nvidia2017cudapg}. The vector types are not supported by CUDA functions. If \emph{shfl} is applied to vector data, it is actually implemented as several calls to the corresponding function using all vector components one after another. There is no AMD counterpart of these functions.  \\
$floor$    & Rounding towards negative infinity \\

\hline
\end{tabularx}
\end{table}