18
18
In parallel tomography, exactly the same operations are performed for all the reconstructed slices. Therefore, it is possible to reconstruct multiple slices in parallel if the back projection operator is applied to a compound sinogram which encodes bins from the multiple simple sinograms as vector data. Particularly, it is possible to construct such sinogram using float2 vector type and interleaving values from one sinogram as $x$ components and from another as $y$, see \figurename~\ref{fig:interleave}. With \emph{float2}-typed texture mapped on this interleaved sinogram, it is possible to fully utilize the bandwidth of the texture engine and reconstruct two slices in parallel. The interleaving is done as an additional data preparation step after filtering, but before actually executing the back projection kernel. The back projection kernel, then, must be simply adjusted to use the float2 type and write the $x$ component of the result into the first output slice and the $y$ component into the second. There is a considerable speed-up on all NVIDIA architectures as can be seen on the \figurename~\ref{fig:texeff}.
22
22
\includegraphics[width=0.45\textwidth]{img/interleave.pdf}
23
23
\caption{\label{fig:interleave} Illustration how two sinograms are interleaved to allow utilization of full 8-byte filtering bandwidth on post Fermi NVIDIA GPUs. }