67
67
\subsection{Benchmarking Strategy}
68
In this article we are not aiming to precisely characterize the performance of the analyzed graphics cards, but rather validate that the proposed optimizations result in significant performance improvement. For this reason we take a relatively relaxed approach to the performance measurements. Unless specified otherwise, to measure performance we reconstruct 512 similar slices and extract median value. In most tests, we use a typical data set recorded by 4~MPix camera utilized at ANKA synchrotron. It consists of 2000 projections with dimensions of 1776 by 1707 pixels each. To prove that these parameters has a negligible effect on the performance, we show how the reconstruction performance depend on size in the section~\ref{x} and the stability of the performance in the section~\ref{x}.
68
In this article we are not aiming to precisely characterize the performance of the analyzed graphics cards, but rather validate that the proposed optimizations result in significant performance improvement. For this reason we take a relatively relaxed approach to the performance measurements. Unless specified otherwise, to measure performance we reconstruct 512 similar slices and extract median value. In most tests, we use a typical data set recorded by 4~MPix camera utilized at ANKA synchrotron. It consists of 2000 projections with dimensions of 1776 by 1707 pixels each. To prove that these parameters has a negligible effect on the performance, we show how the reconstruction performance depend on size in the section~\ref{section:perf_size} and the stability of the performance in the section~\ref{section:perf_gpuboost}.
70
70
Starting with Kepler architecture, the NVIDIA cards use variable clocks. The actual clock is adjusted based on the current load and the processor temperature~\cite{ryan2016gpuboost}. To avoid significant performance discrepancies, we run a heat-up procedure until the performance stabilizes. Furthermore, we verify that the actual hardware clock measured before start of measurements (but after heat-up procedure) does not significantly differ from the clock measured after the measurements. Otherwise, we re-run the test. Finally, we exclude all I/O while benchmarking. The reconstructions were executed using dummy data and the results were dropped without transferring them back to the system memory. Otherwise, some delays may be introduced between consecutive execution of GPU functions. During this pauses the processor is cooling down and may achieve higher average clock as compared to the more efficient reconstruction pipeline executing consecutive reconstructions without any latency in between.
72
72
\subsection{Quality Evaluation}
73
73
Some of the suggested optimizations alter the resulting reconstruction. To assess the effect on quality, in all such cased we compared the obtained results with the standard reconstruction. Standard Shepp Logan Head Phantom with resoltion 1024x1024 pixels is used for evaluation~\cite{shepp1974}. As the changes are typically small and are hardly visible on the 2D image, we show a profile along the line crossing maximum of features on the phantom, see \figurename~\ref{fig:phantom}.
78
78
\includegraphics[width=0.45\textwidth]{img/phantom.pdf}