/articles/toma : revision 34

To get this branch, use:

bzr branch
http://darksoft.org/webbzr/articles/toma

« back to all changes in this revision

Viewing changes to section_2x_setup.tex

Committer: Suren A. Chilingaryan
Date: 2017-12-23 08:49:35 UTC
Revision ID: csa@suren.me-20171223084935-yg4j912ehufjz6d0

Fix cross-references and some latex complaints

Show diffs side-by-side

added added

removed removed

section_2x_setup.tex

\section{Setup, Methodology, and Conventions}

\subsection{Hardware Platform}

To evaluate the performance of the proposed methods, we have selected 9 graphics adapters with varying micro-architectures which were produced by AMD and NVIDIA. Table \ref{table:gpus} summarizes the considered GPUs.

\begin{table}[h]

\begin{table}[htb]

\caption{\label{table:gpus} List of targeted GPU architectures}

\centering

\begin{tabular} { ccccc }

These GPUs were assembled into the 3 GPU servers. The newer NVIDIA cards with Maxwell and Pascal architectures were installed in the Supermicro 7047GT based server specified in the table~\ref{table:sys1}. The older NVIDIA cards and all AMD cards were installed in two identical systems based on Supermicro 7046GT platform. The full specification is given in the table~\ref{table:sys2}. Additionally, we have tested how the developed code is performing on Intel Xeon Phi 5110P accelerator. The accelerator was installed in the first platform along with the newer NVIDIA cards.

\begin{table}[h]

\begin{table}[htb]

\caption{\label{table:sys1} Server running the newer NVIDIA cards}

\begin{tabular} { l || p{5.5cm} }

\hline

\end{tabular}

\end{table}

\begin{table}[h]

\begin{table}[htb]

\caption{\label{table:sys2} Servers running AMD and older NVIDIA cards}

\begin{tabular} { l || p{5.5cm} }

\hline

\subsection{Software Setup}

All described systems were running OpenSuSE 13.1 operating system. The code for NVIDIA cards was developed using CUDA framework. As newer versions of the framework have dropped support for older GPUs, we have used CUDA 6.5 with NVIDIA GeForce GTX295 card and CUDA 8.0 for other NVIDIA GPUs. The AMD version is based on OpenCL and was compiled using AMD APPSDK 3.0. Additionally, we have tested the performance of Xeon processors and Xeon Phi accelerator using Intel SDK for OpenCL. Since the latest version of Intel OpenCL SDK does not support Xeon Phi processors, again we have used two different versions. The newer one was used to evaluate performance of Xeon processors and the older one to run the developed methods on Xeon Phi accelerator. All installed software components are summarized in the table~\ref{table:soft}.

\begin{table}[h]

\begin{table}[htb]

\caption{\label{table:soft} Software components}

\begin{tabular} {l || l}

\hline

\end{table}

\subsection{Benchmarking Strategy}

In this article we are not aiming to precisely characterize the performance of the analyzed graphics cards, but rather validate that the proposed optimizations result in significant performance improvement. For this reason we take a relatively relaxed approach to the performance measurements. Unless specified otherwise, to measure performance we reconstruct 512 similar slices and extract median value. In most tests, we use a typical data set recorded by 4~MPix camera utilized at ANKA synchrotron. It consists of 2000 projections with dimensions of 1776 by 1707 pixels each. To prove that these parameters has a negligible effect on the performance, we show how the reconstruction performance depend on size in the section~\ref{x} and the stability of the performance in the section~\ref{x}.

In this article we are not aiming to precisely characterize the performance of the analyzed graphics cards, but rather validate that the proposed optimizations result in significant performance improvement. For this reason we take a relatively relaxed approach to the performance measurements. Unless specified otherwise, to measure performance we reconstruct 512 similar slices and extract median value. In most tests, we use a typical data set recorded by 4~MPix camera utilized at ANKA synchrotron. It consists of 2000 projections with dimensions of 1776 by 1707 pixels each. To prove that these parameters has a negligible effect on the performance, we show how the reconstruction performance depend on size in the section~\ref{section:perf_size} and the stability of the performance in the section~\ref{section:perf_gpuboost}.

Starting with Kepler architecture, the NVIDIA cards use variable clocks. The actual clock is adjusted based on the current load and the processor temperature~\cite{ryan2016gpuboost}. To avoid significant performance discrepancies, we run a heat-up procedure until the performance stabilizes. Furthermore, we verify that the actual hardware clock measured before start of measurements (but after heat-up procedure) does not significantly differ from the clock measured after the measurements. Otherwise, we re-run the test. Finally, we exclude all I/O while benchmarking. The reconstructions were executed using dummy data and the results were dropped without transferring them back to the system memory. Otherwise, some delays may be introduced between consecutive execution of GPU functions. During this pauses the processor is cooling down and may achieve higher average clock as compared to the more efficient reconstruction pipeline executing consecutive reconstructions without any latency in between.

\subsection{Quality Evaluation}

Some of the suggested optimizations alter the resulting reconstruction. To assess the effect on quality, in all such cased we compared the obtained results with the standard reconstruction. Standard Shepp Logan Head Phantom with resoltion 1024x1024 pixels is used for evaluation~\cite{shepp1974}. As the changes are typically small and are hardly visible on the 2D image, we show a profile along the line crossing maximum of features on the phantom, see \figurename~\ref{fig:phantom}.

\begin{figure}[h]

\begin{figure}[htb]

\centering

\includegraphics[width=0.45\textwidth]{img/phantom.pdf}

Older »