/articles/toma

To get this branch, use:
bzr branch http://darksoft.org/webbzr/articles/toma
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
%\documentclass[journal]{IEEEtran}
\bibliographystyle{IEEEtran}

\documentclass[preprint,12pt]{elsarticle}

%\RequirePackage{fix-cm}
%\documentclass[twocolumn,final,numbook]{svjour3}                % Replace draft with referee or final
%\bibliographystyle{spmpsci}
\def \imgsizedefault {0.45}
\def \imgsizedefault {0.95}

%\journalname{Journal of Real-Time Image Processing}
%\journalname{International Journal of Parallel Programming}


\input{config.tex}

%For svjour3 (to prevent shifting of tables and algorithms to the end)
%\SetAlFnt{\scriptsize}                                  % algorithms

\usepackage{floatrow}
%\DeclareFloatFont{tiny}{\tiny}% "scriptsize" is defined by floatrow, "tiny" not
\floatsetup[table]{font=scriptsize}
%\let\footnotesize\scriptsizeg
\setlength\tabcolsep{4pt}
\floatstyle{plaintop}
\restylefloat{table}


%\SetAlCapSty{xAlCapSty}
%\newcommand{\xAlCapSty}[1]{#1}
%\SetAlCapSty{xAlCapSty}
%\SetAlCapSkip{0.5em}
%\SetAlCapFnt{\footnotesize}
%\SetAlCapNameFnt{\footnotesize}

% This is for handling overful problem caused if latex is unable to break lines in a beautiful ways.
% The default approach is too make longer lines and complain (you can see it with CCS concepts)
% Alternatively one of the following can be enabled:
\sloppy
%\setlength{\emergencystretch}{10pt}
% This, however, increases the size drastically. I have optimal solution on Ubuntu 14.04 laptop,
% which I can't reproduce on Gentoo. Currently, I will let it go and will build final PDFs on 
% Ubuntu to safe time.




\newboolean{draft}
\setboolean{draft}{true}

\DeclareUnicodeCharacter{00A0}{~}

\begin{document}

\begin{frontmatter}

\title{Reviewing GPU architectures to build efficient back projection for parallel geometries}

\author[kit]{Suren Chilingaryan}
\ead{chilingaryan@kit.edu}
\author[kul]{Evelina Ametova}
\ead{evelina.ametova@kuleuven.be}
\author[kit]{Anreas Kopmann}
\ead{kopmann@kit.edu}
\author[esrf]{Alessandro Mirone}
\ead{mirone@esrf.fr}

\address[kit]{Karlsruhe Institute of Technology, Germany}
\address[kul]{KU Leuven, Belgium}
\address[esrf]{ESRF, France}


%\institute{
%    Suren Chilingaryan \at Karlsruhe Institute of Technology, Germany,
%    \email{chilingaryan@kit.edu}
%    \and
%    Evelina Ametova \at KU Leuven, Belgium,
%    \email{evelina.ametova@kuleuven.be}
%    \and
%    Andreas Kopmann \at Karlsruhe Institute of Technology, Germany,
%    \email{kopmann@kit.edu}
%    \and
%    Alessandro Mirone, ESRF, France,
%    \email{mirone@esrf.fr}
%}

%\date{Received: xx.07.2018 / Accepted: xx.xx.2018}


%\maketitle
%\thispagestyle{empty}
\begin{abstract}
 Back-Projection is the major algorithm in Computed Tomography to reconstruct images from a set of recorded projections. It is used for both fast analytical methods and high-quality iterative techniques. X-ray imaging facilities rely on Back-Projection to reconstruct internal structures in material samples and living organisms with high spatial and temporal resolution. Fast image reconstruction is also essential to track and control processes under study in real-time.  In this article, we present efficient implementations of the Back-Projection algorithm for parallel hardware. We survey a range of parallel architectures presented by the major hardware vendors during the last 10 years. Similarities and differences between these architectures are analyzed and we highlight how specific features can be used to enhance the reconstruction performance. In particular, we build a performance model to find hardware hotspots and propose several optimizations to balance the load between texture engine, computational and special function units, as well as different types of memory maximizing the utilization of all GPU subsystems in parallel. We further show that targeting architecture-specific features allows one to boost the performance 2-7 times compared to the current state-of-the-art algorithms used in standard reconstructions codes.

%\keywords{parallel algorithms \and hardware architecture \and GPU computing \and  synchrotron tomography \and back-projection \and CUDA \and OpenCL}
\end{abstract}

\begin{keyword}
parallel algorithms \sep hardware architecture \sep GPU computing \sep  synchrotron tomography \sep back-projection \sep CUDA \sep OpenCL
\end{keyword}
\end{frontmatter}


%\noindent\rule[0.5ex]{\linewidth}{1pt}
%\textbf{Questions}:
%\begin{itemize}
%\end{itemize}


%\ifdraft
%\setpagewiselinenumbers
%\linenumbers
%\fi


\input{section_1x_intro.tex}

\input{section_2x_setup.tex}

\input{section_3x_arch.tex}

\input{section_4x_tomo.tex}

\input{section_5x_texrec.tex}
\input{section_61_alurec.tex}
\input{section_62_cpu.tex}

\input{section_7x_hybrid.tex}
\input{section_8x_summary.tex}


\section{Acknowledgments}
This work was partially supported by the German-Russian BMBF funding program, grant numbers 05K10CKB and 05K10VKE. 
The authors would like to thank to EXTREMA COST Action MP1207 for providing the networking support.

%\bibliography{bib/csa}{}

\bibliography{bib/collab,bib/csa,bib/refs,bib/wp}

\end{document}