/articles/toma

To get this branch, use:
bzr branch http://darksoft.org/webbzr/articles/toma
76 by Suren A. Chilingaryan
Pre-conference
1
%\documentclass[journal]{IEEEtran}
2
\bibliographystyle{IEEEtran}
3
4
\documentclass[preprint,12pt]{elsarticle}
5
6
%\RequirePackage{fix-cm}
7
%\documentclass[twocolumn,final,numbook]{svjour3}                % Replace draft with referee or final
8
%\bibliographystyle{spmpsci}
9
\def \imgsizedefault {0.45}
10
\def \imgsizedefault {0.95}
11
12
%\journalname{Journal of Real-Time Image Processing}
13
%\journalname{International Journal of Parallel Programming}
14
15
16
\input{config.tex}
17
18
%For svjour3 (to prevent shifting of tables and algorithms to the end)
19
%\SetAlFnt{\scriptsize}                                  % algorithms
20
21
\usepackage{floatrow}
22
%\DeclareFloatFont{tiny}{\tiny}% "scriptsize" is defined by floatrow, "tiny" not
23
\floatsetup[table]{font=scriptsize}
24
%\let\footnotesize\scriptsizeg
25
\setlength\tabcolsep{4pt}
26
\floatstyle{plaintop}
27
\restylefloat{table}
28
29
30
%\SetAlCapSty{xAlCapSty}
31
%\newcommand{\xAlCapSty}[1]{#1}
32
%\SetAlCapSty{xAlCapSty}
33
%\SetAlCapSkip{0.5em}
34
%\SetAlCapFnt{\footnotesize}
35
%\SetAlCapNameFnt{\footnotesize}
36
37
% This is for handling overful problem caused if latex is unable to break lines in a beautiful ways.
38
% The default approach is too make longer lines and complain (you can see it with CCS concepts)
39
% Alternatively one of the following can be enabled:
40
\sloppy
41
%\setlength{\emergencystretch}{10pt}
42
% This, however, increases the size drastically. I have optimal solution on Ubuntu 14.04 laptop,
43
% which I can't reproduce on Gentoo. Currently, I will let it go and will build final PDFs on 
44
% Ubuntu to safe time.
45
46
47
48
49
\newboolean{draft}
50
\setboolean{draft}{true}
51
52
\DeclareUnicodeCharacter{00A0}{~}
53
54
\begin{document}
55
56
\begin{frontmatter}
57
58
\title{Reviewing GPU architectures to build efficient back projection for parallel geometries}
59
60
\author[kit]{Suren Chilingaryan}
61
\ead{chilingaryan@kit.edu}
62
\author[kul]{Evelina Ametova}
63
\ead{evelina.ametova@kuleuven.be}
64
\author[kit]{Anreas Kopmann}
65
\ead{kopmann@kit.edu}
66
\author[esrf]{Alessandro Mirone}
67
\ead{mirone@esrf.fr}
68
69
\address[kit]{Karlsruhe Institute of Technology, Germany}
70
\address[kul]{KU Leuven, Belgium}
71
\address[esrf]{ESRF, France}
72
73
74
%\institute{
75
%    Suren Chilingaryan \at Karlsruhe Institute of Technology, Germany,
76
%    \email{chilingaryan@kit.edu}
77
%    \and
78
%    Evelina Ametova \at KU Leuven, Belgium,
79
%    \email{evelina.ametova@kuleuven.be}
80
%    \and
81
%    Andreas Kopmann \at Karlsruhe Institute of Technology, Germany,
82
%    \email{kopmann@kit.edu}
83
%    \and
84
%    Alessandro Mirone, ESRF, France,
85
%    \email{mirone@esrf.fr}
86
%}
87
88
%\date{Received: xx.07.2018 / Accepted: xx.xx.2018}
89
90
91
%\maketitle
92
%\thispagestyle{empty}
93
\begin{abstract}
94
 Back-Projection is the major algorithm in Computed Tomography to reconstruct images from a set of recorded projections. It is used for both fast analytical methods and high-quality iterative techniques. X-ray imaging facilities rely on Back-Projection to reconstruct internal structures in material samples and living organisms with high spatial and temporal resolution. Fast image reconstruction is also essential to track and control processes under study in real-time.  In this article, we present efficient implementations of the Back-Projection algorithm for parallel hardware. We survey a range of parallel architectures presented by the major hardware vendors during the last 10 years. Similarities and differences between these architectures are analyzed and we highlight how specific features can be used to enhance the reconstruction performance. In particular, we build a performance model to find hardware hotspots and propose several optimizations to balance the load between texture engine, computational and special function units, as well as different types of memory maximizing the utilization of all GPU subsystems in parallel. We further show that targeting architecture-specific features allows one to boost the performance 2-7 times compared to the current state-of-the-art algorithms used in standard reconstructions codes.
95
96
%\keywords{parallel algorithms \and hardware architecture \and GPU computing \and  synchrotron tomography \and back-projection \and CUDA \and OpenCL}
97
\end{abstract}
98
99
\begin{keyword}
100
parallel algorithms \sep hardware architecture \sep GPU computing \sep  synchrotron tomography \sep back-projection \sep CUDA \sep OpenCL
101
\end{keyword}
102
\end{frontmatter}
103
104
105
%\noindent\rule[0.5ex]{\linewidth}{1pt}
106
%\textbf{Questions}:
107
%\begin{itemize}
108
%\end{itemize}
109
110
111
%\ifdraft
112
%\setpagewiselinenumbers
113
%\linenumbers
114
%\fi
115
116
117
\input{section_1x_intro.tex}
118
119
\input{section_2x_setup.tex}
120
121
\input{section_3x_arch.tex}
122
123
\input{section_4x_tomo.tex}
124
125
\input{section_5x_texrec.tex}
126
\input{section_61_alurec.tex}
127
\input{section_62_cpu.tex}
128
129
\input{section_7x_hybrid.tex}
130
\input{section_8x_summary.tex}
131
132
133
\section{Acknowledgments}
134
This work was partially supported by the German-Russian BMBF funding program, grant numbers 05K10CKB and 05K10VKE. 
135
The authors would like to thank to EXTREMA COST Action MP1207 for providing the networking support.
136
137
%\bibliography{bib/csa}{}
138
139
\bibliography{bib/collab,bib/csa,bib/refs,bib/wp}
140
141
\end{document}