/articles/toma

To get this branch, use:
bzr branch http://darksoft.org/webbzr/articles/toma
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
\begin{table}[htb] %[htbp]
\begin{threeparttable}
\caption{\label{tbl:overs} Performance and configuration of ALU-based back-projection kernel performing oversampling-based interpolation}
\centering
\noindent
%\resizebox{\columnwidth}{!}{\begin{tabular}{} ... \end{\tabular}}
\begin{tabularx}{\columnwidth}{ | X  c | r | l l l l l l | }
\hline
%& & & \multicolumn{5}{c|}{Configuration} \\
%\mhd{|c}{GPU} & \mhd{c|}{Slices} & \mhd{c|}{Perf.} & \mhd{c}{Area} & \mhd{c}{Blocks} & \mhd{c}{L1/SM} & \mhd{c}{CC} & \mhd{c|}{PaO} \\
& &  \mhd{c|}{Perf} & \multicolumn{6}{c|}{Configuration} \\
\mhd{|c}{GPU} & \mhd{c|}{$n_v$} & GU/s  & \mhd{c}{$n_q$} & \mhd{c}{C} & \mhd{c}{$s_t/s_d$} & \mhd{c}{U} & \mhd{c}{R} & \mhd{c|}{O} \\
%&           & Perf.          & Px.          & Caches   &  Pr.                  & U        & Rnd.              & Occ.        \\
\hline                                                                                         
\multirow{3}{*}{GTX580}                                                                  
& 1          & 80             & 4            & 1        & 32 / 8                & -        & SFU                & 75\%       \\
& 2          & 116            & 4            & 2        & 32 / 8                & -        & SFU                & 50\%       \\
& 4          & 142            & 4            & 4        & 64 / 4                & 2        & SFU                & 50\%       \\
\hline                                                                                   
                                                
% In NN, we need 1/2 cache. The                                                                                                           
\multirow{3}{*}{GTX680}                                                                  
& 1          & 123            & 16           & 1        & 32 / 4\tnote{1}       & 4        & ALU\tnote{2}       & 50\%               \\
& 2          & 160            & 8            & 1        & 32 / 4                & 2        & ALU                & 50\%               \\
& 4          & 165            & 4            & 2        & 64 / 4                & 2        & SFU                & 50\%               \\
\hline                                                                                   
                                                                                         
\multirow{3}{*}{Titan}                                                                   
& 1          & 195            & 16           & 1        & 32 / 4\tnote{1}       & 4        & ALU\tnote{2}       & 50\%               \\
& 2          & 237            & 8            & 1        & 32 / 4                & 2        & ALU                & 43\%               \\
& 4          & 279            & 4            & 2        & 64 / 4                & 2        & SFU                & 37\%               \\
\hline                                                                                   
                                                                                         
\multirow{3}{*}{GTX980}                                                                  
& 1          & 218            & 16           & 1        & 32 / 8                & -        & SFU                & 50\%               \\
& 2          & 269            & 16           & 2        & 64 / 4                & -        & SFU                & 50\%               \\
& 4          & 292            & 4            & 4        & 64 / 4                & 2        & SFU                & 50\%               \\
\hline                                                                                   
                                                                                         
\multirow{3}{*}{Titan X}                                                                 
& 1          & 606            & 16           & 1        & 32 / 8                & -        & SFU                & 50\%               \\
& 2          & 693            & 16           & 2        & 64 / 4                & -        & SFU                & 50\%               \\
& 4          & 743            & 4            & 4        & 64 / 4                & 2        & SFU                & 50\%               \\
\hline                                                                                   
                                                                                         
\multirow{3}{*}{HD5970}                                                                  
& 1          & 63             & 16           & 1        & 32 / 8\tnote{1}       & -        & -                  & -                  \\
& 2          & 71             & 8            & 1        & 32 / 4                & -        & -                  & -                  \\
& 4          & 73             & 8            & 2        & 32 / 4                & 2        & -                  & -                  \\
\hline                                                                                   
                                                                                         
\multirow{3}{*}{HD7970}                                                                  
& 1          & 178            & 16           & 1        & 32 / 8\tnote{1}       & -        & -                  & -                  \\
& 2          & 222            & 4            & 1        & 32 / 8                & -        & -                  & -                  \\
& 4          & 233            & 4            & 2        & 64 / 4                & 2        & -                  & -                  \\
\hline                                                                                   
                                                                                         
\multirow{3}{*}{R9-290}                                                                  
& 1          & 219            & 16           & 1        & 32 / 8                & -        & -                  & -                  \\
& 2          & 298            & 4            & 2        & 32 / 8                & -        & -                  & -                  \\
& 4          & 384            & 4            & 4        & 64 / 4                & 2        & -                  & -                  \\
\hline
       
\end{tabularx}
\begin{tablenotes}
\item The table summarizes the performance and optimal configuration for the ALU-based back-projection kernel if oversampling and nearest neighbor interpolation are used to update values of reconstructed pixels. The configuration specifies: \tblcol{$n_q$} - a number of pixels per thread, \tblcol{C} - a number of separate arrays used to cache singoram (either a dedicated array is used to store each component of sinogram vector or two components are stored together to allow 64-bit writes), \tblcol{$s_t/s_d$} - a number of threads used to cache projection row and a number cached projections, \tblcol{U} - unrolling hint for inner projection loop, \tblcol{R} - the units to perform rounding and type conversions (index is always computed using SFU), \tblcol{O} - the desired occupancy. The caches are configured as specified in \tablename~\ref{tbl:cacheconf}.
\item1 Each GPU thread caches 2 values at once to enable 64-bit writes.
\item2 The use of SFU is also avoided while resolving array addresses, see \sectionname~\ref{section:alu_fancy}.
\end{tablenotes}
\end{threeparttable}
\end{table}