/articles/toma

To get this branch, use:
bzr branch http://darksoft.org/webbzr/articles/toma
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
\begin{table}[htb] %[htbp]
\begin{threeparttable}
\caption{\label{tbl:alurec} Performance and configuration of ALU-based back-projection kernel}
\centering
\noindent
%\resizebox{\columnwidth}{!}{\begin{tabular}{} ... \end{\tabular}}
\begin{tabularx}{\columnwidth}{ | X  c | r r | l l l l l | }
\hline
%& & & \multicolumn{5}{c|}{Configuration} \\
%\mhd{|c}{GPU} & \mhd{c|}{Slices} & \mhd{c|}{Perf.} & \mhd{c}{Area} & \mhd{c}{Blocks} & \mhd{c}{L1/SM} & \mhd{c}{CC} & \mhd{c|}{PaO} \\
& & \multicolumn{2}{c|}{Perf. (GU/s)} & \multicolumn{5}{c|}{Configuration} \\
\mhd{|c}{GPU} & \mhd{c|}{$n_v$} & \mhd{c}{Lin} & \mhd{c|}{NN} & \mhd{c}{$n_q$} & \mhd{c}{$s_d$} & \mhd{c}{U} & \mhd{c}{R} & \mhd{c|}{O} \\
%&             & Lin.          & NN            & Px.          &  Pr.           & U        & Rnd.    & Occ.        \\
\hline                                                                                               
\multirow{3}{*}{GTX580}                                                                        
& 1          & 80             & 120            & 4            & 16            & -        & SFU     & 75\%       \\
& 2          & 113            & 188            & 4            & 16            & -        & SFU     & 50\%       \\
& 4          & 142            & 247            & 4            & 8~\tnote{2}   & -        & SFU     & 50\%       \\
\hline                                                                                         
                                                      
% In NN, we need 1/2 cache. Therefore, 4x4 area.                                                                                                
\multirow{3}{*}{GTX680}                                                                        
& 1          & 123            & 195            & 8\tnote{3}   & 8\tnote{4}    & 4        & ALU     & 50\%               \\
& 2          & 160            & 290            & 8            & 8             & 2        & ALU     & 50\%               \\
& 4          & 165            & 306            & 4            & 8             & 2        & SFU     & 50\%               \\
\hline                                                                                         
                                                                                               
\multirow{3}{*}{Titan}                                                                         
& 1          & 195            & 268            & 8\tnote{3}   & 8\tnote{4}    & 4        & ALU     & 50\%               \\
& 2          & 237            & 429            & 8            & 8             & 2        & ALU     & 50\%               \\
& 4          & 278            & 471            & 4            & 8             & 2        & SFU     & 50\%               \\
\hline                                                                                         
                                                                                               
\multirow{3}{*}{GTX980}                                                                        
& 1          & 218            & 452            & 16           & 8             & -        & SFU     &100\%\tnote{5}      \\
& 2          & 269            & 510            & 16           & 8             & -        & SFU     & 50\%               \\
& 4\tnote{1} & 292            & 567            & 4            & 16            & -        & ALU     & 50\%               \\
\hline                                                                                         
                                                                                               
\multirow{3}{*}{Titan X}                                                                       
& 1          & 606            & 1161           & 16           & 8             & -        & SFU     &100\%\tnote{5}      \\
& 2          & 692            & 1328           & 16           & 8             & -        & SFU     & 50\%               \\
& 4\tnote{1} & 743            & 1405           & 4            & 16            & -        & ALU     & 50\%               \\
\hline                                                                                         
                                                                                               
\multirow{3}{*}{HD5970}                                                                        
& 1          & 63             & 116            & 16           & 8\tnote{3}    & -        & -       & -                  \\
& 2          & 71             & 146            & 8            & 16            & -        & -       & -                  \\
& 4          & 73             & 160            & 8            & 8             & -        & -       & -                  \\
\hline                                                                                         
                                                                                               
\multirow{3}{*}{HD7970}                                                                        
& 1          & 178            & 290            & 16            & 8\tnote{3}  & -        & -        & -                  \\
& 2          & 221            & 430            & 4\tnote{3}    & 16\tnote{6} & -        & -        & -                  \\
& 4          & 233            & 450            & 4             & 8           & -        & -        & -                  \\
\hline                                                                                         
                                                                                               
\multirow{3}{*}{R9-290}                                                                        
& 1          & 219            & 341            & 16            & 8           & -        & -        & -                  \\
& 2          & 298            & 582            & 4\tnote{3}    & 16\tnote{6} & -        & -        & -                  \\
& 4          & 383            & 635            & 4             & 16          & -        & -        & -                  \\
\hline
       
\end{tabularx}
\begin{tablenotes}
\item The table summarizes the performance and optimal configuration for the ALU-based back-projection kernel. The performance is reported for the linear and nearest neighbor interpolation modes.  The configuration specifies: \tblcol{$n_q$} - a number of pixels per thread, \tblcol{$s_d$} - a number of cached projections, \tblcol{U} - unrolling hint for inner projection loop, \tblcol{R} - the units to perform rounding and type conversions (index is always computed using SFU), \tblcol{O} - the desired occupancy. The caches are configured as specified in \tablename~\ref{tbl:cacheconf}. The number of threads to cache a projection row is determined according to guidelines in \tablename~\ref{tbl:shmemconf}.
\item1 The configuration and performance are specified for half-float data representation. The half-float values are also cached in the shared memory.
\item2 Because of the reduced shared memory requirements, 16 projections are cached in the  nearest neighbor interpolation mode.
\item3 A larger 64x64 area is reconstructed if nearest neighbour interpolation is performed. The 16 pixels are assigned to each GPU thread.
\item4 Each GPU thread caches 2 values at once to enable 64-bit writes if nearest neighbor interpolation is used. Consequently, only 16 threads are used per projection row and 16 projections are cached to utilize all threads.
\item5 The 50\% occupancy is targeted in nearest-neighbor interpolation mode.
\item6 Since 64x64 blocks are assigned to the thread block in the nearest-neighbor interpolation mode, the 32 threads are used per projection row and only 8 projections are cached.
\end{tablenotes}
\end{threeparttable}
\end{table}