bzr branch
http://darksoft.org/webbzr/tomo/pyhst
45
by csa
Fixes measurements of I/O timings |
1 |
Requirements
|
2 |
------------
|
|
3 |
* POSIX complaint operating system, standard set of Linux system apps (find,
|
|
4 |
grep, sed, xargs, cut, ...), CMake build system |
|
5 |
* GNU C Compiler 4.x, current version of CUDA is not working with 4.4 and
|
|
6 |
newer. |
|
7 |
* Python with PIL, Logging, NumPy modules support, optionally VimpsCC python
|
|
8 |
module |
|
9 |
* Glib2 Library
|
|
10 |
* FFTW or Intel MKL Library are required for multithreaded mode
|
|
11 |
* NVIDIA CUDA Toolkit 2.x or 3.x is required for GPU based processing
|
|
12 |
* x86-64 compatible CPU supporting SSSE3 instruction set
|
|
13 |
||
18
by csa
Big redesign (early commit) |
14 |
Compiler
|
15 |
--------
|
|
44
by csa
Further data readout optimizations, SSE to change endianess, etc. |
16 |
* -O2 mandatory should be used, it gives approximately 10-times performance
|
17 |
difference. -O3 causes problems in GPU mode, sometimes... |
|
19
by csa
Detect FFTW3 and Intel Kernel Math Library |
18 |
* -msse2 -msse3 are also slightly improving performance in CPU-mode
|
18
by csa
Big redesign (early commit) |
19 |
* Intel compiler is only slightly faster than gcc-4.2.4 (There is certainly
|
20 |
no sense in using it, much better to vectorise code manually) |
|
21 |
||
22 |
FFT libraries
|
|
23 |
-------------
|
|
24 |
The CPU FFT transformations can be performed in several ways |
|
25 |
a) The included 'Vhst_fourier.c' have very small initialization time, but |
|
26 |
doesn't include SSE optimizations and, therefore, performs rather poor on |
|
27 |
modern Intel/AMD platforms |
|
28 |
b) FFTW3 library is about 10-15% faster, but uses more time for initialization, |
|
29 |
which is neglectable in case of large amounts of data. |
|
28
by csa
Use pinned result buffer to perform device2host memory transfer parallel with computations, add ESRF copyright information in files appeared after redesign |
30 |
c) Intel FFTW3 library have comporable performance of processing with FFTW3
|
31 |
but very small initialization time. |
|
23
by csa
Perform pair of convolutions using a single complex fourier transformation in CUDA reconstruction module (early commit) |
32 |
|
33 |
Building
|
|
34 |
--------
|
|
35 |
a) Run 'cmake .' to generate Makefiles, it will report the system libraries |
|
36 |
it was a able to find and which features will be used. |
|
37 |
b) 'ccmake .' or 'cmake-gui' commands may be used to configure options: set |
|
38 |
compilation flags, enable/disable CUDA support, etc. |
|
39 |
c) Run 'make' to compile GPU-enabled version and 'make cpu' to build cpu
|
|
40 |
only version (you should execute 'make clean' before switching between |
|
41 |
gpu and cpu targets) |
|
42 |
||
30
by csa
Multi-GPU, Multi-CPU, and Hybrid modes support |
43 |
Problems
|
44 |
--------
|
|
60
by Suren A. Chilingaryan
More on OpenCL detection |
45 |
a) CUDA 2.3 is buggy and produces incorrect results, CUDA 2.2, 3.0, 3.2 are |
30
by csa
Multi-GPU, Multi-CPU, and Hybrid modes support |
46 |
working correctly. |
35
by csa
Fix python code not feeding the real data into the sinograms |
47 |
b) CUDA 3.1 is buggy and produces incorrect results if GTX2xx cards are used. |
48 |
It looks like filtering step is not performed if computed using GTX280. Other |
|
49 |
possibility is problems computing with not primary card. Acutal configuration |
|
50 |
was GTX480 + GTX280. |
|
46
by csa
Fixes a bug in preload mode when images numbering starts not from 1 |
51 |
c) After new installation, sometimes results are completely and consistently
|
52 |
black. It looks like because the driver in a kind of wrong state. We can fix |
|
53 |
this by running some of the SDK examples. |
|
60
by Suren A. Chilingaryan
More on OpenCL detection |
54 |
d) The NVIDIA OpenCL library does not load AMD platform. The AMD library loads |
55 |
both NVIDIA and AMD platforms. |
|
56 |
||
30
by csa
Multi-GPU, Multi-CPU, and Hybrid modes support |
57 |
|
23
by csa
Perform pair of convolutions using a single complex fourier transformation in CUDA reconstruction module (early commit) |
58 |
Executing
|
59 |
---------
|
|
60 |
python PyHST.py <configuration_file> |
|
61 |
||
62 |
- To have debug output switched on, call the script
|
|
63 |
DEBUG=1 python .... |
|
64 |
||
65 |
- To force synchronous CUDA operations
|
|
66 |
CUDA_LAUNCH_BLOCKING=1 python .... |
|
67 |
||
68 |