1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
|
Requirements
------------
* POSIX complaint operating system, standard set of Linux system apps (find,
grep, sed, xargs, cut, ...), CMake build system
* GNU C Compiler 4.x, current version of CUDA is not working with 4.4 and
newer.
* Python with PIL, Logging, NumPy modules support, optionally VimpsCC python
module
* Glib2 Library
* FFTW or Intel MKL Library are required for multithreaded mode
* NVIDIA CUDA Toolkit 2.x or 3.x is required for GPU based processing
* x86-64 compatible CPU supporting SSSE3 instruction set
Compiler
--------
* -O2 mandatory should be used, it gives approximately 10-times performance
difference. -O3 causes problems in GPU mode, sometimes...
* -msse2 -msse3 are also slightly improving performance in CPU-mode
* Intel compiler is only slightly faster than gcc-4.2.4 (There is certainly
no sense in using it, much better to vectorise code manually)
FFT libraries
-------------
The CPU FFT transformations can be performed in several ways
a) The included 'Vhst_fourier.c' have very small initialization time, but
doesn't include SSE optimizations and, therefore, performs rather poor on
modern Intel/AMD platforms
b) FFTW3 library is about 10-15% faster, but uses more time for initialization,
which is neglectable in case of large amounts of data.
c) Intel FFTW3 library have comporable performance of processing with FFTW3
but very small initialization time.
Building
--------
a) Run 'cmake .' to generate Makefiles, it will report the system libraries
it was a able to find and which features will be used.
b) 'ccmake .' or 'cmake-gui' commands may be used to configure options: set
compilation flags, enable/disable CUDA support, etc.
c) Run 'make' to compile GPU-enabled version and 'make cpu' to build cpu
only version (you should execute 'make clean' before switching between
gpu and cpu targets)
Problems
--------
a) CUDA 2.3 is buggy and produces incorrect results, CUDA 2.2, 3.0, 3.2 are
working correctly.
b) CUDA 3.1 is buggy and produces incorrect results if GTX2xx cards are used.
It looks like filtering step is not performed if computed using GTX280. Other
possibility is problems computing with not primary card. Acutal configuration
was GTX480 + GTX280.
c) After new installation, sometimes results are completely and consistently
black. It looks like because the driver in a kind of wrong state. We can fix
this by running some of the SDK examples.
d) The NVIDIA OpenCL library does not load AMD platform. The AMD library loads
both NVIDIA and AMD platforms.
Executing
---------
python PyHST.py <configuration_file>
- To have debug output switched on, call the script
DEBUG=1 python ....
- To force synchronous CUDA operations
CUDA_LAUNCH_BLOCKING=1 python ....
|