/tomo/pyhst

To get this branch, use:
bzr branch http://darksoft.org/webbzr/tomo/pyhst
45 by csa
Fixes measurements of I/O timings
1
Requirements
2
------------
3
 * POSIX complaint operating system, standard set of Linux system apps (find,
4
 grep, sed, xargs, cut, ...), CMake build system
5
 * GNU C Compiler 4.x, current version of CUDA is not working with 4.4 and
6
 newer.
7
 * Python with PIL, Logging, NumPy modules support, optionally VimpsCC python
8
 module
9
 * Glib2 Library
10
 * FFTW or Intel MKL Library are required for multithreaded mode
11
 * NVIDIA CUDA Toolkit 2.x or 3.x is required for GPU based processing
12
 * x86-64 compatible CPU supporting SSSE3 instruction set
13
18 by csa
Big redesign (early commit)
14
Compiler
15
--------
44 by csa
Further data readout optimizations, SSE to change endianess, etc.
16
 * -O2 mandatory should be used, it gives approximately 10-times performance
17
 difference. -O3 causes problems in GPU mode, sometimes...
19 by csa
Detect FFTW3 and Intel Kernel Math Library
18
 * -msse2 -msse3 are also slightly improving performance in CPU-mode
18 by csa
Big redesign (early commit)
19
 * Intel compiler is only slightly faster than gcc-4.2.4 (There is certainly
20
 no sense in using it, much better to vectorise code manually)
21
22
FFT libraries
23
-------------
24
 The CPU FFT transformations can be performed in several ways
25
 a) The included 'Vhst_fourier.c' have very small initialization time, but 
26
 doesn't include SSE optimizations and, therefore, performs rather poor on
27
 modern Intel/AMD platforms
28
 b) FFTW3 library is about 10-15% faster, but uses more time for initialization,
29
 which is neglectable in case of large amounts of data.
28 by csa
Use pinned result buffer to perform device2host memory transfer parallel with computations, add ESRF copyright information in files appeared after redesign
30
 c) Intel FFTW3 library have comporable performance of processing with FFTW3 
31
 but very small initialization time.
23 by csa
Perform pair of convolutions using a single complex fourier transformation in CUDA reconstruction module (early commit)
32
33
Building
34
--------
35
 a) Run 'cmake .' to generate Makefiles, it will report the system libraries
36
 it was a able to find and which features will be used.
37
 b) 'ccmake .' or 'cmake-gui' commands may be used to configure options: set
38
 compilation flags, enable/disable CUDA support, etc.
39
 c) Run 'make' to compile GPU-enabled version and 'make cpu' to build cpu 
40
 only version (you should execute 'make clean' before switching between
41
 gpu and cpu targets)
42
30 by csa
Multi-GPU, Multi-CPU, and Hybrid modes support
43
Problems
44
--------
60 by Suren A. Chilingaryan
More on OpenCL detection
45
 a) CUDA 2.3 is buggy and produces incorrect results, CUDA 2.2, 3.0, 3.2 are 
30 by csa
Multi-GPU, Multi-CPU, and Hybrid modes support
46
 working correctly.
35 by csa
Fix python code not feeding the real data into the sinograms
47
 b) CUDA 3.1 is buggy and produces incorrect results if GTX2xx cards are used.
48
 It looks like filtering step is not performed if computed using GTX280. Other
49
 possibility is problems computing with not primary card. Acutal configuration 
50
 was GTX480 + GTX280.
46 by csa
Fixes a bug in preload mode when images numbering starts not from 1
51
 c) After new installation, sometimes results are completely and consistently
52
 black. It looks like because the driver in a kind of wrong state. We can fix
53
 this by running some of the SDK examples.
60 by Suren A. Chilingaryan
More on OpenCL detection
54
 d) The NVIDIA OpenCL library does not load AMD platform. The AMD library loads
55
 both NVIDIA and AMD platforms.
56
30 by csa
Multi-GPU, Multi-CPU, and Hybrid modes support
57
23 by csa
Perform pair of convolutions using a single complex fourier transformation in CUDA reconstruction module (early commit)
58
Executing
59
---------
60
  python PyHST.py <configuration_file>
61
  
62
  - To have debug output switched on, call the script
63
    DEBUG=1 python ....
64
    
65
  - To force synchronous CUDA operations
66
    CUDA_LAUNCH_BLOCKING=1 python ....
67
    
68