/tomo/pyhst : contents of README at revision 276

: (revision 276)

To get this branch, use:

bzr branch
http://darksoft.org/webbzr/tomo/pyhst

Requirements
------------
 * POSIX complaint operating system, standard set of Linux system apps (find,
 grep, sed, xargs, cut, ...), CMake build system
 * GNU C Compiler 4.x, current version of CUDA is not working with 4.4 and
 newer.
 * Python with PIL, Logging, NumPy modules support, optionally VimpsCC python
 module
 * Glib2 Library
 * FFTW or Intel MKL Library are required for multithreaded mode
 * NVIDIA CUDA Toolkit 2.x or 3.x is required for GPU based processing
 * x86-64 compatible CPU supporting SSSE3 instruction set

Compiler
--------
 * -O2 mandatory should be used, it gives approximately 10-times performance
 difference. -O3 causes problems in GPU mode, sometimes...
 * -msse2 -msse3 are also slightly improving performance in CPU-mode
 * Intel compiler is only slightly faster than gcc-4.2.4 (There is certainly
 no sense in using it, much better to vectorise code manually)

FFT libraries
-------------
 The CPU FFT transformations can be performed in several ways
 a) The included 'Vhst_fourier.c' have very small initialization time, but 
 doesn't include SSE optimizations and, therefore, performs rather poor on
 modern Intel/AMD platforms
 b) FFTW3 library is about 10-15% faster, but uses more time for initialization,
 which is neglectable in case of large amounts of data.
 c) Intel FFTW3 library have comporable performance of processing with FFTW3 
 but very small initialization time.

Building
--------
 a) Run 'cmake .' to generate Makefiles, it will report the system libraries
 it was a able to find and which features will be used.
 b) 'ccmake .' or 'cmake-gui' commands may be used to configure options: set
 compilation flags, enable/disable CUDA support, etc.
 c) Run 'make' to compile GPU-enabled version and 'make cpu' to build cpu 
 only version (you should execute 'make clean' before switching between
 gpu and cpu targets)

Problems
--------
 a) CUDA 2.3 is buggy and produces incorrect results, CUDA 2.2, 3.0, 3.2 are 
 working correctly.
 b) CUDA 3.1 is buggy and produces incorrect results if GTX2xx cards are used.
 It looks like filtering step is not performed if computed using GTX280. Other
 possibility is problems computing with not primary card. Acutal configuration 
 was GTX480 + GTX280.
 c) After new installation, sometimes results are completely and consistently
 black. It looks like because the driver in a kind of wrong state. We can fix
 this by running some of the SDK examples.
 d) The NVIDIA OpenCL library does not load AMD platform. The AMD library loads
 both NVIDIA and AMD platforms.


Executing
---------
  python PyHST.py <configuration_file>
  
  - To have debug output switched on, call the script
    DEBUG=1 python ....
    
  - To force synchronous CUDA operations
    CUDA_LAUNCH_BLOCKING=1 python ....