/tomo/pyhst : contents of README at revision 276

: (revision 276)

To get this branch, use:

bzr branch
http://darksoft.org/webbzr/tomo/pyhst

Requirements
------------
 * POSIX complaint operating system, standard set of Linux system apps (find,
 grep, sed, xargs, cut, ...), CMake build system
 * GNU C Compiler 4.x, current version of CUDA is not working with 4.4 and
 newer.
 * Python with PIL, Logging, NumPy modules support, optionally VimpsCC python
 module
 * Glib2 Library
 * FFTW or Intel MKL Library are required for multithreaded mode
 * NVIDIA CUDA Toolkit 2.x or 3.x is required for GPU based processing
 * x86-64 compatible CPU supporting SSSE3 instruction set

Compiler
--------
 * -O2 mandatory should be used, it gives approximately 10-times performance
 difference. -O3 causes problems in GPU mode, sometimes...
 * -msse2 -msse3 are also slightly improving performance in CPU-mode
 * Intel compiler is only slightly faster than gcc-4.2.4 (There is certainly
 no sense in using it, much better to vectorise code manually)

FFT libraries
-------------
 The CPU FFT transformations can be performed in several ways
 a) The included 'Vhst_fourier.c' have very small initialization time, but 
 doesn't include SSE optimizations and, therefore, performs rather poor on
 modern Intel/AMD platforms
 b) FFTW3 library is about 10-15% faster, but uses more time for initialization,
 which is neglectable in case of large amounts of data.
 c) Intel FFTW3 library have comporable performance of processing with FFTW3 
 but very small initialization time.

Building
--------
 a) Run 'cmake .' to generate Makefiles, it will report the system libraries
 it was a able to find and which features will be used.
 b) 'ccmake .' or 'cmake-gui' commands may be used to configure options: set
 compilation flags, enable/disable CUDA support, etc.
 c) Run 'make' to compile GPU-enabled version and 'make cpu' to build cpu 
 only version (you should execute 'make clean' before switching between
 gpu and cpu targets)

Problems
--------
 a) CUDA 2.3 is buggy and produces incorrect results, CUDA 2.2, 3.0, 3.2 are 
 working correctly.
 b) CUDA 3.1 is buggy and produces incorrect results if GTX2xx cards are used.
 It looks like filtering step is not performed if computed using GTX280. Other
 possibility is problems computing with not primary card. Acutal configuration 
 was GTX480 + GTX280.
 c) After new installation, sometimes results are completely and consistently
 black. It looks like because the driver in a kind of wrong state. We can fix
 this by running some of the SDK examples.
 d) The NVIDIA OpenCL library does not load AMD platform. The AMD library loads
 both NVIDIA and AMD platforms.


Executing
---------
  python PyHST.py <configuration_file>
  
  - To have debug output switched on, call the script
    DEBUG=1 python ....
    
  - To force synchronous CUDA operations
    CUDA_LAUNCH_BLOCKING=1 python ....
    
    

45 by csa Fixes measurements of I/O timings	1	Requirements
	2	------------
	3	* POSIX complaint operating system, standard set of Linux system apps (find,
	4	grep, sed, xargs, cut, ...), CMake build system
	5	* GNU C Compiler 4.x, current version of CUDA is not working with 4.4 and
	6	newer.
	7	* Python with PIL, Logging, NumPy modules support, optionally VimpsCC python
	8	module
	9	* Glib2 Library
	10	* FFTW or Intel MKL Library are required for multithreaded mode
	11	* NVIDIA CUDA Toolkit 2.x or 3.x is required for GPU based processing
	12	* x86-64 compatible CPU supporting SSSE3 instruction set
	13
18 by csa Big redesign (early commit)	14	Compiler
18 by csa Big redesign (early commit)	15	--------
44 by csa Further data readout optimizations, SSE to change endianess, etc.	16	* -O2 mandatory should be used, it gives approximately 10-times performance
	17	difference. -O3 causes problems in GPU mode, sometimes...
19 by csa Detect FFTW3 and Intel Kernel Math Library	18	* -msse2 -msse3 are also slightly improving performance in CPU-mode
18 by csa Big redesign (early commit)	19	* Intel compiler is only slightly faster than gcc-4.2.4 (There is certainly
	20	no sense in using it, much better to vectorise code manually)
	21
	22	FFT libraries
	23	-------------
	24	The CPU FFT transformations can be performed in several ways
	25	a) The included 'Vhst_fourier.c' have very small initialization time, but
	26	doesn't include SSE optimizations and, therefore, performs rather poor on
	27	modern Intel/AMD platforms
	28	b) FFTW3 library is about 10-15% faster, but uses more time for initialization,
	29	which is neglectable in case of large amounts of data.
28 by csa Use pinned result buffer to perform device2host memory transfer parallel with computations, add ESRF copyright information in files appeared after redesign	30	c) Intel FFTW3 library have comporable performance of processing with FFTW3
	31	but very small initialization time.
23 by csa Perform pair of convolutions using a single complex fourier transformation in CUDA reconstruction module (early commit)	32
	33	Building
	34	--------
	35	a) Run 'cmake .' to generate Makefiles, it will report the system libraries
	36	it was a able to find and which features will be used.
	37	b) 'ccmake .' or 'cmake-gui' commands may be used to configure options: set
	38	compilation flags, enable/disable CUDA support, etc.
	39	c) Run 'make' to compile GPU-enabled version and 'make cpu' to build cpu
	40	only version (you should execute 'make clean' before switching between
	41	gpu and cpu targets)
	42
30 by csa Multi-GPU, Multi-CPU, and Hybrid modes support	43	Problems
30 by csa Multi-GPU, Multi-CPU, and Hybrid modes support	44	--------
60 by Suren A. Chilingaryan More on OpenCL detection	45	a) CUDA 2.3 is buggy and produces incorrect results, CUDA 2.2, 3.0, 3.2 are
30 by csa Multi-GPU, Multi-CPU, and Hybrid modes support	46	working correctly.
35 by csa Fix python code not feeding the real data into the sinograms	47	b) CUDA 3.1 is buggy and produces incorrect results if GTX2xx cards are used.
	48	It looks like filtering step is not performed if computed using GTX280. Other
	49	possibility is problems computing with not primary card. Acutal configuration
	50	was GTX480 + GTX280.
46 by csa Fixes a bug in preload mode when images numbering starts not from 1	51	c) After new installation, sometimes results are completely and consistently
	52	black. It looks like because the driver in a kind of wrong state. We can fix
	53	this by running some of the SDK examples.
60 by Suren A. Chilingaryan More on OpenCL detection	54	d) The NVIDIA OpenCL library does not load AMD platform. The AMD library loads
	55	both NVIDIA and AMD platforms.
	56
30 by csa Multi-GPU, Multi-CPU, and Hybrid modes support	57
23 by csa Perform pair of convolutions using a single complex fourier transformation in CUDA reconstruction module (early commit)	58	Executing
	59	---------
	60	python PyHST.py <configuration_file>
	61
	62	- To have debug output switched on, call the script
	63	DEBUG=1 python ....
	64
	65	- To force synchronous CUDA operations
	66	CUDA_LAUNCH_BLOCKING=1 python ....
	67
	68