root/INSTALL

Revision 26, 3.2 KB (checked in by sascha, 3 years ago)

Makefile simplified
documentation added
work_consumer/distribution shows a graphical representation
of value distribution

Line 
1Dear alpha tester,
2
3to install, edit the Makefile and adjust the paths to includes
4and libraries.
5
6With the release of CUDA2.3 it is not longer necessary to patch and
7compile open64 and you can skip a few paragraphs
8
9quick install guide for cuda 2.2 open64 compiler
10
11download ftp://download.nvidia.com/CUDAOpen64/nvopencc-2.2-src.tar.gz
12mkdir ~/open64; cd ~/open64
13extract the tarball here
14patch -p1 < /path/to/nvopencc-2.2.patch (included)
15cd src/targia3264_nvisa
16make ROOT_DIR=~/open64/
17
18when the build process stops with an error when trying to build 'bec' you find
19inliner/inline
20backend/be
21gccfe/gfec
22
23copy those to /usr/local/cuda/open64/lib after backing up the original files
24and yes it works with bec from the original distribution
25
26cuda 2.1 will probably not compile the program and the patch does not apply,
27but all it does is add some includes and remove -Werror
28
29tested with gcc-4.3
30
31stxxl-1.2.1:
32
33download from stxxl.sf.net and build. no need to install, just symlink
34the stxxl source root dir to stxxl-svn in this directory
35(stxxl from SVN is probably not really needed)
36
37to build the software:
38
39run the autoconfig: vi Makefile; vi obj/gcc (gcc 4.1 works)
40cd obj; make
41
42the number of blocks below should match the number of "Multiprocessors"
43on your GPU which is the number of stream processors / 8
44
45running a test (all on one line):
46time ./c --condition rounds:rounds=256
47         --implementation sharedmem --algorithm A51
48         --roundfunc xor:condition=distinguished_point::bits=8:generator=increment
49         --device cuda:threads=256:blocks=4:sleep=5000:operations=32768
50         --logger verbose
51         --work random:prefix=10,0
52         --consume print:results=16
53         generate --chainlength 65536 --chains 1024
54
55(copy+paste)
56time ./c --condition rounds:rounds=256 --implementation sharedmem --algorithm A51 --roundfunc xor:condition=distinguished_point::bits=8:generator=increment --device cuda:threads=256:blocks=4:sleep=5000:operations=32768 --logger verbose --work random:prefix=10,0 --consume print:results=16 generate --chainlength 65536 --chains 1024
57
58this produces 1024 chains of 256 (--condition) column spans which are
59divided by a distinguished point, where at least 8 bits of the chain value
60are zero (--roundfunc condition:) and uses increment on a uint32_t
61to generate values for the xor round function (starting at zero) (generator=increment)
62two colons are used for nested option grouping (see --roundfunc)
63
64the generator for the chain start values (--work random) uses srand()
65and sets the first 10 bits to constant zero
66
67the work consumer prints some of the values (--consume)
68
69256 threads in 4 blocks get scheduled at once with 5ms sleep between
70kernel calls and 32768 columns computed for those 4*256 chains per call.
71
72device:operations should obviously divide --chainlength 65536
73(that is until i implement error checking)
74
75the call should generate a set of chains where about 50% of the chains
76are complete (see output of --consume print)
77
78--$switch help e.g. ./c --condition help gives some
79info for most options
80
81this is an alpha release. to find out more about the options,
82look at the source.
83
84e.g. --work maps to ./work_generator/*.hpp
85--consume to ./work_consumer/*.hpp
86...
87
88and look for a static member function "optdesc()" that populates the
89option parser if you are curious
Note: See TracBrowser for help on using the browser.