| 1 | Dear alpha tester, |
|---|
| 2 | |
|---|
| 3 | to install, edit the Makefile and adjust the paths to includes |
|---|
| 4 | and libraries. |
|---|
| 5 | |
|---|
| 6 | With the release of CUDA2.3 it is not longer necessary to patch and |
|---|
| 7 | compile open64 and you can skip a few paragraphs |
|---|
| 8 | |
|---|
| 9 | quick install guide for cuda 2.2 open64 compiler |
|---|
| 10 | |
|---|
| 11 | download ftp://download.nvidia.com/CUDAOpen64/nvopencc-2.2-src.tar.gz |
|---|
| 12 | mkdir ~/open64; cd ~/open64 |
|---|
| 13 | extract the tarball here |
|---|
| 14 | patch -p1 < /path/to/nvopencc-2.2.patch (included) |
|---|
| 15 | cd src/targia3264_nvisa |
|---|
| 16 | make ROOT_DIR=~/open64/ |
|---|
| 17 | |
|---|
| 18 | when the build process stops with an error when trying to build 'bec' you find |
|---|
| 19 | inliner/inline |
|---|
| 20 | backend/be |
|---|
| 21 | gccfe/gfec |
|---|
| 22 | |
|---|
| 23 | copy those to /usr/local/cuda/open64/lib after backing up the original files |
|---|
| 24 | and yes it works with bec from the original distribution |
|---|
| 25 | |
|---|
| 26 | cuda 2.1 will probably not compile the program and the patch does not apply, |
|---|
| 27 | but all it does is add some includes and remove -Werror |
|---|
| 28 | |
|---|
| 29 | tested with gcc-4.3 |
|---|
| 30 | |
|---|
| 31 | stxxl-1.2.1: |
|---|
| 32 | |
|---|
| 33 | download from stxxl.sf.net and build. no need to install, just symlink |
|---|
| 34 | the stxxl source root dir to stxxl-svn in this directory |
|---|
| 35 | (stxxl from SVN is probably not really needed) |
|---|
| 36 | |
|---|
| 37 | to build the software: |
|---|
| 38 | |
|---|
| 39 | run the autoconfig: vi Makefile; vi obj/gcc (gcc 4.1 works) |
|---|
| 40 | cd obj; make |
|---|
| 41 | |
|---|
| 42 | the number of blocks below should match the number of "Multiprocessors" |
|---|
| 43 | on your GPU which is the number of stream processors / 8 |
|---|
| 44 | |
|---|
| 45 | running a test (all on one line): |
|---|
| 46 | time ./c --condition rounds:rounds=256 |
|---|
| 47 | --implementation sharedmem --algorithm A51 |
|---|
| 48 | --roundfunc xor:condition=distinguished_point::bits=8:generator=increment |
|---|
| 49 | --device cuda:threads=256:blocks=4:sleep=5000:operations=32768 |
|---|
| 50 | --logger verbose |
|---|
| 51 | --work random:prefix=10,0 |
|---|
| 52 | --consume print:results=16 |
|---|
| 53 | generate --chainlength 65536 --chains 1024 |
|---|
| 54 | |
|---|
| 55 | (copy+paste) |
|---|
| 56 | time ./c --condition rounds:rounds=256 --implementation sharedmem --algorithm A51 --roundfunc xor:condition=distinguished_point::bits=8:generator=increment --device cuda:threads=256:blocks=4:sleep=5000:operations=32768 --logger verbose --work random:prefix=10,0 --consume print:results=16 generate --chainlength 65536 --chains 1024 |
|---|
| 57 | |
|---|
| 58 | this produces 1024 chains of 256 (--condition) column spans which are |
|---|
| 59 | divided by a distinguished point, where at least 8 bits of the chain value |
|---|
| 60 | are zero (--roundfunc condition:) and uses increment on a uint32_t |
|---|
| 61 | to generate values for the xor round function (starting at zero) (generator=increment) |
|---|
| 62 | two colons are used for nested option grouping (see --roundfunc) |
|---|
| 63 | |
|---|
| 64 | the generator for the chain start values (--work random) uses srand() |
|---|
| 65 | and sets the first 10 bits to constant zero |
|---|
| 66 | |
|---|
| 67 | the work consumer prints some of the values (--consume) |
|---|
| 68 | |
|---|
| 69 | 256 threads in 4 blocks get scheduled at once with 5ms sleep between |
|---|
| 70 | kernel calls and 32768 columns computed for those 4*256 chains per call. |
|---|
| 71 | |
|---|
| 72 | device:operations should obviously divide --chainlength 65536 |
|---|
| 73 | (that is until i implement error checking) |
|---|
| 74 | |
|---|
| 75 | the call should generate a set of chains where about 50% of the chains |
|---|
| 76 | are complete (see output of --consume print) |
|---|
| 77 | |
|---|
| 78 | --$switch help e.g. ./c --condition help gives some |
|---|
| 79 | info for most options |
|---|
| 80 | |
|---|
| 81 | this is an alpha release. to find out more about the options, |
|---|
| 82 | look at the source. |
|---|
| 83 | |
|---|
| 84 | e.g. --work maps to ./work_generator/*.hpp |
|---|
| 85 | --consume to ./work_consumer/*.hpp |
|---|
| 86 | ... |
|---|
| 87 | |
|---|
| 88 | and look for a static member function "optdesc()" that populates the |
|---|
| 89 | option parser if you are curious |
|---|