The currently supported devices are
- cuda
- shortcircuit
Cuda configuration
The cuda backend implements access to nvidia graphics adapters. All options have default values that still keep the user interface usable. The number of blocks can be autodetected. threads is 256 by default.
these options can be defined:
- implementation=name
- the name of the implementation to use
- blocks=n
- the number of cuda blocks to use (has a sensible default)
- threads=n
- the number of cuda threads per block (has a sensible default)
- operations=n
- the number of applications of the algorithm in a single kernel invocation
- sleep=n
- the number of microseconds to sleep() between kernel calls
The following implementations are available:
- bitslice
- Single round function bitslice code (used during generation with the sort intermediate)
- bitslice_multi
- Multi round function bitslice code (used during lookup)
- bitslice_extra and bitslice_multi_extra
- Supports extra clockings
- sharedmem
- Low throughput low latency version that is used during lookup
- sharedmem_extra
- Same as above with support for extra clockings
Shortcircuit configuration
The shortcircuit device does not modify any data. It can be used to invoke data processing operations after table generation (i.e. sorting, dumping). It has no options.
