The currently supported devices are

  • cuda
  • shortcircuit

Cuda configuration

The cuda backend implements access to nvidia graphics adapters. All options have default values that still keep the user interface usable. The number of blocks can be autodetected. threads is 256 by default.

these options can be defined:

implementation=name
the name of the implementation to use
blocks=n
the number of cuda blocks to use (has a sensible default)
threads=n
the number of cuda threads per block (has a sensible default)
operations=n
the number of applications of the algorithm in a single kernel invocation
sleep=n
the number of microseconds to sleep() between kernel calls

The following implementations are available:

bitslice
Single round function bitslice code (used during generation with the sort intermediate)
bitslice_multi
Multi round function bitslice code (used during lookup)
bitslice_extra and bitslice_multi_extra
Supports extra clockings
sharedmem
Low throughput low latency version that is used during lookup
sharedmem_extra
Same as above with support for extra clockings

Shortcircuit configuration

The shortcircuit device does not modify any data. It can be used to invoke data processing operations after table generation (i.e. sorting, dumping). It has no options.