An intermediate is very much a combination of a work consumer and a work generator. It is used to process chains that are not yet completed. Those incomplete chains are first fed into the intermediate where the chain end condition is tested for. If it is true, then the chain is given to the work consumer. If it is false the working set management object will retrieve those incomplete chains from the intermediate (and use it as a work generator) and keep processing them.
The following intermediates are implemented:
- filter
- sort
- lookup
- stxxl (deprecated)
filter
The filter intermediate checks for the chain end condition and gives complete chains to the work consumer. It has the following options:
- runlength=n
- the number of steps to compute between passing the chains through the intermediate
- unbuffered
- write to the consumer of the chains that meet the intermediate condition as soon as new chains arrive. the default buffer size to use otherwise is some implementation dependent sensibe default.
sort
Each chain matching the intermediate condition is appended to the corresponding bucket. The bucket itself is an unsorted collection of chains and is to be sorted when read again from disk. A bucket corresponds to a work consumer, currently that is hardcoded as the "file" consumer with appropriate options. If you use N buckets, then 3 * N temporary files will be used to store the chains. After all chains have beed processed, the program switches to the next round and creates the appropriate work generators that read from the temporary files. The work generator is currently hardcoded as the "sort" generator with appropriate options. Only one bucket at a time is sorted which is the one that will be used after the currently active bucket, which is kept in ram while it is read from. One thread is launched to sort a bucket concurrently with the rest of the program. An exception to this rule is the first bucket which cannot be sorted asynchronously, so the program blocks.
- prefix=filename
- use the given filename as the prefix for temporary files. _R_B.[([start|end].tbl)|table] will be appended. R is the round from which the chains written originate, e.g. 0 - RoundMax?. B is the bucket number and usually something between 1 and 1000.
- final
- use excatly 2 rounds, ignoring the rounds parameter of the --condition. The first round creates the buckets and the second round reads from the buckets, outputting to the work consumer.
- ram
- the maximum amount of memory to use while sorting a single bucket. this option is passed to the "sort" generator.
- parts
- the number of buckets to use
lookup
The lookup intermediate is fed the chains that meet the intermediate condition as usual. If the chain is still not at its end it is reinserted in the chain computation to be completed. The end value of completed chains is looked up in the table given as an argument and if a match is found, the start value extracted from the table file is reinserted into the chains computation starting at round 0 to check for a false positive.
- prefix=filename
- the name of the table to use
- startbits=integer
- start bits of the table
- endbits=integer
- end bits of the table
- truncstart=lsb|msb
- where to truncate the start value
- truncend=lsb|msb
- where to truncate the end value
- indexbits=integer
- number of bits in the index
stxxl (currently deprecated)
The stxxl intermediate also checks for the chain end condition, but also sorts the chains for the end value so it is possible to detect collisions early. As it would be used to first compute some steps of all chains, then sort and filter then compute the next steps and so on, it supports writing checkpoint files of those incomplete chains. Checkpoints are not needed when computing a set of chains to the end then the next set and so on, as they are stored on disk by the work consumer, but with the vertical approach all chains would be lost on a fatal error without checkpoints.
It takes the following options:
- runlength=n
- the number of steps to compute between passing the chain through the intermediate
- checkpoint=n
- the number of steps to compute between writing of checkpoint files, which must be a multiple of the runlength
- prefix=name
- the file prefix to use for the checkpoint files
