ruffus.Task
Decorators
Basic Task decorators are:
Task decorators include:
More advanced users may require:
Pipeline functions
pipeline_run
- ruffus.task.pipeline_run(target_tasks, forcedtorun_tasks=[], multiprocess=1, logger=stderr_logger, gnu_make_maximal_rebuild_mode=True)[source]
Run pipelines.
- Parameters:
target_tasks – targets task functions which will be run if they are out-of-date
forcedtorun_tasks – task functions which will be run whether or not they are out-of-date
multiprocess – The number of concurrent jobs running on different processes.
multithread – The number of concurrent jobs running as different threads. If > 1, ruffus will use multithreading instead of multiprocessing (and ignore the multiprocess parameter). Using multi threading is particularly useful to manage high performance clusters which otherwise are prone to “processor storms” when large number of cores finish jobs at the same time.
logger (logging objects) – Where progress will be logged. Defaults to stderr output.
verbose –
level 0 : nothing
level 1 : All Task names
level 2 : All Tasks names any task function docstrings
level 3 : Out-of-date Jobs in Out-of-date Tasks, no explanation
level 4 : Out-of-date Jobs in Out-of-date Tasks, with explanations and warnings
level 5 : All Jobs in Out-of-date Tasks, (include only list of up-to-date tasks)
level 6 : All jobs in All Tasks whether out of date or not
level 7 : Show file modification times for All jobs in All Tasks
level 10: logs messages useful only for debugging ruffus pipeline code
touch_files_only – Create or update input/output files only to simulate running the pipeline. Do not run jobs. If set to CHECKSUM_REGENERATE, will regenerate the checksum history file to reflect the existing i/o files on disk.
exceptions_terminate_immediately – Exceptions cause immediate termination rather than waiting for N jobs to finish where N = multiprocess
log_exceptions – Print exceptions to logger as soon as they occur.
checksum_level –
Several options for checking up-to-dateness are available: Default is level 1.
level 0 : Use only file timestamps
level 1 : above, plus timestamp of successful job completion
level 2 : above, plus a checksum of the pipeline function body
level 3 : above, plus a checksum of the pipeline function default arguments and the additional arguments passed in by task decorators
one_second_per_job – To work around poor file timepstamp resolution for some file systems. Defaults to True if checksum_level is 0 forcing Tasks to take a minimum of 1 second to complete.
runtime_data – Experimental feature: pass data to tasks at run time
gnu_make_maximal_rebuild_mode – Defaults to re-running all out-of-date tasks. Runs minimal set to build targets if set to
True
. Use with caution.history_file – Database file storing checksums and file timestamps for input/output files.
verbose_abbreviated_path –
whether input and output paths are abbreviated.
level 0: The full (expanded, abspath) input or output path
level > 1: The number of subdirectories to include. Abbreviated paths are prefixed with
[,,,]/
level < 0: Input / Output parameters are truncated to
MMM
letters whereverbose_abbreviated_path ==-MMM
. Subdirectories are first removed to see if this allows the paths to fit in the specified limit. Otherwise abbreviated paths are prefixed by<???>
pipeline_printout
- ruffus.task.pipeline_printout(output_stream=None, target_tasks=[], forcedtorun_tasks=[], verbose=None, indent=4, gnu_make_maximal_rebuild_mode=True, wrap_width=100, runtime_data=None, checksum_level=None, history_file=None, verbose_abbreviated_path=None, pipeline=None)[source]
Printouts the parts of the pipeline which will be run
Because the parameters of some jobs depend on the results of previous tasks, this function produces only the current snap-shot of task jobs. In particular, tasks which generate variable number of inputs into following tasks will not produce the full range of jobs.
- ::
verbose = 0 : Nothing verbose = 1 : All Tasks names verbose = 2 : All Tasks (including any task function docstrings) verbose = 3 : Out-of-date Jobs in Out-of-date Tasks, no explanation verbose = 4 : Out-of-date Jobs in Out-of-date Tasks, with explanations and warnings verbose = 5 : All Jobs in Out-of-date Tasks, (include only list of up-to-date tasks) verbose = 6 : All jobs in All Tasks whether out of date or not
- Parameters:
output_stream (file-like object with
write()
function) – where to print totarget_tasks – targets task functions which will be run if they are out-of-date
forcedtorun_tasks – task functions which will be run whether or not they are out-of-date
verbose – level 0 : nothing level 1 : Out-of-date Task names level 2 : All Tasks (including any task function docstrings) level 3 : Out-of-date Jobs in Out-of-date Tasks, no explanation level 4 : Out-of-date Jobs in Out-of-date Tasks, with explanations and warnings level 5 : All Jobs in Out-of-date Tasks, (include only list of up-to-date tasks) level 6 : All jobs in All Tasks whether out of date or not level 7 : Show file modification times for All jobs in All Tasks level 10: logs messages useful only for debugging ruffus pipeline code
indent – How much indentation for pretty format.
gnu_make_maximal_rebuild_mode – Defaults to re-running all out-of-date tasks. Runs minimal set to build targets if set to
True
. Use with caution.wrap_width – The maximum length of each line
runtime_data – Experimental feature: pass data to tasks at run time
checksum_level – Several options for checking up-to-dateness are available: Default is level 1. level 0 : Use only file timestamps level 1 : above, plus timestamp of successful job completion level 2 : above, plus a checksum of the pipeline function body level 3 : above, plus a checksum of the pipeline function default arguments and the additional arguments passed in by task decorators
history_file – Database file storing checksums and file timestamps for input/output files.
verbose_abbreviated_path – whether input and output paths are abbreviated. level 0: The full (expanded, abspath) input or output path level > 1: The number of subdirectories to include. Abbreviated paths are prefixed with
[,,,]/
level < 0: Input / Output parameters are truncated toMMM
letters whereverbose_abbreviated_path ==-MMM
. Subdirectories are first removed to see if this allows the paths to fit in the specified limit. Otherwise abbreviated paths are prefixed by<???>
pipeline_printout_graph
- ruffus.task.pipeline_printout_graph(stream, output_format=None, target_tasks=[], forcedtorun_tasks=[], draw_vertically=True, ignore_upstream_of_target=False, skip_uptodate_tasks=False, gnu_make_maximal_rebuild_mode=True, test_all_task_for_update=True, no_key_legend=False, minimal_key_legend=True, user_colour_scheme=None, pipeline_name='Pipeline:', size=(11, 8), dpi=120, runtime_data=None, checksum_level=None, history_file=None, pipeline=None)[source]
print out pipeline dependencies in various formats
- Parameters:
stream (file-like object with
write()
function) – where to print tooutput_format – [“dot”, “jpg”, “svg”, “ps”, “png”]. All but the first depends on the dot program.
target_tasks – targets task functions which will be run if they are out-of-date.
forcedtorun_tasks – task functions which will be run whether or not they are out-of-date.
draw_vertically – Top to bottom instead of left to right.
ignore_upstream_of_target – Don’t draw upstream tasks of targets.
skip_uptodate_tasks – Don’t draw up-to-date tasks if possible.
gnu_make_maximal_rebuild_mode – Defaults to re-running all out-of-date tasks. Runs minimal set to build targets if set to
True
. Use with caution.test_all_task_for_update – Ask all task functions if they are up-to-date.
no_key_legend – Don’t draw key/legend for graph.
minimal_key_legend – Only legend entries for used task types
user_colour_scheme – Dictionary specifying flowchart colour scheme
pipeline_name – Pipeline Title
size – tuple of x and y dimensions
dpi – print resolution
runtime_data – Experimental feature: pass data to tasks at run time
history_file – Database file storing checksums and file timestamps for input/output files.
checksum_level – Several options for checking up-to-dateness are available: Default is level 1. level 0 : Use only file timestamps level 1 : above, plus timestamp of successful job completion level 2 : above, plus a checksum of the pipeline function body level 3 : above, plus a checksum of the pipeline function default arguments and the additional arguments passed in by task decorators
Logging
Implementation:
Parameter factories:
- ruffus.task.merge_param_factory(input_files_task_globs, output_param, *extra_params)[source]
Factory for task_merge
- ruffus.task.collate_param_factory(input_files_task_globs, file_names_transform, extra_input_files_task_globs, replace_inputs, output_pattern, *extra_specs)[source]
Factory for task_collate
Looks exactly like @transform except that all [input] which lead to the same [output / extra] are combined together
- ruffus.task.transform_param_factory(input_files_task_globs, file_names_transform, extra_input_files_task_globs, replace_inputs, output_pattern, *extra_specs)[source]
Factory for task_transform
- ruffus.task.files_param_factory(input_files_task_globs, do_not_expand_single_job_tasks, output_extras)[source]
- Factory for functions which
yield tuples of inputs, outputs / extras
..Note:
1. Each job requires input/output file names 2. Input/output file names can be a string, an arbitrarily nested sequence 3. Non-string types are ignored 3. Either Input or output file name must contain at least one string
- ruffus.task.args_param_factory(orig_args)[source]
- Factory for functions which
yield tuples of inputs, outputs / extras
..Note:
1. Each job requires input/output file names 2. Input/output file names can be a string, an arbitrarily nested sequence 3. Non-string types are ignored 3. Either Input or output file name must contain at least one string
Wrappers around jobs:
- ruffus.task.job_wrapper_generic(params, user_defined_work_func, register_cleanup, touch_files_only)[source]
run func