Getting Started

Below is a simple example demonstrating how to define and execute a pipeline with steps:

import time
from devpipe import step, pipeline

@step
def add_one(x):
    time.sleep(1)
    return x + 1

@step
def times_two(x):
    time.sleep(1)
    return x * 2

@pipeline
def example_pipeline(x):
    y = add_one(x)
    z = times_two(y)
    return z

result = example_pipeline(3)  # Output will be 8 ( (3 + 1) * 2 )

The first execution of the pipeline will take around 2 seconds to complete. Subsequent executions will return the cached result without running the pipeline or steps again.

Caching

By default, the result of pipelines and steps are stored in disk and the executions are tracked by a database. This allows for fast re-execution of long running pipelines and steps that use the same inputs.

You can disable caching both in pipelines and steps by setting the cache parameter to False:

@step(cache=False)
def example_step(x):
    ...

@pipeline(cache=False)
def example_pipeline(x):
    ...

Re-run

You can force the execution of a pipeline or step by setting the rerun parameter to True:

@step(rerun=True)
def example_step(x):
    ...

@pipeline(rerun=True)
def example_pipeline(x):
    ...

This will ignore the cached result, run the pipeline or step again and store the new result in the cache.

Naming

Internally, pipelines are identified by their name. By default, the name of the pipeline is the name of the function. If you have multiple pipelines with the same function name, you can specify a custom name using the name parameter:

@pipeline(name='my_pipeline')
def example_pipeline(x):
    ...

Similarly, if a pipeline has multiple steps with the same function name, you can specify a custom name for the step using the name parameter:

@step(name='my_step')
def example_step(x):
    ...

Steps are linked to the pipeline they belong to, so the name of the step only needs to be unique within the pipeline.