Pipelines#

Janus allows for the configuration and running of pipelines as of version 4.4.0. Pipelines are a series of steps that can be run in sequence. Pipelines are configured in JSON files and can be run with the janus pipeline command.

Configuring a Pipeline#

Every pipeline is configured with a JSON file made up of a list of JSON objects. Each object represents a step in the pipeline and can be any number of Converters or Refiners. The list of these objects can be seen below in the Available Components section.

Available Components#

Converters#

  • Aggregator: Aggregates multiple products into a single output product.

  • Partitioner: Partitions source code in different ways (with an LLM, etc.).

  • Translator: Translates source code from one programming language to another.

Evaluators#

Documenters#

Example Single Stage Pipeline#

Every pipeline must have at least one component. The following is an example of a pipeline that uses the Translator component to translate from Python to Javascript. The user can also specify some keyword arguments to the component.

translate.json:

[
    {
        "type": "Translator",
        "kwargs": {"source_language": "python", "target_language": "javascript"}
    }
]

This can then be run with the following command:

janus pipeline --input janus/cli/ --output janus-translation --pipeline pipelines/translate.json --llm my-gpt -l python

This is the equivalent to running the following command:

janus translate --input janus/cli/ --output janus-translation --llm my-gpt --source-language python --target-language javascript

Example Multi Stage Pipeline#

comment_eval.json

[
    {
        "type": "ClozeDocumenter",
        "kwargs": {"comments_per_request": 5}
    },
    {
        "type": "InlineCommentEvaluator",
        "kwargs": {}
    }
]

This can then be run with the following command:

janus pipeline --input janus/cli/ --output janus-translation --pipeline pipelines/comment_eval.json --llm my-gpt -l python

This is the equivalent to running the following commands:

janus document --doc-mode cloze --input janus/cli/ --output janus-translation --llm my-gpt -l python
janus llm-self-eval --input janus-translation --output janus-evals --llm my-gpt -l python -e comments

Using ConverterPools#

ConverterPools allow for the parallel execution of multiple Converters.

Basic ConverterPool#

The following is an example of a pipeline that uses a ConverterPool to run multiple Documenters in parallel.

[
    {
        "type": "ConverterPool",
        "args": [
            {
                "type": "Documenter",
                "kwargs": {}
            },
            {
                "type": "ClozeDocumenter",
                "kwargs": {"comments_per_request": 5}
            }
        ]
    }
]

This runs the Documenter and ClozeDocumenter in parallel, producing two outputs in the output JSON.

ConverterPool with Evaluation#

The following example runs two ClozeDocumenters in parallel and then runs an InlineCommentEvaluator on the output of the ClozeDocumenters.

[
    {
        "type": "ConverterPool",
        "args": [
            {
                "type": "ClozeDocumenter",
                "kwargs": {}
            },
            {
                "type": "ClozeDocumenter",
                "kwargs": {"comments_per_request": 5}
            }
        ]
    },
    {
        "type": "InlineCommentEvaluator",
        "kwargs": {"eval_items_per_request": 5}
    }
]

ConverterPool with ConverterPassthrough#

The ConverterPassthrough component allows for the output of one Converter to be passed to the next stage of the pipeline without modification.

[
    {
        "type": "ClozeDocumenter",
        "kwargs": {"comments_per_request": 5}
    },
    {
	"type": "ConverterPool",
	"args": [
	    {
		"type": "InlineCommentEvaluator"
	    },
	    {
		"type": "ConverterPassthrough"
	    }
	]
    }
]

In this example, the output of the ClozeDocumenter is passed to the InlineCommentEvaluator and the ConverterPassthrough. This produces two outputs in the output JSON: the output of the InlineCommentEvaluator and the output of the ClozeDocumenter.

ConverterPool with Input and Output Labels#

Every Converter allows for the specification of input and output labels.

[
    {
        "type": "ConverterPool",
        "args": [
            {
                "type": "ClozeDocumenter",
                "kwargs": {"output_label": "dtest"}
            },
            {
                "type": "ClozeDocumenter",
                "kwargs": {}
            },
            {
                "type": "ClozeDocumenter",
                "kwargs": {"output_label": "dtest"}
            }
        ]
    },
    {
        "type": "ConverterPool",
	    "args": [
            {
                "type": "InlineCommentEvaluator",
                "kwargs": {"input_labels": "dtest"}
            }
	    ]
    }
]

The labels are used to specify which outputs are passed to which inputs. In this example, the output of the first and third ClozeDocumenters are passed to the InlineCommentEvaluator, and the output of the second ClozeDocumenter is kept in the intermediate outputs of the resultant JSON file.

ConverterPool with Input Types#

Every Converter has an associated output_type that informs other Converters of the type of output it produces. The input_types argument allows for the specification of the types of input that a Converter can accept.

[
    {
        "type": "ConverterPool",
        "args": [
            {
                "type": "Documenter",
                "kwargs": {}
            },
            {
                "type": "ClozeDocumenter",
                "kwargs": {"comments_per_request": 5}
            }
        ]
    },
    {
        "type": "ConverterPool",
        "args": [
            {
	            "type": "Translator",
		        "kwargs": {"input_types": "documentation"}
            },
            {
                "type": "InlineCommentEvaluator"
            }
	]
    }
]

This example runs a Documenter and a ClozeDocumenter in parallel, producing two outputs in the output JSON. The outputs are then passed to a ConverterPool that runs a Translator and an InlineCommentEvaluator in parallel. The Translator is specified to accept only documentation as input, so it will only accept the output of the Documenter and not the ClozeDocumenter.

Available Output Types#

The available output types for each converter are listed here:

  • DiagramGenerator: diagram

  • Documenter: documentation

  • MultiDocumenter: multidocumentation

  • ClozeDocumenter: cloze_comments

  • RequirementsDocumenter: requirements

  • Partitioner: partition

  • RequirementEvaluator: requirements_eval

  • InlineCommentEvaluator: cloze_comments_eval