janus.converter.document#

Attributes#

log

Classes#

`Documenter`	Parent class that converts code into something else.
`MultiDocumenter`	Parent class that converts code into something else.
`ClozeDocumenter`	Parent class that converts code into something else.

Module Contents#

janus.converter.document.log#

class janus.converter.document.Documenter(source_language='fortran', drop_comments=True, output_type='documentation', **kwargs)#

Bases: janus.converter.converter.Converter

Parent class that converts code into something else.

Children will determine what the code gets converted into. Whether that’s translated into another language, into pseudocode, requirements, documentation, etc., or converted into embeddings

Initialize a Converter instance.

Parameters:

source_language (str) – The source programming language.
parser_type – The type of parser to use for parsing the LLM output. Valid values are “code”, “text”, “eval”, and None (default). If None, the Converter assumes you won’t be parsing an output (i.e., adding to an embedding DB).
max_prompts – The maximum number of prompts to try before giving up.
max_tokens – The maximum number of tokens to use in the LLM. If None, the converter will use half the model’s token limit.
prompt_templates – The name of the prompt templates to use.
db_path – The path to the database to use for vectorization.
db_config – The configuration for the database.
protected_node_types – A set of node types that aren’t to be merged.
prune_node_types – A set of node types which should be pruned.
splitter_type – The type of splitter to use. Valid values are “file”, “tag”, “chunk”, “ast-strict”, and “ast-flex”.
refiner_type – The type of refiner to use. Valid values: - “parser” - “reflection” - None
retriever_type – The type of retriever to use. Valid values: - “active_usings” - “language_docs” - None
combine_output – Whether to combine the output into a single file or not.
use_janus_inputs – Whether to use janus inputs or not.
target_language – The target programming language.
target_version – The target programming language version.
input_types – The types of input to accept.
input_labels – The labels of input to accept.
output_type (str) – The type of output to produce.
output_label – The label of output to produce.
drop_comments (bool) –

class janus.converter.document.MultiDocumenter(output_type='multidocumentation', **kwargs)#

Bases: Documenter

Parent class that converts code into something else.

Children will determine what the code gets converted into. Whether that’s translated into another language, into pseudocode, requirements, documentation, etc., or converted into embeddings

Initialize a Converter instance.

Parameters:

source_language – The source programming language.
parser_type – The type of parser to use for parsing the LLM output. Valid values are “code”, “text”, “eval”, and None (default). If None, the Converter assumes you won’t be parsing an output (i.e., adding to an embedding DB).
max_prompts – The maximum number of prompts to try before giving up.
max_tokens – The maximum number of tokens to use in the LLM. If None, the converter will use half the model’s token limit.
prompt_templates – The name of the prompt templates to use.
db_path – The path to the database to use for vectorization.
db_config – The configuration for the database.
protected_node_types – A set of node types that aren’t to be merged.
prune_node_types – A set of node types which should be pruned.
splitter_type – The type of splitter to use. Valid values are “file”, “tag”, “chunk”, “ast-strict”, and “ast-flex”.
refiner_type – The type of refiner to use. Valid values: - “parser” - “reflection” - None
retriever_type – The type of retriever to use. Valid values: - “active_usings” - “language_docs” - None
combine_output – Whether to combine the output into a single file or not.
use_janus_inputs – Whether to use janus inputs or not.
target_language – The target programming language.
target_version – The target programming language version.
input_types – The types of input to accept.
input_labels – The labels of input to accept.
output_type (str) – The type of output to produce.
output_label – The label of output to produce.

class janus.converter.document.ClozeDocumenter(comments_per_request=None, output_type='cloze_comments', **kwargs)#

Bases: Documenter

Parent class that converts code into something else.

Children will determine what the code gets converted into. Whether that’s translated into another language, into pseudocode, requirements, documentation, etc., or converted into embeddings

Initialize a Converter instance.

Parameters:

source_language – The source programming language.
parser_type – The type of parser to use for parsing the LLM output. Valid values are “code”, “text”, “eval”, and None (default). If None, the Converter assumes you won’t be parsing an output (i.e., adding to an embedding DB).
max_prompts – The maximum number of prompts to try before giving up.
max_tokens – The maximum number of tokens to use in the LLM. If None, the converter will use half the model’s token limit.
prompt_templates – The name of the prompt templates to use.
db_path – The path to the database to use for vectorization.
db_config – The configuration for the database.
protected_node_types – A set of node types that aren’t to be merged.
prune_node_types – A set of node types which should be pruned.
splitter_type – The type of splitter to use. Valid values are “file”, “tag”, “chunk”, “ast-strict”, and “ast-flex”.
refiner_type – The type of refiner to use. Valid values: - “parser” - “reflection” - None
retriever_type – The type of retriever to use. Valid values: - “active_usings” - “language_docs” - None
combine_output – Whether to combine the output into a single file or not.
use_janus_inputs – Whether to use janus inputs or not.
target_language – The target programming language.
target_version – The target programming language version.
input_types – The types of input to accept.
input_labels – The labels of input to accept.
output_type (str) – The type of output to produce.
output_label – The label of output to produce.
comments_per_request (int | None) –