janus.converter.document#
Attributes#
Classes#
Parent class that converts code into something else. |
|
Parent class that converts code into something else. |
|
Parent class that converts code into something else. |
Module Contents#
- janus.converter.document.log#
- class janus.converter.document.Documenter(source_language='fortran', drop_comments=True, **kwargs)#
Bases:
janus.converter.converter.Converter
Parent class that converts code into something else.
Children will determine what the code gets converted into. Whether that’s translated into another language, into pseudocode, requirements, documentation, etc., or converted into embeddings
Initialize a Converter instance.
- Parameters:
source_language (str) – The source programming language.
parser_type – The type of parser to use for parsing the LLM output. Valid values are “code”, “text”, “eval”, and None (default). If None, the Converter assumes you won’t be parsing an output (i.e., adding to an embedding DB).
max_prompts – The maximum number of prompts to try before giving up.
max_tokens – The maximum number of tokens to use in the LLM. If None, the converter will use half the model’s token limit.
prompt_template – The name of the prompt template to use.
db_path – The path to the database to use for vectorization.
db_config – The configuration for the database.
protected_node_types – A set of node types that aren’t to be merged.
prune_node_types – A set of node types which should be pruned.
splitter_type – The type of splitter to use. Valid values are “file”, “tag”, “chunk”, “ast-strict”, and “ast-flex”.
refiner_type – The type of refiner to use. Valid values: - “parser” - “reflection” - None
retriever_type – The type of retriever to use. Valid values: - “active_usings” - “language_docs” - None
drop_comments (bool) –
- class janus.converter.document.MultiDocumenter(**kwargs)#
Bases:
Documenter
Parent class that converts code into something else.
Children will determine what the code gets converted into. Whether that’s translated into another language, into pseudocode, requirements, documentation, etc., or converted into embeddings
Initialize a Converter instance.
- Parameters:
source_language – The source programming language.
parser_type – The type of parser to use for parsing the LLM output. Valid values are “code”, “text”, “eval”, and None (default). If None, the Converter assumes you won’t be parsing an output (i.e., adding to an embedding DB).
max_prompts – The maximum number of prompts to try before giving up.
max_tokens – The maximum number of tokens to use in the LLM. If None, the converter will use half the model’s token limit.
prompt_template – The name of the prompt template to use.
db_path – The path to the database to use for vectorization.
db_config – The configuration for the database.
protected_node_types – A set of node types that aren’t to be merged.
prune_node_types – A set of node types which should be pruned.
splitter_type – The type of splitter to use. Valid values are “file”, “tag”, “chunk”, “ast-strict”, and “ast-flex”.
refiner_type – The type of refiner to use. Valid values: - “parser” - “reflection” - None
retriever_type – The type of retriever to use. Valid values: - “active_usings” - “language_docs” - None
- class janus.converter.document.ClozeDocumenter(comments_per_request=None, **kwargs)#
Bases:
Documenter
Parent class that converts code into something else.
Children will determine what the code gets converted into. Whether that’s translated into another language, into pseudocode, requirements, documentation, etc., or converted into embeddings
Initialize a Converter instance.
- Parameters:
source_language – The source programming language.
parser_type – The type of parser to use for parsing the LLM output. Valid values are “code”, “text”, “eval”, and None (default). If None, the Converter assumes you won’t be parsing an output (i.e., adding to an embedding DB).
max_prompts – The maximum number of prompts to try before giving up.
max_tokens – The maximum number of tokens to use in the LLM. If None, the converter will use half the model’s token limit.
prompt_template – The name of the prompt template to use.
db_path – The path to the database to use for vectorization.
db_config – The configuration for the database.
protected_node_types – A set of node types that aren’t to be merged.
prune_node_types – A set of node types which should be pruned.
splitter_type – The type of splitter to use. Valid values are “file”, “tag”, “chunk”, “ast-strict”, and “ast-flex”.
refiner_type – The type of refiner to use. Valid values: - “parser” - “reflection” - None
retriever_type – The type of retriever to use. Valid values: - “active_usings” - “language_docs” - None
comments_per_request (int | None) –