janus.converter.document#

Attributes#

log

Classes#

Documenter

Parent class that converts code into something else.

MultiDocumenter

Parent class that converts code into something else.

ClozeDocumenter

Parent class that converts code into something else.

Module Contents#

janus.converter.document.log#
class janus.converter.document.Documenter(source_language='fortran', drop_comments=True, **kwargs)#

Bases: janus.converter.converter.Converter

Parent class that converts code into something else.

Children will determine what the code gets converted into. Whether that’s translated into another language, into pseudocode, requirements, documentation, etc., or converted into embeddings

Initialize a Converter instance.

Parameters:
  • source_language (str) – The source programming language.

  • parser_type – The type of parser to use for parsing the LLM output. Valid values are “code”, “text”, “eval”, and None (default). If None, the Converter assumes you won’t be parsing an output (i.e., adding to an embedding DB).

  • max_prompts – The maximum number of prompts to try before giving up.

  • max_tokens – The maximum number of tokens to use in the LLM. If None, the converter will use half the model’s token limit.

  • prompt_template – The name of the prompt template to use.

  • db_path – The path to the database to use for vectorization.

  • db_config – The configuration for the database.

  • protected_node_types – A set of node types that aren’t to be merged.

  • prune_node_types – A set of node types which should be pruned.

  • splitter_type – The type of splitter to use. Valid values are “file”, “tag”, “chunk”, “ast-strict”, and “ast-flex”.

  • refiner_type – The type of refiner to use. Valid values: - “parser” - “reflection” - None

  • retriever_type – The type of retriever to use. Valid values: - “active_usings” - “language_docs” - None

  • drop_comments (bool) –

class janus.converter.document.MultiDocumenter(**kwargs)#

Bases: Documenter

Parent class that converts code into something else.

Children will determine what the code gets converted into. Whether that’s translated into another language, into pseudocode, requirements, documentation, etc., or converted into embeddings

Initialize a Converter instance.

Parameters:
  • source_language – The source programming language.

  • parser_type – The type of parser to use for parsing the LLM output. Valid values are “code”, “text”, “eval”, and None (default). If None, the Converter assumes you won’t be parsing an output (i.e., adding to an embedding DB).

  • max_prompts – The maximum number of prompts to try before giving up.

  • max_tokens – The maximum number of tokens to use in the LLM. If None, the converter will use half the model’s token limit.

  • prompt_template – The name of the prompt template to use.

  • db_path – The path to the database to use for vectorization.

  • db_config – The configuration for the database.

  • protected_node_types – A set of node types that aren’t to be merged.

  • prune_node_types – A set of node types which should be pruned.

  • splitter_type – The type of splitter to use. Valid values are “file”, “tag”, “chunk”, “ast-strict”, and “ast-flex”.

  • refiner_type – The type of refiner to use. Valid values: - “parser” - “reflection” - None

  • retriever_type – The type of retriever to use. Valid values: - “active_usings” - “language_docs” - None

class janus.converter.document.ClozeDocumenter(comments_per_request=None, **kwargs)#

Bases: Documenter

Parent class that converts code into something else.

Children will determine what the code gets converted into. Whether that’s translated into another language, into pseudocode, requirements, documentation, etc., or converted into embeddings

Initialize a Converter instance.

Parameters:
  • source_language – The source programming language.

  • parser_type – The type of parser to use for parsing the LLM output. Valid values are “code”, “text”, “eval”, and None (default). If None, the Converter assumes you won’t be parsing an output (i.e., adding to an embedding DB).

  • max_prompts – The maximum number of prompts to try before giving up.

  • max_tokens – The maximum number of tokens to use in the LLM. If None, the converter will use half the model’s token limit.

  • prompt_template – The name of the prompt template to use.

  • db_path – The path to the database to use for vectorization.

  • db_config – The configuration for the database.

  • protected_node_types – A set of node types that aren’t to be merged.

  • prune_node_types – A set of node types which should be pruned.

  • splitter_type – The type of splitter to use. Valid values are “file”, “tag”, “chunk”, “ast-strict”, and “ast-flex”.

  • refiner_type – The type of refiner to use. Valid values: - “parser” - “reflection” - None

  • retriever_type – The type of retriever to use. Valid values: - “active_usings” - “language_docs” - None

  • comments_per_request (int | None) –