janus.language.splitter#

Attributes#

log

Exceptions#

TokenLimitError

An exception raised when the token limit is exceeded and the code cannot be

EmptyTreeError

An exception raised when the tree is empty or does not exist (can happen

FileSizeError

An exception raised when the file size is too large for the splitter

Classes#

Splitter

A class for splitting code into functional blocks to prompt with for

Module Contents#

janus.language.splitter.log#
exception janus.language.splitter.TokenLimitError#

Bases: Exception

An exception raised when the token limit is exceeded and the code cannot be split into smaller blocks.

Initialize self. See help(type(self)) for accurate signature.

exception janus.language.splitter.EmptyTreeError#

Bases: Exception

An exception raised when the tree is empty or does not exist (can happen when there are no nodes of interest in the tree)

Initialize self. See help(type(self)) for accurate signature.

exception janus.language.splitter.FileSizeError#

Bases: Exception

An exception raised when the file size is too large for the splitter

Initialize self. See help(type(self)) for accurate signature.

class janus.language.splitter.Splitter(language, model=None, max_tokens=4096, skip_merge=False, protected_node_types=(), prune_node_types=(), prune_unprotected=False)#

Bases: janus.language.file.FileManager

A class for splitting code into functional blocks to prompt with for transcoding.

Parameters:
  • language (str) – The name of the language to split.

  • model (None | langchain.schema.language_model.BaseLanguageModel) – The name of the model to use for counting tokens. If the model is None, will use tiktoken’s default tokenizer to count tokens.

  • max_tokens (int) – The maximum number of tokens to use for each functional block.

  • skip_merge (bool) –

    Whether to merge child nodes up to the max_token length. May be used for situations like documentation where function-level documentation is preferred. TODO: Maybe instead support something like a list of node types that

    shouldnt be merged (e.g. functions, classes)?

  • prune_unprotected (bool) – Whether to prune unprotected nodes from the tree.

  • protected_node_types (tuple[str, Ellipsis]) –

  • prune_node_types (tuple[str, Ellipsis]) –

split(file)#

Split the given file into functional code blocks.

Parameters:

file (pathlib.Path | str) – The file to split into functional blocks.

Returns:

A CodeBlock made up of nested `CodeBlock`s.

Return type:

janus.language.block.CodeBlock

split_string(code, name)#

Split the given code into functional code blocks.

Parameters:
  • code (str) – The code as a string to split into functional blocks.

  • name (str) – The filename of the code block.

Returns:

A CodeBlock made up of nested `CodeBlock`s.

Return type:

janus.language.block.CodeBlock

merge_nodes(nodes)#

Merge a list of nodes into a single node. The first and last nodes’ respective prefix and suffix become this node’s affixes.

Parameters:

nodes (List[janus.language.block.CodeBlock]) –

Return type:

janus.language.block.CodeBlock