janus.language.naive.tag_splitter#

Classes#

TagSplitter

Splits code by tags inserted into code

Module Contents#

class janus.language.naive.tag_splitter.TagSplitter(tag, *args, **kwargs)#

Bases: janus.language.splitter.Splitter

Splits code by tags inserted into code

Parameters:
  • language – The name of the language to split.

  • model – The name of the model to use for counting tokens. If the model is None, will use tiktoken’s default tokenizer to count tokens.

  • max_tokens – The maximum number of tokens to use for each functional block.

  • skip_merge

    Whether to merge child nodes up to the max_token length. May be used for situations like documentation where function-level documentation is preferred. TODO: Maybe instead support something like a list of node types that

    shouldnt be merged (e.g. functions, classes)?

  • prune_unprotected – Whether to prune unprotected nodes from the tree.

  • tag (str) –