janus.metrics.llm_metrics#

Classes#

LLMMetricOutput

The output of an LLM evaluation metric.

Functions#

load_prompt(path, language, parser)

Load a default prompt from a file.

evaluate(target, language, model, prompt_path[, reference])

Calculate the LLM self evaluation score.

llm_evaluate_option(target[, metric, prompt, num_eval])

CLI option to calculate the LLM self evaluation score.

llm_evaluate_ref_option(target, reference[, metric, ...])

CLI option to calculate the LLM self evaluation score, for evaluations which

Module Contents#

class janus.metrics.llm_metrics.LLMMetricOutput#

Bases: langchain_core.pydantic_v1.BaseModel

The output of an LLM evaluation metric.

thought: str#
value: str | float | int#
janus.metrics.llm_metrics.load_prompt(path, language, parser)#

Load a default prompt from a file.

Parameters:
  • path (pathlib.Path) – The path to the file.

  • language (str) – The language of the prompt.

  • pydantic_model – The Pydantic model to use for parsing the output.

  • parser (langchain_core.output_parsers.BaseOutputParser) –

Returns:

The prompt text.

Return type:

langchain_core.prompts.PromptTemplate

janus.metrics.llm_metrics.evaluate(target, language, model, prompt_path, reference=None)#

Calculate the LLM self evaluation score.

Parameters:
  • target (str) – The target text.

  • language (str) – The language that the target code is written in.

  • prompt_path (pathlib.Path) – The filepath of the prompt text

  • reference (str | None) – The reference text.

  • model (str) –

Returns:

The LLM Evaluation score.

janus.metrics.llm_metrics.llm_evaluate_option(target, metric='quality', prompt=None, num_eval=1, **kwargs)#

CLI option to calculate the LLM self evaluation score.

Parameters:
  • target (str) – The target text.

  • reference – The reference text.

  • metric (typing_extensions.Annotated[str, typer.Option('--metric', '-m', help='The pre-defined metric to use for evaluation.', click_type=click.Choice(['quality', 'clarity', 'faithfulness', 'completeness', 'hallucination', 'readability', 'usefulness']))]) – The pre-defined metric to use for evaluation.

  • prompt (typing_extensions.Annotated[str, None, typer.Option('--prompt', '-P', help='A custom prompt in a .txt file to use for evaluation.')]) – The prompt text.

  • num_eval (typing_extensions.Annotated[int, typer.Option('-n', '--num-eval', help='Number of times to run the evaluation')]) –

Returns:

The LLM Evaluation score.

Return type:

Any

janus.metrics.llm_metrics.llm_evaluate_ref_option(target, reference, metric='faithfulness', prompt=None, num_eval=1, **kwargs)#

CLI option to calculate the LLM self evaluation score, for evaluations which require a reference file (e.g. faithfulness)

Parameters:
  • target (str) – The target text.

  • reference (str) – The reference text.

  • metric (typing_extensions.Annotated[str, typer.Option('--metric', '-m', help='The pre-defined metric to use for evaluation.', click_type=click.Choice(['faithfulness']))]) – The pre-defined metric to use for evaluation.

  • prompt (typing_extensions.Annotated[str, None, typer.Option('--prompt', '-P', help='A custom prompt in a .txt file to use for evaluation.')]) – The prompt text.

  • num_eval (typing_extensions.Annotated[int, typer.Option('-n', '--num-eval', help='Number of times to run evaluation for pair')]) –

Returns:

The LLM Evaluation score.

Return type:

Any