janus.metrics.llm_metrics#

Classes#

The output of an LLM evaluation metric.

`load_prompt`(path, language, parser)	Load a default prompt from a file.
`evaluate`(target, language, model, prompt_path[, reference])	Calculate the LLM self evaluation score.
`llm_evaluate_option`(target[, metric, prompt, num_eval])	CLI option to calculate the LLM self evaluation score.
`llm_evaluate_ref_option`(target, reference[, metric, ...])	CLI option to calculate the LLM self evaluation score, for evaluations which

class janus.metrics.llm_metrics.LLMMetricOutput#

Bases: langchain_core.pydantic_v1.BaseModel

The output of an LLM evaluation metric.

janus.metrics.llm_metrics.load_prompt(path, language, parser)#

Load a default prompt from a file.

Parameters:

Returns:

The prompt text.

Return type:

langchain_core.prompts.PromptTemplate

janus.metrics.llm_metrics.evaluate(target, language, model, prompt_path, reference=None)#

Calculate the LLM self evaluation score.

Parameters:

Returns:

The LLM Evaluation score.

janus.metrics.llm_metrics.llm_evaluate_option(target, metric='quality', prompt=None, num_eval=1, **kwargs)#

CLI option to calculate the LLM self evaluation score.

Parameters:

target (str) – The target text.
reference – The reference text.
metric (typing_extensions.Annotated[str, typer.Option('--metric', '-m', help='The pre-defined metric to use for evaluation.', click_type=click.Choice(['quality', 'clarity', 'faithfulness', 'completeness', 'hallucination', 'readability', 'usefulness']))]) – The pre-defined metric to use for evaluation.
prompt (typing_extensions.Annotated[str, None, typer.Option('--prompt', '-P', help='A custom prompt in a .txt file to use for evaluation.')]) – The prompt text.
num_eval (typing_extensions.Annotated[int, typer.Option('-n', '--num-eval', help='Number of times to run the evaluation')]) –

Returns:

The LLM Evaluation score.

Return type:

Any

janus.metrics.llm_metrics.llm_evaluate_ref_option(target, reference, metric='faithfulness', prompt=None, num_eval=1, **kwargs)#

CLI option to calculate the LLM self evaluation score, for evaluations which require a reference file (e.g. faithfulness)

Parameters:

target (str) – The target text.
reference (str) – The reference text.
metric (typing_extensions.Annotated[str, typer.Option('--metric', '-m', help='The pre-defined metric to use for evaluation.', click_type=click.Choice(['faithfulness']))]) – The pre-defined metric to use for evaluation.
prompt (typing_extensions.Annotated[str, None, typer.Option('--prompt', '-P', help='A custom prompt in a .txt file to use for evaluation.')]) – The prompt text.
num_eval (typing_extensions.Annotated[int, typer.Option('-n', '--num-eval', help='Number of times to run evaluation for pair')]) –

Returns:

The LLM Evaluation score.

Return type:

Any