Judgement Processor
The Judgement Processor module is designed to handle the evaluation of responses using various judgement models. It includes two main components: judge_responses for evaluating responses and judge_images for evaluating images.
Quick Start
judge_responses
The judge_responses function processes all data files in a specified directory to evaluate responses using specified models.
Definition:
judge_responses(
data_folder: str,
async_judge_model: List[str],
target_models: List[str],
judge_type: str,
response_key: List[str] = ['responses'],
judge_key: str = 'judge',
response_extension: str = '_responses',
judge_extension: str = '_judge',
reverse_choice: bool = False
) -> None
- Parameters:
data_folder (
str) Path to the folder containing JSON files to processasync_judge_model (
List[str]) List of asynchronous judge modelstarget_models (
List[str]) Combined list of target asynchronous and synchronous modelsjudge_type (
str) Type of judge (‘llm’, ‘vlm’, ‘toxicity’, etc.)response_key (
List[str], optional) List of keys to look for in the responsesjudge_key (
str, optional) Key to store judge resultsresponse_extension (
str, optional) Extension for response filesjudge_extension (
str, optional) Extension for judge result filesreverse_choice (
bool, optional) Whether to reverse choices in mappings
Examples:
For LLM Usage:
import trusteval
await trusteval.judge_responses(
data_folder='path/to/data',
async_judge_model=['model1', 'model2'],
target_models=['model3'],
judge_type='llm',
response_key=['responses'],
judge_key='judge'
)
For VLM Usage:
import trusteval
await trusteval.judge_responses(
data_folder='path/to/data',
async_judge_model=['model1', 'model2'],
target_models=['model3'],
judge_type='vlm',
response_key=['responses'],
judge_key='judge',
)
judge_images
The judge_images function processes all image data files in a specified directory to evaluate images using specified models.
Definition:
judge_images(
base_dir: str,
aspect: str,
handler_type: str = 'api',
target_models: List[str] = None
) -> None
- Parameters:
base_dir (
str) Base directory for data and outputaspect (
str) Evaluation aspect (‘robustness’, ‘fairness’, etc.)handler_type (
str, optional) Type of handler (‘api’ or ‘local’)target_models (
List[str], optional) List of model names to evaluate
Example Usage:
import trusteval
trusteval.judge_images(
base_dir='path/to/base_dir',
aspect='robustness_t2i',
handler_type='api',
target_models=['model1', 'model2']
)
Classes
JudgeProcessor
The JudgeProcessor class processes responses from different models, handling both asynchronous and synchronous services.
- Parameters:
data_folder (
str) Path to the folder containing JSON files to processasync_judge_model (
List[str]) List of asynchronous judge modelsresponse_key (
List[str], optional) List of keys to look for in the responsesjudge_key (
str, optional) Key to store judge resultstarget_models (
List[str]) Combined list of target asynchronous and synchronous modelsresponse_extension (
str, optional) Extension for response filesjudge_extension (
str, optional) Extension for judge result filesjudge_type (
str) Type of judge (‘llm’, ‘vlm’, ‘toxicity’, etc.)reverse_choice (
bool, optional) Whether to reverse choices in mappings
Functions
get_response
Definition:
get_response(
task_config: Dict[str, Any],
data_path: str,
max_concurrent_tasks: int = 30
) -> None
- Parameters:
task_config (
Dict[str, Any]) Configuration for the current taskdata_path (
str) Path to the data filemax_concurrent_tasks (
int, optional) Maximum number of concurrent tasks
toxicity
Definition:
toxicity(
data_path: str,
response_key: List[str]
) -> None
- Parameters:
data_path (
str) Path to the data fileresponse_key (
List[str]) Key(s) to extract responses from