Judgement Processor
===================

The Judgement Processor module is designed to handle the evaluation of responses using various judgement models. It includes two main components: `judge_responses` for evaluating responses and `judge_images` for evaluating images.

Quick Start
-----------

judge_responses
~~~~~~~~~~~~~~~

The ``judge_responses`` function processes all data files in a specified directory to evaluate responses using specified models.

**Definition:**

.. code-block:: python

    judge_responses(
        data_folder: str,
        async_judge_model: List[str],
        target_models: List[str],
        judge_type: str,
        response_key: List[str] = ['responses'],
        judge_key: str = 'judge',
        response_extension: str = '_responses',
        judge_extension: str = '_judge',
        reverse_choice: bool = False
    ) -> None

:Parameters:

- **data_folder** (:type:`str`) Path to the folder containing JSON files to process
- **async_judge_model** (:type:`List[str]`) List of asynchronous judge models
- **target_models** (:type:`List[str]`) Combined list of target asynchronous and synchronous models
- **judge_type** (:type:`str`) Type of judge ('llm', 'vlm', 'toxicity', etc.)
- **response_key** (:type:`List[str]`, optional) List of keys to look for in the responses
- **judge_key** (:type:`str`, optional) Key to store judge results
- **response_extension** (:type:`str`, optional) Extension for response files
- **judge_extension** (:type:`str`, optional) Extension for judge result files
- **reverse_choice** (:type:`bool`, optional) Whether to reverse choices in mappings

**Examples:**

For LLM Usage:

.. code-block:: python

    import trusteval
    await trusteval.judge_responses(
        data_folder='path/to/data',
        async_judge_model=['model1', 'model2'],
        target_models=['model3'],
        judge_type='llm',
        response_key=['responses'],
        judge_key='judge'
    )

For VLM Usage:

.. code-block:: python

    import trusteval
    await trusteval.judge_responses(
        data_folder='path/to/data',
        async_judge_model=['model1', 'model2'],
        target_models=['model3'],
        judge_type='vlm',
        response_key=['responses'],
        judge_key='judge',
    )

judge_images
~~~~~~~~~~~~

The ``judge_images`` function processes all image data files in a specified directory to evaluate images using specified models.

**Definition:**

.. code-block:: python

    judge_images(
        base_dir: str,
        aspect: str,
        handler_type: str = 'api',
        target_models: List[str] = None
    ) -> None

:Parameters:

- **base_dir** (:type:`str`) Base directory for data and output
- **aspect** (:type:`str`) Evaluation aspect ('robustness', 'fairness', etc.)
- **handler_type** (:type:`str`, optional) Type of handler ('api' or 'local')
- **target_models** (:type:`List[str]`, optional) List of model names to evaluate

**Example Usage:**

.. code-block:: python

    import trusteval
    
    trusteval.judge_images(
        base_dir='path/to/base_dir', 
        aspect='robustness_t2i', 
        handler_type='api', 
        target_models=['model1', 'model2']
    )

Classes
-------

JudgeProcessor
~~~~~~~~~~~~~~

The `JudgeProcessor` class processes responses from different models, handling both asynchronous and synchronous services.

:Parameters:

- **data_folder** (:type:`str`) Path to the folder containing JSON files to process
- **async_judge_model** (:type:`List[str]`) List of asynchronous judge models
- **response_key** (:type:`List[str]`, optional) List of keys to look for in the responses
- **judge_key** (:type:`str`, optional) Key to store judge results
- **target_models** (:type:`List[str]`) Combined list of target asynchronous and synchronous models
- **response_extension** (:type:`str`, optional) Extension for response files
- **judge_extension** (:type:`str`, optional) Extension for judge result files
- **judge_type** (:type:`str`) Type of judge ('llm', 'vlm', 'toxicity', etc.)
- **reverse_choice** (:type:`bool`, optional) Whether to reverse choices in mappings

Functions
---------

get_response
~~~~~~~~~~~~

**Definition:**

.. code-block:: python

    get_response(
        task_config: Dict[str, Any],
        data_path: str,
        max_concurrent_tasks: int = 30
    ) -> None

:Parameters:

- **task_config** (:type:`Dict[str, Any]`) Configuration for the current task
- **data_path** (:type:`str`) Path to the data file
- **max_concurrent_tasks** (:type:`int`, optional) Maximum number of concurrent tasks

toxicity
~~~~~~~~

**Definition:**

.. code-block:: python

    toxicity(
        data_path: str,
        response_key: List[str]
    ) -> None

:Parameters:

- **data_path** (:type:`str`) Path to the data file
- **response_key** (:type:`List[str]`) Key(s) to extract responses from