# NeMo-Inspector **Repository Path**: mirrors_NVIDIA/NeMo-Inspector ## Basic Information - **Project Name**: NeMo-Inspector - **Description**: A tool for an analysis of LLM generations. - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-12-06 - **Last Updated**: 2026-01-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # NeMo Inspector NeMo Inspector is a tool designed to help you analyze Large Language Model (LLM) generations. It lets you explore and manipulate existing generations, apply filters, sorting criteria, and compute statistics. ## Prerequisites 1. **Clone and Install the Tool:** ```shell git clone git@github.com:NVIDIA/NeMo-Inspector.git cd NeMo-Inspector pip install . ``` 2. **Launch the Tool:** ```shell nemo_inspector ``` This will start a local server that you can access through your browser. ## Analyze Page The Analyze page helps you work with pre-generated outputs. To use it, provide paths to the generation files using command-line arguments. For example: ```shell nemo_inspector --model_prediction \ generation1='/path/to/generation1/output-greedy.jsonl' \ generation2='/path/to/generation2/output-rs*.jsonl' ``` Once loaded, the Analyze page lets you: - **Sort and Filter Results:** Apply custom filtering and sorting functions to refine the displayed data. - **Compare Generations:** View outputs from multiple generation runs side-by-side. - **Modify and Label Data:** Update or annotate samples and save the changes for future reference. - **Compute Statistics:** Generate both custom and general statistics to summarize your data. ### Filtering The tool supports two filtering modes: **Filter Files** mode and **Filter Questions** mode. You can define custom filtering functions in Python and run them directly in the UI. #### Filter Files Mode - In this mode, the filtering function will be run on each sample across different files simultaneously. - The input to the filtering function is a dictionary where keys represent generation names and values are JSON objects for that sample. - The custom function should return a Boolean value (`True` to keep the sample, `False` to filter it out). Example of a custom filtering function: ```python def custom_filtering_function(error_message: str) -> bool: # Implement your logic here return 'timeout' not in error_message # This line will be used for the filtering: custom_filtering_function(data['generation1']['error_message']) ``` **Note:** The last line of the custom filtering function is used for filtering. All preceding lines are just for computation. To apply multiple conditions to multiple generations, use the `&&` separator. For instance: ```python data['generation1']['is_correct'] && not data['generation2']['is_correct'] ``` **Important:** In Filter Files mode, do not write multi-generation conditions without using `&&`. Each condition should be separated by `&&`. #### Filter Questions Mode - In this mode, the function filters each question across multiple files without filtering out entire files. - The input is a dictionary of generation names mapping to **lists** of JSON data for that question. In this mode, you write conditions without the `&&` operator. For example: ```python data['generation1'][0]['is_correct'] and not data['generation2'][0]['is_correct'] ``` This example filters out questions where the first generation is correct and the second generation is incorrect. It can also compare fields directly: ```python data['generation1'][0]['is_correct'] != data['generation2'][0]['is_correct'] ``` **Note:** These examples cannot be used in Filter Files mode. ### Sorting Sorting functions are similar to filtering functions, but there are key differences: 1. **Scope:** Sorting functions operate on individual data entries (not dictionaries with multiple generations). 2. **Cross-Generations:** Sorting cannot be applied across multiple generations at once. You must sort one generation at a time. A correct sorting function might look like this: ```python def custom_sorting_function(generation: str): # Sort by the length of the generation text return len(generation) # This line will be used for the sorting: custom_sorting_function(data['generation']) ``` ### Statistics NeMo Inspector supports two types of statistics: 1. **Custom Statistics:** Applied to the samples of a single question (for each generation). Default custom statistics include: - `correct_responses` - `wrong_responses` - `no_responses` 2. **General Custom Statistics:** Applied across all questions and all generations. Default general custom statistics include: - `dataset size` - `overall number of samples` - `generations per sample` You can change the existing or define your own Custom and General Custom Statistics functions. **Custom Statistics Example:** ```python def unique_error_counter(datas): # `datas` is a list of JSONs (one per file) for a single question unique_errors = set() for data in datas: unique_errors.add(data.get('error_message')) return len(unique_errors) def number_of_runs(datas): return len(datas) # Map function names to functions {'unique_errors': unique_error_counter, 'number_of_runs': number_of_runs} ``` **General Custom Statistics Example:** ```python def overall_unique_error_counter(datas): # `datas` is a list of lists of dictionaries, # where datas[question_index][file_index] is a JSON record unique_errors = set() for question_data in datas: for file_data in question_data: unique_errors.add(file_data.get('error_message')) return len(unique_errors) # Map function names to functions {'unique_errors': overall_unique_error_counter} ``` **Note:** The final line in both the Custom and General Custom Statistics code blocks should be a dictionary mapping function names to their corresponding functions. ### Modifications You can update each sample in the dataset programmatically. At the end of the code block, return the updated sample dictionary: ```python # For example, strip leading and trailing whitespace from the "generation" field {**data, 'generation': data['generation'].strip()} ```