Skip to content

sensein/sails-vlm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VLM Baseline Evaluation — Documentation

Overview

This vlm_baseline folder provides a baseline framework for automatic annotation of videos using Video-Language Models (VLMs). The primary goal is to automate the manual annotation process currently performed on SAILS videos.

Key Concepts

  • Automatic Annotation: Videos that are currently manually annotated will be processed automatically using VLMs.
  • Annotation Types:
    • Classifications: Categorical labels (e.g., gesture types)
    • Descriptions: Free-text descriptions of video content (e.g. activity)
  • Evaluation: Different metrics are used for each annotation type to evaluate the VLM performances.
  • Inference Process: Run VLM inference on all available videos and compare predictions against ground truth annotations.
  • Output Format: Videos processed are from the BIDS folder, and evaluation results are saved to locations specified in the configuration file.

Key Architecture Principle:

  • models/ handles model interaction - if you want to try performances of a new VLM, you'll need to implement it here
  • postprocessing/ converts raw VLM output into task-specific prediction format
  • evaluation/ computes metrics comparing predictions vs. ground truth
  • runners/ orchestrates the entire pipeline (config loading, data iteration, output saving, evaluation)

How to Run

Build a srun session with a gpu, then from the repo root, run:

poetry run python vlm_baseline/runners/run_prediction.py vlm_baseline/configs/ovis2/response_to_name.yaml

Configuration File (YAML)

A config defines one complete experiment (one model + one task + one dataset + one prompt + one output directory). If you want to try a vlm on a particular annotation prediction, feel free to create a new configuration file with the same structure as the ones already present.

Handling YAML booleans

YAML treats the following as boolean values:

  • no
  • yes
  • on
  • off

If these are being used as ground-truth labels, ensure they are enclosed by single or double quotes. Otherwise they will be converted to their boolean counterparts and produce unexpected behavior.

Handling missing / NaN labels in the ground-truth CSV

Some label columns in the annotation CSV contain missing values (pandas NaN).

  • For classification tasks, the runner normally converts missing values to the literal string "NaN" internally.
  • For description tasks, the runner converts missing values to the empty string "".

If you want to exclude unlabeled rows entirely, set the following in your config:

data:
  drop_missing_labels: true

When enabled, rows with missing ground-truth labels are removed before any VLM inference (those videos are not processed and do not appear in predictions/eval).

Models (models/)

This folder contains thin wrappers around VLM backends (Ovis2, Qwen2.5, …). It loads the model, runs inference on a video + prompt, returns raw generated text

Postprocessing (postprocessing/)

Postprocessing converts raw model output into the prediction type expected by the task. It then validates the postprocessed output

Evaluation (evaluation/)

Evaluation metrics depend on task.type. For free text tasks, we haven't any metrics implemented yet.

Classification Evaluation

Common metrics include:

  • Accuracy (though not always most relevant for unbalanced datasets)
  • Macro-F1 / Weighted-F1
  • Per-class precision/recall/F1
  • Confusion matrix

Inputs: Ground truth labels from CSV vs. postprocessed predictions

How to add a new model

How to Add a New Model

To integrate a new VLM into the baseline framework, follow these steps:

1. Create Model Wrapper

Create a new file models/<new_model>.py with a class that inherits from BaseVLM:

class NewModelVLM(BaseVLM):
    def load(self):
        # Load weights/processor, set device, eval mode
        pass

    def generate(self, video_path, prompt, video_cfg=None, gen_cfg=None):
        # Implement inference logic
        # Return VLMRawOutput
        pass

    # Usually no need to override predict()

2. Register the Model

Update models/__init__.py:

  • Import your new class
  • Add a case in the load_model() function for your model's config["name"]

3. Create Configuration

Add a config YAML file under configs/<new_model>/...yaml with at least the annotation description, prompt etc,... and for the model configuration:

model:
  name: "your_model_name"
  model_path: "HF_repo_id"  # or local path
  device: "cuda"
  precision: "bf16"

4. Test the Integration

Run your existing runner with the new config:

poetry run python vlm_baseline/runners/run_prediction.py vlm_baseline/configs/<new_model>/your_config.yaml

Note: Downstream postprocessing automatically determines whether it's a classification or free-text task based on the configuration.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages