Jetson Generative AI – Live LLaVA

Enhar — Thu, 31 Jul 2025 05:50:37 +0000

Vision-Language Models reach new heights when applied to live video streams—Live LLaVA demonstrates real-time multimodal AI that can see, understand, and describe what’s happening in your camera feed continuously on your Jetson device.

In this article you’ll learn how to run Live LLaVA with optimized vision-language models like LLaVA and VILA, featuring hardware-accelerated video processing and real-time inference capabilities.

Requirements

Hardware / Software	Notes
Jetson AGX Orin (64GB)	Recommended for best performance
Jetson AGX Orin (32GB)	Good performance for most use cases
Jetson Orin NX (16GB)	Solid performance
Jetson Orin Nano (8GB)	Minimum requirement – use smaller models
JetPack 6 (L4T r36.x)	Required for latest optimizations
USB camera or CSI camera	For live video input
NVMe SSD highly recommended	For storage speed and space
22GB for nano_llm container	Container image storage
>10GB for models	Vision-language model storage

Note: Follow the NanoVLM tutorial first to familiarize yourself with vision/language models, and see Agent Studio for an interactive pipeline editor.

Supported Models

The following vision-language models are optimized for Live LLaVA:

LLaVA Models:

`liuhaotian/llava-v1.5-7b`,

`liuhaotian/llava-v1.5-13b`

`liuhaotian/llava-v1.6-vicuna-7b`

`liuhaotian/llava-v1.6-vicuna-13b`

VILA Models:

`Efficient-Large-Model/VILA-2.7b`

`Efficient-Large-Model/VILA-7b`

`Efficient-Large-Model/VILA-13b`

`Efficient-Large-Model/VILA1.5-3b`

`Efficient-Large-Model/Llama-3-VILA1.5-8B`

`Efficient-Large-Model/VILA1.5-13b`

Jetson Orin Nano Compatible Models:

VILA-2.7b

VILA1.5-3b

VILA-7b

Llava-7b

Obsidian-3B

Step-by-Step Setup

1. Verify Camera Connection

Check that your camera is properly connected and detected:

Copy to Clipboard

2. Clone and setup jetson-containers

Copy to Clipboard

3. Launch Live LLaVA

Start the VideoQuery agent with your camera:

Copy to Clipboard

4. Access the Web Interface

Navigate your browser to:

https://:8050

⚠️ Chrome Recommended: For best WebRTC performance, use Chrome with `chrome://flags#enable-webrtc-hide-local-ips-with-mdns` disabled.

5. Configure Prompts

In the web interface, you can:

– Set custom prompts for continuous analysis

– Adjust inference frequency for real-time performance

– Monitor live video feed with AI descriptions

Live LLaVA Face Detection

Real-time Object Detection

Live LLaVA can continuously analyze your video feed, detecting and describing objects, people, and activities in real-time:

Live LLaVA Object Detection

Custom Prompting

You can customize the analysis with specific prompts:

Copy to Clipboard

Pre-recorded Video Analysis

Process existing video files instead of live camera feeds:

Copy to Clipboard

Supported Formats

Input Formats:

MP4, MKV, AVI, FLV (with H.264/H.265 encoding)

Live network streams (RTP, RTSP, WebRTC)

USB/CSI cameras

Output Formats:

Video files (MP4, AVI, etc.)

Network streams (WebRTC, RTSP)

Display output

NanoDB Integration

Enable reverse-image search and database tagging by integrating with NanoDB:

Copy to Clipboard

This enables:

– Reverse-image search against your database

– One-shot recognition tasks via web UI

– Automatic tagging of incoming images

Video VILA – Multi-frame Analysis

VILA-1.5 models can analyze multiple frames simultaneously for temporal understanding:

Copy to Clipboard

Troubleshooting

How to fix freezing issues while loading the model?

The documentation uses the old awq4; instead, use the --quantization q4f16_1 parameter.
The 13B model eventually freezes on the Jetson AGX Orin 32GB due to running out of tokens; if speed is needed, we recommend using VILA-7B or VILA-2.7B instead.

How to fix the issue of the camera not being detected

To make a USB camera accessible inside the container, add the parameter --device /dev/video0 when running the container. This maps the host’s camera device into the container, allowing applications inside to access the video stream as if it were running natively on the host system.

How to Avoid Color Distortion Issues with Logitech C505e Using MJPEG Codec ?

To prevent color distortion problems on the Logitech C505e camera, we recommend using the --video-input-codec mjpeg parameter. This forces the camera to use the MJPEG codec, which is better supported and helps maintain accurate color reproduction.

Resolution Limitation

For stable FPS performance, use the parameters --video-input-width 1280 and --video-input-height 720. These settings limit the video resolution to 1280×720, helping maintain smoother and more consistent frame rates.

Issue	Fix
Camera not detected	Check USB connection, verify with `ls /dev/video*`
WebRTC not working	Use Chrome, disable WebRTC local IP hiding flag
Out of memory errors	Use smaller model (VILA1.5-3b), reduce context length
Low frame rate	Reduce max-new-tokens, use smaller model, check camera resolution
Video codec errors	Verify input format is H.264/H.265, check jetson_utils installation

For more information about Live LLaVA and advanced configurations, visit the NanoLLM GitHub repository.

The post Jetson Generative AI – Live LLaVA appeared first on OpenZeka EN Blog.

Jetson NanoLLM Live LLaVA setup Archives - OpenZeka EN Blog

Jetson Generative AI – Live LLaVA

Requirements

Supported Models

Supported Models

Step-by-Step Setup

Step-by-Step Setup

1. Verify Camera Connection

1. Verify Camera Connection

2. Clone and setup jetson-containers

3. Launch Live LLaVA

3. Launch Live LLaVA

4. Access the Web Interface

5. Configure Prompts

Real-time Object Detection

Custom Prompting

Custom Prompting

Pre-recorded Video Analysis

Supported Formats

NanoDB Integration

Video VILA – Multi-frame Analysis

Video VILA – Multi-frame Analysis

Troubleshooting

How to fix freezing issues while loading the model?

How to fix the issue of the camera not being detected

How to Avoid Color Distortion Issues with Logitech C505e Using MJPEG Codec ?

Resolution Limitation