<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Jetson NanoLLM Live LLaVA setup Archives - OpenZeka EN Blog</title>
	<atom:link href="https://blog-en.openzeka.com/tag/jetson-nanollm-live-llava-setup/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog-en.openzeka.com/tag/jetson-nanollm-live-llava-setup/</link>
	<description>NVIDIA Jetson Developer Kits &#38;Edge Devices</description>
	<lastBuildDate>Fri, 27 Mar 2026 13:18:30 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>Jetson Generative AI – Live LLaVA</title>
		<link>https://blog-en.openzeka.com/jetson-generative-ai-live-llava/</link>
		
		<dc:creator><![CDATA[Enhar]]></dc:creator>
		<pubDate>Thu, 31 Jul 2025 05:50:37 +0000</pubDate>
				<category><![CDATA[Generative AI]]></category>
		<category><![CDATA[Jetson edge vision‑language agent]]></category>
		<category><![CDATA[Jetson NanoLLM Live LLaVA setup]]></category>
		<category><![CDATA[Live LLaVA on Jetson]]></category>
		<category><![CDATA[Multimodal stream inference Jetson]]></category>
		<category><![CDATA[Real‑time VLM camera pipeline]]></category>
		<guid isPermaLink="false">https://blog.aetherix.com/?p=773</guid>

					<description><![CDATA[<p>Vision-Language Models reach new heights when applied  ... Continue Reading→</p>
<p>The post <a href="https://blog-en.openzeka.com/jetson-generative-ai-live-llava/">Jetson Generative AI – Live LLaVA</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling" style="--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;" ><div class="fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap" style="max-width:1331.2px;margin-left: calc(-4% / 2 );margin-right: calc(-4% / 2 );"><div class="fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-flex-column" style="--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;"><div class="fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column"><div class="fusion-text fusion-text-1"><div>
<div>Vision-Language Models reach new heights when applied to <strong>live video streams</strong>—<strong>Live LLaVA</strong> demonstrates real-time multimodal AI that can see, understand, and describe what&#8217;s happening in your camera feed <strong>continuously</strong> on your Jetson device.</div>
<div></div>
<div>In this article you&#8217;ll learn how to run Live LLaVA with optimized vision-language models like LLaVA and VILA, featuring hardware-accelerated video processing and real-time inference capabilities.</div>
</div>
<p>&nbsp;</p>
</div><div class="fusion-title title fusion-title-1 fusion-sep-none fusion-title-text fusion-title-size-three" style="--awb-margin-bottom:-10px;"><h3 class="fusion-title-heading title-heading-left" style="margin:0;">Requirements</h3></div>
<div class="table-1">
<p>&nbsp;</p>
<table width="100%">
<thead>
<tr>
<th align="left">
<div>
<div>Hardware / Software</div>
</div>
</th>
<th align="left">
<div>
<div>Notes</div>
</div>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">
<div>
<div><strong>Jetson AGX Orin (64GB)</strong></div>
</div>
</td>
<td align="left">
<div>
<div>Recommended for best performance</div>
</div>
</td>
</tr>
<tr>
<td align="left">
<div>
<div><strong>Jetson AGX Orin (32GB)</strong></div>
</div>
</td>
<td align="left">
<div>
<div>Good performance for most use cases</div>
</div>
</td>
</tr>
<tr>
<td align="left">
<div>
<div><strong>Jetson Orin NX (16GB)</strong></div>
</div>
</td>
<td align="left">
<div>
<div>Solid performance</div>
</div>
</td>
</tr>
<tr>
<td align="left">
<div>
<div><strong>Jetson Orin Nano (8GB)</strong></div>
</div>
</td>
<td align="left">
<div>
<div>Minimum requirement &#8211; use smaller models</div>
</div>
</td>
</tr>
<tr>
<td align="left">
<div>
<div><strong>JetPack 6 (L4T r36.x)</strong></div>
</div>
</td>
<td align="left">
<div>
<div>Required for latest optimizations</div>
</div>
</td>
</tr>
<tr>
<td align="left">
<div>
<div><strong>USB camera or CSI camera</strong></div>
</div>
</td>
<td align="left">
<div>
<div>For live video input</div>
</div>
</td>
</tr>
<tr>
<td align="left">
<div>
<div><strong>NVMe SSD highly recommended</strong></div>
</div>
</td>
<td align="left">
<div>
<div>For storage speed and space</div>
</div>
</td>
</tr>
<tr>
<td align="left">
<div>
<div><strong>22GB for nano_llm container</strong></div>
</div>
</td>
<td align="left">
<div>
<div>Container image storage</div>
</div>
</td>
</tr>
<tr>
<td align="left">
<div>
<div><strong>&gt;10GB for models</strong></div>
</div>
</td>
<td align="left">
<div>
<div>Vision-language model storage</div>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<div class="fusion-text fusion-text-2" style="--awb-margin-top:30px;"><div>
<div><em><strong>Note:</strong> Follow the NanoVLM tutorial first to familiarize yourself with vision/language models, and see Agent Studio for an interactive pipeline editor.</em></div>
</div>
</div><div class="fusion-title title fusion-title-2 fusion-sep-none fusion-title-text fusion-title-size-one"><h1 class="fusion-title-heading title-heading-left" style="margin:0;"><div>
<h4>Supported Models</h4>
</div></h1></div><div class="fusion-text fusion-text-3"><p>The following vision-language models are optimized for Live LLaVA:</p>
<div>
<div>
<div><strong>LLaVA Models:</strong></div>
<blockquote>
<div>`liuhaotian/llava-v1.5-7b`,</div>
<div>`liuhaotian/llava-v1.5-13b`</div>
<div>`liuhaotian/llava-v1.6-vicuna-7b`</div>
<div>`liuhaotian/llava-v1.6-vicuna-13b`</div>
</blockquote>
<div><strong>VILA Models:</strong></div>
<blockquote>
<div>`Efficient-Large-Model/VILA-2.7b`</div>
<div>`Efficient-Large-Model/VILA-7b`</div>
<div>`Efficient-Large-Model/VILA-13b`</div>
<div>`Efficient-Large-Model/VILA1.5-3b`</div>
<div>`Efficient-Large-Model/Llama-3-VILA1.5-8B`</div>
<div>`Efficient-Large-Model/VILA1.5-13b`</div>
</blockquote>
<div><strong>Jetson Orin Nano Compatible Models:</strong></div>
<blockquote>
<div>VILA-2.7b</div>
<div>VILA1.5-3b</div>
<div>VILA-7b</div>
<div>Llava-7b</div>
<div>Obsidian-3B</div>
</blockquote>
</div>
</div>
</div><div class="fusion-title title fusion-title-3 fusion-sep-none fusion-title-text fusion-title-size-three"><h3 class="fusion-title-heading title-heading-left" style="margin:0;"><div>
<h3>Step-by-Step Setup</h3>
</div></h3></div><div class="fusion-title title fusion-title-4 fusion-sep-none fusion-title-text fusion-title-size-one"><h1 class="fusion-title-heading title-heading-left" style="margin:0;"><div>
<h4>1. Verify Camera Connection</h4>
</div></h1></div><div class="fusion-text fusion-text-4 fusion-text-no-margin" style="--awb-margin-bottom:10px;"><p>Check that your camera is properly connected and detected:</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-1 > .CodeMirror, .fusion-syntax-highlighter-1 > .CodeMirror .CodeMirror-gutters {background-color:#2d3748;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-1 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_1" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_1" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_1" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh"># List available video devices
ls /dev/video*

# Test camera with GStreamer (optional)
gst-launch-1.0 v4l2src device=/dev/video0 ! autovideosink</textarea></div><div class="fusion-title title fusion-title-5 fusion-sep-none fusion-title-text fusion-title-size-four"><h4 class="fusion-title-heading title-heading-left" style="margin:0;"><h4>2. Clone and setup jetson-containers</h4></h4></div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-2 > .CodeMirror, .fusion-syntax-highlighter-2 > .CodeMirror .CodeMirror-gutters {background-color:#2d3748;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-2 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_2" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_2" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_2" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">git clone https://github.com/dusty-nv/jetson-containers
bash jetson-containers/install.sh</textarea></div><div class="fusion-title title fusion-title-6 fusion-sep-none fusion-title-text fusion-title-size-one"><h1 class="fusion-title-heading title-heading-left" style="margin:0;"><div>
<p>&nbsp;</p>
<h4>3. Launch Live LLaVA</h4>
</div></h1></div><div class="fusion-text fusion-text-5 fusion-text-no-margin" style="--awb-margin-bottom:10px;"><p>Start the VideoQuery agent with your camera:</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-3 > .CodeMirror, .fusion-syntax-highlighter-3 > .CodeMirror .CodeMirror-gutters {background-color:#2d3748;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-3 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_3" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_3" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_3" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">jetson-containers run $(autotag nano_llm) \
  python3 -m nano_llm.agents.video_query --api=mlc \
    --model Efficient-Large-Model/VILA1.5-3b \
    --max-context-len 256 \
    --max-new-tokens 32 \
    --video-input /dev/video0 \
    --video-output webrtc://@:8554/output</textarea></div><div class="fusion-title title fusion-title-7 fusion-sep-none fusion-title-text fusion-title-size-four"><h4 class="fusion-title-heading title-heading-left" style="margin:0;"><h4>4. Access the Web Interface</h4></h4></div><div class="fusion-text fusion-text-6"><div>
<div>Navigate your browser to:</div>
<div>
<div>
<blockquote>
<div>https://&lt;jetson-ip&gt;:8050</div>
</blockquote>
<div>
<div>
<div><strong>⚠️ Chrome Recommended:</strong> For best WebRTC performance, use Chrome with `chrome://flags#enable-webrtc-hide-local-ips-with-mdns` disabled.</div>
</div>
</div>
</div>
</div>
</div>
</div><div class="fusion-title title fusion-title-8 fusion-sep-none fusion-title-text fusion-title-size-four"><h4 class="fusion-title-heading title-heading-left" style="margin:0;"><h4>5. Configure Prompts</h4></h4></div><div class="fusion-text fusion-text-7 fusion-text-no-margin" style="--awb-margin-bottom:10px;"><p>In the web interface, you can:</p>
<div>&#8211; <strong>Set custom prompts</strong> for continuous analysis</div>
<div>&#8211; <strong>Adjust inference frequency</strong> for real-time performance</div>
<div>&#8211; <strong>Monitor live video feed</strong> with AI descriptions</div>
</div><div class="fusion-image-element awb-imageframe-style awb-imageframe-style-below awb-imageframe-style-1" style="--awb-margin-top:10px;--awb-margin-bottom:20px;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);"><span class=" fusion-imageframe imageframe-none imageframe-1 hover-type-none"><img fetchpriority="high" decoding="async" width="1024" height="632" src="https://blog-en.openzeka.com/wp-content/uploads/2025/07/live_llava_face_detection-1024x632.png" alt class="img-responsive wp-image-781" srcset="https://blog-en.openzeka.com/wp-content/uploads/2025/07/live_llava_face_detection-200x123.png 200w, https://blog-en.openzeka.com/wp-content/uploads/2025/07/live_llava_face_detection-400x247.png 400w, https://blog-en.openzeka.com/wp-content/uploads/2025/07/live_llava_face_detection-600x370.png 600w, https://blog-en.openzeka.com/wp-content/uploads/2025/07/live_llava_face_detection-800x494.png 800w, https://blog-en.openzeka.com/wp-content/uploads/2025/07/live_llava_face_detection-1200x741.png 1200w" sizes="(max-width: 640px) 100vw, 1024px" /></span><div class="awb-imageframe-caption-container" style="text-align:center;"><div class="awb-imageframe-caption"><p class="awb-imageframe-caption-text">Live LLaVA Face Detection</p></div></div></div><div class="fusion-title title fusion-title-9 fusion-sep-none fusion-title-text fusion-title-size-one" style="--awb-margin-top:-30px;"><h1 class="fusion-title-heading title-heading-left" style="margin:0;"><h4>Real-time Object Detection</h4></h1></div><div class="fusion-text fusion-text-8"><div>
<div>Live LLaVA can continuously analyze your video feed, detecting and describing objects, people, and activities in real-time:</div>
</div>
</div><div class="fusion-image-element awb-imageframe-style awb-imageframe-style-below awb-imageframe-style-2" style="text-align:center;--awb-aspect-ratio:16 / 9;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);"><span class=" fusion-imageframe imageframe-none imageframe-2 hover-type-none has-aspect-ratio"><img decoding="async" width="500" height="282" src="https://blog-en.openzeka.com/wp-content/uploads/2025/07/live_llava_object_detection.gif" class="img-responsive wp-image-782 img-with-aspect-ratio" data-parent-fit="cover" data-parent-container=".fusion-image-element" alt /></span><div class="awb-imageframe-caption-container" style="text-align:center;"><div class="awb-imageframe-caption"><p class="awb-imageframe-caption-text">Live LLaVA Object Detection</p></div></div></div><div class="fusion-title title fusion-title-10 fusion-sep-none fusion-title-text fusion-title-size-one"><h1 class="fusion-title-heading title-heading-left" style="margin:0;"><div>
<h4>Custom Prompting</h4>
</div></h1></div><div class="fusion-text fusion-text-9 fusion-text-no-margin" style="--awb-margin-bottom:10px;"><p>You can customize the analysis with specific prompts:</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-4 > .CodeMirror, .fusion-syntax-highlighter-4 > .CodeMirror .CodeMirror-gutters {background-color:#2d3748;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-4 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_4" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_4" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_4" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh"># Example prompts
"Describe what you see in detail"
"What objects are on the desk?"
"Count the number of people in the scene"
"What is the person doing?"
"Describe the lighting and environment"</textarea></div><div class="fusion-title title fusion-title-11 fusion-sep-none fusion-title-text fusion-title-size-three"><h3 class="fusion-title-heading title-heading-left" style="margin:0;"><h4>Pre-recorded Video Analysis</h4></h3></div><div class="fusion-text fusion-text-10"><div>
<div>Process existing video files instead of live camera feeds:</div>
</div>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-5 > .CodeMirror, .fusion-syntax-highlighter-5 > .CodeMirror .CodeMirror-gutters {background-color:#2d3748;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-5 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_5" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_5" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_5" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">jetson-containers run \
  -v /path/to/your/videos:/mount \
  $(autotag nano_llm) \
    python3 -m nano_llm.agents.video_query --api=mlc \
      --model Efficient-Large-Model/VILA1.5-3b \
      --max-context-len 256 \
      --max-new-tokens 32 \
      --video-input /mount/my_video.mp4 \
      --video-output /mount/output.mp4 \
      --prompt "What does the weather look like?"</textarea></div><div class="fusion-title title fusion-title-12 fusion-sep-none fusion-title-text fusion-title-size-one"><h1 class="fusion-title-heading title-heading-left" style="margin:0;"><h4>Supported Formats</h4></h1></div><div class="fusion-text fusion-text-11"><div>
<div><strong>Input Formats:</strong></div>
<blockquote>
<div>MP4, MKV, AVI, FLV (with H.264/H.265 encoding)</div>
<div>Live network streams (RTP, RTSP, WebRTC)</div>
<div>USB/CSI cameras</div>
</blockquote>
<div><strong>Output Formats:</strong></div>
<blockquote>
<div>Video files (MP4, AVI, etc.)</div>
<div>Network streams (WebRTC, RTSP)</div>
<div>Display output</div>
</blockquote>
</div>
</div><div class="fusion-title title fusion-title-13 fusion-sep-none fusion-title-text fusion-title-size-one"><h1 class="fusion-title-heading title-heading-left" style="margin:0;"><h4>NanoDB Integration</h4></h1></div><div class="fusion-text fusion-text-12 fusion-text-no-margin" style="--awb-margin-bottom:10px;"><p>Enable reverse-image search and database tagging by integrating with NanoDB:</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-6 > .CodeMirror, .fusion-syntax-highlighter-6 > .CodeMirror .CodeMirror-gutters {background-color:#2d3748;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-6 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_6" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_6" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_6" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">jetson-containers run $(autotag nano_llm) \
  python3 -m nano_llm.agents.video_query --api=mlc \
    --model Efficient-Large-Model/VILA1.5-3b \
    --max-context-len 256 \
    --max-new-tokens 32 \
    --video-input /dev/video0 \
    --video-output webrtc://@:8554/output \
    --nanodb /data/nanodb/coco/2017</textarea></div><div class="fusion-text fusion-text-13" style="--awb-margin-top:10px;"><p>This enables:</p>
<div>&#8211; <strong>Reverse-image search</strong> against your database</div>
<div>&#8211; <strong>One-shot recognition</strong> tasks via web UI</div>
<div>&#8211; <strong>Automatic tagging</strong> of incoming images</div>
</div><div class="fusion-title title fusion-title-14 fusion-sep-none fusion-title-text fusion-title-size-one"><h1 class="fusion-title-heading title-heading-left" style="margin:0;"><div>
<h4>Video VILA &#8211; Multi-frame Analysis</h4>
</div></h1></div><div class="fusion-text fusion-text-14 fusion-text-no-margin" style="--awb-margin-bottom:10px;"><div>
<div>VILA-1.5 models can analyze multiple frames simultaneously for temporal understanding:</div>
</div>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-7 > .CodeMirror, .fusion-syntax-highlighter-7 > .CodeMirror .CodeMirror-gutters {background-color:#2d3748;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-7 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_7" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_7" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_7" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">jetson-containers run $(autotag nano_llm) \
  python3 -m nano_llm.vision.video \
    --model Efficient-Large-Model/VILA1.5-3b \
    --max-images 8 \
    --max-new-tokens 48 \
    --video-input /data/my_video.mp4 \
    --video-output /data/my_output.mp4 \
    --prompt 'What changes occurred in the video?'</textarea></div><div class="fusion-title title fusion-title-15 fusion-sep-none fusion-title-text fusion-title-size-three" style="--awb-margin-bottom:-20px;"><h3 class="fusion-title-heading title-heading-left" style="margin:0;"><h3>Troubleshooting</h3></h3></div><div class="fusion-title title fusion-title-16 fusion-sep-none fusion-title-text fusion-title-size-one"><h1 class="fusion-title-heading title-heading-left" style="margin:0;"><h4>How to fix freezing issues while loading the model?</h4></h1></div><div class="fusion-text fusion-text-15"><p>The documentation uses the old <code data-start="154" data-end="160">awq4</code>; instead, use the <code data-start="179" data-end="203"><span style="color: #38c92e;">--quantization q4f16_1</span></code> parameter.<br data-start="214" data-end="217" data-is-only-node="" />The 13B model eventually freezes on the Jetson AGX Orin 32GB due to running out of tokens; if speed is needed, we recommend using <b>VILA-7B</b> or <b>VILA-2.7B </b>instead.</p>
</div><div class="fusion-title title fusion-title-17 fusion-sep-none fusion-title-text fusion-title-size-one"><h1 class="fusion-title-heading title-heading-left" style="margin:0;"><h4>How to fix the issue of the camera not being detected</h4></h1></div><div class="fusion-text fusion-text-16"><p>To make a USB camera accessible inside the container, add the parameter <code data-start="210" data-end="232"><span style="color: #38c92e;">--device /dev/video0</span></code> when running the container. This maps the host&#8217;s camera device into the container, allowing applications inside to access the video stream as if it were running natively on the host system.</p>
</div><div class="fusion-title title fusion-title-18 fusion-sep-none fusion-title-text fusion-title-size-one"><h1 class="fusion-title-heading title-heading-left" style="margin:0;"><h4 data-start="105" data-end="183">How to Avoid Color Distortion Issues with Logitech C505e Using MJPEG Codec ?</h4></h1></div><div class="fusion-text fusion-text-17"><p>To prevent color distortion problems on the Logitech C505e camera, we recommend using the <code data-start="275" data-end="302"><span style="color: #38c92e;">--video-input-codec mjpeg</span></code> parameter. This forces the camera to use the MJPEG codec, which is better supported and helps maintain accurate color reproduction.</p>
</div><div class="fusion-title title fusion-title-19 fusion-sep-none fusion-title-text fusion-title-size-one"><h1 class="fusion-title-heading title-heading-left" style="margin:0;"><h4>Resolution Limitation</h4></h1></div><div class="fusion-text fusion-text-18 fusion-text-no-margin" style="--awb-margin-bottom:-10px;"><p>For stable FPS performance, use the parameters <code data-start="139" data-end="165"><span style="color: #38c92e;">--video-input-width 1280</span></code><span style="color: #38c92e;"> and </span><code data-start="170" data-end="196"><span style="color: #38c92e;">--video-input-height 720</span></code><span style="color: #38c92e;">.</span> These settings limit the video resolution to 1280&#215;720, helping maintain smoother and more consistent frame rates.</p>
</div>
<div class="table-1">
<p>&nbsp;</p>
<table width="100%">
<thead>
<tr>
<th align="left">
<div>
<div>Issue</div>
</div>
</th>
<th align="left">
<div>
<div>Fix</div>
</div>
</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">
<div>
<div><strong>Camera not detected</strong></div>
</div>
</td>
<td align="left">
<div>
<div>Check USB connection, verify with `ls /dev/video*`</div>
</div>
</td>
</tr>
<tr>
<td align="left">
<div>
<div><strong>WebRTC not working</strong></div>
</div>
</td>
<td align="left">
<div>
<div>Use Chrome, disable WebRTC local IP hiding flag</div>
</div>
</td>
</tr>
<tr>
<td align="left">
<div>
<div><strong>Out of memory errors</strong></div>
</div>
</td>
<td align="left">
<div>
<div>Use smaller model (VILA1.5-3b), reduce context length</div>
</div>
</td>
</tr>
<tr>
<td align="left">
<div>
<div><strong>Low frame rate</strong></div>
</div>
</td>
<td align="left">
<div>
<div>Reduce max-new-tokens, use smaller model, check camera resolution</div>
</div>
</td>
</tr>
<tr>
<td align="left">
<div>
<div><strong>Video codec errors</strong></div>
</div>
</td>
<td align="left">
<div>
<div>Verify input format is H.264/H.265, check jetson_utils installation</div>
</div>
</td>
</tr>
</tbody>
</table>
</div>
<div class="fusion-text fusion-text-19" style="--awb-margin-top:20px;"><p><em><strong>For more information about Live LLaVA and advanced configurations, visit the <a style="color: #38c92e;" href="https://github.com/dusty-nv/NanoLLM"><span style="color: #38c92e;"><span style="color: #38c92e;">NanoLLM <span style="color: #38c92e;">G</span></span></span><span style="color: #38c92e;"><span style="color: #38c92e;">itHub</span> repository.</span></a></strong></em></p>
</div></div></div></div></div>
<p>The post <a href="https://blog-en.openzeka.com/jetson-generative-ai-live-llava/">Jetson Generative AI – Live LLaVA</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
