<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>OpenZeka EN Blog</title>
	<atom:link href="https://blog-en.openzeka.com/feed/" rel="self" type="application/rss+xml" />
	<link>https://blog-en.openzeka.com/</link>
	<description>NVIDIA Jetson Developer Kits &#38;Edge Devices</description>
	<lastBuildDate>Fri, 27 Mar 2026 13:44:56 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>NVIDIA DGX Spark vs NVIDIA Jetson Thor</title>
		<link>https://blog-en.openzeka.com/nvidia-dgx-spark-vs-nvidia-jetson-thor/</link>
		
		<dc:creator><![CDATA[Betül Kaya]]></dc:creator>
		<pubDate>Tue, 23 Dec 2025 13:58:02 +0000</pubDate>
				<category><![CDATA[Generative AI]]></category>
		<category><![CDATA[Performance]]></category>
		<guid isPermaLink="false">https://blog.aetherix.com/?p=1504</guid>

					<description><![CDATA[<p>One of the most common mistakes made when developing a ... Continue Reading→</p>
<p>The post <a href="https://blog-en.openzeka.com/nvidia-dgx-spark-vs-nvidia-jetson-thor/">NVIDIA DGX Spark vs NVIDIA Jetson Thor</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling" style="--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-padding-right:0px;--awb-padding-left:0px;--awb-flex-wrap:wrap;" ><div class="fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap" style="max-width:1331.2px;margin-left: calc(-4% / 2 );margin-right: calc(-4% / 2 );"><div class="fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-flex-column" style="--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;"><div class="fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column"><div class="fusion-text fusion-text-1"><p>One of the most common mistakes made when developing artificial intelligence systems is evaluating hardware designed for different purposes as if they were meant to solve the same problem. Although NVIDIA DGX Spark and NVIDIA Jetson Thor—two of NVIDIA’s recently prominent products—are often compared due to their similar names and emphasis on high performance, they are in fact two entirely different platforms designed to solve completely different problems.</p>
</div><div class="fusion-text fusion-text-2"><p>The purpose of this article is to clearly highlight the differences between DGX Spark and Jetson Thor and to make the following distinction explicit at the end:</p>
<p><strong>DGX Spark</strong> is designed for developing, training, and testing artificial intelligence models.<br />
<strong>Jetson Thor</strong>, on the other hand, is designed to run these models in the real world, on robots and physical systems.</p>
</div><div class="fusion-title title fusion-title-1 fusion-sep-none fusion-title-text fusion-title-size-two"><h2 class="fusion-title-heading title-heading-left" style="margin:0;">What is NVIDIA DGX Spark?</h2></div><div class="fusion-text fusion-text-3"><p>NVIDIA DGX Spark is a compact AI supercomputer positioned in a desktop form factor, designed to enable the development and execution of artificial intelligence models entirely in a local environment. At the heart of the system is the Grace Blackwell GB10 Superchip, which combines NVIDIA’s Grace CPU and Blackwell GPU architectures into a single chip. Thanks to this architectural integration, DGX Spark delivers up to 1 petaflop of AI computing performance along with 128 GB of high-bandwidth unified HBM3e memory. This makes it an extremely powerful local development platform for large language models and generative AI workloads.</p>
<p>Each DGX Spark can operate independently as a fully capable AI workstation. When two Spark devices are connected together, the system reaches a unified memory capacity of 256 GB, transforming into an expanded AI node capable of handling models with up to 405 billion parameters. While pairing a maximum of two units is currently supported, NVIDIA states that this limit may be increased in the future through software updates.</p>
<p>DGX Spark aims to reduce reliance on the cloud or data centers by enabling the following workloads to be performed entirely in a local environment.</p>
</div><div class="fusion-title title fusion-title-2 fusion-sep-none fusion-title-text fusion-title-size-two"><h2 class="fusion-title-heading title-heading-left" style="margin:0;">DGX Spark Use Cases</h2></div><div class="fusion-text fusion-text-4"><p><strong>Fine-Tuning</strong><br />
DGX Spark provides a powerful fine-tuning platform, especially for organizations working with enterprise, sensitive, or regulated data. In sectors such as finance, healthcare, defense, or law, large language models, image recognition systems, or task-specific AI models can be fine-tuned entirely locally without data leaving the organization. This approach ensures compliance with GDPR regulations and eliminates intellectual property risks.</p>
<p><strong>Inference and Local AI Services</strong><br />
DGX Spark enables low-latency, high-efficiency inference of trained models in desktop or local server environments. Chatbots, document analysis systems, visual inspection applications, or decision support systems can run in real time without relying on the cloud. As a result, performance improves while network dependency and data transfer risks are eliminated.</p>
<p><strong>Data Science and Analytics Workloads</strong><br />
For data scientists working with large datasets, DGX Spark consolidates data cleaning, model training, and evaluation steps into a single powerful platform. Thanks to GPU-accelerated computing, complex statistical analyses, simulations, and machine learning pipelines can be completed much faster. This provides a significant speed advantage, especially for Proof of Concept (PoC) and pilot projects.</p>
<p><strong>Transition from Cloud to Desktop and Desktop to Cloud</strong><br />
DGX Spark is designed to be fully compatible with the NVIDIA ecosystem. After developing and testing a model on DGX Spark, you can move it to DGX Cloud or other accelerated cloud infrastructures using the same codebase and software stack with little to no modification. This approach offers great flexibility for organizations adopting hybrid AI strategies.</p>
<p><strong>Working with Secure and Sensitive Data</strong><br />
DGX Spark is an ideal solution for scenarios where data must remain within the organization. Sensitive customer data, internal company documents, or confidential R&amp;D outputs can be processed and modeled locally without being uploaded to the cloud. This reduces cybersecurity risks and simplifies regulatory compliance.</p>
<p><strong>Education, Academic, and Enterprise AI Laboratories</strong><br />
For universities, research centers, and corporate AI teams, DGX Spark functions as a compact yet extremely powerful “AI laboratory.” Students and engineers can gain hands-on experience working with large-scale models on real hardware and develop scenarios that are much closer to production environments.</p>
</div><div class="fusion-title title fusion-title-3 fusion-sep-none fusion-title-text fusion-title-size-two"><h2 class="fusion-title-heading title-heading-left" style="margin:0;">What is NVIDIA Jetson Thor?</h2></div><div class="fusion-text fusion-text-5"><p>NVIDIA Jetson Thor is a high-performance edge AI platform developed for Physical AI, robotics, and autonomous systems. The core objective of Jetson Thor is to run large language models (LLMs), vision-language models (VLMs), and vision-language-action (VLA) models in real time with low latency and high energy efficiency. In this respect, Thor is positioned as the central “brain” of a robot or autonomous system, responsible for decision-making and action execution.<br />
Thanks to its Blackwell-based architecture, Jetson Thor delivers up to 2,070 TFLOPS (FP4 – sparsity-enabled) of AI computing performance, making it possible to deploy advanced models developed at data-center scale directly in edge environments. The Jetson Thor module family is optimized for Physical AI and robotics applications, combining high performance with a flexible power profile: configurable power consumption between 40 W and 130 W, along with up to 128 GB of memory.</p>
<p>This powerful hardware foundation allows LLM, VLM, and VLA models to run concurrently in a deterministic, low-latency manner. Its high energy efficiency makes Jetson Thor an ideal solution for 24/7 autonomous systems, robotic platforms, and mission-critical edge AI applications.</p>
</div><div class="fusion-text fusion-text-6"><p>The platform is optimized to process multiple data streams simultaneously from cameras, LiDAR, radar, and other sensors, enabling the entire perception–decision–action loop to be closed fully at the edge. Jetson Thor’s architecture targets continuously operating, time-sensitive systems that interact with the real world, rather than desktop- or data-center-oriented development environments.</p>
<p>In short, Jetson Thor is not a platform for developing AI models; it is an edge AI solution designed to run already developed models in the field, in the physical world, and in real time. Especially in robotics, autonomous vehicles, and Physical AI scenarios, it serves as a foundational building block for modern autonomous systems by unifying high computational power, low latency, sensor integration, and energy efficiency in a single platform.</p>
</div><div class="fusion-text fusion-text-7"><p>Jetson Thor’s high computational performance and extensive I/O capabilities make it an ideal solution across a wide range of industries. Below are some of the potential application areas of Jetson Thor:</p>
<ul>
<li><strong>Autonomous Systems (Vehicles and Robots)</strong><br />
By processing LiDAR, camera, and radar data simultaneously, Jetson Thor enables autonomous vehicles to perceive their environment and make safe decisions. Humanoid robots and unmanned aerial vehicles (UAVs) can also perform tasks such as real-time localization, mapping (SLAM), and obstacle detection more efficiently with Jetson Thor.</li>
<li><strong>Smart Cities and Public Safety</strong><br />
Jetson Thor can analyze 24/7 video streams from city surveillance cameras locally, without relying on the cloud. This enables instant traffic management, crowd monitoring, and detection of security threats. Thanks to its high memory capacity, Jetson Thor can analyze 4K/8K video streams in real time for smart city applications.</li>
<li><strong>Industrial Automation</strong><br />
When integrated into robotic arms or camera systems on production lines, Jetson Thor enables AI-driven tasks such as defect detection, quality control, and predictive maintenance to be performed in real time. Its rugged design and long-lifecycle industrial variants ensure reliable operation in harsh industrial environments.</li>
<li><strong>Healthcare Technologies</strong><br />
Medical devices and innovative healthcare systems can also benefit from Jetson Thor’s capabilities. For example, a portable MRI or ultrasound device can process images locally using AI to deliver instant diagnostic insights. When equipped with Jetson Thor, surgical robots can perform real-time image processing and precise control during operations. In addition, patient monitoring systems can process data locally while preserving privacy.</li>
<li><strong>Security and Surveillance</strong><br />
Smart security cameras can perform deep learning–based tasks such as facial recognition or threat detection in real time using Jetson Thor. This enhances security while reducing network traffic in environments such as banks, airports, and critical infrastructure. The system can detect suspicious situations on-site and send immediate alerts to security personnel.</li>
</ul>
</div>
<div class="table-1">
<table width="100%">
<thead>
<tr>
<th align="left">Feature</th>
<th align="left">NVIDIA DGX Spark</th>
<th align="left">NVIDIA Jetson Thor</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Primary Purpose</td>
<td align="left">AI development, training, testing</td>
<td align="left">Robotics and Physical AI Inference</td>
</tr>
<tr>
<td align="left">Deployment Environment</td>
<td align="left">Desktop / Office / Lab</td>
<td align="left">Edge / Robot / Autonomous systems</td>
</tr>
<tr>
<td align="left">LLM Prefill Performance</td>
<td align="left">Very high (compute-bound)</td>
<td align="left">Optimized for edge</td>
</tr>
<tr>
<td align="left">Power Consumption</td>
<td align="left">High</td>
<td align="left">Low and energy-efficient</td>
</tr>
<tr>
<td align="left">Real-Time Operation</td>
<td align="left">Not a priority</td>
<td align="left">Critical requirement</td>
</tr>
<tr>
<td align="left">Sensor Integration</td>
<td align="left">None</td>
<td align="left">Camera, LIDAR, radar etc.</td>
</tr>
<tr>
<td align="left">Target User</td>
<td align="left">AI developers, data scientists</td>
<td align="left">Robotics and embedded systems developers</td>
</tr>
</tbody>
</table>
</div>
<div class="fusion-text fusion-text-8" style="--awb-margin-top:15px;"><p>If your goal is Physical AI, robotics, autonomous driving, and edge inference:</p>
<ul>
<li>Jetson Thor is specifically designed for this purpose and is the right choice.<br />
If you need AI model development, training, testing, fine-tuning, and high-performance local computation.</li>
<li>DGX Spark is purpose-built exactly for these needs.</li>
</ul>
<p>For large-scale organizations, these two products are not competitors but complementary: You develop the model on DGX Spark and deploy it into the real world on Jetson Thor.</p>
</div></div></div></div></div>
<p>The post <a href="https://blog-en.openzeka.com/nvidia-dgx-spark-vs-nvidia-jetson-thor/">NVIDIA DGX Spark vs NVIDIA Jetson Thor</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>NVIDIA DGX Spark: Bringing Data Center Power to Your Desk</title>
		<link>https://blog-en.openzeka.com/nvidia-dgx-spark/</link>
		
		<dc:creator><![CDATA[admin]]></dc:creator>
		<pubDate>Mon, 27 Oct 2025 07:02:26 +0000</pubDate>
				<category><![CDATA[Getting Started]]></category>
		<guid isPermaLink="false">https://blog.aetherix.com/?p=1490</guid>

					<description><![CDATA[<p>Artificial intelligence is entering a new era—one wher ... Continue Reading→</p>
<p>The post <a href="https://blog-en.openzeka.com/nvidia-dgx-spark/">NVIDIA DGX Spark: Bringing Data Center Power to Your Desk</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p><div class="fusion-fullwidth fullwidth-box fusion-builder-row-2 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling" style="--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-padding-right:0px;--awb-padding-left:0px;--awb-flex-wrap:wrap;" ><div class="fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap" style="max-width:1331.2px;margin-left: calc(-4% / 2 );margin-right: calc(-4% / 2 );"><div class="fusion-layout-column fusion_builder_column fusion-builder-column-1 fusion_builder_column_1_1 1_1 fusion-flex-column" style="--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;"><div class="fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column"><div class="fusion-text fusion-text-9"><p>Artificial intelligence is entering a new era—one where supercomputing performance is no longer confined to massive data centers. The NVIDIA DGX Spark, unveiled by NVIDIA, embodies this transformation. It’s a compact, AI-focused workstation that lets developers, researchers, and innovators harness data center-grade power right from their desks.</p>
</div><div class="fusion-title title fusion-title-4 fusion-sep-none fusion-title-text fusion-title-size-two"><h2 class="fusion-title-heading title-heading-left" style="margin:0;">What Is NVIDIA DGX Spark?</h2></div><div class="fusion-text fusion-text-10"><p>The NVIDIA DGX Spark is a compact, single-user AI development and inference system powered by the Grace Blackwell GB10 Superchip—a seamless fusion of NVIDIA’s Grace CPU and Blackwell GPU architectures. This powerful combination delivers up to one petaflop of AI compute and 128 GB of unified high-bandwidth memory (HBM3e) per unit.</p>
<p>Each Spark functions as a self-contained AI powerhouse, but it gets even more impressive when two units are linked together, effectively operating as a single expanded AI node with 256 GB of unified memory and the ability to handle up to 405 billion model parameters. At present, the configuration supports only two systems, though NVIDIA has indicated that broader scalability may be possible in future software updates.</p>
<p>Despite its small form factor, DGX Spark comes fully equipped with NVIDIA’s comprehensive AI software stack, including CUDA, CUDA-X AI, AI Workbench, and integrated support for NVIDIA toolkits such as Isaac Sim, Metropolis, and NeMo. In essence, it’s a mini data center on your desk—delivering enterprise-level AI performance in a workstation-sized footprint.</p>
</div><div class="fusion-title title fusion-title-5 fusion-sep-none fusion-title-text fusion-title-size-two"><h2 class="fusion-title-heading title-heading-left" style="margin:0;">Why It Matters</h2></div><div class="fusion-text fusion-text-11"><p>Developers often struggle with limited GPU memory and costly cloud resources. DGX Spark eliminates those constraints by offering local access to large GPU memory and NVIDIA’s entire AI ecosystem—without the complexity or expense of managing data center infrastructure.</p>
<p>This accessibility empowers AI researchers, developers, students, and data scientists to prototype, fine-tune, and test massive models directly on their desktops. Tasks like data science, model inference, computer vision, and robotics become faster, cheaper, and more secure.</p>
</div><div class="fusion-title title fusion-title-6 fusion-sep-none fusion-title-text fusion-title-size-two"><h2 class="fusion-title-heading title-heading-left" style="margin:0;">Who Should Use It</h2></div><div class="fusion-text fusion-text-12"><p>DGX Spark is built for AI developers and innovators who need high performance and flexibility but don’t have access to large-scale compute clusters. It’s ideal for:</p>
<ul>
<li>Developers building or fine-tuning large language models (LLMs)</li>
<li>Researchers experimenting with edge and robotics applications</li>
<li>Students learning with real-world AI tools</li>
<li>Organizations looking to augment existing cloud or workstation setups</li>
</ul>
<p>Essentially, if your local GPU can’t handle the memory demands of your model—or cloud costs are slowing you down—DGX Spark fills that gap.</p>
</div><div class="fusion-title title fusion-title-7 fusion-sep-none fusion-title-text fusion-title-size-two"><h2 class="fusion-title-heading title-heading-left" style="margin:0;">DGX Spark vs. RTX Pro 6000 and RTX 5090</h2></div><div class="fusion-text fusion-text-13"><p>While the RTX 5090 and RTX Pro 6000 Blackwell offer higher raw compute power (up to four petaflops vs. Spark’s one), they are limited by GPU memory. The RTX Pro 6000, for instance, has 96 GB VRAM, compared to Spark’s 128 GB unified memory. This means that for smaller, compute-heavy workloads, an RTX Pro or 5090 is ideal—but for large models that exceed GPU memory, Spark performs better, as it can handle models that would otherwise crash or slow dramatically on traditional GPUs.</p>
<p>In short:</p>
<ul>
<li>RTX 5090 / 6000 Pro → More compute, less memory</li>
<li>DGX Spark → Slightly less compute, much larger memory + full NVIDIA AI stack integration</li>
</ul>
</div><div class="fusion-title title fusion-title-8 fusion-sep-none fusion-title-text fusion-title-size-two"><h2 class="fusion-title-heading title-heading-left" style="margin:0;">The Future of Local AI Development</h2></div><div class="fusion-text fusion-text-14"><p><span style="font-weight: 400;">NVIDIA DGX Spark represents the democratization of AI supercomputing. For the first time, researchers, developers, and creators can access petaflop-level performance from a system that fits under a desk.</span></p>
<p><span style="font-weight: 400;">As the AI landscape grows increasingly complex, DGX Spark provides the missing middle ground—more power than a desktop GPU, more freedom than the cloud. Whether you’re building LLMs, robotics solutions, or next-gen visual AI applications, Spark lets you do it faster, locally, and securely.</span></p>
</div></div></div><div class="fusion-layout-column fusion_builder_column fusion-builder-column-2 fusion_builder_column_1_1 1_1 fusion-flex-column fusion-column-inner-bg-wrapper" style="--awb-padding-top-small:12px;--awb-padding-bottom-small:12px;--awb-overflow:hidden;--awb-inner-bg-size:cover;--awb-box-shadow:0px 2px 10px 0px rgba(0,0,0,0.06);;--awb-border-color:#e5e7eb;--awb-border-top:1px;--awb-border-right:1px;--awb-border-bottom:1px;--awb-border-left:1px;--awb-border-style:solid;--awb-border-radius:4px 4px 4px 4px;--awb-inner-bg-border-radius:4px 4px 4px 4px;--awb-inner-bg-overflow:hidden;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:28px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;"><span class="fusion-column-inner-bg hover-type-none"><a class="fusion-column-anchor" href="https://aetherix.com/product/nvidia-dgx-spark/"><span class="fusion-column-inner-bg-image"></span></a></span><div class="fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column"></div></div></div></div><div class="fusion-fullwidth fullwidth-box fusion-builder-row-3 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling" style="--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;" ><div class="fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap" style="max-width:1331.2px;margin-left: calc(-4% / 2 );margin-right: calc(-4% / 2 );"><div class="fusion-layout-column fusion_builder_column fusion-builder-column-3 fusion_builder_column_1_1 1_1 fusion-flex-column" style="--awb-bg-blend:overlay;--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;"><div class="fusion-column-wrapper fusion-flex-justify-content-flex-start fusion-content-layout-column"><div class="fusion-text fusion-text-15"></div></div></div></div></div></p>
<p>The post <a href="https://blog-en.openzeka.com/nvidia-dgx-spark/">NVIDIA DGX Spark: Bringing Data Center Power to Your Desk</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>HammerBench : AGX Thor’s Power Meets Ollama</title>
		<link>https://blog-en.openzeka.com/hammerbench-agx-thors-power-meets-ollama/</link>
		
		<dc:creator><![CDATA[Enhar]]></dc:creator>
		<pubDate>Wed, 17 Sep 2025 13:28:53 +0000</pubDate>
				<category><![CDATA[Generative AI]]></category>
		<guid isPermaLink="false">https://blog.aetherix.com/?p=1372</guid>

					<description><![CDATA[<p>What is an LLM benchmark and why is it important?  LLM  ... Continue Reading→</p>
<p>The post <a href="https://blog-en.openzeka.com/hammerbench-agx-thors-power-meets-ollama/">HammerBench : AGX Thor’s Power Meets Ollama</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-4 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling" style="--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;" ><div class="fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap" style="max-width:1331.2px;margin-left: calc(-4% / 2 );margin-right: calc(-4% / 2 );"><div class="fusion-layout-column fusion_builder_column fusion-builder-column-4 fusion_builder_column_1_1 1_1 fusion-flex-column" style="--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;"><div class="fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column"><div class="fusion-title title fusion-title-9 fusion-sep-none fusion-title-text fusion-title-size-three"><h3 class="fusion-title-heading title-heading-left" style="margin:0;">What is an LLM benchmark and why is it important?</h3></div><div class="fusion-text fusion-text-16"><p><strong>LLM benchmarks</strong> are standardized tests designed to measure how fast, efficient, and accurate large language models (LLMs) perform across different hardware and environments. These tests evaluate metrics such as latency, throughput, and sometimes accuracy to provide an objective view of performance.</p>
<p>As LLMs continue to grow larger and more complex, choosing the right hardware to run them on becomes a critical decision. Benchmark results are essential to understand which device or infrastructure delivers better performance, to balance cost and efficiency, and to identify the most suitable solution for real-world use cases. In short, LLM benchmarks give both researchers and developers a clear roadmap of how models perform in practice.</p>
<p>To showcase the performance of <strong>Jetson AGX Thor</strong>, we are sharing our results and performance charts with you. At the same time, you can also run benchmarks across<strong> different GPU types</strong> to compare and validate performance for your own workloads. If you want to measure the performance metrics of your own devices and test your models under real-world conditions, get in touch with us. With our solution, your measurements turn into more than just numbers — they become actionable insights that drive strategic decisions.</p>
</div><div class="fusion-title title fusion-title-10 fusion-sep-none fusion-title-text fusion-title-size-three"><h3 class="fusion-title-heading title-heading-left" style="margin:0;">How to use HammerBench ?</h3></div><div class="fusion-text fusion-text-17"><p><strong>🖥️ What the App Does</strong></p>
<p>This is a Streamlit-based LLM Benchmark Tool interface designed to evaluate large language models (LLMs) on NVIDIA Jetson AGX Thor hardware using Ollama as the backend.</p>
<p>⚙️ <strong>Configuration (Left Sidebar)</strong></p>
<ul>
<li>GPU Information:</li>
<li>Detects if the device is a Jetson (in this case, a Jetson AGX Thor Developer Kit).</li>
<li>Shows details about the GPU (NVIDIA Jetson AGX Thor) and available memory (125,772 MB ≈ 122.8 GB).</li>
</ul>
<p><strong>Use Only GPU:</strong></p>
<p>A checkbox option that allows restricting benchmarks to GPU-only execution.</p>
<p><strong>📊 Main Panel</strong></p>
<p><strong>Title:</strong><em><strong> LLM Benchmark Tool with description: Benchmark LLM models using Ollama with real-time progress tracking.</strong></em></p>
<p><strong>Models Compatible with GPU memory (VRAM) requirements:</strong></p>
<ul>
<li>Displays a table of available models (llama3.2.1b, gemma3.4b, qwen3.14b, gpt-oss20b, etc.)</li>
<li>Shows how much memory (VRAM in GB) each model requires.</li>
<li>Marks them with ✅ if they are runnable on the detected GPU.</li>
</ul>
<p><strong>Select Models to Benchmark:</strong></p>
<ul>
<li>Lists the same models with checkboxes so the user can pick which ones to run benchmarks on.</li>
<li>Each option shows the memory requirement for clarity (e.g., gemma3.27b (17 GB), gpt-oss-120b (65 GB)).</li>
</ul>
<p><strong>🚀 Purpose</strong></p>
<p>The tool helps developers and researchers:</p>
<ul>
<li>See which LLMs are compatible with their GPU memory.</li>
<li>Select multiple models and run benchmarks to measure performance (latency, throughput, GPU utilization).</li>
<li>Use the results to compare models and make better deployment or scaling decisions.</li>
</ul>
</div><div class="fusion-video fusion-selfhosted-video" style="max-width:100%;"><div class="video-wrapper"><video playsinline="true" width="100%" style="object-fit: cover;" autoplay="true" muted="true" loop="true" preload="auto" controls="1"><source src="https://blog-en.openzeka.com/wp-content/uploads/2025/09/animation.webm" type="video/webm">Sorry, your browser doesn&#039;t support embedded videos.</video></div></div></div></div></div></div>
<p>The post <a href="https://blog-en.openzeka.com/hammerbench-agx-thors-power-meets-ollama/">HammerBench : AGX Thor’s Power Meets Ollama</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></content:encoded>
					
		
		<enclosure url="https://blog-en.openzeka.com/wp-content/uploads/2025/09/animation.webm" length="194501" type="video/webm" />

			</item>
		<item>
		<title>How to Run Llama.cpp Server on Jetson AGX Thor?</title>
		<link>https://blog-en.openzeka.com/how-to-run-llama-cpp-server-on-jetson-agx-thor/</link>
		
		<dc:creator><![CDATA[Enhar]]></dc:creator>
		<pubDate>Fri, 12 Sep 2025 10:44:53 +0000</pubDate>
				<category><![CDATA[Generative AI]]></category>
		<guid isPermaLink="false">https://blog.aetherix.com/?p=1410</guid>

					<description><![CDATA[<p>Llama.cpp Server on Jetson AGX Thor: Unlocking Edge AI  ... Continue Reading→</p>
<p>The post <a href="https://blog-en.openzeka.com/how-to-run-llama-cpp-server-on-jetson-agx-thor/">How to Run Llama.cpp Server on Jetson AGX Thor?</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-5 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling" style="--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;" ><div class="fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap" style="max-width:1331.2px;margin-left: calc(-4% / 2 );margin-right: calc(-4% / 2 );"><div class="fusion-layout-column fusion_builder_column fusion-builder-column-5 fusion_builder_column_1_1 1_1 fusion-flex-column" style="--awb-bg-blend:overlay;--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:0px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;"><div class="fusion-column-wrapper fusion-flex-justify-content-flex-start fusion-content-layout-column"><div class="fusion-title title fusion-title-11 fusion-sep-none fusion-title-text fusion-title-size-four"><h4 class="fusion-title-heading title-heading-left" style="margin:0;">Llama.cpp Server on Jetson AGX Thor: Unlocking Edge AI with Large Language Models</h4></div><div class="fusion-text fusion-text-18"><p><strong>Llama.cpp Server</strong> is a lightweight, high-performance runtime for large language models (LLMs), designed to run efficiently on both CPU and GPU. Built in C++, it eliminates unnecessary overhead and delivers deep hardware-level optimizations. By supporting the <strong>GGUF model format,</strong> it allows for quantization, drastically reducing memory requirements while maintaining accuracy. Through its<strong> REST API,</strong> Llama.cpp Server can be seamlessly integrated into applications, enabling developers to bring advanced LLM capabilities directly to devices—without relying on the cloud.</p>
<p>When deployed on <strong>NVIDIA Jetson AGX Thor</strong>, the advantages become even more compelling:</p>
<ul>
<li>GPU acceleration with<strong> CUDA</strong> ensures that the Thor’s compute power is fully utilized, bringing real-time inference to the edge.</li>
<li>Optimized for edge AI use cases such as robotics, autonomous systems, and industrial automation, it provides ultra-low latency decision-making.</li>
<li>Resource efficiency via quantization makes it possible to run models from 7B up to 13B parameters within the limited memory budgets typical of embedded devices.</li>
</ul>
<p>By combining <strong>Llama.cpp Server</strong> with Jetson<strong> AGX Thor</strong>, organizations gain a powerful platform for on-device AI that is private, fast, and cost-effective. No data needs to leave the device, latency is minimized, and the system remains fully adaptable to both prototyping and production scenarios. Supported by an open-source ecosystem, this pairing represents a breakthrough for deploying large language models securely and efficiently at the edge.</p>
</div></div></div><div class="fusion-layout-column fusion_builder_column fusion-builder-column-6 fusion_builder_column_1_1 1_1 fusion-flex-column" style="--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;"><div class="fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column"><div class="fusion-title title fusion-title-12 fusion-sep-none fusion-title-text fusion-title-size-three"><h3 class="fusion-title-heading title-heading-left" style="margin:0;">Requirements</h3></div><div class="fusion-text fusion-text-19"><ul>
<li>JetPack 7 (<span style="color: #76b900;"><a style="color: #76b900;" href="https://blog-en.openzeka.com/what-is-nvidia-jetpack-beginner-friendly-guide/">Learn more about JetPack</a></span>)</li>
<li>CUDA 13</li>
<li>At least 10 GB of free disk space<strong> (Only for the Llama Server image, not for the models.)</strong></li>
<li>A stable and fast internet connection</li>
</ul>
</div><div class="fusion-title title fusion-title-13 fusion-sep-none fusion-title-text fusion-title-size-four"><h4 class="fusion-title-heading title-heading-left" style="margin:0;">How to use Llama.cpp Server ?</h4></div><div class="fusion-text fusion-text-20"><p>Firstly download the image ;</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-1 > .CodeMirror, .fusion-syntax-highlighter-1 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-1 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_1" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_1" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_1" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">docker run --gpus all -it --rm \
  -p 8080:8080 \
  -v /workspace/models:/models \
  ghcr.io/nvidia-ai-iot/llama_cpp:r38.2.arm64-sbsa-cu130-24.04 \
  /bin/bash</textarea></div><div class="fusion-text fusion-text-21" style="--awb-margin-top:20px;"><p>Then, download the model from Hugging Face. If the model requires access, log in with your token by running:</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-2 > .CodeMirror, .fusion-syntax-highlighter-2 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-2 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_2" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_2" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_2" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh"># huggingface-cli login
hf download Qwen/Qwen3-4B-Instruct-2507</textarea></div><div class="fusion-text fusion-text-22" style="--awb-margin-top:20px;"><p>Then, install the required Python dependencies with the following command:</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-3 > .CodeMirror, .fusion-syntax-highlighter-3 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-3 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_3" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_3" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_3" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">pip install transformers torch mistral_common sentencepiece</textarea></div><div class="fusion-text fusion-text-23" style="--awb-margin-top:20px;"><p>This command set downloads the <strong>NVIDIA NVPL local repository package</strong>, installs it, adds the signing key to the system, and then installs the NVPL library via apt-get.</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-4 > .CodeMirror, .fusion-syntax-highlighter-4 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-4 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_4" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_4" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_4" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">export NVPL_VERSION=25.5
export DISTRO=ubuntu2404

wget https://developer.download.nvidia.com/compute/nvpl/${NVPL_VERSION}/local_installers/nvpl-local-repo-${DISTRO}-${NVPL_VERSION}_1.0-1_arm64.deb

dpkg -i nvpl-local-repo-ubuntu2404-25.5_1.0-1_arm64.deb

cp /var/nvpl-local-repo-ubuntu2404-25.5/nvpl-local-52E38D21-keyring.gpg /usr/share/keyrings/

apt-get update && apt-get install -y nvpl</textarea></div><div class="fusion-text fusion-text-24" style="--awb-margin-top:20px;"><p>This command takes the Qwen2.5-VL-3B-Instruct model downloaded from Hugging Face (inside the snapshot folder identified by ), and uses the convert_hf_to_gguf.py tool to convert the Hugging Face weights (safetensors/PyTorch) into GGUF format, saving the output as /data/models/Qwen3-4B-Instruct-2507-f16.gguf.</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-5 > .CodeMirror, .fusion-syntax-highlighter-5 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-5 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_5" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_5" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_5" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">python3 /opt/llama_cpp_python/vendor/llama.cpp/convert_hf_to_gguf.py \
  /data/models/huggingface/models--Qwen--Qwen2.5-VL-3B-Instruct/snapshots/<hash> \
  --outfile /data/models/Qwen3-4B-Instruct-2507-f16.gguf</textarea></div><div class="fusion-text fusion-text-25" style="--awb-margin-top:20px;"><p>This command takes the full-precision GGUF model (Qwen3-4B-Instruct-2507-f16.gguf) and runs it through llama-quantize to produce a quantized version <strong>(Qwen3-4B-Instruct-2507-q4_k_m.gguf)</strong> using the<strong> q4_k_m quantization method.</strong></p>
<ul>
<li><strong>Input file:</strong> /data/models/Qwen3-4B-Instruct-2507-f16.gguf (the FP16 model converted from Hugging Face).</li>
<li><strong>Output file:</strong> /data/models/Qwen3-4B-Instruct-2507-q4_k_m.gguf (smaller, quantized model).</li>
<li><strong>Quantization type:</strong> q4_k_m → a 4-bit quantization scheme optimized for speed and memory efficiency.</li>
</ul>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-6 > .CodeMirror, .fusion-syntax-highlighter-6 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-6 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_6" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_6" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_6" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">llama-quantize /data/models/Qwen3-4B-Instruct-2507-f16.gguf \
  /data/models/Qwen3-4B-Instruct-2507-q4_k_m.gguf q4_k_m</textarea></div><div class="fusion-text fusion-text-26" style="--awb-margin-top:20px;"><p>This command launches the llama.cpp server so the quantized model can be served via an<strong> HTTP API.</strong></p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-7 > .CodeMirror, .fusion-syntax-highlighter-7 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-7 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_7" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_7" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_7" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">llama-server \
  -m /data/models/Qwen3-4B-Instruct-2507-q4_k_m.gguf \
  --host 0.0.0.0 --port 8080 \
  -c 8192 \
  --n-gpu-layers 35</textarea></div><div class="fusion-text fusion-text-27" style="--awb-margin-top:20px;"><p>And thats it ! You can start chatting .</p>
</div><div class="fusion-image-element " style="--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);"><span class=" fusion-imageframe imageframe-none imageframe-1 hover-type-zoomin"><img fetchpriority="high" decoding="async" width="1024" height="568" title="Screenshot from 2025-09-12 13-31-53" src="https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-12-13-31-53-1024x568.png" alt class="img-responsive wp-image-1422" srcset="https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-12-13-31-53-200x111.png 200w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-12-13-31-53-400x222.png 400w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-12-13-31-53-600x333.png 600w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-12-13-31-53-800x444.png 800w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-12-13-31-53-1200x666.png 1200w" sizes="(max-width: 640px) 100vw, 1024px" /></span></div></div></div></div></div>
<p>The post <a href="https://blog-en.openzeka.com/how-to-run-llama-cpp-server-on-jetson-agx-thor/">How to Run Llama.cpp Server on Jetson AGX Thor?</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How to Run MLC LLM on Jetson AGX Thor?</title>
		<link>https://blog-en.openzeka.com/how-to-run-mlc-llm-on-jetson-agx-thor/</link>
		
		<dc:creator><![CDATA[Enhar]]></dc:creator>
		<pubDate>Tue, 09 Sep 2025 10:21:01 +0000</pubDate>
				<category><![CDATA[Generative AI]]></category>
		<guid isPermaLink="false">https://blog.aetherix.com/?p=1322</guid>

					<description><![CDATA[<p>What is MLC LLM ? MLC LLM (Machine Learning Compilation ... Continue Reading→</p>
<p>The post <a href="https://blog-en.openzeka.com/how-to-run-mlc-llm-on-jetson-agx-thor/">How to Run MLC LLM on Jetson AGX Thor?</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-6 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling" style="--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-padding-right:0px;--awb-padding-left:0px;--awb-flex-wrap:wrap;" ><div class="fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap" style="max-width:1331.2px;margin-left: calc(-4% / 2 );margin-right: calc(-4% / 2 );"><div class="fusion-layout-column fusion_builder_column fusion-builder-column-7 fusion_builder_column_1_1 1_1 fusion-flex-column" style="--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;"><div class="fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column"><div class="fusion-title title fusion-title-14 fusion-sep-none fusion-title-text fusion-title-size-three"><h3 class="fusion-title-heading title-heading-left" style="margin:0;">What is MLC LLM ?</h3></div><div class="fusion-text fusion-text-28"><p><strong>MLC LLM (Machine Learning Compilation for Large Language Models)</strong> is an open-source project designed to make large language models (LLMs) run efficiently across different hardware platforms. Its main goal is to optimize performance and reduce energy consumption, enabling AI applications to run not only in the cloud but also on edge devices.</p>
<p>NVIDIA’s next-generation <strong>Jetson AGX Thor platform</strong> delivers powerful computing capabilities for robotics, autonomous systems, and AI-driven applications. By leveraging <strong>MLC LLM</strong> on <strong>Jetson AGX Thor</strong>, large language models can be optimized to run in real time, supporting tasks such as natural language processing, decision-making, and human-like interaction with higher efficiency.</p>
</div></div></div><div class="fusion-layout-column fusion_builder_column fusion-builder-column-8 fusion_builder_column_1_1 1_1 fusion-flex-column" style="--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;"><div class="fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column"><div class="fusion-text fusion-text-29"><p>In short, <strong>MLC LLM</strong> on <strong>Jetson AGX Thor</strong> acts as a bridge that brings high-performance large language model capabilities to edge devices.</p>
</div><div class="fusion-title title fusion-title-15 fusion-sep-none fusion-title-text fusion-title-size-three"><h3 class="fusion-title-heading title-heading-left" style="margin:0;">Requirements</h3></div><div class="fusion-text fusion-text-30"><ul>
<li>JetPack 7 (<span style="color: #76b900;"><a style="color: #76b900;" href="https://blog-en.openzeka.com/what-is-nvidia-jetpack-beginner-friendly-guide/">Learn more about JetPack</a></span>)</li>
<li>CUDA 13</li>
<li>At least 25 GB of free disk space<strong> (Only for the MLC LLM image, not for the models.)</strong></li>
<li>A stable and fast internet connection</li>
</ul>
</div><div class="fusion-title title fusion-title-16 fusion-sep-none fusion-title-text fusion-title-size-three"><h3 class="fusion-title-heading title-heading-left" style="margin:0;">How to use <i>MLC</i> LLM ?</h3></div><div class="fusion-text fusion-text-31"><p>First, install the <i>Docker</i> image on your computer:</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-8 > .CodeMirror, .fusion-syntax-highlighter-8 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-8 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_8" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_8" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_8" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="hopscotch" data-mode="text/x-sh">sudo docker run -it --rm \
  --runtime nvidia \
  --gpus all \
  -v /workspace:/workspace \
  -p 6678:6678 \
  -p 6677:6677 \
  ghcr.io/nvidia-ai-iot/mlc:r38.2.arm64-sbsa-cu130-24.04 </textarea></div><div class="fusion-text fusion-text-32" style="--awb-margin-top:20px;"><p>If you’d like to explore the available images or replace them with newer ones, you can visit the <strong><a style="color: #14ce00;" href="http://ghcr.io/nvidia-ai-iot/mlc">GitHub Container Registry.</a></strong></p>
</div><div class="fusion-text fusion-text-33"><p>Once inside the container, find the model you want to download from Hugging Face.<br />
Use the hf download command inside the container to download the model.</p>
<p><strong>For example:</strong></p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-9 > .CodeMirror, .fusion-syntax-highlighter-9 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-9 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_9" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_9" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_9" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="hopscotch" data-mode="text/x-sh">hf download Qwen/Qwen3-30B-A3B-Instruct-2507</textarea></div><div class="fusion-text fusion-text-34" style="--awb-margin-top:20px;"><p>In the next step, provide the folder where you downloaded the model and run the command below.<br />
This command converts the model’s original Hugging Face weights (in safetensor format) into the optimized<strong> MLC LLM format.</strong> During conversion, the weights are quantized (e.g., to <strong>q4bf16_1</strong>), which reduces memory usage and improves runtime efficiency on GPU without heavily sacrificing accuracy.</p>
<p>In short,<strong> mlc_llm convert_weight</strong> takes the raw model checkpoint and transforms it into a format that can be directly executed by the MLC runtime on your target device (e.g., Jetson AGX Thor with CUDA).</p>
<p><em><strong>⚠️ Warning:</strong> In the command, replace in <strong>snapshots//</strong> with the actual folder name you see inside the snapshots directory<strong> (e.g., aeb13307a71acd8fe81861d94ad54ab689df&#8230;)</strong>. This folder contains the real model files such as config.json, tokenizer.json, and model.safetensors, which are required for the <strong>mlc_llm convert_weight</strong> command to work.</em></p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-10 > .CodeMirror, .fusion-syntax-highlighter-10 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-10 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_10" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_10" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_10" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="hopscotch" data-mode="text/x-sh">mlc_llm convert_weight /data/models/huggingface/models--Qwen--Qwen3-30B-A3B-Instruct-2507/snapshots/<hash>/ \
    --quantization q4bf16_1 \
    --model-type qwen3 \
    --device cuda \
    --source-format huggingface-safetensor \
    -o /workspace/models/mlc/Qwen3-30B-A3B-Instruct-2507-q4bf16_1</textarea></div><div class="fusion-text fusion-text-35" style="--awb-margin-top:20px;"><p>In the next step , <strong>gen_config</strong> generates the configuration files needed to run the converted model in <strong>MLC</strong>. It defines the conversation template (<strong>e.g., Qwen format</strong>), context length, batch size, and other runtime parameters. In short, it makes the weight-converted model fully executable in the MLC runtime.</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-11 > .CodeMirror, .fusion-syntax-highlighter-11 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-11 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_11" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_11" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_11" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="hopscotch" data-mode="text/x-sh">mlc_llm gen_config \
    /data/models/huggingface/models--Qwen--Qwen3-30B-A3B-Instruct-2507/snapshots/<hash>/config.json \
    --quantization q4bf16_1 \
    --conv-template qwen2 \
    --context-window-size 32768 \
    --prefill-chunk-size 4096 \
    --max-batch-size 3 \
    --output /workspace/models/mlc/Qwen3-30B-A3B-Instruct-2507-q4bf16_1
</textarea></div><div class="fusion-text fusion-text-36" style="--awb-margin-top:20px;"><p><em><strong>⚠️ Note:</strong> The “Not found” messages for files like tokenizer.model or added_tokens.json are not errors. These files are optional and not required by all models. As long as t<strong>okenizer.json, vocab.json, and merges.txt</strong> are found and copied, the model configuration is complete and ready to run.</em></p>
<p>Now that the configuration is ready, we can move on to the compilation step. In this stage, the model is compiled into a CUDA-optimized shared library (.so file), which enables fast execution on the GPU.</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-12 > .CodeMirror, .fusion-syntax-highlighter-12 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-12 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_12" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_12" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_12" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="hopscotch" data-mode="text/x-sh">mlc_llm compile \
    /workspace/models/mlc/Qwen3-30B-A3B-Instruct-2507-q4bf16_1/mlc-chat-config.json \
    --device cuda \
    -o /workspace/models/mlc/Qwen3-30B-A3B-Instruct-2507-q4bf16_1/Qwen3-30B-A3B-Instruct-2507-q4bf16_1-cuda.so \
    --quantization q4bf16_1 \
    --model-type qwen3 \
    --opt="cublas_gemm=1;cudagraph=1"</textarea></div><div class="fusion-text fusion-text-37" style="--awb-margin-top:20px;"><p>With the compilation complete, the final step is to serve the model so it can handle inference requests. The<strong> mlc_llm serve</strong> command launches an HTTP server that exposes the model as an API endpoint, making it accessible for testing or integration into applications.</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-13 > .CodeMirror, .fusion-syntax-highlighter-13 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-13 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_13" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_13" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_13" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="hopscotch" data-mode="text/x-sh">mlc_llm serve /workspace/models/mlc/Qwen3-30B-A3B-Instruct-2507-q4bf16_1 \
  --port 6678 \
  --host 0.0.0.0 \
  --device cuda \
  --mode interactive \
  --model-lib /workspace/models/mlc/Qwen3-30B-A3B-Instruct-2507-q4bf16_1/Qwen3-30B-A3B-Instruct-2507-q4bf16_1-cuda.so \
  --overrides "max_num_sequence=1;max_total_seq_length=32768;context_window_size=32768;gpu_memory_utilization=0.3"</textarea></div><div class="fusion-text fusion-text-38" style="--awb-margin-top:20px;"><p><em><strong>If you see this output, it means the model has been successfully compiled and serving .</strong></em></p>
</div><div class="fusion-image-element " style="--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);"><span class=" fusion-imageframe imageframe-none imageframe-2 hover-type-none"><img decoding="async" width="453" height="69" title="Screenshot from 2025-09-08 14-56-47" src="https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-14-56-47.png" alt class="img-responsive wp-image-1333" srcset="https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-14-56-47-200x30.png 200w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-14-56-47-400x61.png 400w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-14-56-47.png 453w" sizes="(max-width: 640px) 100vw, 453px" /></span></div><div class="fusion-text fusion-text-39" style="--awb-margin-top:20px;"><p>You can test it with this curl request ;</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-14 > .CodeMirror, .fusion-syntax-highlighter-14 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-14 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_14" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_14" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_14" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="hopscotch" data-mode="text/x-sh">curl -X POST http://localhost:6678/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "<model-name>",
    "messages": [
      {"role": "system", "content": "You are a helpful AI assistant."},
      {"role": "user", "content": "Hello !"}
    ],
    "temperature": 0.7,
    "max_tokens": 512,
    "stream": false
  }'</textarea></div><div class="fusion-title title fusion-title-17 fusion-sep-none fusion-title-text fusion-title-size-four"><h4 class="fusion-title-heading title-heading-left" style="margin:0;">Which Jetson should I choose for my LLM model?</h4></div><div class="fusion-text fusion-text-40"><p>Below, you can find the RAM requirements of the most popular LLM models along with Jetson recommendations that meet the minimum specifications to run them. You can choose the one that best fits your needs.</p>
</div>
<div class="table-1">
<table width="100%">
<thead>
<tr>
<th align="left">Model</th>
<th align="left">Parameters</th>
<th align="left">Quantization</th>
<th align="left">Required RAM (GB)</th>
<th align="left">Recommended Minimum Jetson</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">deepseek-ai Deepseek-R1 Base</td>
<td align="left">684B</td>
<td align="left">Dynamic-1.58-bit</td>
<td align="left">162.11</td>
<td align="left">Not supported (≥128 GB and above)</td>
</tr>
<tr>
<td align="left">deepseek-ai Deepseek-R1 Distill-Qwen-1.5B</td>
<td align="left">1.5B</td>
<td align="left">Q4_K_M</td>
<td align="left">0.90</td>
<td align="left">Jetson Orin Nano 4 GB, Jetson Nano 4 GB</td>
</tr>
<tr>
<td align="left">deepseek-ai Deepseek-R1 Distill-Qwen-7B</td>
<td align="left">7B</td>
<td align="left">Q5_K_M</td>
<td align="left">5.25</td>
<td align="left">Jetson Orin Nano 8 GB, Jetson Orin NX 8 GB, Jetson Xavier NX 8 GB</td>
</tr>
<tr>
<td align="left">mistralai Mixtral 8x22B-Instruct-v0.1</td>
<td align="left">22B</td>
<td align="left">Q4_K_M</td>
<td align="left">13.20</td>
<td align="left">Jetson Orin NX 16 GB, Jetson AGX Orin 32 GB, Jetson AGX Xavier 32 GB</td>
</tr>
<tr>
<td align="left">mistralai Mathstral 7B-v0.1</td>
<td align="left">7B</td>
<td align="left">Q5_K_M</td>
<td align="left">5.25</td>
<td align="left">Jetson Orin Nano 8 GB, Jetson Orin NX 8 GB, Jetson Xavier NX 8 GB</td>
</tr>
<tr>
<td align="left">google gemma-3 12b-it</td>
<td align="left">12B</td>
<td align="left">Q4_K_M</td>
<td align="left">7.20</td>
<td align="left">Jetson Orin NX 8 GB, Jetson Orin Nano 8 GB, Jetson Xavier NX 8 GB</td>
</tr>
<tr>
<td align="left">meta-llama Llama-3.1 70B-Instruct</td>
<td align="left">70B</td>
<td align="left">Q5_K_M</td>
<td align="left">52.50</td>
<td align="left">Jetson AGX Orin 64 GB, Jetson AGX Xavier 64 GB, Jetson AGX Thor (T5000) 128 GB</td>
</tr>
</tbody>
</table>
</div>
</div></div></div></div>
<p>The post <a href="https://blog-en.openzeka.com/how-to-run-mlc-llm-on-jetson-agx-thor/">How to Run MLC LLM on Jetson AGX Thor?</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How to Run vLLM on Jetson AGX Thor?</title>
		<link>https://blog-en.openzeka.com/how-to-run-vllm-on-jetson-agx-thor/</link>
		
		<dc:creator><![CDATA[Enhar]]></dc:creator>
		<pubDate>Tue, 09 Sep 2025 10:16:57 +0000</pubDate>
				<category><![CDATA[Generative AI]]></category>
		<guid isPermaLink="false">https://blog.aetherix.com/?p=1338</guid>

					<description><![CDATA[<p>What is vLLM and Why Does It Matter on Jetson AGX Thor? ... Continue Reading→</p>
<p>The post <a href="https://blog-en.openzeka.com/how-to-run-vllm-on-jetson-agx-thor/">How to Run vLLM on Jetson AGX Thor?</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-7 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling" style="--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;" ><div class="fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap" style="max-width:1331.2px;margin-left: calc(-4% / 2 );margin-right: calc(-4% / 2 );"><div class="fusion-layout-column fusion_builder_column fusion-builder-column-9 fusion_builder_column_1_1 1_1 fusion-flex-column" style="--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;"><div class="fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column"><div class="fusion-title title fusion-title-18 fusion-sep-none fusion-title-text fusion-title-size-three"><h3 class="fusion-title-heading title-heading-left" style="margin:0;">What is vLLM and Why Does It Matter on Jetson AGX Thor?</h3></div><div class="fusion-text fusion-text-41"><p><strong>vLLM</strong> is an open-source inference engine designed to run large language models (LLMs) with exceptional efficiency. Thanks to its innovative PagedAttention architecture, vLLM delivers both high throughput and low latency making it possible to deploy advanced AI models in real-time applications.</p>
<p>On the other side, NVIDIA Jetson AGX Thor is a next-generation edge AI platform built for robotics, autonomous machines, and industrial systems. With its immense compute power and AI acceleration, Thor is the perfect hardware to unlock the full potential of LLMs at the edge.</p>
<p>When combined, vLLM on Jetson AGX Thor enables:</p>
<ul>
<li><strong>Real-time LLM services (chatbots, assistants, summarization, translation)</strong></li>
<li><strong>Vision + Language use cases (explaining camera input instantly)</strong></li>
<li><strong>On-device inference with ultra-low latency and stronger data privacy</strong></li>
<li><strong>Reduced reliance on cloud resources, with better energy efficiency</strong></li>
</ul>
<p>In short, vLLM provides the software intelligence, Thor provides the hardware muscle together they make cutting-edge LLM experiences possible directly on the device.</p>
</div><div class="fusion-title title fusion-title-19 fusion-sep-none fusion-title-text fusion-title-size-three"><h3 class="fusion-title-heading title-heading-left" style="margin:0;">Installing Process</h3></div><div class="fusion-text fusion-text-42"><p>First, download the following Triton Inference Server container image.<br />
This image comes with vLLM version 0.9.2 pre-installed. The tag 25.08 refers to August 2025.</p>
<p>If you’d like to update to a newer version in the future, you can always visit the <strong><a style="color: #00dd37;" href="https://catalog.ngc.nvidia.com/?filters=&amp;orderBy=weightPopularDESC&amp;query=&amp;page=&amp;pageSize=">NVIDIA NGC Catalog</a></strong> to find the latest container releases.</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-15 > .CodeMirror, .fusion-syntax-highlighter-15 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-15 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_15" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_15" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_15" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">docker run --name vllm_container -it \
  --gpus all \
  -p 8000:8000 \
  -v $HOME/.cache/huggingface:/root/.cache/huggingface \
  nvcr.io/nvidia/tritonserver:25.08-vllm-python-py3 bash</textarea></div><div class="fusion-text fusion-text-43" style="--awb-margin-top:20px;"><p>You can verify the installed vLLM version directly with Python.</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-16 > .CodeMirror, .fusion-syntax-highlighter-16 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-16 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_16" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_16" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_16" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">python3 -c "import vllm; print(vllm.__version__)"</textarea></div><div class="fusion-text fusion-text-44" style="--awb-margin-top:20px;"><p>Next, you’ll need to create an account on Hugging Face , generate an access token, and log in with it.</p>
<p>This token will allow the container to securely download and run models directly from <a href="https://huggingface.co/"><strong style="color: #00e200;">Hugging Face.</strong></a></p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-17 > .CodeMirror, .fusion-syntax-highlighter-17 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-17 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_17" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_17" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_17" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">huggingface-cli login</textarea></div><div class="fusion-text fusion-text-45" style="--awb-margin-top:20px;"><p>To download model run ;</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-18 > .CodeMirror, .fusion-syntax-highlighter-18 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-18 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_18" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_18" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_18" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">hf download <model></textarea></div><div class="fusion-text fusion-text-46" style="--awb-margin-top:20px;"><p>Once your environment is ready, you can launch the vLLM API server using the following command:</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-19 > .CodeMirror, .fusion-syntax-highlighter-19 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-19 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_19" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_19" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_19" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">python3 -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --tensor-parallel-size 1 \
  --gpu-memory-utilization 0.90 \
  --max-model-len 8192 \
  --dtype float16</textarea></div><div class="fusion-text fusion-text-47" style="--awb-margin-top:20px;"><p>Here’s what each parameter does:</p>
<ul>
<li><strong><em>&#8211;model</em> →</strong> specifies which model to load (in this case, Llama-3.1-8B-Instruct from Hugging Face).</li>
<li><em><strong>&#8211;tensor-parallel-size 1</strong> </em>→ runs the model on a single GPU. If you have multiple GPUs, you can increase this value.</li>
<li><em><strong>&#8211;gpu-memory-utilization 0.90</strong></em> → tells vLLM to use up to 90% of available GPU memory. Adjust this if you run into memory errors.</li>
<li><em><strong>&#8211;max-model-len 8192 →</strong></em> sets the maximum context length (in tokens) for the model.</li>
<li><em><strong>&#8211;dtype float16 →</strong> </em>runs the model in FP16 precision, which is more efficient on Jetson AGX Thor.</li>
</ul>
</div><div class="fusion-text fusion-text-48"><p><em><strong>⚠️ Heads-up: If you encounter ;</strong></em></p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-20 > .CodeMirror, .fusion-syntax-highlighter-20 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-20 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_20" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_20" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_20" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}</textarea></div><div class="fusion-text fusion-text-49" style="--awb-margin-top:20px;"><p><em><strong>It usually means the engine couldn’t reserve enough GPU memory. Try lowering the GPU memory utilization. For example try with &#8211;gpu-memory-utilization 0.75 .</strong></em></p>
</div><div class="fusion-image-element " style="--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);"><span class=" fusion-imageframe imageframe-none imageframe-3 hover-type-none"><img decoding="async" width="1024" height="617" title="Screenshot from 2025-09-09 09-33-20" src="https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-09-09-33-20-1024x617.png" alt class="img-responsive wp-image-1345" srcset="https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-09-09-33-20-200x120.png 200w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-09-09-33-20-400x241.png 400w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-09-09-33-20-600x361.png 600w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-09-09-33-20-800x482.png 800w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-09-09-33-20-1200x723.png 1200w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-09-09-33-20.png 1393w" sizes="(max-width: 640px) 100vw, 1024px" /></span></div><div class="fusion-text fusion-text-50" style="--awb-margin-top:20px;"><p>If you see a message like:</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-21 > .CodeMirror, .fusion-syntax-highlighter-21 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-21 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_21" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_21" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_21" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">Starting vLLM API server 0 on http://0.0.0.0:8000</textarea></div><div class="fusion-text fusion-text-51" style="--awb-margin-top:20px;"><p>it means that vLLM is now serving on port 8000 and ready to accept requests.<br />
At this point, you can start testing it with a simple curl command. For example:</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-22 > .CodeMirror, .fusion-syntax-highlighter-22 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-22 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_22" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_22" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_22" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-8B-Instruct",
    "messages": [{"role": "user", "content": "Hello Jetson AGX Thor!"}],
    "max_tokens": 64
  }'</textarea></div><div class="fusion-title title fusion-title-20 fusion-sep-none fusion-title-text fusion-title-size-four"><h4 class="fusion-title-heading title-heading-left" style="margin:0;">Which Jetson should I choose for my LLM model?</h4></div><div class="fusion-text fusion-text-52"><p>Below, you can find the RAM requirements of the most popular LLM models along with Jetson recommendations that meet the minimum specifications to run them. You can choose the one that best fits your needs.</p>
</div>
<div class="table-1">
<table width="100%">
<thead>
<tr>
<th align="left">Model</th>
<th align="left">Parameters</th>
<th align="left">Quantization</th>
<th align="left">Required RAM (GB)</th>
<th align="left">Recommended Minimum Jetson</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">deepseek-ai Deepseek-R1 Base</td>
<td align="left">684B</td>
<td align="left">Dynamic-1.58-bit</td>
<td align="left">162.11</td>
<td align="left">Not supported (≥128 GB and above)</td>
</tr>
<tr>
<td align="left">deepseek-ai Deepseek-R1 Distill-Qwen-1.5B</td>
<td align="left">1.5B</td>
<td align="left">Q4_K_M</td>
<td align="left">0.90</td>
<td align="left">Jetson Orin Nano 4 GB, Jetson Nano 4 GB</td>
</tr>
<tr>
<td align="left">deepseek-ai Deepseek-R1 Distill-Qwen-7B</td>
<td align="left">7B</td>
<td align="left">Q5_K_M</td>
<td align="left">5.25</td>
<td align="left">Jetson Orin Nano 8 GB, Jetson Orin NX 8 GB, Jetson Xavier NX 8 GB</td>
</tr>
<tr>
<td align="left">mistralai Mixtral 8x22B-Instruct-v0.1</td>
<td align="left">22B</td>
<td align="left">Q4_K_M</td>
<td align="left">13.20</td>
<td align="left">Jetson Orin NX 16 GB, Jetson AGX Orin 32 GB, Jetson AGX Xavier 32 GB</td>
</tr>
<tr>
<td align="left">mistralai Mathstral 7B-v0.1</td>
<td align="left">7B</td>
<td align="left">Q5_K_M</td>
<td align="left">5.25</td>
<td align="left">Jetson Orin Nano 8 GB, Jetson Orin NX 8 GB, Jetson Xavier NX 8 GB</td>
</tr>
<tr>
<td align="left">google gemma-3 12b-it</td>
<td align="left">12B</td>
<td align="left">Q4_K_M</td>
<td align="left">7.20</td>
<td align="left">Jetson Orin NX 8 GB, Jetson Orin Nano 8 GB, Jetson Xavier NX 8 GB</td>
</tr>
<tr>
<td align="left">meta-llama Llama-3.1 70B-Instruct</td>
<td align="left">70B</td>
<td align="left">Q5_K_M</td>
<td align="left">52.50</td>
<td align="left">Jetson AGX Orin 64 GB, Jetson AGX Xavier 64 GB, Jetson AGX Thor (T5000) 128 GB</td>
</tr>
</tbody>
</table>
</div>
</div></div></div></div>
<p>The post <a href="https://blog-en.openzeka.com/how-to-run-vllm-on-jetson-agx-thor/">How to Run vLLM on Jetson AGX Thor?</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>How to Run Ollama on Jetson AGX Thor with OpenwebUI?</title>
		<link>https://blog-en.openzeka.com/how-to-run-ollama-on-jetson-agx-thor-with-openwebui/</link>
		
		<dc:creator><![CDATA[Enhar]]></dc:creator>
		<pubDate>Tue, 09 Sep 2025 10:11:28 +0000</pubDate>
				<category><![CDATA[Generative AI]]></category>
		<guid isPermaLink="false">https://blog.aetherix.com/?p=1301</guid>

					<description><![CDATA[<p>What is Ollama?  Ollama is a lightweight and flexible p ... Continue Reading→</p>
<p>The post <a href="https://blog-en.openzeka.com/how-to-run-ollama-on-jetson-agx-thor-with-openwebui/">How to Run Ollama on Jetson AGX Thor with OpenwebUI?</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-8 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling" style="--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-flex-wrap:wrap;" ><div class="fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap" style="max-width:1331.2px;margin-left: calc(-4% / 2 );margin-right: calc(-4% / 2 );"><div class="fusion-layout-column fusion_builder_column fusion-builder-column-10 fusion_builder_column_1_1 1_1 fusion-flex-column" style="--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;"><div class="fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column"><div class="fusion-title title fusion-title-21 fusion-sep-none fusion-title-text fusion-title-size-three"><h3 class="fusion-title-heading title-heading-left" style="margin:0;">What is Ollama?</h3></div><div class="fusion-text fusion-text-53"><p>Ollama is a lightweight and flexible platform that allows you to run large language models (LLMs) directly on your own device. When running on powerful AI hardware such as the <strong>NVIDIA Jetson AGX Thor</strong>, it provides a local, fast, and secure experience without the need for cloud-based solutions.</p>
<p>Thanks to the high processing power of Jetson AGX Thor, Ollama:</p>
<ul>
<li><strong>Runs LLMs locally</strong> → Can be used even without an internet connection.</li>
<li><strong>Utilizes hardware acceleration</strong> → Leverages GPU power to generate faster responses.</li>
<li><strong>Ensures data privacy</strong> → All processing happens on-device, so sensitive data never leaves the system.</li>
<li><strong>Offers flexibility</strong> → Different models can be downloaded, customized, and tested.</li>
</ul>
<p>In short, Ollama leverages the hardware advantages of AGX Jetson Thor to make AI applications more accessible, portable, and secure.</p>
</div><div class="fusion-title title fusion-title-22 fusion-sep-none fusion-title-text fusion-title-size-three"><h3 class="fusion-title-heading title-heading-left" style="margin:0;">Requirements for AGX Thor</h3></div><div class="fusion-text fusion-text-54"><ol>
<li>JetPack 7 must be installed</li>
<li>Stable high-speed internet connection</li>
<li>At least 15 GB of free disk space (excluding model storage for Ollama itself)</li>
</ol>
</div><div class="fusion-title title fusion-title-23 fusion-sep-none fusion-title-text fusion-title-size-three"><h3 class="fusion-title-heading title-heading-left" style="margin:0;">Installation Process</h3></div><div class="fusion-text fusion-text-55"><p>First, we create a folder to mount into the container.</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-23 > .CodeMirror, .fusion-syntax-highlighter-23 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-23 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_23" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_23" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_23" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">mkdir ~/ollama-data/</textarea></div><div class="fusion-text fusion-text-56" style="--awb-margin-top:20px;"><p>Next, we download the image from the <strong>GitHub Container Registry.</strong><br />
The <strong>ghcr.io</strong> prefix indicates that the image is hosted on the GitHub Container Registry.</p>
<p>To access other images or check for the latest updates, you can visit the following <strong><a style="color: #2a9e00;" href="https://github.com/orgs/NVIDIA-AI-IOT/packages/container/package/ollama">link.</a></strong></p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-24 > .CodeMirror, .fusion-syntax-highlighter-24 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-24 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_24" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_24" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_24" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">docker run --rm -it -v ${HOME}/ollama-data:/data ghcr.io/nvidia-ai-iot/ollama:r38.2.arm64-sbsa-cu130-24.04</textarea></div><div class="fusion-text fusion-text-57" style="--awb-margin-top:20px;"><p>It will take some time to pull (download) the container image.</p>
<p>Once in the container, you will see something like this.</p>
</div><div class="fusion-image-element " style="--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);"><span class=" fusion-imageframe imageframe-none imageframe-4 hover-type-none"><img decoding="async" width="848" height="817" title="Screenshot from 2025-09-08 11-44-34" src="https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-11-44-34.png" alt class="img-responsive wp-image-1306" srcset="https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-11-44-34-200x193.png 200w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-11-44-34-400x385.png 400w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-11-44-34-600x578.png 600w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-11-44-34-800x771.png 800w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-11-44-34.png 848w" sizes="(max-width: 640px) 100vw, 848px" /></span></div><div class="fusion-text fusion-text-58" style="--awb-margin-top:20px;"><p>Try running a GPT OSS (20b parameter) model by issuing a command below.</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-25 > .CodeMirror, .fusion-syntax-highlighter-25 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-25 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_25" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_25" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_25" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">ollama run --verbose gpt-oss:20b</textarea></div><div class="fusion-text fusion-text-59" style="--awb-margin-top:20px;"><p>Once ready, it will show something like this:</p>
</div><div class="fusion-image-element " style="text-align:center;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);"><span class=" fusion-imageframe imageframe-none imageframe-5 hover-type-none"><img decoding="async" width="697" height="522" title="Screenshot from 2025-09-08 11-50-28" src="https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-11-50-28.png" alt class="img-responsive wp-image-1310" srcset="https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-11-50-28-200x150.png 200w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-11-50-28-400x300.png 400w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-11-50-28-600x449.png 600w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-11-50-28.png 697w" sizes="(max-width: 640px) 100vw, 697px" /></span></div><div class="fusion-title title fusion-title-24 fusion-sep-none fusion-title-text fusion-title-size-three" style="--awb-margin-top:20px;"><h3 class="fusion-title-heading title-heading-left" style="margin:0;">Troubleshooting</h3></div><div class="fusion-text fusion-text-60"><p><strong>CUDA out of memory</strong></p>
<p>If you encounter CUDA out of memory errors, try running a <strong>smaller model.</strong><br />
You can also use quantization to reduce memory usage and run models more efficiently on your device.</p>
<p>Different model sizes and quantized versions can be found <strong><a style="color: #1bcc00;" href="https://ollama.com">here</a><span style="color: #1bcc00;">.</span> </strong></p>
</div><div class="fusion-title title fusion-title-25 fusion-sep-none fusion-title-text fusion-title-size-three"><h3 class="fusion-title-heading title-heading-left" style="margin:0;">Installing OpenwebUI</h3></div><div class="fusion-text fusion-text-61"><p>Firsty run this command on terminal ;</p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-26 > .CodeMirror, .fusion-syntax-highlighter-26 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-26 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_26" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_26" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_26" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">docker run -it --rm --network=host --add-host=host.docker.internal:host-gateway ghcr.io/open-webui/open-webui:main</textarea></div><div class="fusion-text fusion-text-62" style="--awb-margin-top:20px;"><p><em>If you see the <strong>&#8220;application startup&#8221;</strong> message on the screen, you can proceed to the next step.</em><br />
<em>If it says <strong>&#8220;retrying&#8221;</strong> and you don’t see any progress in the download section, stop the process with <strong>Control + C</strong> and try again or just wait. There should be no problem.</em></p>
</div><div class="fusion-image-element " style="--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);"><span class=" fusion-imageframe imageframe-none imageframe-6 hover-type-none"><img decoding="async" width="960" height="589" title="Screenshot from 2025-09-08 13-31-30" src="https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-13-31-30.png" alt class="img-responsive wp-image-1314" srcset="https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-13-31-30-200x123.png 200w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-13-31-30-400x245.png 400w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-13-31-30-600x368.png 600w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-13-31-30-800x491.png 800w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-13-31-30.png 960w" sizes="(max-width: 640px) 100vw, 960px" /></span></div><div class="fusion-text fusion-text-63" style="--awb-margin-top:20px;"><p>You can then navigate your browser to <em><strong>http://JETSON_IP:8080</strong></em> , and create a fake account to log in (these credentials are only local). Instead of <strong>JETSON_IP</strong>, you can also use localhost.</p>
<p>Create an account .</p>
</div><div class="fusion-image-element " style="text-align:center;--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);"><span class=" fusion-imageframe imageframe-none imageframe-7 hover-type-none"><img decoding="async" width="613" height="482" title="Screenshot from 2025-09-08 13-36-48" src="https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-13-36-48.png" alt class="img-responsive wp-image-1315" srcset="https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-13-36-48-200x157.png 200w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-13-36-48-400x315.png 400w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-13-36-48-600x472.png 600w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-13-36-48.png 613w" sizes="(max-width: 640px) 100vw, 613px" /></span></div><div class="fusion-text fusion-text-64"><p><em><strong>⚠️ Be careful !</strong> When OpenWebUI is launched, <strong>no model</strong> will appear in the <strong>Load Models</strong> section at the top left. To connect models to <strong>OpenWebUI</strong>, we need to assign a port. Restart the Ollama container with the following command:</em></p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-27 > .CodeMirror, .fusion-syntax-highlighter-27 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-27 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_27" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_27" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_27" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">docker run --rm -it \
  -p 11434:11434 \
  -v ${HOME}/ollama-data:/data \
  ghcr.io/nvidia-ai-iot/ollama:r38.2.arm64-sbsa-cu130-24.04</textarea></div><div class="fusion-text fusion-text-65" style="--awb-margin-top:20px;"><p data-start="53" data-end="98">You can check it by sending a <strong>curl request:</strong></p>
</div><style type="text/css" scopped="scopped">.fusion-syntax-highlighter-28 > .CodeMirror, .fusion-syntax-highlighter-28 > .CodeMirror .CodeMirror-gutters {background-color:#000000;}</style><div class="fusion-syntax-highlighter-container fusion-syntax-highlighter-28 fusion-syntax-highlighter-theme-dark" style="opacity:0;margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0px;font-size:14px;border-width:1px;border-style:solid;border-color:rgba(242,243,245,0);"><div class="syntax-highlighter-copy-code"><span class="syntax-highlighter-copy-code-title" data-id="fusion_syntax_highlighter_28" style="font-size:14px;">Copy to Clipboard</span></div><label for="fusion_syntax_highlighter_28" class="screen-reader-text">Syntax Highlighter</label><textarea class="fusion-syntax-highlighter-textarea" id="fusion_syntax_highlighter_28" data-readOnly="nocursor" data-lineNumbers="" data-lineWrapping="" data-theme="oceanic-next" data-mode="text/x-sh">curl http://localhost:11434</textarea></div><div class="fusion-text fusion-text-66" style="--awb-margin-top:20px;"><p>If you see “<strong>Ollama is running</strong>”, you can continue using it.</p>
</div><div class="fusion-image-element " style="--awb-caption-title-font-family:var(--h2_typography-font-family);--awb-caption-title-font-weight:var(--h2_typography-font-weight);--awb-caption-title-font-style:var(--h2_typography-font-style);--awb-caption-title-size:var(--h2_typography-font-size);--awb-caption-title-transform:var(--h2_typography-text-transform);--awb-caption-title-line-height:var(--h2_typography-line-height);--awb-caption-title-letter-spacing:var(--h2_typography-letter-spacing);"><span class=" fusion-imageframe imageframe-none imageframe-8 hover-type-none"><img decoding="async" width="1024" height="261" title="Screenshot from 2025-09-08 14-08-00" src="https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-14-08-00-1024x261.png" alt class="img-responsive wp-image-1320" srcset="https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-14-08-00-200x51.png 200w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-14-08-00-400x102.png 400w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-14-08-00-600x153.png 600w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-14-08-00-800x204.png 800w, https://blog-en.openzeka.com/wp-content/uploads/2025/09/Screenshot-from-2025-09-08-14-08-00.png 1058w" sizes="(max-width: 640px) 100vw, 1024px" /></span></div><div class="fusion-title title fusion-title-26 fusion-sep-none fusion-title-text fusion-title-size-four"><h4 class="fusion-title-heading title-heading-left" style="margin:0;">Which Jetson should I choose for my LLM model?</h4></div><div class="fusion-text fusion-text-67 fusion-text-no-margin" style="--awb-margin-bottom:-20px;"><p>Below, you can find the RAM requirements of the most popular LLM models along with Jetson recommendations that meet the minimum specifications to run them. You can choose the one that best fits your needs.</p>
</div>
<div class="table-1">
<p>&nbsp;</p>
<table width="100%">
<thead>
<tr>
<th align="left">Model</th>
<th align="left">Parameters</th>
<th align="left">Quantization</th>
<th align="left">Required RAM (GB)</th>
<th align="left">Recommended Minimum Jetson</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">DeepSeek-R1</td>
<td align="left">671B</td>
<td align="left">Dynamic-1.58-bit (MoE 1.5-bit + other layers 4–6-bit)</td>
<td align="left">159.03</td>
<td align="left">Not supported (≥128 GB and above)</td>
</tr>
<tr>
<td align="left">DeepSeek-R1 Distill-Qwen-1.5B</td>
<td align="left">1.5B</td>
<td align="left">Q4_K_M</td>
<td align="left">0.90</td>
<td align="left">Jetson Orin Nano 4 GB, Jetson Nano 4 GB</td>
</tr>
<tr>
<td align="left">DeepSeek-R1 Distill-Qwen-7B</td>
<td align="left">7B</td>
<td align="left">Q5_K_M</td>
<td align="left">5.25</td>
<td align="left">Jetson Orin Nano 8 GB, Jetson Orin NX 8 GB, Jetson Xavier NX 8 GB</td>
</tr>
<tr>
<td align="left">Qwen 2.5</td>
<td align="left">14B</td>
<td align="left">FP16</td>
<td align="left">33.60</td>
<td align="left">Jetson AGX Orin 64 GB, Jetson AGX Xavier 64 GB</td>
</tr>
<tr>
<td align="left">CodeLlama</td>
<td align="left">34B</td>
<td align="left">Q4_K_M</td>
<td align="left">20.40</td>
<td align="left">Jetson AGX Orin 32 GB, Jetson AGX Xavier 32 GB</td>
</tr>
<tr>
<td align="left">Llama 3.2 Vision</td>
<td align="left">90B</td>
<td align="left">Q5_K_M</td>
<td align="left">67.50</td>
<td align="left">Jetson AGX Thor (T5000) 128 GB</td>
</tr>
<tr>
<td align="left">Phi-3</td>
<td align="left">3.8B</td>
<td align="left">FP16</td>
<td align="left">9.12</td>
<td align="left">Jetson Orin NX 16 GB</td>
</tr>
</tbody>
</table>
</div>
</div></div></div></div>
<p>The post <a href="https://blog-en.openzeka.com/how-to-run-ollama-on-jetson-agx-thor-with-openwebui/">How to Run Ollama on Jetson AGX Thor with OpenwebUI?</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>NVIDIA JetPack 7.0: Powering the Next Generation of AI and Robotics at the Edge</title>
		<link>https://blog-en.openzeka.com/nvidia-jetpack-7-0-next-gen-ai-robotics-edge/</link>
		
		<dc:creator><![CDATA[admin]]></dc:creator>
		<pubDate>Wed, 27 Aug 2025 07:01:43 +0000</pubDate>
				<category><![CDATA[Getting Started]]></category>
		<guid isPermaLink="false">https://blog.aetherix.com/?p=1274</guid>

					<description><![CDATA[<p>NVIDIA has announced the release of JetPack™ 7.0, the  ... Continue Reading→</p>
<p>The post <a href="https://blog-en.openzeka.com/nvidia-jetpack-7-0-next-gen-ai-robotics-edge/">NVIDIA JetPack 7.0: Powering the Next Generation of AI and Robotics at the Edge</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-9 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling" style="--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-padding-right:0px;--awb-padding-left:0px;--awb-flex-wrap:wrap;" ><div class="fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap" style="max-width:1331.2px;margin-left: calc(-4% / 2 );margin-right: calc(-4% / 2 );"><div class="fusion-layout-column fusion_builder_column fusion-builder-column-11 fusion_builder_column_1_1 1_1 fusion-flex-column" style="--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;"><div class="fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column"><div class="fusion-text fusion-text-68"><p>NVIDIA has announced the release of JetPack™ 7.0, the latest and most advanced software stack for the <span style="color: #198fd9;"><a style="color: #198fd9;" href="https://blog-en.openzeka.com/nvidia-jetson-beginners-guide/">Jetson™</a></span> platform. Built to enable cutting-edge robotics and generative AI applications at the edge, JetPack 7 delivers an unprecedented foundation for developers building machines that interact with and understand the physical world.</p>
</div><div class="fusion-title title fusion-title-27 fusion-sep-none fusion-title-text fusion-title-size-two" style="--awb-font-size:20px;"><h2 class="fusion-title-heading title-heading-left" style="margin:0;font-size:1em;">A New Era for AI at the Edge</h2></div><div class="fusion-text fusion-text-69"><p>JetPack 7.0 redefines what’s possible with Jetson by providing ultra-low latency, deterministic performance, and scalable deployment. From humanoid robots to AI systems tackling the most demanding generative workloads, JetPack 7 ensures developers have the right tools and libraries to bring ideas to life.</p>
<p>Key to this release is full support for the NVIDIA Jetson Thor™ platform, featuring groundbreaking performance and next-generation AI capabilities. JetPack 7.0 also introduces:</p>
<ul>
<li>A preemptable real-time kernel for predictable system responsiveness.</li>
<li>Multi-Instance GPU (MIG) support, maximizing GPU utilization across workloads.</li>
<li>An integrated Holoscan Sensor Bridge, enabling seamless sensor-to-AI pipelines.</li>
</ul>
</div><div class="fusion-title title fusion-title-28 fusion-sep-none fusion-title-text fusion-title-size-two" style="--awb-font-size:20px;"><h2 class="fusion-title-heading title-heading-left" style="margin:0;font-size:1em;">Built for the Future: Modern OS and Cloud-Native Design</h2></div><div class="fusion-text fusion-text-70"><p>At its core, JetPack 7 is built on Linux Kernel 6.8 and Ubuntu 24.04 LTS, ensuring long-term stability and compatibility. Its modular, cloud-native architecture integrates the latest NVIDIA AI compute stack, making it easier than ever to align Jetson development with NVIDIA’s broader AI workflows.</p>
<p>For developers, this means seamless interoperability, whether building robotics systems in the lab or deploying generative AI at the edge</p>
</div><div class="fusion-title title fusion-title-29 fusion-sep-none fusion-title-text fusion-title-size-two" style="--awb-font-size:20px;"><h2 class="fusion-title-heading title-heading-left" style="margin:0;font-size:1em;">Aligning with Industry Standards: SBSA Architecture</h2></div><div class="fusion-text fusion-text-71"><p>JetPack 7 also marks a major milestone in aligning Jetson with industry standards through the Server Base System Architecture (SBSA). By adopting SBSA:</p>
<ul>
<li>Jetson Thor is now positioned alongside ARM server-class systems.</li>
<li>Developers benefit from stronger OS support, simplified software portability, and smoother enterprise integration.</li>
<li>CUDA 13.0 is now unified across all ARM targets, streamlining development and reducing fragmentation.</li>
</ul>
<p>This alignment ensures consistency from server-class systems to Jetson Thor, bridging the gap between edge and enterprise AI.</p>
</div><div class="fusion-title title fusion-title-30 fusion-sep-none fusion-title-text fusion-title-size-two" style="--awb-font-size:20px;"><h2 class="fusion-title-heading title-heading-left" style="margin:0;font-size:1em;">What’s New in Jetson Linux 38.2</h2></div><div class="fusion-text fusion-text-72"><p>JetPack 7.0 is powered by Jetson Linux 38.2, which brings a host of enhancements:</p>
<ul>
<li>Based on Ubuntu 24.04 LTS and Linux Kernel v6.8 LTS.</li>
<li>Support for the Jetson AGX Thor Developer Kit and Jetson T5000 module.</li>
<li>OpenRM-based stack architecture.</li>
<li>Updated AI compute libraries: CUDA 13, cuDNN 9.12, and TensorRT 10.13.</li>
<li>CoE (CSI over Ethernet) support via the Holoscan Sensor Bridge, enabling plug-and-play with the Eagle Camera Sensor Module LI-VB1940.</li>
<li>Optimized support for CSI/GMSL via Argus and CoE via SIPL Camera API.</li>
<li>NVIDIA-optimized preemptable real-time kernel for deterministic performance.</li>
</ul>
</div><div class="fusion-title title fusion-title-31 fusion-sep-none fusion-title-text fusion-title-size-two" style="--awb-font-size:20px;"><h2 class="fusion-title-heading title-heading-left" style="margin:0;font-size:1em;">Supported Hardware</h2></div><div class="fusion-text fusion-text-73"><p>NVIDIA JetPack 7.0 launches with support for the latest Jetson platforms:</p>
<ul>
<li>Jetson AGX Thor Developer Kit</li>
<li>Jetson T5000</li>
</ul>
<p>Developers using these devices can immediately take advantage of JetPack 7’s new capabilities for robotics, AI, and sensor-driven applications.</p>
</div><div class="fusion-title title fusion-title-32 fusion-sep-none fusion-title-text fusion-title-size-two" style="--awb-font-size:20px;"><h2 class="fusion-title-heading title-heading-left" style="margin:0;font-size:1em;">Important Notes for Developers</h2></div><div class="fusion-text fusion-text-74"><ul>
<li>Manual flashing instructions have been updated due to SBSA architecture adoption—developers should carefully follow the updated guide.</li>
<li>For reinstallation using an ISO, refer to the Getting Started Guide to avoid issues.</li>
</ul>
</div><div class="fusion-title title fusion-title-33 fusion-sep-none fusion-title-text fusion-title-size-two" style="--awb-font-size:20px;"><h2 class="fusion-title-heading title-heading-left" style="margin:0;font-size:1em;">Conclusion</h2></div><div class="fusion-text fusion-text-75"><p>With JetPack 7.0, NVIDIA is setting a new standard for AI-powered edge computing. By combining next-generation hardware support, industry-standard architectures, and an updated AI stack, JetPack 7 delivers everything developers need to push the boundaries of robotics and generative AI.</p>
<p>For full technical details, developers should review the Jetson Linux 38.2 <span style="color: #198fd9;"><a style="color: #198fd9;" href="https://docs.nvidia.com/jetson/archives/r38.2/ReleaseNotes/Jetson_Linux_Release_Notes_r38.2.pdf" target="_blank" rel="noopener">release notes</a></span>.</p>
</div></div></div></div></div>
<p>The post <a href="https://blog-en.openzeka.com/nvidia-jetpack-7-0-next-gen-ai-robotics-edge/">NVIDIA JetPack 7.0: Powering the Next Generation of AI and Robotics at the Edge</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>NVIDIA Jetson Thor: A Next-Generation Platform for Edge AI</title>
		<link>https://blog-en.openzeka.com/nvidia-jetson-thor-a-next-generation-platform-for-edge-ai/</link>
		
		<dc:creator><![CDATA[admin]]></dc:creator>
		<pubDate>Mon, 25 Aug 2025 15:35:08 +0000</pubDate>
				<category><![CDATA[Getting Started]]></category>
		<guid isPermaLink="false">https://blog.aetherix.com/?p=1263</guid>

					<description><![CDATA[<p>The NVIDIA Jetson platform is a family of compact, hig ... Continue Reading→</p>
<p>The post <a href="https://blog-en.openzeka.com/nvidia-jetson-thor-a-next-generation-platform-for-edge-ai/">NVIDIA Jetson Thor: A Next-Generation Platform for Edge AI</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div class="fusion-fullwidth fullwidth-box fusion-builder-row-10 fusion-flex-container has-pattern-background has-mask-background nonhundred-percent-fullwidth non-hundred-percent-height-scrolling" style="--awb-border-radius-top-left:0px;--awb-border-radius-top-right:0px;--awb-border-radius-bottom-right:0px;--awb-border-radius-bottom-left:0px;--awb-padding-right:0px;--awb-padding-left:0px;--awb-flex-wrap:wrap;" ><div class="fusion-builder-row fusion-row fusion-flex-align-items-flex-start fusion-flex-content-wrap" style="max-width:1331.2px;margin-left: calc(-4% / 2 );margin-right: calc(-4% / 2 );"><div class="fusion-layout-column fusion_builder_column fusion-builder-column-12 fusion_builder_column_1_1 1_1 fusion-flex-column" style="--awb-bg-size:cover;--awb-width-large:100%;--awb-margin-top-large:0px;--awb-spacing-right-large:1.92%;--awb-margin-bottom-large:20px;--awb-spacing-left-large:1.92%;--awb-width-medium:100%;--awb-order-medium:0;--awb-spacing-right-medium:1.92%;--awb-spacing-left-medium:1.92%;--awb-width-small:100%;--awb-order-small:0;--awb-spacing-right-small:1.92%;--awb-spacing-left-small:1.92%;"><div class="fusion-column-wrapper fusion-column-has-shadow fusion-flex-justify-content-flex-start fusion-content-layout-column"><div class="fusion-text fusion-text-76"><p>The <span style="color: #198fd9;"><a style="color: #198fd9;" href="https://blog-en.openzeka.com/nvidia-jetson-beginners-guide/">NVIDIA Jetson</a></span> platform is a family of compact, high-performance computer modules built to bring AI to the edge, enabling everything from robotics and autonomous vehicles to industrial automation. Each Jetson module integrates powerful GPUs with ARM-based processors, allowing autonomous machines—such as robots, unmanned vehicles, and intelligent sensors—to operate with speed and precision directly where the data is generated.</p>
</div><div class="fusion-title title fusion-title-34 fusion-sep-none fusion-title-text fusion-title-size-two" style="--awb-font-size:20px;"><h2 class="fusion-title-heading title-heading-left" style="margin:0;font-size:1em;">Why Edge AI and Robotics Matter</h2></div><div class="fusion-text fusion-text-77"><p>In robotics and other autonomous systems, milliseconds can determine success or failure—whether it’s avoiding an obstacle, making a precision movement, or responding to a critical safety event. Edge AI addresses this by processing data locally, reducing latency for real-time decision-making and ensuring systems keep running even without internet connectivity. This local processing also protects privacy by keeping sensitive information on-device and reduces network load, saving both bandwidth and operating costs.</p>
<p>As robots and intelligent machines take on more complex tasks, from multi-sensor fusion to running large AI models, their need for computing power grows rapidly. Next-generation edge platforms must not only deliver ultra-low latency and high throughput but also support advanced AI workloads like generative AI, vision-language understanding, and autonomous navigation—all in compact, energy-efficient form factors.</p>
</div><div class="fusion-title title fusion-title-35 fusion-sep-none fusion-title-text fusion-title-size-two" style="--awb-font-size:20px;"><h2 class="fusion-title-heading title-heading-left" style="margin:0;font-size:1em;">Why Jetson Thor?</h2></div><div class="fusion-text fusion-text-78"><p>The latest member of the Jetson family, Jetson Thor, was developed to meet the growing demand for greater computing power to support next-generation humanoid robots, autonomous systems and large AI models running directly on-device.</p>
<p>Built for demanding workloads such as generative AI and multi-sensor fusion, Jetson Thor delivers up to 7.5× higher AI performance and 3.5× better energy efficiency than AGX Orin. While AGX Orin offered ~275 TOPS, Jetson Thor surpassed it dramatically, reaching 2,070 TFLOPS (FP4). This leap in performance allows developers to run larger deep learning models, process more sensors in parallel, and achieve faster real-time control—making Jetson Thor a better choice for edge AI.</p>
</div><div class="fusion-title title fusion-title-36 fusion-sep-none fusion-title-text fusion-title-size-two" style="--awb-font-size:20px;"><h2 class="fusion-title-heading title-heading-left" style="margin:0;font-size:1em;">A New Class of Robotic Computing</h2></div><div class="fusion-text fusion-text-79"><p>Jetson AGX Thor redefines robotic intelligence, delivering the power and efficiency needed to bring next-generation humanoid robots to life. It supports a wide range of generative AI models—from Cosmos Reason, DeepSeek, Llama, Gemini, and Qwen to domain-specific robotics models like NVIDIA Isaac™ GR00T N1.5—enabling any developer to easily experiment and run inference locally, whether with Vision Language Action (VLA) models, popular LLMs, or VLMs.</p>
<p>To deliver a seamless cloud-to-edge experience, Jetson AGX Thor runs the NVIDIA AI software stack for physical AI applications, including:</p>
<ul>
<li>NVIDIA Isaac for robotics,</li>
<li>NVIDIA Metropolis for visual agentic AI, and</li>
<li>NVIDIA Holoscan for sensor processing.</li>
</ul>
<p>It also enables the creation of AI agents directly at the edge using NVIDIA agentic AI workflows such as Video Search and Summarization (VSS).</p>
</div><div class="fusion-title title fusion-title-37 fusion-sep-none fusion-title-text fusion-title-size-two" style="--awb-font-size:20px;"><h2 class="fusion-title-heading title-heading-left" style="margin:0;font-size:1em;">Video Search and Summarization: A New Edge AI Milestone</h2></div><div class="fusion-text fusion-text-80"><p>Video has become one of the most valuable sources of information in today’s connected world, powering everything from security monitoring to industrial inspection, retail analytics, and healthcare diagnostics. However, the sheer volume of video data is overwhelming—hours of footage must often be reviewed just to find a few seconds of relevant events. Without automation, this process is slow, expensive, and prone to human error.</p>
<p>This is where Video Search and Summarization (VSS) comes in. VSS is a powerful generative AI application designed to streamline the development of intelligent video analytics agents. Built on the NVIDIA AI Blueprint for video search and summarization, it combines vision-language models (VLMs), large language models (LLMs), and advanced computer vision to:</p>
<ul>
<li>Search across multiple live streams and recorded files instantly.</li>
<li>Summarize video content with contextualized insights.</li>
<li>Provide interactive Q&amp;A about video footage.</li>
<li>Deliver real-time alerts and notifications for critical events.</li>
<li>Support audio-based cues for more comprehensive situational understanding.</li>
<li>Offer REST APIs for easy integration into existing systems.</li>
</ul>
<p>Previously, workloads of this scale were only practical in cloud environments. Now, thanks to Jetson Thor’s AI compute capacity, VSS can run at the edge. This means organizations can perform real-time, privacy-preserving video analytics without relying on internet connectivity—ideal for mission-critical applications in security, manufacturing, smart cities, and more.</p>
</div><div class="fusion-title title fusion-title-38 fusion-sep-none fusion-title-text fusion-title-size-two" style="--awb-font-size:20px;"><h2 class="fusion-title-heading title-heading-left" style="margin:0;font-size:1em;">Jetson Thor Technical Specifications</h2></div><div class="fusion-text fusion-text-81"><p>Jetson Thor sits at the top of the Jetson family in terms of hardware capabilities.</p>
<ul>
<li><strong>CPU:</strong> 14-core Arm Neoverse-V3AE processor for high-performance multitasking and real-time operations.</li>
<li><strong>GPU:</strong> NVIDIA’s next-generation Blackwell architecture GPU with 2,560 CUDA cores and 96 Tensor Cores, delivering massive AI compute performance.</li>
<li><strong>MIG (Multi-Instance GPU):</strong> Enables secure, isolated execution of multiple AI workloads by partitioning GPU resources in hardware.</li>
<li><strong>Specialized Accelerators:</strong> Includes a 3rd-gen Programmable Vision Accelerator (PVA), dual video encoders/decoders, and an optical flow accelerator to offload processing from CPU/GPU.</li>
<li><strong>Memory:</strong> 128 GB LPDDR5X RAM with a 256-bit interface and ~273 GB/s bandwidth—ideal for large AI models and high-resolution sensor data.</li>
<li><strong>Power:</strong> Configurable from 40 W to 130 W, supporting both low-power embedded use cases and full-performance operation.</li>
<li><strong>Connectivity:</strong>
<ul>
<li>QSFP slot with 4× 25 GbE high-bandwidth Ethernet for streaming data from multiple cameras or lidars.</li>
<li>Additional Multi-Gigabit Ethernet (RJ-45), multiple USB 3.2 ports, DisplayPort/HDMI outputs, and industrial interfaces like CAN/UART.</li>
</ul>
</li>
</ul>
<p>Jetson Thor runs on Ubuntu 24.04 LTS with the new JetPack 7.0 SDK, ensuring compatibility with NVIDIA’s latest AI software stack, including CUDA and TensorRT.</p>
</div><div class="fusion-title title fusion-title-39 fusion-sep-none fusion-title-text fusion-title-size-two" style="--awb-font-size:20px;"><h2 class="fusion-title-heading title-heading-left" style="margin:0;font-size:1em;">Application Areas</h2></div><div class="fusion-text fusion-text-82"><p><strong>1. Autonomous Systems (Vehicles &amp; Robots)</strong><br />
Processes LIDAR, camera, and radar data simultaneously to enable precise perception and safe decision-making. Humanoid robots and drones can perform real-time localization, SLAM, and obstacle detection with greater speed and accuracy.</p>
<p><strong>2. Smart Cities &amp; Security</strong><br />
Analyzes 24/7 video streams from city surveillance systems locally—without sending data to the cloud—for instant traffic management, crowd control, and threat detection. Supports real-time analysis of 4K/8K video feeds.</p>
<p><strong>3. Industrial Automation</strong><br />
Enhances factory robotics, production line cameras, and inspection systems for defect detection, quality control, and predictive maintenance. Built for reliability in demanding industrial environments.</p>
<p><strong>4. Healthcare Technologies</strong><br />
It powers AI-driven medical devices such as portable MRI and ultrasound systems to process images directly on-device for instant diagnostics. Surgical robots can benefit from real-time imaging and enhanced precision, while patient monitoring systems can safeguard privacy by keeping all data processing local.</p>
<p>Beyond these, Jetson Thor can power AI research labs, smart retail systems, agricultural automation, and much more—accelerating the shift toward smarter, faster, and more autonomous edge systems.</p>
<p><strong>In summary:</strong> Jetson Thor brings supercomputer-class AI performance to the edge, making it possible to run previously cloud-only applications like Video Search and Summarization locally, in real time, and with full data privacy. This opens the door to faster, smarter, and more autonomous machines across every industry.</p>
</div></div></div></div></div>
<p>The post <a href="https://blog-en.openzeka.com/nvidia-jetson-thor-a-next-generation-platform-for-edge-ai/">NVIDIA Jetson Thor: A Next-Generation Platform for Edge AI</a> appeared first on <a href="https://blog-en.openzeka.com">OpenZeka EN Blog</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
