OpenZeka EN Blog

NVIDIA DGX Spark vs NVIDIA Jetson Thor

Betül Kaya — Tue, 23 Dec 2025 13:58:02 +0000

One of the most common mistakes made when developing artificial intelligence systems is evaluating hardware designed for different purposes as if they were meant to solve the same problem. Although NVIDIA DGX Spark and NVIDIA Jetson Thor—two of NVIDIA’s recently prominent products—are often compared due to their similar names and emphasis on high performance, they are in fact two entirely different platforms designed to solve completely different problems.

The purpose of this article is to clearly highlight the differences between DGX Spark and Jetson Thor and to make the following distinction explicit at the end:

DGX Spark is designed for developing, training, and testing artificial intelligence models.
Jetson Thor, on the other hand, is designed to run these models in the real world, on robots and physical systems.

What is NVIDIA DGX Spark?

NVIDIA DGX Spark is a compact AI supercomputer positioned in a desktop form factor, designed to enable the development and execution of artificial intelligence models entirely in a local environment. At the heart of the system is the Grace Blackwell GB10 Superchip, which combines NVIDIA’s Grace CPU and Blackwell GPU architectures into a single chip. Thanks to this architectural integration, DGX Spark delivers up to 1 petaflop of AI computing performance along with 128 GB of high-bandwidth unified HBM3e memory. This makes it an extremely powerful local development platform for large language models and generative AI workloads.

Each DGX Spark can operate independently as a fully capable AI workstation. When two Spark devices are connected together, the system reaches a unified memory capacity of 256 GB, transforming into an expanded AI node capable of handling models with up to 405 billion parameters. While pairing a maximum of two units is currently supported, NVIDIA states that this limit may be increased in the future through software updates.

DGX Spark aims to reduce reliance on the cloud or data centers by enabling the following workloads to be performed entirely in a local environment.

DGX Spark Use Cases

Fine-Tuning
DGX Spark provides a powerful fine-tuning platform, especially for organizations working with enterprise, sensitive, or regulated data. In sectors such as finance, healthcare, defense, or law, large language models, image recognition systems, or task-specific AI models can be fine-tuned entirely locally without data leaving the organization. This approach ensures compliance with GDPR regulations and eliminates intellectual property risks.

Inference and Local AI Services
DGX Spark enables low-latency, high-efficiency inference of trained models in desktop or local server environments. Chatbots, document analysis systems, visual inspection applications, or decision support systems can run in real time without relying on the cloud. As a result, performance improves while network dependency and data transfer risks are eliminated.

Data Science and Analytics Workloads
For data scientists working with large datasets, DGX Spark consolidates data cleaning, model training, and evaluation steps into a single powerful platform. Thanks to GPU-accelerated computing, complex statistical analyses, simulations, and machine learning pipelines can be completed much faster. This provides a significant speed advantage, especially for Proof of Concept (PoC) and pilot projects.

Transition from Cloud to Desktop and Desktop to Cloud
DGX Spark is designed to be fully compatible with the NVIDIA ecosystem. After developing and testing a model on DGX Spark, you can move it to DGX Cloud or other accelerated cloud infrastructures using the same codebase and software stack with little to no modification. This approach offers great flexibility for organizations adopting hybrid AI strategies.

Working with Secure and Sensitive Data
DGX Spark is an ideal solution for scenarios where data must remain within the organization. Sensitive customer data, internal company documents, or confidential R&D outputs can be processed and modeled locally without being uploaded to the cloud. This reduces cybersecurity risks and simplifies regulatory compliance.

Education, Academic, and Enterprise AI Laboratories
For universities, research centers, and corporate AI teams, DGX Spark functions as a compact yet extremely powerful “AI laboratory.” Students and engineers can gain hands-on experience working with large-scale models on real hardware and develop scenarios that are much closer to production environments.

What is NVIDIA Jetson Thor?

NVIDIA Jetson Thor is a high-performance edge AI platform developed for Physical AI, robotics, and autonomous systems. The core objective of Jetson Thor is to run large language models (LLMs), vision-language models (VLMs), and vision-language-action (VLA) models in real time with low latency and high energy efficiency. In this respect, Thor is positioned as the central “brain” of a robot or autonomous system, responsible for decision-making and action execution.
Thanks to its Blackwell-based architecture, Jetson Thor delivers up to 2,070 TFLOPS (FP4 – sparsity-enabled) of AI computing performance, making it possible to deploy advanced models developed at data-center scale directly in edge environments. The Jetson Thor module family is optimized for Physical AI and robotics applications, combining high performance with a flexible power profile: configurable power consumption between 40 W and 130 W, along with up to 128 GB of memory.

This powerful hardware foundation allows LLM, VLM, and VLA models to run concurrently in a deterministic, low-latency manner. Its high energy efficiency makes Jetson Thor an ideal solution for 24/7 autonomous systems, robotic platforms, and mission-critical edge AI applications.

The platform is optimized to process multiple data streams simultaneously from cameras, LiDAR, radar, and other sensors, enabling the entire perception–decision–action loop to be closed fully at the edge. Jetson Thor’s architecture targets continuously operating, time-sensitive systems that interact with the real world, rather than desktop- or data-center-oriented development environments.

In short, Jetson Thor is not a platform for developing AI models; it is an edge AI solution designed to run already developed models in the field, in the physical world, and in real time. Especially in robotics, autonomous vehicles, and Physical AI scenarios, it serves as a foundational building block for modern autonomous systems by unifying high computational power, low latency, sensor integration, and energy efficiency in a single platform.

Jetson Thor’s high computational performance and extensive I/O capabilities make it an ideal solution across a wide range of industries. Below are some of the potential application areas of Jetson Thor:

Autonomous Systems (Vehicles and Robots)
By processing LiDAR, camera, and radar data simultaneously, Jetson Thor enables autonomous vehicles to perceive their environment and make safe decisions. Humanoid robots and unmanned aerial vehicles (UAVs) can also perform tasks such as real-time localization, mapping (SLAM), and obstacle detection more efficiently with Jetson Thor.
Smart Cities and Public Safety
Jetson Thor can analyze 24/7 video streams from city surveillance cameras locally, without relying on the cloud. This enables instant traffic management, crowd monitoring, and detection of security threats. Thanks to its high memory capacity, Jetson Thor can analyze 4K/8K video streams in real time for smart city applications.
Industrial Automation
When integrated into robotic arms or camera systems on production lines, Jetson Thor enables AI-driven tasks such as defect detection, quality control, and predictive maintenance to be performed in real time. Its rugged design and long-lifecycle industrial variants ensure reliable operation in harsh industrial environments.
Healthcare Technologies
Medical devices and innovative healthcare systems can also benefit from Jetson Thor’s capabilities. For example, a portable MRI or ultrasound device can process images locally using AI to deliver instant diagnostic insights. When equipped with Jetson Thor, surgical robots can perform real-time image processing and precise control during operations. In addition, patient monitoring systems can process data locally while preserving privacy.
Security and Surveillance
Smart security cameras can perform deep learning–based tasks such as facial recognition or threat detection in real time using Jetson Thor. This enhances security while reducing network traffic in environments such as banks, airports, and critical infrastructure. The system can detect suspicious situations on-site and send immediate alerts to security personnel.

Feature	NVIDIA DGX Spark	NVIDIA Jetson Thor
Primary Purpose	AI development, training, testing	Robotics and Physical AI Inference
Deployment Environment	Desktop / Office / Lab	Edge / Robot / Autonomous systems
LLM Prefill Performance	Very high (compute-bound)	Optimized for edge
Power Consumption	High	Low and energy-efficient
Real-Time Operation	Not a priority	Critical requirement
Sensor Integration	None	Camera, LIDAR, radar etc.
Target User	AI developers, data scientists	Robotics and embedded systems developers

If your goal is Physical AI, robotics, autonomous driving, and edge inference:

Jetson Thor is specifically designed for this purpose and is the right choice.
If you need AI model development, training, testing, fine-tuning, and high-performance local computation.
DGX Spark is purpose-built exactly for these needs.

For large-scale organizations, these two products are not competitors but complementary: You develop the model on DGX Spark and deploy it into the real world on Jetson Thor.

The post NVIDIA DGX Spark vs NVIDIA Jetson Thor appeared first on OpenZeka EN Blog.

NVIDIA DGX Spark: Bringing Data Center Power to Your Desk

admin — Mon, 27 Oct 2025 07:02:26 +0000

Artificial intelligence is entering a new era—one where supercomputing performance is no longer confined to massive data centers. The NVIDIA DGX Spark, unveiled by NVIDIA, embodies this transformation. It’s a compact, AI-focused workstation that lets developers, researchers, and innovators harness data center-grade power right from their desks.

What Is NVIDIA DGX Spark?

The NVIDIA DGX Spark is a compact, single-user AI development and inference system powered by the Grace Blackwell GB10 Superchip—a seamless fusion of NVIDIA’s Grace CPU and Blackwell GPU architectures. This powerful combination delivers up to one petaflop of AI compute and 128 GB of unified high-bandwidth memory (HBM3e) per unit.

Each Spark functions as a self-contained AI powerhouse, but it gets even more impressive when two units are linked together, effectively operating as a single expanded AI node with 256 GB of unified memory and the ability to handle up to 405 billion model parameters. At present, the configuration supports only two systems, though NVIDIA has indicated that broader scalability may be possible in future software updates.

Despite its small form factor, DGX Spark comes fully equipped with NVIDIA’s comprehensive AI software stack, including CUDA, CUDA-X AI, AI Workbench, and integrated support for NVIDIA toolkits such as Isaac Sim, Metropolis, and NeMo. In essence, it’s a mini data center on your desk—delivering enterprise-level AI performance in a workstation-sized footprint.

Why It Matters

Developers often struggle with limited GPU memory and costly cloud resources. DGX Spark eliminates those constraints by offering local access to large GPU memory and NVIDIA’s entire AI ecosystem—without the complexity or expense of managing data center infrastructure.

This accessibility empowers AI researchers, developers, students, and data scientists to prototype, fine-tune, and test massive models directly on their desktops. Tasks like data science, model inference, computer vision, and robotics become faster, cheaper, and more secure.

Who Should Use It

DGX Spark is built for AI developers and innovators who need high performance and flexibility but don’t have access to large-scale compute clusters. It’s ideal for:

Developers building or fine-tuning large language models (LLMs)
Researchers experimenting with edge and robotics applications
Students learning with real-world AI tools
Organizations looking to augment existing cloud or workstation setups

Essentially, if your local GPU can’t handle the memory demands of your model—or cloud costs are slowing you down—DGX Spark fills that gap.

DGX Spark vs. RTX Pro 6000 and RTX 5090

While the RTX 5090 and RTX Pro 6000 Blackwell offer higher raw compute power (up to four petaflops vs. Spark’s one), they are limited by GPU memory. The RTX Pro 6000, for instance, has 96 GB VRAM, compared to Spark’s 128 GB unified memory. This means that for smaller, compute-heavy workloads, an RTX Pro or 5090 is ideal—but for large models that exceed GPU memory, Spark performs better, as it can handle models that would otherwise crash or slow dramatically on traditional GPUs.

In short:

RTX 5090 / 6000 Pro → More compute, less memory
DGX Spark → Slightly less compute, much larger memory + full NVIDIA AI stack integration

The Future of Local AI Development

NVIDIA DGX Spark represents the democratization of AI supercomputing. For the first time, researchers, developers, and creators can access petaflop-level performance from a system that fits under a desk.

As the AI landscape grows increasingly complex, DGX Spark provides the missing middle ground—more power than a desktop GPU, more freedom than the cloud. Whether you’re building LLMs, robotics solutions, or next-gen visual AI applications, Spark lets you do it faster, locally, and securely.

The post NVIDIA DGX Spark: Bringing Data Center Power to Your Desk appeared first on OpenZeka EN Blog.

HammerBench : AGX Thor’s Power Meets Ollama

Enhar — Wed, 17 Sep 2025 13:28:53 +0000

What is an LLM benchmark and why is it important?

LLM benchmarks are standardized tests designed to measure how fast, efficient, and accurate large language models (LLMs) perform across different hardware and environments. These tests evaluate metrics such as latency, throughput, and sometimes accuracy to provide an objective view of performance.

As LLMs continue to grow larger and more complex, choosing the right hardware to run them on becomes a critical decision. Benchmark results are essential to understand which device or infrastructure delivers better performance, to balance cost and efficiency, and to identify the most suitable solution for real-world use cases. In short, LLM benchmarks give both researchers and developers a clear roadmap of how models perform in practice.

To showcase the performance of Jetson AGX Thor, we are sharing our results and performance charts with you. At the same time, you can also run benchmarks across different GPU types to compare and validate performance for your own workloads. If you want to measure the performance metrics of your own devices and test your models under real-world conditions, get in touch with us. With our solution, your measurements turn into more than just numbers — they become actionable insights that drive strategic decisions.

How to use HammerBench ?

🖥️ What the App Does

This is a Streamlit-based LLM Benchmark Tool interface designed to evaluate large language models (LLMs) on NVIDIA Jetson AGX Thor hardware using Ollama as the backend.

⚙️ Configuration (Left Sidebar)

GPU Information:
Detects if the device is a Jetson (in this case, a Jetson AGX Thor Developer Kit).
Shows details about the GPU (NVIDIA Jetson AGX Thor) and available memory (125,772 MB ≈ 122.8 GB).

Use Only GPU:

A checkbox option that allows restricting benchmarks to GPU-only execution.

📊 Main Panel

Title: LLM Benchmark Tool with description: Benchmark LLM models using Ollama with real-time progress tracking.

Models Compatible with GPU memory (VRAM) requirements:

Displays a table of available models (llama3.2.1b, gemma3.4b, qwen3.14b, gpt-oss20b, etc.)
Shows how much memory (VRAM in GB) each model requires.
Marks them with ✅ if they are runnable on the detected GPU.

Select Models to Benchmark:

Lists the same models with checkboxes so the user can pick which ones to run benchmarks on.
Each option shows the memory requirement for clarity (e.g., gemma3.27b (17 GB), gpt-oss-120b (65 GB)).

🚀 Purpose

The tool helps developers and researchers:

See which LLMs are compatible with their GPU memory.
Select multiple models and run benchmarks to measure performance (latency, throughput, GPU utilization).
Use the results to compare models and make better deployment or scaling decisions.

Sorry, your browser doesn't support embedded videos.

The post HammerBench : AGX Thor’s Power Meets Ollama appeared first on OpenZeka EN Blog.

How to Run Llama.cpp Server on Jetson AGX Thor?

Enhar — Fri, 12 Sep 2025 10:44:53 +0000

Llama.cpp Server on Jetson AGX Thor: Unlocking Edge AI with Large Language Models

Llama.cpp Server is a lightweight, high-performance runtime for large language models (LLMs), designed to run efficiently on both CPU and GPU. Built in C++, it eliminates unnecessary overhead and delivers deep hardware-level optimizations. By supporting the GGUF model format, it allows for quantization, drastically reducing memory requirements while maintaining accuracy. Through its REST API, Llama.cpp Server can be seamlessly integrated into applications, enabling developers to bring advanced LLM capabilities directly to devices—without relying on the cloud.

When deployed on NVIDIA Jetson AGX Thor, the advantages become even more compelling:

GPU acceleration with CUDA ensures that the Thor’s compute power is fully utilized, bringing real-time inference to the edge.
Optimized for edge AI use cases such as robotics, autonomous systems, and industrial automation, it provides ultra-low latency decision-making.
Resource efficiency via quantization makes it possible to run models from 7B up to 13B parameters within the limited memory budgets typical of embedded devices.

By combining Llama.cpp Server with Jetson AGX Thor, organizations gain a powerful platform for on-device AI that is private, fast, and cost-effective. No data needs to leave the device, latency is minimized, and the system remains fully adaptable to both prototyping and production scenarios. Supported by an open-source ecosystem, this pairing represents a breakthrough for deploying large language models securely and efficiently at the edge.

Requirements

JetPack 7 (Learn more about JetPack)
CUDA 13
At least 10 GB of free disk space (Only for the Llama Server image, not for the models.)
A stable and fast internet connection

How to use Llama.cpp Server ?

Firstly download the image ;

Copy to Clipboard

Then, download the model from Hugging Face. If the model requires access, log in with your token by running:

Copy to Clipboard

Then, install the required Python dependencies with the following command:

Copy to Clipboard

This command set downloads the NVIDIA NVPL local repository package, installs it, adds the signing key to the system, and then installs the NVPL library via apt-get.

Copy to Clipboard

This command takes the Qwen2.5-VL-3B-Instruct model downloaded from Hugging Face (inside the snapshot folder identified by ), and uses the convert_hf_to_gguf.py tool to convert the Hugging Face weights (safetensors/PyTorch) into GGUF format, saving the output as /data/models/Qwen3-4B-Instruct-2507-f16.gguf.

Copy to Clipboard

This command takes the full-precision GGUF model (Qwen3-4B-Instruct-2507-f16.gguf) and runs it through llama-quantize to produce a quantized version (Qwen3-4B-Instruct-2507-q4_k_m.gguf) using the q4_k_m quantization method.

Input file: /data/models/Qwen3-4B-Instruct-2507-f16.gguf (the FP16 model converted from Hugging Face).
Output file: /data/models/Qwen3-4B-Instruct-2507-q4_k_m.gguf (smaller, quantized model).
Quantization type: q4_k_m → a 4-bit quantization scheme optimized for speed and memory efficiency.

Copy to Clipboard

This command launches the llama.cpp server so the quantized model can be served via an HTTP API.

Copy to Clipboard

And thats it ! You can start chatting .

The post How to Run Llama.cpp Server on Jetson AGX Thor? appeared first on OpenZeka EN Blog.

How to Run MLC LLM on Jetson AGX Thor?

Enhar — Tue, 09 Sep 2025 10:21:01 +0000

What is MLC LLM ?

MLC LLM (Machine Learning Compilation for Large Language Models) is an open-source project designed to make large language models (LLMs) run efficiently across different hardware platforms. Its main goal is to optimize performance and reduce energy consumption, enabling AI applications to run not only in the cloud but also on edge devices.

NVIDIA’s next-generation Jetson AGX Thor platform delivers powerful computing capabilities for robotics, autonomous systems, and AI-driven applications. By leveraging MLC LLM on Jetson AGX Thor, large language models can be optimized to run in real time, supporting tasks such as natural language processing, decision-making, and human-like interaction with higher efficiency.

In short, MLC LLM on Jetson AGX Thor acts as a bridge that brings high-performance large language model capabilities to edge devices.

Requirements

JetPack 7 (Learn more about JetPack)
CUDA 13
At least 25 GB of free disk space (Only for the MLC LLM image, not for the models.)
A stable and fast internet connection

How to use MLC LLM ?

First, install the Docker image on your computer:

Copy to Clipboard

If you’d like to explore the available images or replace them with newer ones, you can visit the GitHub Container Registry.

Once inside the container, find the model you want to download from Hugging Face.
Use the hf download command inside the container to download the model.

For example:

Copy to Clipboard

In the next step, provide the folder where you downloaded the model and run the command below.
This command converts the model’s original Hugging Face weights (in safetensor format) into the optimized MLC LLM format. During conversion, the weights are quantized (e.g., to q4bf16_1), which reduces memory usage and improves runtime efficiency on GPU without heavily sacrificing accuracy.

In short, mlc_llm convert_weight takes the raw model checkpoint and transforms it into a format that can be directly executed by the MLC runtime on your target device (e.g., Jetson AGX Thor with CUDA).

⚠️ Warning: In the command, replace in snapshots// with the actual folder name you see inside the snapshots directory (e.g., aeb13307a71acd8fe81861d94ad54ab689df…). This folder contains the real model files such as config.json, tokenizer.json, and model.safetensors, which are required for the mlc_llm convert_weight command to work.

Copy to Clipboard

In the next step , gen_config generates the configuration files needed to run the converted model in MLC. It defines the conversation template (e.g., Qwen format), context length, batch size, and other runtime parameters. In short, it makes the weight-converted model fully executable in the MLC runtime.

Copy to Clipboard

⚠️ Note: The “Not found” messages for files like tokenizer.model or added_tokens.json are not errors. These files are optional and not required by all models. As long as tokenizer.json, vocab.json, and merges.txt are found and copied, the model configuration is complete and ready to run.

Now that the configuration is ready, we can move on to the compilation step. In this stage, the model is compiled into a CUDA-optimized shared library (.so file), which enables fast execution on the GPU.

Copy to Clipboard

With the compilation complete, the final step is to serve the model so it can handle inference requests. The mlc_llm serve command launches an HTTP server that exposes the model as an API endpoint, making it accessible for testing or integration into applications.

Copy to Clipboard

If you see this output, it means the model has been successfully compiled and serving .

You can test it with this curl request ;

Copy to Clipboard

Which Jetson should I choose for my LLM model?

Below, you can find the RAM requirements of the most popular LLM models along with Jetson recommendations that meet the minimum specifications to run them. You can choose the one that best fits your needs.

Model	Parameters	Quantization	Required RAM (GB)	Recommended Minimum Jetson
deepseek-ai Deepseek-R1 Base	684B	Dynamic-1.58-bit	162.11	Not supported (≥128 GB and above)
deepseek-ai Deepseek-R1 Distill-Qwen-1.5B	1.5B	Q4_K_M	0.90	Jetson Orin Nano 4 GB, Jetson Nano 4 GB
deepseek-ai Deepseek-R1 Distill-Qwen-7B	7B	Q5_K_M	5.25	Jetson Orin Nano 8 GB, Jetson Orin NX 8 GB, Jetson Xavier NX 8 GB
mistralai Mixtral 8x22B-Instruct-v0.1	22B	Q4_K_M	13.20	Jetson Orin NX 16 GB, Jetson AGX Orin 32 GB, Jetson AGX Xavier 32 GB
mistralai Mathstral 7B-v0.1	7B	Q5_K_M	5.25	Jetson Orin Nano 8 GB, Jetson Orin NX 8 GB, Jetson Xavier NX 8 GB
google gemma-3 12b-it	12B	Q4_K_M	7.20	Jetson Orin NX 8 GB, Jetson Orin Nano 8 GB, Jetson Xavier NX 8 GB
meta-llama Llama-3.1 70B-Instruct	70B	Q5_K_M	52.50	Jetson AGX Orin 64 GB, Jetson AGX Xavier 64 GB, Jetson AGX Thor (T5000) 128 GB

The post How to Run MLC LLM on Jetson AGX Thor? appeared first on OpenZeka EN Blog.

How to Run vLLM on Jetson AGX Thor?

Enhar — Tue, 09 Sep 2025 10:16:57 +0000

What is vLLM and Why Does It Matter on Jetson AGX Thor?

vLLM is an open-source inference engine designed to run large language models (LLMs) with exceptional efficiency. Thanks to its innovative PagedAttention architecture, vLLM delivers both high throughput and low latency making it possible to deploy advanced AI models in real-time applications.

On the other side, NVIDIA Jetson AGX Thor is a next-generation edge AI platform built for robotics, autonomous machines, and industrial systems. With its immense compute power and AI acceleration, Thor is the perfect hardware to unlock the full potential of LLMs at the edge.

When combined, vLLM on Jetson AGX Thor enables:

Real-time LLM services (chatbots, assistants, summarization, translation)
Vision + Language use cases (explaining camera input instantly)
On-device inference with ultra-low latency and stronger data privacy
Reduced reliance on cloud resources, with better energy efficiency

In short, vLLM provides the software intelligence, Thor provides the hardware muscle together they make cutting-edge LLM experiences possible directly on the device.

Installing Process

First, download the following Triton Inference Server container image.
This image comes with vLLM version 0.9.2 pre-installed. The tag 25.08 refers to August 2025.

If you’d like to update to a newer version in the future, you can always visit the NVIDIA NGC Catalog to find the latest container releases.

Copy to Clipboard

You can verify the installed vLLM version directly with Python.

Copy to Clipboard

Next, you’ll need to create an account on Hugging Face , generate an access token, and log in with it.

This token will allow the container to securely download and run models directly from Hugging Face.

Copy to Clipboard

To download model run ;

Copy to Clipboard

Once your environment is ready, you can launch the vLLM API server using the following command:

Copy to Clipboard

Here’s what each parameter does:

–model → specifies which model to load (in this case, Llama-3.1-8B-Instruct from Hugging Face).
–tensor-parallel-size 1 → runs the model on a single GPU. If you have multiple GPUs, you can increase this value.
–gpu-memory-utilization 0.90 → tells vLLM to use up to 90% of available GPU memory. Adjust this if you run into memory errors.
–max-model-len 8192 → sets the maximum context length (in tokens) for the model.
–dtype float16 → runs the model in FP16 precision, which is more efficient on Jetson AGX Thor.

⚠️ Heads-up: If you encounter ;

Copy to Clipboard

It usually means the engine couldn’t reserve enough GPU memory. Try lowering the GPU memory utilization. For example try with –gpu-memory-utilization 0.75 .

If you see a message like:

Copy to Clipboard

it means that vLLM is now serving on port 8000 and ready to accept requests.
At this point, you can start testing it with a simple curl command. For example:

Copy to Clipboard

Which Jetson should I choose for my LLM model?

Model	Parameters	Quantization	Required RAM (GB)	Recommended Minimum Jetson
deepseek-ai Deepseek-R1 Base	684B	Dynamic-1.58-bit	162.11	Not supported (≥128 GB and above)
deepseek-ai Deepseek-R1 Distill-Qwen-1.5B	1.5B	Q4_K_M	0.90	Jetson Orin Nano 4 GB, Jetson Nano 4 GB
deepseek-ai Deepseek-R1 Distill-Qwen-7B	7B	Q5_K_M	5.25	Jetson Orin Nano 8 GB, Jetson Orin NX 8 GB, Jetson Xavier NX 8 GB
mistralai Mixtral 8x22B-Instruct-v0.1	22B	Q4_K_M	13.20	Jetson Orin NX 16 GB, Jetson AGX Orin 32 GB, Jetson AGX Xavier 32 GB
mistralai Mathstral 7B-v0.1	7B	Q5_K_M	5.25	Jetson Orin Nano 8 GB, Jetson Orin NX 8 GB, Jetson Xavier NX 8 GB
google gemma-3 12b-it	12B	Q4_K_M	7.20	Jetson Orin NX 8 GB, Jetson Orin Nano 8 GB, Jetson Xavier NX 8 GB
meta-llama Llama-3.1 70B-Instruct	70B	Q5_K_M	52.50	Jetson AGX Orin 64 GB, Jetson AGX Xavier 64 GB, Jetson AGX Thor (T5000) 128 GB

The post How to Run vLLM on Jetson AGX Thor? appeared first on OpenZeka EN Blog.

How to Run Ollama on Jetson AGX Thor with OpenwebUI?

Enhar — Tue, 09 Sep 2025 10:11:28 +0000

What is Ollama?

Ollama is a lightweight and flexible platform that allows you to run large language models (LLMs) directly on your own device. When running on powerful AI hardware such as the NVIDIA Jetson AGX Thor, it provides a local, fast, and secure experience without the need for cloud-based solutions.

Thanks to the high processing power of Jetson AGX Thor, Ollama:

Runs LLMs locally → Can be used even without an internet connection.
Utilizes hardware acceleration → Leverages GPU power to generate faster responses.
Ensures data privacy → All processing happens on-device, so sensitive data never leaves the system.
Offers flexibility → Different models can be downloaded, customized, and tested.

In short, Ollama leverages the hardware advantages of AGX Jetson Thor to make AI applications more accessible, portable, and secure.

Requirements for AGX Thor

JetPack 7 must be installed
Stable high-speed internet connection
At least 15 GB of free disk space (excluding model storage for Ollama itself)

Installation Process

First, we create a folder to mount into the container.

Copy to Clipboard

Next, we download the image from the GitHub Container Registry.
The ghcr.io prefix indicates that the image is hosted on the GitHub Container Registry.

To access other images or check for the latest updates, you can visit the following link.

Copy to Clipboard

It will take some time to pull (download) the container image.

Once in the container, you will see something like this.

Try running a GPT OSS (20b parameter) model by issuing a command below.

Copy to Clipboard

Once ready, it will show something like this:

Troubleshooting

CUDA out of memory

If you encounter CUDA out of memory errors, try running a smaller model.
You can also use quantization to reduce memory usage and run models more efficiently on your device.

Different model sizes and quantized versions can be found here.

Installing OpenwebUI

Firsty run this command on terminal ;

Copy to Clipboard

If you see the “application startup” message on the screen, you can proceed to the next step.
If it says “retrying” and you don’t see any progress in the download section, stop the process with Control + C and try again or just wait. There should be no problem.

You can then navigate your browser to http://JETSON_IP:8080 , and create a fake account to log in (these credentials are only local). Instead of JETSON_IP, you can also use localhost.

Create an account .

⚠️ Be careful ! When OpenWebUI is launched, no model will appear in the Load Models section at the top left. To connect models to OpenWebUI, we need to assign a port. Restart the Ollama container with the following command:

Copy to Clipboard

You can check it by sending a curl request:

Copy to Clipboard

If you see “Ollama is running”, you can continue using it.

Which Jetson should I choose for my LLM model?

Model	Parameters	Quantization	Required RAM (GB)	Recommended Minimum Jetson
DeepSeek-R1	671B	Dynamic-1.58-bit (MoE 1.5-bit + other layers 4–6-bit)	159.03	Not supported (≥128 GB and above)
DeepSeek-R1 Distill-Qwen-1.5B	1.5B	Q4_K_M	0.90	Jetson Orin Nano 4 GB, Jetson Nano 4 GB
DeepSeek-R1 Distill-Qwen-7B	7B	Q5_K_M	5.25	Jetson Orin Nano 8 GB, Jetson Orin NX 8 GB, Jetson Xavier NX 8 GB
Qwen 2.5	14B	FP16	33.60	Jetson AGX Orin 64 GB, Jetson AGX Xavier 64 GB
CodeLlama	34B	Q4_K_M	20.40	Jetson AGX Orin 32 GB, Jetson AGX Xavier 32 GB
Llama 3.2 Vision	90B	Q5_K_M	67.50	Jetson AGX Thor (T5000) 128 GB
Phi-3	3.8B	FP16	9.12	Jetson Orin NX 16 GB

The post How to Run Ollama on Jetson AGX Thor with OpenwebUI? appeared first on OpenZeka EN Blog.

NVIDIA JetPack 7.0: Powering the Next Generation of AI and Robotics at the Edge

admin — Wed, 27 Aug 2025 07:01:43 +0000

NVIDIA has announced the release of JetPack™ 7.0, the latest and most advanced software stack for the Jetson™ platform. Built to enable cutting-edge robotics and generative AI applications at the edge, JetPack 7 delivers an unprecedented foundation for developers building machines that interact with and understand the physical world.

A New Era for AI at the Edge

JetPack 7.0 redefines what’s possible with Jetson by providing ultra-low latency, deterministic performance, and scalable deployment. From humanoid robots to AI systems tackling the most demanding generative workloads, JetPack 7 ensures developers have the right tools and libraries to bring ideas to life.

Key to this release is full support for the NVIDIA Jetson Thor™ platform, featuring groundbreaking performance and next-generation AI capabilities. JetPack 7.0 also introduces:

A preemptable real-time kernel for predictable system responsiveness.
Multi-Instance GPU (MIG) support, maximizing GPU utilization across workloads.
An integrated Holoscan Sensor Bridge, enabling seamless sensor-to-AI pipelines.

Built for the Future: Modern OS and Cloud-Native Design

At its core, JetPack 7 is built on Linux Kernel 6.8 and Ubuntu 24.04 LTS, ensuring long-term stability and compatibility. Its modular, cloud-native architecture integrates the latest NVIDIA AI compute stack, making it easier than ever to align Jetson development with NVIDIA’s broader AI workflows.

For developers, this means seamless interoperability, whether building robotics systems in the lab or deploying generative AI at the edge

Aligning with Industry Standards: SBSA Architecture

JetPack 7 also marks a major milestone in aligning Jetson with industry standards through the Server Base System Architecture (SBSA). By adopting SBSA:

Jetson Thor is now positioned alongside ARM server-class systems.
Developers benefit from stronger OS support, simplified software portability, and smoother enterprise integration.
CUDA 13.0 is now unified across all ARM targets, streamlining development and reducing fragmentation.

This alignment ensures consistency from server-class systems to Jetson Thor, bridging the gap between edge and enterprise AI.

What’s New in Jetson Linux 38.2

JetPack 7.0 is powered by Jetson Linux 38.2, which brings a host of enhancements:

Based on Ubuntu 24.04 LTS and Linux Kernel v6.8 LTS.
Support for the Jetson AGX Thor Developer Kit and Jetson T5000 module.
OpenRM-based stack architecture.
Updated AI compute libraries: CUDA 13, cuDNN 9.12, and TensorRT 10.13.
CoE (CSI over Ethernet) support via the Holoscan Sensor Bridge, enabling plug-and-play with the Eagle Camera Sensor Module LI-VB1940.
Optimized support for CSI/GMSL via Argus and CoE via SIPL Camera API.
NVIDIA-optimized preemptable real-time kernel for deterministic performance.

Supported Hardware

NVIDIA JetPack 7.0 launches with support for the latest Jetson platforms:

Jetson AGX Thor Developer Kit
Jetson T5000

Developers using these devices can immediately take advantage of JetPack 7’s new capabilities for robotics, AI, and sensor-driven applications.

Important Notes for Developers

Manual flashing instructions have been updated due to SBSA architecture adoption—developers should carefully follow the updated guide.
For reinstallation using an ISO, refer to the Getting Started Guide to avoid issues.

Conclusion

With JetPack 7.0, NVIDIA is setting a new standard for AI-powered edge computing. By combining next-generation hardware support, industry-standard architectures, and an updated AI stack, JetPack 7 delivers everything developers need to push the boundaries of robotics and generative AI.

For full technical details, developers should review the Jetson Linux 38.2 release notes.

The post NVIDIA JetPack 7.0: Powering the Next Generation of AI and Robotics at the Edge appeared first on OpenZeka EN Blog.

NVIDIA Jetson Thor: A Next-Generation Platform for Edge AI

admin — Mon, 25 Aug 2025 15:35:08 +0000

The NVIDIA Jetson platform is a family of compact, high-performance computer modules built to bring AI to the edge, enabling everything from robotics and autonomous vehicles to industrial automation. Each Jetson module integrates powerful GPUs with ARM-based processors, allowing autonomous machines—such as robots, unmanned vehicles, and intelligent sensors—to operate with speed and precision directly where the data is generated.

Why Edge AI and Robotics Matter

In robotics and other autonomous systems, milliseconds can determine success or failure—whether it’s avoiding an obstacle, making a precision movement, or responding to a critical safety event. Edge AI addresses this by processing data locally, reducing latency for real-time decision-making and ensuring systems keep running even without internet connectivity. This local processing also protects privacy by keeping sensitive information on-device and reduces network load, saving both bandwidth and operating costs.

As robots and intelligent machines take on more complex tasks, from multi-sensor fusion to running large AI models, their need for computing power grows rapidly. Next-generation edge platforms must not only deliver ultra-low latency and high throughput but also support advanced AI workloads like generative AI, vision-language understanding, and autonomous navigation—all in compact, energy-efficient form factors.

Why Jetson Thor?

The latest member of the Jetson family, Jetson Thor, was developed to meet the growing demand for greater computing power to support next-generation humanoid robots, autonomous systems and large AI models running directly on-device.

Built for demanding workloads such as generative AI and multi-sensor fusion, Jetson Thor delivers up to 7.5× higher AI performance and 3.5× better energy efficiency than AGX Orin. While AGX Orin offered ~275 TOPS, Jetson Thor surpassed it dramatically, reaching 2,070 TFLOPS (FP4). This leap in performance allows developers to run larger deep learning models, process more sensors in parallel, and achieve faster real-time control—making Jetson Thor a better choice for edge AI.

A New Class of Robotic Computing

Jetson AGX Thor redefines robotic intelligence, delivering the power and efficiency needed to bring next-generation humanoid robots to life. It supports a wide range of generative AI models—from Cosmos Reason, DeepSeek, Llama, Gemini, and Qwen to domain-specific robotics models like NVIDIA Isaac™ GR00T N1.5—enabling any developer to easily experiment and run inference locally, whether with Vision Language Action (VLA) models, popular LLMs, or VLMs.

To deliver a seamless cloud-to-edge experience, Jetson AGX Thor runs the NVIDIA AI software stack for physical AI applications, including:

NVIDIA Isaac for robotics,
NVIDIA Metropolis for visual agentic AI, and
NVIDIA Holoscan for sensor processing.

It also enables the creation of AI agents directly at the edge using NVIDIA agentic AI workflows such as Video Search and Summarization (VSS).

Video Search and Summarization: A New Edge AI Milestone

Video has become one of the most valuable sources of information in today’s connected world, powering everything from security monitoring to industrial inspection, retail analytics, and healthcare diagnostics. However, the sheer volume of video data is overwhelming—hours of footage must often be reviewed just to find a few seconds of relevant events. Without automation, this process is slow, expensive, and prone to human error.

This is where Video Search and Summarization (VSS) comes in. VSS is a powerful generative AI application designed to streamline the development of intelligent video analytics agents. Built on the NVIDIA AI Blueprint for video search and summarization, it combines vision-language models (VLMs), large language models (LLMs), and advanced computer vision to:

Search across multiple live streams and recorded files instantly.
Summarize video content with contextualized insights.
Provide interactive Q&A about video footage.
Deliver real-time alerts and notifications for critical events.
Support audio-based cues for more comprehensive situational understanding.
Offer REST APIs for easy integration into existing systems.

Previously, workloads of this scale were only practical in cloud environments. Now, thanks to Jetson Thor’s AI compute capacity, VSS can run at the edge. This means organizations can perform real-time, privacy-preserving video analytics without relying on internet connectivity—ideal for mission-critical applications in security, manufacturing, smart cities, and more.

Jetson Thor Technical Specifications

Jetson Thor sits at the top of the Jetson family in terms of hardware capabilities.

CPU: 14-core Arm Neoverse-V3AE processor for high-performance multitasking and real-time operations.
GPU: NVIDIA’s next-generation Blackwell architecture GPU with 2,560 CUDA cores and 96 Tensor Cores, delivering massive AI compute performance.
MIG (Multi-Instance GPU): Enables secure, isolated execution of multiple AI workloads by partitioning GPU resources in hardware.
Specialized Accelerators: Includes a 3rd-gen Programmable Vision Accelerator (PVA), dual video encoders/decoders, and an optical flow accelerator to offload processing from CPU/GPU.
Memory: 128 GB LPDDR5X RAM with a 256-bit interface and ~273 GB/s bandwidth—ideal for large AI models and high-resolution sensor data.
Power: Configurable from 40 W to 130 W, supporting both low-power embedded use cases and full-performance operation.
Connectivity:
- QSFP slot with 4× 25 GbE high-bandwidth Ethernet for streaming data from multiple cameras or lidars.
- Additional Multi-Gigabit Ethernet (RJ-45), multiple USB 3.2 ports, DisplayPort/HDMI outputs, and industrial interfaces like CAN/UART.

Jetson Thor runs on Ubuntu 24.04 LTS with the new JetPack 7.0 SDK, ensuring compatibility with NVIDIA’s latest AI software stack, including CUDA and TensorRT.

Application Areas

1. Autonomous Systems (Vehicles & Robots)
Processes LIDAR, camera, and radar data simultaneously to enable precise perception and safe decision-making. Humanoid robots and drones can perform real-time localization, SLAM, and obstacle detection with greater speed and accuracy.

2. Smart Cities & Security
Analyzes 24/7 video streams from city surveillance systems locally—without sending data to the cloud—for instant traffic management, crowd control, and threat detection. Supports real-time analysis of 4K/8K video feeds.

3. Industrial Automation
Enhances factory robotics, production line cameras, and inspection systems for defect detection, quality control, and predictive maintenance. Built for reliability in demanding industrial environments.

4. Healthcare Technologies
It powers AI-driven medical devices such as portable MRI and ultrasound systems to process images directly on-device for instant diagnostics. Surgical robots can benefit from real-time imaging and enhanced precision, while patient monitoring systems can safeguard privacy by keeping all data processing local.

Beyond these, Jetson Thor can power AI research labs, smart retail systems, agricultural automation, and much more—accelerating the shift toward smarter, faster, and more autonomous edge systems.

In summary: Jetson Thor brings supercomputer-class AI performance to the edge, making it possible to run previously cloud-only applications like Video Search and Summarization locally, in real time, and with full data privacy. This opens the door to faster, smarter, and more autonomous machines across every industry.

The post NVIDIA Jetson Thor: A Next-Generation Platform for Edge AI appeared first on OpenZeka EN Blog.