Generative AI Archives - OpenZeka EN Blog

NVIDIA DGX Spark vs NVIDIA Jetson Thor

Betül Kaya — Tue, 23 Dec 2025 13:58:02 +0000

One of the most common mistakes made when developing artificial intelligence systems is evaluating hardware designed for different purposes as if they were meant to solve the same problem. Although NVIDIA DGX Spark and NVIDIA Jetson Thor—two of NVIDIA’s recently prominent products—are often compared due to their similar names and emphasis on high performance, they are in fact two entirely different platforms designed to solve completely different problems.

The purpose of this article is to clearly highlight the differences between DGX Spark and Jetson Thor and to make the following distinction explicit at the end:

DGX Spark is designed for developing, training, and testing artificial intelligence models.
Jetson Thor, on the other hand, is designed to run these models in the real world, on robots and physical systems.

What is NVIDIA DGX Spark?

NVIDIA DGX Spark is a compact AI supercomputer positioned in a desktop form factor, designed to enable the development and execution of artificial intelligence models entirely in a local environment. At the heart of the system is the Grace Blackwell GB10 Superchip, which combines NVIDIA’s Grace CPU and Blackwell GPU architectures into a single chip. Thanks to this architectural integration, DGX Spark delivers up to 1 petaflop of AI computing performance along with 128 GB of high-bandwidth unified HBM3e memory. This makes it an extremely powerful local development platform for large language models and generative AI workloads.

Each DGX Spark can operate independently as a fully capable AI workstation. When two Spark devices are connected together, the system reaches a unified memory capacity of 256 GB, transforming into an expanded AI node capable of handling models with up to 405 billion parameters. While pairing a maximum of two units is currently supported, NVIDIA states that this limit may be increased in the future through software updates.

DGX Spark aims to reduce reliance on the cloud or data centers by enabling the following workloads to be performed entirely in a local environment.

DGX Spark Use Cases

Fine-Tuning
DGX Spark provides a powerful fine-tuning platform, especially for organizations working with enterprise, sensitive, or regulated data. In sectors such as finance, healthcare, defense, or law, large language models, image recognition systems, or task-specific AI models can be fine-tuned entirely locally without data leaving the organization. This approach ensures compliance with GDPR regulations and eliminates intellectual property risks.

Inference and Local AI Services
DGX Spark enables low-latency, high-efficiency inference of trained models in desktop or local server environments. Chatbots, document analysis systems, visual inspection applications, or decision support systems can run in real time without relying on the cloud. As a result, performance improves while network dependency and data transfer risks are eliminated.

Data Science and Analytics Workloads
For data scientists working with large datasets, DGX Spark consolidates data cleaning, model training, and evaluation steps into a single powerful platform. Thanks to GPU-accelerated computing, complex statistical analyses, simulations, and machine learning pipelines can be completed much faster. This provides a significant speed advantage, especially for Proof of Concept (PoC) and pilot projects.

Transition from Cloud to Desktop and Desktop to Cloud
DGX Spark is designed to be fully compatible with the NVIDIA ecosystem. After developing and testing a model on DGX Spark, you can move it to DGX Cloud or other accelerated cloud infrastructures using the same codebase and software stack with little to no modification. This approach offers great flexibility for organizations adopting hybrid AI strategies.

Working with Secure and Sensitive Data
DGX Spark is an ideal solution for scenarios where data must remain within the organization. Sensitive customer data, internal company documents, or confidential R&D outputs can be processed and modeled locally without being uploaded to the cloud. This reduces cybersecurity risks and simplifies regulatory compliance.

Education, Academic, and Enterprise AI Laboratories
For universities, research centers, and corporate AI teams, DGX Spark functions as a compact yet extremely powerful “AI laboratory.” Students and engineers can gain hands-on experience working with large-scale models on real hardware and develop scenarios that are much closer to production environments.

What is NVIDIA Jetson Thor?

NVIDIA Jetson Thor is a high-performance edge AI platform developed for Physical AI, robotics, and autonomous systems. The core objective of Jetson Thor is to run large language models (LLMs), vision-language models (VLMs), and vision-language-action (VLA) models in real time with low latency and high energy efficiency. In this respect, Thor is positioned as the central “brain” of a robot or autonomous system, responsible for decision-making and action execution.
Thanks to its Blackwell-based architecture, Jetson Thor delivers up to 2,070 TFLOPS (FP4 – sparsity-enabled) of AI computing performance, making it possible to deploy advanced models developed at data-center scale directly in edge environments. The Jetson Thor module family is optimized for Physical AI and robotics applications, combining high performance with a flexible power profile: configurable power consumption between 40 W and 130 W, along with up to 128 GB of memory.

This powerful hardware foundation allows LLM, VLM, and VLA models to run concurrently in a deterministic, low-latency manner. Its high energy efficiency makes Jetson Thor an ideal solution for 24/7 autonomous systems, robotic platforms, and mission-critical edge AI applications.

The platform is optimized to process multiple data streams simultaneously from cameras, LiDAR, radar, and other sensors, enabling the entire perception–decision–action loop to be closed fully at the edge. Jetson Thor’s architecture targets continuously operating, time-sensitive systems that interact with the real world, rather than desktop- or data-center-oriented development environments.

In short, Jetson Thor is not a platform for developing AI models; it is an edge AI solution designed to run already developed models in the field, in the physical world, and in real time. Especially in robotics, autonomous vehicles, and Physical AI scenarios, it serves as a foundational building block for modern autonomous systems by unifying high computational power, low latency, sensor integration, and energy efficiency in a single platform.

Jetson Thor’s high computational performance and extensive I/O capabilities make it an ideal solution across a wide range of industries. Below are some of the potential application areas of Jetson Thor:

Autonomous Systems (Vehicles and Robots)
By processing LiDAR, camera, and radar data simultaneously, Jetson Thor enables autonomous vehicles to perceive their environment and make safe decisions. Humanoid robots and unmanned aerial vehicles (UAVs) can also perform tasks such as real-time localization, mapping (SLAM), and obstacle detection more efficiently with Jetson Thor.
Smart Cities and Public Safety
Jetson Thor can analyze 24/7 video streams from city surveillance cameras locally, without relying on the cloud. This enables instant traffic management, crowd monitoring, and detection of security threats. Thanks to its high memory capacity, Jetson Thor can analyze 4K/8K video streams in real time for smart city applications.
Industrial Automation
When integrated into robotic arms or camera systems on production lines, Jetson Thor enables AI-driven tasks such as defect detection, quality control, and predictive maintenance to be performed in real time. Its rugged design and long-lifecycle industrial variants ensure reliable operation in harsh industrial environments.
Healthcare Technologies
Medical devices and innovative healthcare systems can also benefit from Jetson Thor’s capabilities. For example, a portable MRI or ultrasound device can process images locally using AI to deliver instant diagnostic insights. When equipped with Jetson Thor, surgical robots can perform real-time image processing and precise control during operations. In addition, patient monitoring systems can process data locally while preserving privacy.
Security and Surveillance
Smart security cameras can perform deep learning–based tasks such as facial recognition or threat detection in real time using Jetson Thor. This enhances security while reducing network traffic in environments such as banks, airports, and critical infrastructure. The system can detect suspicious situations on-site and send immediate alerts to security personnel.

Feature	NVIDIA DGX Spark	NVIDIA Jetson Thor
Primary Purpose	AI development, training, testing	Robotics and Physical AI Inference
Deployment Environment	Desktop / Office / Lab	Edge / Robot / Autonomous systems
LLM Prefill Performance	Very high (compute-bound)	Optimized for edge
Power Consumption	High	Low and energy-efficient
Real-Time Operation	Not a priority	Critical requirement
Sensor Integration	None	Camera, LIDAR, radar etc.
Target User	AI developers, data scientists	Robotics and embedded systems developers

If your goal is Physical AI, robotics, autonomous driving, and edge inference:

Jetson Thor is specifically designed for this purpose and is the right choice.
If you need AI model development, training, testing, fine-tuning, and high-performance local computation.
DGX Spark is purpose-built exactly for these needs.

For large-scale organizations, these two products are not competitors but complementary: You develop the model on DGX Spark and deploy it into the real world on Jetson Thor.

The post NVIDIA DGX Spark vs NVIDIA Jetson Thor appeared first on OpenZeka EN Blog.

HammerBench : AGX Thor’s Power Meets Ollama

Enhar — Wed, 17 Sep 2025 13:28:53 +0000

What is an LLM benchmark and why is it important?

LLM benchmarks are standardized tests designed to measure how fast, efficient, and accurate large language models (LLMs) perform across different hardware and environments. These tests evaluate metrics such as latency, throughput, and sometimes accuracy to provide an objective view of performance.

As LLMs continue to grow larger and more complex, choosing the right hardware to run them on becomes a critical decision. Benchmark results are essential to understand which device or infrastructure delivers better performance, to balance cost and efficiency, and to identify the most suitable solution for real-world use cases. In short, LLM benchmarks give both researchers and developers a clear roadmap of how models perform in practice.

To showcase the performance of Jetson AGX Thor, we are sharing our results and performance charts with you. At the same time, you can also run benchmarks across different GPU types to compare and validate performance for your own workloads. If you want to measure the performance metrics of your own devices and test your models under real-world conditions, get in touch with us. With our solution, your measurements turn into more than just numbers — they become actionable insights that drive strategic decisions.

How to use HammerBench ?

🖥️ What the App Does

This is a Streamlit-based LLM Benchmark Tool interface designed to evaluate large language models (LLMs) on NVIDIA Jetson AGX Thor hardware using Ollama as the backend.

⚙️ Configuration (Left Sidebar)

GPU Information:
Detects if the device is a Jetson (in this case, a Jetson AGX Thor Developer Kit).
Shows details about the GPU (NVIDIA Jetson AGX Thor) and available memory (125,772 MB ≈ 122.8 GB).

Use Only GPU:

A checkbox option that allows restricting benchmarks to GPU-only execution.

📊 Main Panel

Title: LLM Benchmark Tool with description: Benchmark LLM models using Ollama with real-time progress tracking.

Models Compatible with GPU memory (VRAM) requirements:

Displays a table of available models (llama3.2.1b, gemma3.4b, qwen3.14b, gpt-oss20b, etc.)
Shows how much memory (VRAM in GB) each model requires.
Marks them with ✅ if they are runnable on the detected GPU.

Select Models to Benchmark:

Lists the same models with checkboxes so the user can pick which ones to run benchmarks on.
Each option shows the memory requirement for clarity (e.g., gemma3.27b (17 GB), gpt-oss-120b (65 GB)).

🚀 Purpose

The tool helps developers and researchers:

See which LLMs are compatible with their GPU memory.
Select multiple models and run benchmarks to measure performance (latency, throughput, GPU utilization).
Use the results to compare models and make better deployment or scaling decisions.

Sorry, your browser doesn't support embedded videos.

The post HammerBench : AGX Thor’s Power Meets Ollama appeared first on OpenZeka EN Blog.

How to Run Llama.cpp Server on Jetson AGX Thor?

Enhar — Fri, 12 Sep 2025 10:44:53 +0000

Llama.cpp Server on Jetson AGX Thor: Unlocking Edge AI with Large Language Models

Llama.cpp Server is a lightweight, high-performance runtime for large language models (LLMs), designed to run efficiently on both CPU and GPU. Built in C++, it eliminates unnecessary overhead and delivers deep hardware-level optimizations. By supporting the GGUF model format, it allows for quantization, drastically reducing memory requirements while maintaining accuracy. Through its REST API, Llama.cpp Server can be seamlessly integrated into applications, enabling developers to bring advanced LLM capabilities directly to devices—without relying on the cloud.

When deployed on NVIDIA Jetson AGX Thor, the advantages become even more compelling:

GPU acceleration with CUDA ensures that the Thor’s compute power is fully utilized, bringing real-time inference to the edge.
Optimized for edge AI use cases such as robotics, autonomous systems, and industrial automation, it provides ultra-low latency decision-making.
Resource efficiency via quantization makes it possible to run models from 7B up to 13B parameters within the limited memory budgets typical of embedded devices.

By combining Llama.cpp Server with Jetson AGX Thor, organizations gain a powerful platform for on-device AI that is private, fast, and cost-effective. No data needs to leave the device, latency is minimized, and the system remains fully adaptable to both prototyping and production scenarios. Supported by an open-source ecosystem, this pairing represents a breakthrough for deploying large language models securely and efficiently at the edge.

Requirements

JetPack 7 (Learn more about JetPack)
CUDA 13
At least 10 GB of free disk space (Only for the Llama Server image, not for the models.)
A stable and fast internet connection

How to use Llama.cpp Server ?

Firstly download the image ;

Copy to Clipboard

Then, download the model from Hugging Face. If the model requires access, log in with your token by running:

Copy to Clipboard

Then, install the required Python dependencies with the following command:

Copy to Clipboard

This command set downloads the NVIDIA NVPL local repository package, installs it, adds the signing key to the system, and then installs the NVPL library via apt-get.

Copy to Clipboard

This command takes the Qwen2.5-VL-3B-Instruct model downloaded from Hugging Face (inside the snapshot folder identified by ), and uses the convert_hf_to_gguf.py tool to convert the Hugging Face weights (safetensors/PyTorch) into GGUF format, saving the output as /data/models/Qwen3-4B-Instruct-2507-f16.gguf.

Copy to Clipboard

This command takes the full-precision GGUF model (Qwen3-4B-Instruct-2507-f16.gguf) and runs it through llama-quantize to produce a quantized version (Qwen3-4B-Instruct-2507-q4_k_m.gguf) using the q4_k_m quantization method.

Input file: /data/models/Qwen3-4B-Instruct-2507-f16.gguf (the FP16 model converted from Hugging Face).
Output file: /data/models/Qwen3-4B-Instruct-2507-q4_k_m.gguf (smaller, quantized model).
Quantization type: q4_k_m → a 4-bit quantization scheme optimized for speed and memory efficiency.

Copy to Clipboard

This command launches the llama.cpp server so the quantized model can be served via an HTTP API.

Copy to Clipboard

And thats it ! You can start chatting .

The post How to Run Llama.cpp Server on Jetson AGX Thor? appeared first on OpenZeka EN Blog.

How to Run MLC LLM on Jetson AGX Thor?

Enhar — Tue, 09 Sep 2025 10:21:01 +0000

What is MLC LLM ?

MLC LLM (Machine Learning Compilation for Large Language Models) is an open-source project designed to make large language models (LLMs) run efficiently across different hardware platforms. Its main goal is to optimize performance and reduce energy consumption, enabling AI applications to run not only in the cloud but also on edge devices.

NVIDIA’s next-generation Jetson AGX Thor platform delivers powerful computing capabilities for robotics, autonomous systems, and AI-driven applications. By leveraging MLC LLM on Jetson AGX Thor, large language models can be optimized to run in real time, supporting tasks such as natural language processing, decision-making, and human-like interaction with higher efficiency.

In short, MLC LLM on Jetson AGX Thor acts as a bridge that brings high-performance large language model capabilities to edge devices.

Requirements

JetPack 7 (Learn more about JetPack)
CUDA 13
At least 25 GB of free disk space (Only for the MLC LLM image, not for the models.)
A stable and fast internet connection

How to use MLC LLM ?

First, install the Docker image on your computer:

Copy to Clipboard

If you’d like to explore the available images or replace them with newer ones, you can visit the GitHub Container Registry.

Once inside the container, find the model you want to download from Hugging Face.
Use the hf download command inside the container to download the model.

For example:

Copy to Clipboard

In the next step, provide the folder where you downloaded the model and run the command below.
This command converts the model’s original Hugging Face weights (in safetensor format) into the optimized MLC LLM format. During conversion, the weights are quantized (e.g., to q4bf16_1), which reduces memory usage and improves runtime efficiency on GPU without heavily sacrificing accuracy.

In short, mlc_llm convert_weight takes the raw model checkpoint and transforms it into a format that can be directly executed by the MLC runtime on your target device (e.g., Jetson AGX Thor with CUDA).

⚠️ Warning: In the command, replace in snapshots// with the actual folder name you see inside the snapshots directory (e.g., aeb13307a71acd8fe81861d94ad54ab689df…). This folder contains the real model files such as config.json, tokenizer.json, and model.safetensors, which are required for the mlc_llm convert_weight command to work.

Copy to Clipboard

In the next step , gen_config generates the configuration files needed to run the converted model in MLC. It defines the conversation template (e.g., Qwen format), context length, batch size, and other runtime parameters. In short, it makes the weight-converted model fully executable in the MLC runtime.

Copy to Clipboard

⚠️ Note: The “Not found” messages for files like tokenizer.model or added_tokens.json are not errors. These files are optional and not required by all models. As long as tokenizer.json, vocab.json, and merges.txt are found and copied, the model configuration is complete and ready to run.

Now that the configuration is ready, we can move on to the compilation step. In this stage, the model is compiled into a CUDA-optimized shared library (.so file), which enables fast execution on the GPU.

Copy to Clipboard

With the compilation complete, the final step is to serve the model so it can handle inference requests. The mlc_llm serve command launches an HTTP server that exposes the model as an API endpoint, making it accessible for testing or integration into applications.

Copy to Clipboard

If you see this output, it means the model has been successfully compiled and serving .

You can test it with this curl request ;

Copy to Clipboard

Which Jetson should I choose for my LLM model?

Below, you can find the RAM requirements of the most popular LLM models along with Jetson recommendations that meet the minimum specifications to run them. You can choose the one that best fits your needs.

Model	Parameters	Quantization	Required RAM (GB)	Recommended Minimum Jetson
deepseek-ai Deepseek-R1 Base	684B	Dynamic-1.58-bit	162.11	Not supported (≥128 GB and above)
deepseek-ai Deepseek-R1 Distill-Qwen-1.5B	1.5B	Q4_K_M	0.90	Jetson Orin Nano 4 GB, Jetson Nano 4 GB
deepseek-ai Deepseek-R1 Distill-Qwen-7B	7B	Q5_K_M	5.25	Jetson Orin Nano 8 GB, Jetson Orin NX 8 GB, Jetson Xavier NX 8 GB
mistralai Mixtral 8x22B-Instruct-v0.1	22B	Q4_K_M	13.20	Jetson Orin NX 16 GB, Jetson AGX Orin 32 GB, Jetson AGX Xavier 32 GB
mistralai Mathstral 7B-v0.1	7B	Q5_K_M	5.25	Jetson Orin Nano 8 GB, Jetson Orin NX 8 GB, Jetson Xavier NX 8 GB
google gemma-3 12b-it	12B	Q4_K_M	7.20	Jetson Orin NX 8 GB, Jetson Orin Nano 8 GB, Jetson Xavier NX 8 GB
meta-llama Llama-3.1 70B-Instruct	70B	Q5_K_M	52.50	Jetson AGX Orin 64 GB, Jetson AGX Xavier 64 GB, Jetson AGX Thor (T5000) 128 GB

The post How to Run MLC LLM on Jetson AGX Thor? appeared first on OpenZeka EN Blog.

How to Run vLLM on Jetson AGX Thor?

Enhar — Tue, 09 Sep 2025 10:16:57 +0000

What is vLLM and Why Does It Matter on Jetson AGX Thor?

vLLM is an open-source inference engine designed to run large language models (LLMs) with exceptional efficiency. Thanks to its innovative PagedAttention architecture, vLLM delivers both high throughput and low latency making it possible to deploy advanced AI models in real-time applications.

On the other side, NVIDIA Jetson AGX Thor is a next-generation edge AI platform built for robotics, autonomous machines, and industrial systems. With its immense compute power and AI acceleration, Thor is the perfect hardware to unlock the full potential of LLMs at the edge.

When combined, vLLM on Jetson AGX Thor enables:

Real-time LLM services (chatbots, assistants, summarization, translation)
Vision + Language use cases (explaining camera input instantly)
On-device inference with ultra-low latency and stronger data privacy
Reduced reliance on cloud resources, with better energy efficiency

In short, vLLM provides the software intelligence, Thor provides the hardware muscle together they make cutting-edge LLM experiences possible directly on the device.

Installing Process

First, download the following Triton Inference Server container image.
This image comes with vLLM version 0.9.2 pre-installed. The tag 25.08 refers to August 2025.

If you’d like to update to a newer version in the future, you can always visit the NVIDIA NGC Catalog to find the latest container releases.

Copy to Clipboard

You can verify the installed vLLM version directly with Python.

Copy to Clipboard

Next, you’ll need to create an account on Hugging Face , generate an access token, and log in with it.

This token will allow the container to securely download and run models directly from Hugging Face.

Copy to Clipboard

To download model run ;

Copy to Clipboard

Once your environment is ready, you can launch the vLLM API server using the following command:

Copy to Clipboard

Here’s what each parameter does:

–model → specifies which model to load (in this case, Llama-3.1-8B-Instruct from Hugging Face).
–tensor-parallel-size 1 → runs the model on a single GPU. If you have multiple GPUs, you can increase this value.
–gpu-memory-utilization 0.90 → tells vLLM to use up to 90% of available GPU memory. Adjust this if you run into memory errors.
–max-model-len 8192 → sets the maximum context length (in tokens) for the model.
–dtype float16 → runs the model in FP16 precision, which is more efficient on Jetson AGX Thor.

⚠️ Heads-up: If you encounter ;

Copy to Clipboard

It usually means the engine couldn’t reserve enough GPU memory. Try lowering the GPU memory utilization. For example try with –gpu-memory-utilization 0.75 .

If you see a message like:

Copy to Clipboard

it means that vLLM is now serving on port 8000 and ready to accept requests.
At this point, you can start testing it with a simple curl command. For example:

Copy to Clipboard

Which Jetson should I choose for my LLM model?

Model	Parameters	Quantization	Required RAM (GB)	Recommended Minimum Jetson
deepseek-ai Deepseek-R1 Base	684B	Dynamic-1.58-bit	162.11	Not supported (≥128 GB and above)
deepseek-ai Deepseek-R1 Distill-Qwen-1.5B	1.5B	Q4_K_M	0.90	Jetson Orin Nano 4 GB, Jetson Nano 4 GB
deepseek-ai Deepseek-R1 Distill-Qwen-7B	7B	Q5_K_M	5.25	Jetson Orin Nano 8 GB, Jetson Orin NX 8 GB, Jetson Xavier NX 8 GB
mistralai Mixtral 8x22B-Instruct-v0.1	22B	Q4_K_M	13.20	Jetson Orin NX 16 GB, Jetson AGX Orin 32 GB, Jetson AGX Xavier 32 GB
mistralai Mathstral 7B-v0.1	7B	Q5_K_M	5.25	Jetson Orin Nano 8 GB, Jetson Orin NX 8 GB, Jetson Xavier NX 8 GB
google gemma-3 12b-it	12B	Q4_K_M	7.20	Jetson Orin NX 8 GB, Jetson Orin Nano 8 GB, Jetson Xavier NX 8 GB
meta-llama Llama-3.1 70B-Instruct	70B	Q5_K_M	52.50	Jetson AGX Orin 64 GB, Jetson AGX Xavier 64 GB, Jetson AGX Thor (T5000) 128 GB

The post How to Run vLLM on Jetson AGX Thor? appeared first on OpenZeka EN Blog.

How to Run Ollama on Jetson AGX Thor with OpenwebUI?

Enhar — Tue, 09 Sep 2025 10:11:28 +0000

What is Ollama?

Ollama is a lightweight and flexible platform that allows you to run large language models (LLMs) directly on your own device. When running on powerful AI hardware such as the NVIDIA Jetson AGX Thor, it provides a local, fast, and secure experience without the need for cloud-based solutions.

Thanks to the high processing power of Jetson AGX Thor, Ollama:

Runs LLMs locally → Can be used even without an internet connection.
Utilizes hardware acceleration → Leverages GPU power to generate faster responses.
Ensures data privacy → All processing happens on-device, so sensitive data never leaves the system.
Offers flexibility → Different models can be downloaded, customized, and tested.

In short, Ollama leverages the hardware advantages of AGX Jetson Thor to make AI applications more accessible, portable, and secure.

Requirements for AGX Thor

JetPack 7 must be installed
Stable high-speed internet connection
At least 15 GB of free disk space (excluding model storage for Ollama itself)

Installation Process

First, we create a folder to mount into the container.

Copy to Clipboard

Next, we download the image from the GitHub Container Registry.
The ghcr.io prefix indicates that the image is hosted on the GitHub Container Registry.

To access other images or check for the latest updates, you can visit the following link.

Copy to Clipboard

It will take some time to pull (download) the container image.

Once in the container, you will see something like this.

Try running a GPT OSS (20b parameter) model by issuing a command below.

Copy to Clipboard

Once ready, it will show something like this:

Troubleshooting

CUDA out of memory

If you encounter CUDA out of memory errors, try running a smaller model.
You can also use quantization to reduce memory usage and run models more efficiently on your device.

Different model sizes and quantized versions can be found here.

Installing OpenwebUI

Firsty run this command on terminal ;

Copy to Clipboard

If you see the “application startup” message on the screen, you can proceed to the next step.
If it says “retrying” and you don’t see any progress in the download section, stop the process with Control + C and try again or just wait. There should be no problem.

You can then navigate your browser to http://JETSON_IP:8080 , and create a fake account to log in (these credentials are only local). Instead of JETSON_IP, you can also use localhost.

Create an account .

⚠️ Be careful ! When OpenWebUI is launched, no model will appear in the Load Models section at the top left. To connect models to OpenWebUI, we need to assign a port. Restart the Ollama container with the following command:

Copy to Clipboard

You can check it by sending a curl request:

Copy to Clipboard

If you see “Ollama is running”, you can continue using it.

Which Jetson should I choose for my LLM model?

Model	Parameters	Quantization	Required RAM (GB)	Recommended Minimum Jetson
DeepSeek-R1	671B	Dynamic-1.58-bit (MoE 1.5-bit + other layers 4–6-bit)	159.03	Not supported (≥128 GB and above)
DeepSeek-R1 Distill-Qwen-1.5B	1.5B	Q4_K_M	0.90	Jetson Orin Nano 4 GB, Jetson Nano 4 GB
DeepSeek-R1 Distill-Qwen-7B	7B	Q5_K_M	5.25	Jetson Orin Nano 8 GB, Jetson Orin NX 8 GB, Jetson Xavier NX 8 GB
Qwen 2.5	14B	FP16	33.60	Jetson AGX Orin 64 GB, Jetson AGX Xavier 64 GB
CodeLlama	34B	Q4_K_M	20.40	Jetson AGX Orin 32 GB, Jetson AGX Xavier 32 GB
Llama 3.2 Vision	90B	Q5_K_M	67.50	Jetson AGX Thor (T5000) 128 GB
Phi-3	3.8B	FP16	9.12	Jetson Orin NX 16 GB

The post How to Run Ollama on Jetson AGX Thor with OpenwebUI? appeared first on OpenZeka EN Blog.

Jetson Generative AI – JupyterLab Development Environment

Enhar — Thu, 07 Aug 2025 13:11:04 +0000

JupyterLab turns your Jetson into a powerful interactive development environment for AI, data science, and ML. It provides the familiar building blocks of classic Jupyter Notebook (notebooks, terminal, text editor, file browser, rich outputs) in a flexible UI. With GPU‑accelerated containers, you can train models and prototype solutions directly on your Jetson.

Requirements

Hardware / Software	Notes
Jetson (Nano / Orin series) ≥ 4 GB RAM	8 GB+ recommended for larger notebooks and models
NVMe SSD	Highly recommended for faster I/O and model storage (microSD works but slower)
JetPack 5.0+ or 6.0+	Latest versions recommended for best container support
NVIDIA Container Toolkit	Installed with Jetpack , if not you can install here .
Docker	May require manual install on JP6

Step‑by‑Step Setup

1. Verify JetPack Installation

First, check your JetPack version and ensure Docker is working:

Copy to Clipboard

2. Create a persistent workspace

Use a dedicated workspace so notebooks/files persist outside the container:

Copy to Clipboard

3. Launch JupyterLab with GPU

The dustynv/jupyterlab image provides JupyterLab on port 8888. Note that newer JetPack versions use different container tags:

Copy to Clipboard

4. Set your own password

Your password is stored at ;

/root/.jupyter/jupyter_server_config.json.

5. Access the Web UI

Open your browser and navigate to:

http://localhost:8888/lab (on the Jetson itself)
http://:8888/lab (from another device on your LAN)

You can change your password like this:

In JupyterLab → Terminal (or from host via docker exec):

Copy to Clipboard

6. Verify GPU access

Copy to Clipboard

Outside the container run:

Copy to Clipboard

to verify gpu usage.

7. Advanced: Mount additional data sources

For complex projects, mount additional host directories:

Copy to Clipboard

Sample AI Workflows

Computer Vision Pipeline (GPU-accelerated):

Copy to Clipboard

Natural Language Processing:

Copy to Clipboard

Troubleshooting

Common Issues & Solutions:

JupyterLab won’t start:

Copy to Clipboard

Out of memory errors:

Reduce batch sizes
Use gradient checkpointing
Enable mixed precision training
Monitor with tegrastats

Package installation failures:

Check Python version compatibility
For OpenCV, prefer JetPack’s optimized version
Use pip install –no-cache-dir for memory-constrained installs

Additional Resources

For the latest updates and community discussions, visit the NVIDIA Developer Forums.

The post Jetson Generative AI – JupyterLab Development Environment appeared first on OpenZeka EN Blog.

Jetson Generative AI – n8n Local Agents

Enhar — Thu, 07 Aug 2025 13:09:07 +0000

n8n transforms your Jetson into an intelligent agent factory with its visual workflow automation platform. This fair code licensed tool with over 123k GitHub stars (as of July) enables you to create autonomous agents that think, decide, and can act locally using local LLMs, Ollama or using external services. You can build RAG pipelines, sophisticated AI agents that can monitor systems, process data, make decisions, and execute actions by only using n8n workflow.

Requirements

Hardware / Software	Notes
Any Jetson (Nano/Orin) ≥ 4 GB RAM	16GB+ recommended for complex workflows
NVMe SSD	Recommended for workflow data storage

Step-by-Step Setup

1. Create Necessary Directories

Copy to Clipboard

2. Launch n8n

Copy to Clipboard

3. Access the Web Interface

Once the container starts, you’ll see:

Editor is now accessible via:
http://localhost:5678

Local access: Open http://localhost:5678 in your browser
Remote access: Use http://:5678

4. Set up Ollama (Separate Container)

In a new terminal, start Ollama:

Copy to Clipboard

5. Set up Ollama Models

Pull AI models into your separate Ollama container:

Copy to Clipboard

Model Size Guide for Jetson:

4-8GB RAM: Use 3B models (llama3.2:3b)
16GB+ RAM: Can handle 7B-8B models
32GB+ RAM: Can run multiple large models simultaneously

6. Configure Ollama Connection in n8n

In n8n workflows:
- Add “Ollama Chat Model” node to any workflow
- Set base URL to http://localhost:11434
- Select your pulled model from dropdown (e.g., llama3.2:3b)
- Test the connection
Create credentials (if needed):
- Go to Settings > Credentials
- Add “Ollama” credential
- Base URL: http://localhost:11434

Note: Since both containers use --network=host, they can communicate via localhost:11434

7. Use external services

If youre going to use an external service like OpenAI you can add the node and then add its credentials.

8. Explore Community Workflows and Templates

n8n has a rich ecosystem of community-contributed workflows that you can use as starting points:

Official Template Gallery: Visit n8n.io/workflows to browse 800+ workflow templates
GitHub Community: Search GitHub for “n8n-workflow” to find community contributions
Template Categories:
- AI Agent Chat workflows
- Content creation automation
- Social media management
- Email processing with AI
- Data transformation pipelines
- Slack/Discord bots

How to use templates:

Browse templates at n8n.io/workflows
Click “Use for free” on any template
You can copy the JSON file and paste it into the workflow or save it as a JSON file and upload it
Customize nodes and credentials to match your setup

Popular AI templates to try:

9. Advanced agent features

You can created advanced workflows with n8n using:

Interactive Chat Agents: Build conversational AI with Chat Trigger nodes for real-time user interactions
File Processing Intelligence: Load schemas, extract data from files, and combine with AI for document analysis
Memory Management: Use Window Buffer Memory to maintain conversation context across multiple interactions
Multi-step reasoning: Chain multiple LLM operations for complex decisions and data processing
Dynamic Data Combination: Merge schema data with chat inputs for context-aware responses
Conditional logic: Route agent workflows based on AI-generated decisions and user inputs
Error handling: Build robust failure recovery into agent behavior
Local AI integration: Combine multiple local models for specialized tasks like SQL generation, data analysis

E.g Complex agent workflow to generate SQL queries from schema using local LLM reasoning ;

Troubleshooting & Common Issues

“Model not found” in Ollama node

This error occurs when you try to use a model in n8n that hasn’t been downloaded to your Ollama container yet.

Solution: Download the model first by running:

Copy to Clipboard

For example, to pull Llama 3.2 3B model:

Copy to Clipboard

Connection refused

This happens when n8n cannot connect to the Ollama service, usually due to network configuration issues.

Solutions:

Ensure both containers are running with --network=host flag
Verify Ollama is accessible by testing: curl http://localhost:11434/api/tags
For Mac users, use host.docker.internal:11434 instead of localhost:11434
Check if Ollama container is running: docker ps

“Pull model manifest: 412” error

This error typically occurs when using outdated Docker images or when there are authentication issues with model repositories.

Solutions:

Update to the latest Docker images:

Copy to Clipboard

Clear Docker cache and restart containers
Check your internet connection and firewall settings
Verify the model name is correct and still available in the Ollama library

For comprehensive guides and documentation, visit docs.n8n.io. The platform offers extensive customization options and community support for automation workflows.

The post Jetson Generative AI – n8n Local Agents appeared first on OpenZeka EN Blog.

Jetson Generative AI – Flowise

Enhar — Thu, 07 Aug 2025 13:08:45 +0000

Flowise is an open-source, low-code tool for building customized LLM applications and AI agents. Flowise is designed to let anyone build powerful AI-driven solutions without writing a single line of code!

Why Flowise on Jetson?

Flowise vs n8n: Choosing the Right Tool

At first glance, Flowise and n8n might seem pretty similar – both offer visual workflow builders and can handle AI tasks. But when you dig deeper, they each have their own advantages and disadvantages that make them better suited for different types of projects.

Core Philosophy Difference
Flowise: AI-first platform specializing in LLM workflows
n8n: General-purpose automation platform that can do AI

Capability	Flowise	n8n
AI Focus	LLM-optimized with built-in prompt engineering	300+ integrations, AI as add-on
Learning Curve	Easier for AI newcomers	Better for developers
Code Execution	Limited, component-focused	Full JavaScript/Python support
Performance	Good for moderate AI workloads	Enterprise-grade with scaling
Templates	AI-focused (RAG, Agents, Research)	800+ general automation templates

Choose Flowise When You:

Want rapid AI prototyping – Get chatbots running in minutes
Focus on conversational AI – Built-in conversation management
Need AI-specific tools – Native LangChain/LlamaIndex integration
Prefer simplicity – Lower learning curve for LLM projects

Choose n8n When You:

Need enterprise integrations – Connect AI with existing business systems
Want code flexibility – Custom JavaScript/Python execution
Require complex workflows – Advanced scraping, data processing
Build beyond AI – General automation across multiple systems

For the Jetson:

For edge AI applications, Flowise’s specialization often wins because:

Local LLM integration is seamless (Ollama support)
Rapid iteration matters more than complex integrations
AI-first design reduces development complexity
Template marketplace accelerates deployment

If your Jetson project is primarily about AI/LLM applications, Flowise gets you there faster. If you need AI as part of broader system automation, n8n provides more flexibility.

Installation Methods

You have a few different ways to get Flowise running on your Jetson. Docker installation is probably your best bet since it’s the most straightforward and handles all the dependencies for you. If you prefer more control, you can do a local Node.js installation. There’s also the option of cloud deployment if you need external access, though that defeats some of the edge computing benefits.

Docker Installation

Note: Ensure your Jetson has Docker installed. For more information visit https://docs.docker.com/engine/install/ .

Web UI: http://localhost:3000

Copy to Clipboard

Local Installation

Note: Ensure Node.js is installed on your Jetson (Node v18.15.0 or v20 is supported).

1. Install Node.js on Jetson:

Copy to Clipboard

Quick Start

2 .Install Flowise:

Copy to Clipboard

3 .Start Flowise:

Copy to Clipboard

Open: http://localhost:3000

Getting Started

First Setup

Access Flowise:
- Open browser to http://your-jetson-ip:3000
- Login with configured credentials
Configure API Keys:
- Go to “Credentials” section
- Add your LLM provider API keys (OpenAI, Anthropic, etc.)

Flowise Management Features

Once you’re inside Flowise, you’ll find all the management tools neatly organized in the sidebar. The interface is pretty intuitive – you’ve got your Chatflows for basic conversational AI, Agentflows for more complex multi-agent setups, and a Marketplace full of templates from the community.

There are separate sections for Tools and utility functions, Assistants for managing your deployed AI instances, and Executions where you can see how all your workflows are performing. The Document Stores section is where you upload files for your knowledge base, while API Keys and Credentials handle all your authentication securely.

What’s really nice is how everything ties together. You can track execution performance across all your workflows, manage variables globally so you don’t have to configure the same things over and over, and organize your documents with proper version control. Plus, all your API keys and secrets are encrypted and stored securely.

Building Advanced Agent Workflows

For more complex applications, Flowise also supports advanced agent workflows that can handle multi-step reasoning and autonomous decision-making:

Main Flowise dashboard showing sidebar navigation, management options, and credentials setup.

Local LLM Setup (Recommended for Jetson)

Copy to Clipboard

Pull AI models into your separate Ollama container:

Note: Choose models based on your Jetson’s RAM – use 3B models for 4-8GB RAM, 7B-8B models for 16GB+ RAM.

Copy to Clipboard

Template Marketplace

Pre-Built Templates for Quick Start

Access Marketplace: Click “Marketplaces” in the sidebar and use filters to find relevant templates. Click on the template you want to use and select “Use Template”. Modify nodes and connections for your specific needs.

Building Your First Chatbot

You can start from Template (Recommended)

Navigate to Marketplace: Go to “Marketplaces” → “Community Templates”
Select Template: Choose “Basic” or “Customer Support” template
Import and Customize: Click template → modify for your needs
Configure Ollama: Update Chat Model nodes to use http://localhost:11434

Or you can build from scratch

Step 1: Create New Chatflow

Go to “Chatflows” tab
Click “Add New”
Name your chatflow

And you can add nodes from Categories

When you’re building workflows, you’ll see different node categories organized logically. LangChain nodes give you the core components and chains, while LlamaIndex nodes are specifically for RAG applications. There are Utility nodes for helper functions, Agent nodes for autonomous components, and Chat Model nodes that connect to different LLMs like Ollama or OpenAI. You’ll also find Document Loaders for processing files and Embedding nodes for vector operations.

Configure Components

When configuring your Chat Ollama node, set the model name to something like llama3.2:3b, keep the temperature around 0.7 for good creativity, and point the Base URL to http://localhost:11434 where your Ollama container is running.

For the Conversation Chain, you’ll connect your Chat Model input and add a system prompt to give your AI some context about what it should do. Here’s a good starting prompt:

Copy to Clipboard

Test and Deploy

Once you’ve got everything configured, hit the save button and use the chat panel to test your bot. Make sure it’s responding the way you want, then you can use Flowise’s deployment options to get it ready for production use.

Deployment

There are 5 options for deployment ( button). Flowise makes it really easy to embed your chatbots into websites. You can choose from a popup widget that floats on your page, a fullpage dedicated chat interface, or if you’re using React, there are specific React components for both popup and fullpage implementations.

The embedding code is pretty straightforward – just import the Flowise embed script and initialize it with your chatflow ID:

You also get some nice advanced features like direct public links for sharing, custom authentication if you need access controls, theme customization to match your brand, and custom JavaScript event handling for more complex integrations.

Troubleshooting

Common Issues You Might Run Into

Node.js Version Problems: If you’re getting errors about Node.js being too old, you’ll need to update it. Run these commands to get a newer version:

Copy to Clipboard

Memory Issues: Running out of memory with “JavaScript heap out of memory” errors? Increase the memory limit before starting Flowise:

Copy to Clipboard

GPU Not Working: If your GPU isn’t being detected, check the status with tegrastats and make sure Docker has the NVIDIA runtime enabled by restarting the service: sudo systemctl restart docker

Port Conflicts: If port 3000 is already in use, find what’s using it with sudo lsof -i :3000, kill that process with sudo kill -9 , or just start Flowise on a different port: flowise start –PORT=3001

Permission Problems: File permission issues? Fix them with:

Copy to Clipboard

Docker Issues

If you’re running Flowise in Docker and having problems, here are some quick fixes:

Check what’s happening with docker logs flowise, restart the container with docker restart flowise, or see all container statuses with docker ps -a. If things are really broken, you can remove the container completely with docker rm -f flowise and recreate it with your original docker run command.

Performance Problems

Slow responses? Try using local models instead of API calls, reduce the context window size, or optimize your RAG chunk sizes if you’re doing document processing.

Memory usage too high? Reduce the buffer window memory size, switch to smaller models, or clear your browser cache which sometimes helps.

Getting Help

If you’re stuck, the Flowise community is pretty active. Check out the GitHub Issues for bug reports and feature requests. The official documentation is also quite comprehensive. For Jetson-specific issues, the NVIDIA Developer Forums are your best bet.

The post Jetson Generative AI – Flowise appeared first on OpenZeka EN Blog.