Jetson Generative AI – LLaMA Factory

Enhar — Thu, 31 Jul 2025 05:09:53 +0000

LLaMA Factory provides a unified framework for fine-tuning large language models with an intuitive web interface. This powerful tool brings professional model training capabilities to Jetson devices, enabling you to customize LLMs for your specific use cases with optimized performance for edge deployment.

In this article, you’ll learn how to run LLaMA Factory on Jetson Orin for efficient LLM fine-tuning and deployment.

Features

Support for multiple LLM architectures including LLaMA, Qwen, ChatGLM, and more
Multiple training stages: Supervised Fine-Tuning, Reward
Modeling, PPO, DPO, KTO, Pre-Training
Three fine-tuning methods: full, freeze, and lora
Gradio-based web UI with Train, Evaluate & Predict, Chat, and
Export tabs
Built-in dataset support with preview functionality
Integrated chat interface for testing models
Real-time training loss visualization
Advanced configurations for quantization, LoRA, RLHF, and more
Model evaluation with customizable generation parameters

Requirements

Hardware / Software	Notes
Jetson AGX Orin with ≥ 32 GB RAM	64GB recommended for larger models
JetPack 6.0+	For CUDA 12.x support
NVMe SSD	Essential for model storage and caching
Hugging Face token	Required for accessing gated models
~50 GB free storage	For models and training checkpoints

Step-by-Step Setup

1. Create necessary directories

Copy to Clipboard

2. Set your Hugging Face token

Replace your_hf_token_here with your actual token from https://huggingface.co/settings/tokens .

Copy to Clipboard

3. Launch LLaMA Factory

Copy to Clipboard

4. Access the Web UI

Once the container starts, you’ll see:

Running on local URL: http://0.0.0.0:7860

Local access: Open http://localhost:7860 in your browser
Remote access: Use http://:7860

5. Configure your fine-tuning

In the Train tab:

1.Select Training Stage:

Supervised Fine-Tuning (most common)
Reward Modeling
PPO (Proximal Policy Optimization)
DPO (Direct Preference Optimization)
KTO
Pre-Training

2.Choose Finetuning Method:

lora – Low-rank adaptation, best for memory efficiency
freeze – Freezes base model layers
full – Full parameter fine-tuning

3.Configure Data:

Data directory: /opt/LLaMA-Factory/data
Select dataset from dropdown
Use “Preview dataset” to verify data format

4.Set Training Parameters:

Cutoff length: 1024 (max tokens in input sequence)
Max samples: 100000
Batch size: 2
Learning rate: 5e-5 (in Advanced configurations)
Epochs: 3.0 (in Advanced configurations)

5.Advanced Configurations (expandable sections):

Quantization bit (none/bitsandbytes)
Extra configurations
Freeze tuning configurations
LoRA configurations
RLHF configurations
GaLore configurations
BAdam configurations

6.Start Training:

Click “Preview command” to verify settings
Click “Start” to begin training
Monitor real-time loss graph

6. Evaluate your model

Switch to the Evaluate & Predict tab to assess model performance:

1.Configure evaluation settings:

Data directory and dataset (same as training)
Cutoff length: 1024
Max samples: 100000
Batch size: 2

2.Set generation parameters:

Maximum new tokens: 512
Top-p: 0.7
Temperature: 0.95

3.Run evaluation:

Enable “Save predictions” to store results
Click “Start” to begin evaluation
Results saved to timestamped output directory

7. Test your model interactively

Navigate to the Chat tab:

1.Load your model:

Click “Load model” button
Select inference engine: huggingface
Choose inference data type: auto

2.Chat with your model:

Type messages in the chat interface
Model responds in real-time
Test both base and fine-tuned versions

3.Unload model when switching between models

8. Export your model

Use the Export tab to save your fine-tuned model in various formats for deployment. This allows you to use your model outside of LLaMA Factory in production environments.

Training Method Guide

Method	Memory Usage	Training Speed	Use Case
LoRA	Low	Fast	Recommended for most Jetson deployments
Freeze	Medium	Medium	When you need to preserve base model behavior
Full	High	Slow	Small models only (≤1.5B parameters)

Troubleshooting

Issue	Fix
Out of memory during training	Reduce batch size to 1-2, use LoRA method, or use smaller model
Slow model download	Models are cached in `/mnt/nvme/cache/huggingface`, be patient on first run
Connection refused	Ensure port 7860 is not blocked by firewall
Training won’t start	Check dataset format matches the selected template
GPU not utilized	Verify with `tegrastats` and ensure `--runtime nvidia` is set

For more information about LLaMA Factory features and supported models, visit the LLaMA Factory repository.

The post Jetson Generative AI – LLaMA Factory appeared first on OpenZeka EN Blog.

Run LLaMA-Factory on edge devices Archives - OpenZeka EN Blog

Jetson Generative AI – LLaMA Factory

Features

Requirements

Step-by-Step Setup

1. Create necessary directories

2. Set your Hugging Face token

3. Launch LLaMA Factory

4. Access the Web UI

5. Configure your fine-tuning

6. Evaluate your model

7. Test your model interactively

8. Export your model

Training Method Guide

Troubleshooting