Building an Edge AI Assistant with NXP eIQ GenAI Flow 2.0 RAG

1. Introduction

With the rapid development of generative AI, deploying AI assistants on edge devices has become increasingly important for applications requiring privacy, low latency, and offline capabilities.

The Advantech AOM-5521, powered by the NXP i.MX 95 application processor, provides a powerful edge computing platform designed for AI and embedded applications.

By leveraging NXP eIQ GenAI Flow 2.0 together with Retrieval-Augmented Generation (RAG), developers can build AI assistants that retrieve information from local documents and generate context-aware responses directly on the edge device.

In this article, we demonstrate how to build an Edge AI Assistant on the AOM-5521 platform using NXP eIQ GenAI Flow 2.0 with RAG, including document preparation on a host PC, knowledge database generation, and running the assistant on the device.


2. Prerequisites

This solution requires two machines working together: a Host PC (x86) and the AOM-5521 edge device. The overall workflow is divided into two distinct stages, as illustrated below:

Host PC (x86) — Pre-processing Stage: The host PC handles all computationally intensive preparation tasks: parsing PDF documents with Docling, chunking the extracted text, and running an embedding model to generate the RAG knowledge database (rag_database.pkl). A capable x86 machine is required for this stage because the embedding model and document parser demand substantial CPU and memory resources that exceed the AOM-5521’s intended workload.

AOM-5521 — Runtime Stage: Once the rag_database.pkl is transferred to the AOM-5521, the device loads the database at startup and uses it to answer domain-specific questions entirely on-device. At runtime, the i.MX 95 NPU accelerates the LLM inference, while the CPU handles the lightweight similarity search against the pre-built vector database. No cloud connectivity or model retraining is needed.

This division of labor is precisely what makes the architecture practical for edge deployment. Heavy pre-processing is done once on a capable host, and the resulting compact knowledge database runs efficiently on the constrained edge device.

2.1 Necessary Hardware

Host PC (x86) — Pre-processing

  • x86-64 PC or workstation running Ubuntu 20.04 or Ubuntu 22.04
  • CPU: Intel Core i5 / i7 or AMD Ryzen equivalent (8+ cores recommended)
  • RAM: 16 GB minimum (32 GB recommended when processing large PDF corpora)
  • Storage: 50 GB free disk space (SSD recommended for faster I/O)
  • Network: Internet access required to download Hugging Face models and Python packages

AOM-5521 Target Device — Runtime

  • AOM-5521 — a SMARC 2.2 Computer-on-Module (COM) powered by the NXP i.MX 95 Plus SoC
  • SOM-DB2510 — an evaluation carrier board designed for Advantech SMARC 2.1 modules
  • 1 × Power Adapter (input: 100~240V AC 50/60Hz; output: DC 12V 3A; Advantech P/N: 96PSA-A36W12R1-3)
  • 1 × HDMI Cable for connecting to a monitor
  • 1 × USB Type-C Conference Microphone for voice input
  • 1 × Monitor (standard HDMI monitor)

2.2 Necessary Software

This project uses a two-stage pipeline: document pre-processing runs on a Host PC, while the AI assistant runs at runtime on the AOM-5521 target device.

Host PC Requirements

  • UV Python package manager — for faster Python dependency installation
  • Python 3.11 environment
  • Hugging Face account and personal read access token — required to download AI models
  • NXP eIQ GenAI Flow 2.0 Demonstrator Repository

Target Device (AOM-5521) Requirements

  • Yocto 5.0 OS
  • Git and Git Large File Storage (Git-LFS)
  • Python 3.11 environment
  • NXP eIQ GenAI Flow 2.0 Demonstrator Repository

3. Setting Up the Host PC

The host PC is responsible for parsing PDF documents and generating the RAG knowledge database (rag_database.pkl) that will be deployed to the AOM-5521.

3.1 Install UV and Clone the Repository

Install the UV package manager and clone the NXP eIQ GenAI Flow 2.0 repository:

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
git clone --single-branch -b release/v2.0 https://github.com/nxp-appcodehub/dm-eiq-genai-flow-demonstrator

3.2 Create Virtual Environments and Install Packages

Navigate to the RAG directory, create a Python 3.11 virtual environment, and install the required packages:

cd dm-eiq-genai-flow-demonstrator/rag
uv venv --python 3.11
uv pip install -e .[dev]
uv pip install flash-attn --no-build-isolation
uv run python -m ensurepip --upgrade

3.3 Configure Hugging Face Token

A Hugging Face personal read access token is required to download the AI models. Create a Hugging Face account, generate a personal read access token from your account settings, then export it as an environment variable:

export HF_TOKEN="<your_huggingface_token>"

4. Generating the RAG Knowledge Database (Host PC)

The RAG pipeline on the host PC converts your PDF documentation into a vector database. This involves three steps: placing PDF files, parsing them, and generating embeddings.

4.1 Place PDF Files

Copy your PDF documentation into the input folder:

4.2 Parse PDF Files

Run the document parser (powered by Docling) to extract text and metadata from the PDF files:

uv run -m document_parsing -f all

4.3 Generate Text Chunks

Process the parsed text into chunks using the chunking strategy (e.g., SpaCy, NLTK) and save the result as chunks.json:

uv run -m rag.preprocessing.generate_chunks -f all

image

4.4 Generate Embeddings and Build the Database

Run the embedding model (e.g., all-MiniLM-L6-v2, 30M parameters) to generate vector embeddings for all chunks and produce the final rag_database.pkl file:

uv run -m rag.preprocessing.generate_embeddings -f all

The output file rag_database.pkl will be saved to dm-eiq-genai-flow-demonstrator/rag/src/data/. This file contains the embeddings, chunks, and metadata required for runtime retrieval on the AOM-5521.


5. Setting Up the AOM-5521

The following steps are performed directly on the AOM-5521 target device running Yocto 5.0 OS.

5.1 Install Git and Git-LFS

If Git is not already installed, build and install it from source:

cd ~
wget https://www.kernel.org/pub/software/scm/git/git-2.9.5.tar.gz
tar -zxf git-2.9.5.tar.gz
cd git-2.9.5
make prefix=/usr/local all
sudo make prefix=/usr/local install

cd ~
rm -r git-2.9.5
rm git-2.9.5.tar.gz

Then install Git-LFS (required for downloading AI model weight files stored as large binaries):

cd ~
wget https://github.com/git-lfs/git-lfs/releases/download/v3.7.1/git-lfs-linux-arm64-v3.7.1.tar.gz
tar -xf git-lfs-linux-arm64-v3.7.1.tar.gz
cd git-lfs-3.7.1
chmod a+x ./install.sh
./install.sh

cd ~
rm -r git-lfs-3.7.1
rm git-lfs-linux-arm64-v3.7.1.tar.gz

5.2 Clone and Install eIQ GenAI Flow 2.0

Clone the repository and run the installation script:

Reference repo: eIQ GenAI Flow v2.0

cd ~
git clone --single-branch -b release/v2.0 https://github.com/nxp-appcodehub/dm-eiq-genai-flow-demonstrator
cd dm-eiq-genai-flow-demonstrator
git lfs pull
./install.sh

6. Launching the Edge AI Assistant with RAG

To launch the eIQ GenAI Flow 2.0 demonstrator with RAG enabled, pass the -r flag:

python3 eiq_genai_flow.py -r

Upon startup, the system loads the following models:

  • Embedding model: all-MiniLM-L6-v2
  • LLM: danube-500M-q8
  • TTS model: english-multi_speaker-16k-quant-encrypted

Type your question at the prompt, and the assistant will retrieve relevant context from the RAG database and generate a response using the on-device NPU.

6.1 Optional: Check USB Audio Codec

If using voice input via the USB conference microphone, plug in the USB codec and verify it is detected:

python3 eiq_genai_flow.py -h

The detected USB codec will appear as plughw:CARD=Seri under the --capture-device and --playback-device options.