Vector Databases + RAG: The Engine Behind AI Agents

Bela Shah
Feb 24
5 min read

Imagine searching for something online, but instead of typing the exact words, the system understands your intent and gives results that truly match what you mean.

You might type: “Shoes that are good for knee pain”

And the results include: "Cushioned athletic sneakers”

“Orthopedic sports footwear”

“Shock absorption trainers”

Even though those products don’t contain the exact words you typed, the system understands your intent.

How? Behind the scenes, many modern AI-powered systems use a vector database. Let’s break it down— with a full practical example.

The Problem with Traditional Databases

Traditional databases (SQL) are excellent at structured data and exact matches.

Example:

SELECT * FROM products
WHERE description LIKE '%knee pain%'

This only works if the product description literally contains “knee pain.” If the description says: “Enhanced arch support and joint cushioning”, it may not appear — even though it’s relevant.

Traditional search matches words. Modern AI search matches meaning.

What Is a Vector Database?

A vector database stores embeddings — high-dimensional numerical representations of data. An embedding converts text into numbers while preserving meaning.

End-to-End Practical Example

Let’s imagine a general online store selling different types of footwear.

Step 1: We Have Product Descriptions

| Product ID | Description                                     |
| ---------- | ----------------------------------------------- |
| 201        | Lightweight running shoes with extra cushioning |
| 202        | Sneakers designed for joint and knee support    |
| 203        | Formal leather office shoes                     |
| 204        | Trail running shoes with shock absorption       |

Step 2: Convert Descriptions into Embeddings

Instead of storing raw text alone, we process each database description through an embedding model to obtain its semantic vector representation.

It generates vectors:

| ID  | Embedding (shortened) |
| --- | --------------------- |
| 201 | [0.88, 0.10, -0.41]   |
| 202 | [0.85, 0.09, -0.38]   |
| 203 | [-0.52, 0.73, 0.14]   |
| 204 | [0.90, 0.11, -0.36]   |

Step 3: How Data Looks Inside a Vector Database

Conceptually, the structure looks like this:

| id  | embedding_vector         | metadata                                                      |
| --- | ------------------------ | ------------------------------------- |

| 201 | [0.88, 0.10, -0.41, ...] | description="Lightweight running shoes
                                   with extra cushioning"|                                                                                           
| 202 | [0.85, 0.09, -0.38, ...] | description="Sneakers designed for 
                                   joint and knee support"|
| 203 | [-0.52, 0.73, 0.14, ...] | description="Formal leather office 
                                   shoes"|
| 204 | [0.90, 0.11, -0.36, ...] | description="Trail running shoes with
                                   shock absorption"|

Step 4: User Makes a Search

User types: “Comfortable shoes for knee pain”

What Happens Behind the Scenes?

Just like we converted product descriptions into vectors, we also convert the user’s query into a vector.

Query text: "Comfortable shoes for knee pain"

Query embedding: [0.87, 0.10, -0.39, ...]

Now the query is no longer text — it’s a point in vector space.

Step 5: Similarity Search

The vector database now compares:

The query vector
With all stored product vectors

It calculates how “close” they are using mathematical distance metrics such as:

Cosine Similarity → Measures angle similarity (most common)
Dot Product → Measures directional alignment
Euclidean Distance → Measures straight-line distance

The closer the vectors → the more semantically similar the meaning.

Similarity Results

| Product ID | Description                                    | Similarity |
| ---------- | ----------------------------------------------- | --------|
| 202        | Sneakers designed for joint and knee support    | 0.99       |
| 201        | Lightweight running shoes with extra cushioning | 0.97       |
| 204        | Trail running shoes with shock absorption       | 0.95       |
| 203        | Formal leather office shoes                     | 0.11       |

Why Product 202 Ranked Highest

Product 202 description: “Sneakers designed for joint and knee support”

Even though the words don’t match exactly, the meaning is extremely close to the user’s intent. That’s why it gets the highest similarity score (0.99).

What About Product 203?

Product 203: “Formal leather office shoes”

This has completely different semantic meaning, so its vector is far away from the query vector → similarity score 0.11.

Final Output

The system returns:

Product 202
Product 201
Product 204

Not because of keyword matching — but because their meaning is mathematically close to the query.

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) combines vector database retrieval with Large Language Model (LLM) reasoning.

Instead of relying solely on the AI’s memory, RAG allows the model to access relevant documents or data to provide accurate and context-aware answers.

It works in two stages:

Retrieval → Find relevant documents from the vector database
Generation → Use an LLM to reason over retrieved content and produce natural language output

Core Steps to Implement RAG

Load
Ingest data from multiple sources: text files, CSVs, logs, PDFs, web pages, etc.
Goal: Consolidate all relevant knowledge into the system pipeline.
Split
Break large documents into smaller chunks.
Reason: LLMs have token limits and cannot process very large texts at once. Chunks preserve context to maintain coherent understanding.
Embed
Convert each chunk into a vector using an embedding model (OpenAI, Cohere, BGE, Sentence-BERT, etc.).
Embeddings capture meaning, allowing similarity searches based on concepts rather than exact keywords.
Store
Save embeddings with original text and metadata in a vector database (Pinecone, Faiss, Chroma, Weaviate, Milvus, AstraDB, etc.).
This enables fast similarity searches when a query comes in.

Query-to-Answer Workflow in RAG (Shoes Example)

1. Question (User Input)

The user asks: “Which shoes should I buy for knee support and lightweight comfort?”

2. Retrieve (Vector Search)

The query is converted into an embedding using an embedding model.
The vector database compares this query vector with stored product vectors.
It returns the top relevant products: Product 202, Product 201, Product 204.

3. Build Prompt for LLM

The retrieved product descriptions are structured into a prompt for the LLM.
Example of prompt passed to the LLM:

User asked: "Which shoes should I buy for knee support and lightweight comfort?"

Here are some relevant products: -

Product 202: Sneakers designed for joint and knee support.

Product 201: Lightweight running shoes with extra cushioning.

Product 204: Trail running shoes with shock absorption. Please provide a recommendation based on these options.

4. LLM Processing

The LLM reads the prompt, interprets the question, and prioritizes information from the retrieved products.
It generates a coherent and human-friendly answer.

5. Answer (Final Response to User)

"For knee support and lightweight comfort, the sneakers designed for joint and knee support (Product 202) and the lightweight running shoes with extra cushioning (Product 201) are ideal. The trail running shoes (Product 204) could also work if you want extra shock absorption."

RAG Pipeline and Diagram

[Company Data Preparation]
Load → Split → Embed → Store in Vector DB
──────────────────────────────────────────
[User Interaction]
Query → Embed → Retrieve → Augment Prompt → LLM Generate → Answer

Why RAG Matters

The vector database ensures that only the most relevant, up-to-date information is retrieved for a given query.
The LLM then uses that retrieved context to reason, summarize, and generate a clear, human-like response.
Together, they enable AI systems to produce context-aware, grounded, and more accurate answers — far more reliable than relying only on the model’s pre-trained knowledge.

What Are Embeddings?

Embeddings are models or pieces of code that generate vectors from your data. You pass your text (or other data) to an embedding model, and it returns a vector representation that can be stored in a vector database for semantic search.

Some popular embeddings available today include:

OpenAI Embeddings (text-embedding-3-small/large)
Cohere Embeddings
Hugging Face Models
SentenceTransformers

These models help AI systems understand meaning, making search and recommendations smarter.

Conclusion

Vector databases changed search from matching words to understanding meaning. By storing embeddings and enabling semantic similarity search, they allow systems to retrieve information based on intent rather than exact keywords.

RAG builds on this foundation. The vector database retrieves the most relevant content, and the LLM reasons over it to generate clear, context-aware answers. Retrieval provides accuracy. Generation provides intelligence.

Some of the leading vector databases powering these applications today include:

Pinecone
Weaviate
Milvus
Qdrant
Chroma

Together, Vector DB + RAG form the backbone of modern AI systems — powering semantic search, AI assistants, recommendation engines, and enterprise knowledge tools that don’t just find information, but truly understand it.

Welcome
to NumpyNinja Blogs

Vector Databases + RAG: The Engine Behind AI Agents

Recent Posts

Welcome to NumpyNinja Blogs

Welcome
to NumpyNinja Blogs