The Problem — Why Are Most AI Integrations "Dumb"?
Most enterprise AI integrations today work like this: the LLM (large language model) receives the user's question, runs an SQL query, and returns the result. That's a better search engine — but it's not intelligence.
The real problem: data alone is not knowledge. Knowledge lies in the relationships between data.
Example: Anna Kovács is a contact in the CRM. But what do we actually know about her?
- She received a proposal last week (deal)
- She replied positively via email (Gmail thread)
- There's a consultation with her tomorrow (calendar event)
- Her boss is also attending the consultation (participant)
- We worked together on 3 projects last year (history)
A traditional CRM query might return 1-2 of these data points. A RAG system built on a Knowledge Graph returns all of them — because it understands the relationships.
Knowledge Graph — The Map of Business Relationships
What Is It Exactly?
A Knowledge Graph consists of two building blocks:
- Nodes: entities — contacts, emails, calendar events, deals, invoices, tasks
- Edges: relationships — WHO sent an email TO WHOM, WHO attended the EVENT, which DEAL belongs to which CONTACT
┌────────────┐ SENT ┌────────────┐
│ Anna Kovács│───────────────▶│ Email │
│ (contact) │ │ "Proposal │
│ │◀───────────────│ OK!" │
└─────┬──────┘ WAS_RECIPIENT └────────────┘
│
│ ATTENDEE
▼
┌────────────┐ BELONGS_TO ┌────────────┐
│Consultation│ │ WebShop Pro │
│ (event) │───────────────▶│ (deal) │
│ 2025.11.04 │ │ €1,300 │
└────────────┘ └────────────┘
The Data Model
A production-ready Knowledge Graph node has the following fields:
Edge types indicate the nature of the relationship: EMAILED, BOOKED, PAID, MENTIONS, ASSIGNED, ATTENDED_BY. Every edge is weighted (weight), enabling prioritization of relevant connections.
Why Not Neo4j?
Fair question. Dedicated graph databases (Neo4j, Amazon Neptune) seem like a natural choice. In practice, however, the PostgreSQL + pgvector combination works surprisingly well when:
- PostgreSQL is already in the stack (no new infrastructure needed)
- The graph size is a few tens of thousands of nodes per entity (not millions)
- Vector search is also needed (pgvector supports it natively)
- You want to keep everything in one database (ops simplicity)
The trick: recursive CTEs (Common Table Expressions) in PostgreSQL can handle multi-step graph traversal, shortest path queries, and cycle prevention — essentially providing the same capabilities as Neo4j's Cypher language at BFS level.
The key is the abstraction layer: if the graph-service API is well designed, switching from PostgreSQL to Neo4j requires zero code changes on the calling side. It's worth starting with this design principle.
RAG — How Does the AI Get Relevant Context?
The Essence of RAG in 30 Seconds
RAG (Retrieval-Augmented Generation) solves the biggest limitation of LLMs: they don't know everything, and what wasn't in the training data, they might hallucinate (make up).
RAG solves this by retrieving relevant information from the database before answering the question, and providing it as context to the LLM. The LLM doesn't answer from "memory" but based on the facts it's given.
Why Isn't Simple Search Enough?
Two reasons:
-
Semantic search > keyword search: If the user asks "what did we discuss about the proposal?", keyword search looks for the word "proposal." Semantic search understands that "quote", "pricing", "rate card", and even "I'll send the details" could be relevant.
-
Context linking: Vector search finds the relevant email. But the Knowledge Graph can add who sent it, which deal it belongs to, and what calendar event is connected. This combination makes the answer truly useful.
The 5-Step RAG Pipeline in Practice
Step 1 — Vectorization and Semantic Search
The user's question is converted into an embedding (OpenAI text-embedding-3-small, 1536 dimensions), then pgvector cosine similarity search finds the closest matching content.
Parameters (tested in production):
- Top-K: 8 results
- Threshold: 0.60 cosine similarity (below this, too much noise)
Step 2 — Graph Enrichment
The neighbors of the top 3 vector results are also pulled in — 1 step deep, max 5 neighbor nodes per result.
Neighbors receive an inherited relevance score: parent_similarity × 0.8. If an email came in with 92% relevance, the connected contact gets 73.6%. This ensures that context from the graph doesn't overwhelm direct matches.
Step 3 — Deduplication and Ranking
Merging vector and graph results (vector results get priority), sorted by descending relevance.
Critical element: token budget management. The LLM's context window is finite (and expensive). The pipeline uses a simple but effective estimation: (content_length + title_length + 50) / 4 tokens per node (Hungarian text ≈ 4 characters/token). When the cumulative token count reaches the 3,000 limit, it stops — no overflow, no unnecessary cost.
Step 4 — Context Assembly
A structured Markdown context is prepared for the LLM from the selected nodes:
## Relevant Context
### Email
**"RE: Proposal details"** [source: Gmail, relevance: 92%]
> Hi, I reviewed the proposal, it's acceptable for us...
*From: anna.kovacs@company.com | Date: 2025-10-28*
Relationships: → SENT_EMAIL → Anna Kovács (client)
### Calendar
**"Consultation — Anna Kovács"** [source: Google Calendar, relevance: 74%]
*2025-11-04 10:00–11:00 | Attendees: Anna Kovács, Dr. Szabó*
Step 5 — Source Attribution
Detailed source objects are created for the frontend: icon, color, excerpt, relevance percentage, and the entry path (vector or graph). The user sees exactly where the information came from — this is not just UX, it's critical from a compliance perspective.
Concrete Use Cases
"What Do We Know About This Client?"
Traditional approach: Open CRM → search contact → review deals → switch to email client → search → open calendar → search. ~5-10 minutes.
Knowledge Graph + RAG: The AI summarizes the contact, last emails, open deals, and upcoming events in response to a single question — with source references. ~5 seconds.
"What Happened Last Week with the WebShop Pro Project?"
Semantic search finds the relevant emails and calendar events. Graph enrichment pulls in connected contacts and deal status. The AI provides a chronological summary with references.
Proactive Alerts
Based on graph statistics, the AI recognizes: "Anna Kovács has 5 incomplete deals and there has been no communication for 2 weeks — it might be worth reaching out." No question needed — the system analyzes the graph on a schedule.
Decision Points for CTOs and IT Leaders
Embedding Model Selection
Recommendation: Start with the text-embedding-3-small model — it's the best compromise between cost, quality, and multilingual accuracy. It can be migrated later at any time.
Vector Database Selection
Recommendation: Below 100,000 nodes, pgvector is perfectly sufficient and dramatically simplifies infrastructure. Above that, Qdrant or Pinecone are worth considering.
Async vs. Sync Embedding Generation
Embedding generation is expensive (API call) and slow (100-500 ms/piece). Two approaches:
- Synchronous: Generate the embedding immediately when the entity is created. Simple, but slows down writes.
- Asynchronous (recommended): The entity is created, the embedding goes into a queue (BullMQ, RabbitMQ, SQS), and is generated in the background. With rate limiting, parallel processing, and retry logic.
Our production configuration: 3 parallel workers, max 50 jobs/minute rate limit, Redis-based BullMQ queue. This ensures that OpenAI API rate limits never cause errors, while most embeddings are ready within 1 minute.
Security and Data Protection
Tenant Isolation
Every Knowledge Graph query is filtered by providerId. In a multi-tenant system, this is not optional — it's the baseline. Graph traversal, vector search, and RAG context all access only the given provider's data.
Data Minimization in RAG
The RAG pipeline doesn't send the entire Knowledge Graph to the LLM — only the relevant, ranked context, with a token limit. From a GDPR perspective, this is data minimization; from a cost perspective, it's efficiency.
Embeddings and Personal Data
Important to know: an embedding cannot be reverse-engineered back to text, but the original content is stored alongside it. To comply with GDPR's right to erasure, when an entity is deleted, the node, its content, and the embedding must all be removed. Cascade delete (including edges) handles this automatically.
On-Premise Option
For the most sensitive data:
- Embedding: Ollama +
nomic-embed-textlocally, zero data leaving the network - Vector search: pgvector on your own PostgreSQL
- Trade-off: lower embedding quality, but full data control
Summary — When Is It Worth Starting?
Knowledge Graph + RAG is worth it if:
- There are multiple data sources (CRM + email + calendar + invoicing)
- User questions are context-dependent ("what do we know about them?" type)
- Data silos are a clear pain point
- Source transparency and auditability matter
It's too early if:
- There's a single, well-structured data source — a simple SQL search suffices
- There aren't 500+ entities — the graph doesn't add value at small data volumes
- Semantic search isn't important — keyword search is sufficient
The Most Important Design Principles
- Abstraction layer above the graph database — PostgreSQL today, Neo4j tomorrow, zero code changes
- Async embedding pipeline — queue + rate limiting + retry
- Token budget management in RAG — we decide what the LLM receives, not the LLM
- Relevance inheritance in graph traversal — decay factor for 2nd-level results
- Source attribution in the UI — the user always knows where the data came from
This article is based on the AIMY project's Knowledge Graph implementation — PostgreSQL + pgvector, OpenAI embeddings, BullMQ pipeline.
If you're considering a similar solution, get in touch with us!