Semantic Search and Embeddings — Why Keyword Search Falls Short in the AI Era

When Search Doesn't Understand What We're Looking For

Imagine: an AI assistant is asked — "When was Kiss Anna's last visit?"

Traditional keyword search returns nothing. The word "last" doesn't appear in any calendar entry. Semantic search, however, understands the meaning: it finds Kiss Anna's most recent calendar entry, because it recognizes that "last visit" = most recent appointment.

This is the difference between word-matching and meaning-matching — and it's the foundation of every modern AI-powered search architecture.

What Is an Embedding?

An embedding transforms text into a numerical vector — typically in a 256-3072 dimensional space. The key idea: semantically similar texts produce vectors that are close together, while different ones are far apart.

This is essentially how machines "understand" meaning — they don't compare letters, they compare concepts.

"When was Kiss Anna's last visit?"     →   [0.23, -0.41, 0.87, ...]   ←─┐
                                                                          │ close!
"Kiss Anna's most recent appointment"  →   [0.25, -0.39, 0.85, ...]   ←─┘

"Marketing budget 2026"                →   [0.71, 0.12, -0.33, ...]   ← far away

The two similar questions produce nearly identical vectors, while the unrelated text's vector is far away. Similarity is measured using cosine similarity: 1 = perfectly similar, 0 = no relationship.

Why Isn't Vector Search Enough on Its Own?

Semantic search finds similar content — but business questions often require relationships:

"When was Kiss Anna's last visit, and what did we do?" → Need the calendar event + client data + notes
"How much did she spend in March?" → Need the client + invoices + bookings

This is where a knowledge graph comes in: business entities (email, calendar, client, invoice) are represented as nodes, the relationships between them as edges — and search uses both.

Search Approach	What Does It Find?	Answers Business Questions?
Keyword (LIKE, tsvector)	Exact word matches	Rarely
Vector (embedding)	Semantically similar	Partially
Vector + Knowledge Graph	Similar + related entities	Yes, with context

pgvector — Vector Search in Your Existing PostgreSQL

No need for a separate vector database (Pinecone, Qdrant). If you already have PostgreSQL, the pgvector extension is free and can be enabled with a single CREATE INDEX — keeping vectors in the same database as your business data.

This means: a single SQL query can perform vector search + graph traversal + tenant filtering, with zero network latency between systems.

Criterion	pgvector	Dedicated Vector DB
Cost	$0 (extension)	$25-70+/mo
SQL integration	Native (JOIN, WHERE)	None
Multi-tenant filtering	WHERE provider_id =	Namespace / payload
Knowledge graph	Same database	Separate system needed
Scalability	Excellent up to ~5M vectors	100M+

At enterprise scale (100M+ vectors), dedicated solutions win — but for most SMEs and mid-market SaaS, pgvector is more than sufficient, and significantly simpler to operate.

The RAG Pipeline in Brief — 5 Steps

The Retrieval-Augmented Generation (RAG) pipeline connects search to the LLM:

Input validation — Short messages (1-2 characters) carry no semantic content, filter them out
Vector search — Compare the question's embedding against database vectors (cosine similarity > 0.60, top-8)
Graph enrichment — Load the top-3 results' neighbors (1-hop neighbors, 0.8 decay factor)
Deduplication + token budget — Unique nodes, ranked by relevance, within a 3000-token limit
Format + inject — Markdown context, grouped by type, into the LLM system message

The result: the LLM doesn't hallucinate — it responds based on real business data, with source attribution.

3 Practical Takeaways

1. Start Simple

pgvector + OpenAI text-embedding-3-small + cosine search — this works in 30 minutes and is sufficient for most SME use cases. Don't over-engineer!

2. Don't Chunk What's Already a Natural Unit

Emails, calendar events, and client profiles are natural units — no need to split them into 500-token chunks. The entity-based approach (1 business object = 1 node + 1 embedding) is simpler and produces better results.

3. Graph Enrichment Is the Real Differentiator

For "Who?" "When?" "How much?" questions, vector search alone is weak — loading neighbors brings dramatic quality improvement at minimal additional cost.

Want to Go Deeper?

This article is a condensed version of our Semantic Search and Embedding Strategies — Whitepaper. The full whitepaper covers 15 chapters in detail: embedding model comparison, pgvector indexing (IVFFlat vs. HNSW), BullMQ async pipeline, hybrid search (RRF), re-ranking, RAGAS evaluation, GraphRAG, production monitoring, and embedding drift management.

Want to implement semantic search in your own system? The Atlosz Interactive team has production experience with pgvector, knowledge graph, and RAG pipeline architecture. Get in touch for a free technical consultation!