gig-RAG v8 — the UAE insurer corpus
One review surface for the knowledge layer behind GG: every UAE insurer, four product areas, three data layers — policy PDFs, website claims, and review/reputation. Now running ON the Mac Studio: 10,021 chunks — policy PDFs + GIG website pages + the 1,000-review eKomi set — embedded locally (nomic-embed-text) and queried by a local LLM (qwen3:8b). No OpenAI. v7–v8 add a LangGraph agent — hybrid dense+BM25 retrieval, MMR rerank, self-check, plus a knowledge graph (51 edges) + structured fact store — see the adversarial pass.
What good looks like — June 2026
Production knowledge systems in mid-2026 are not single vector stores. The bar, the gap, and the specific move to close it — per-dimension. Aligns with Gully's model.
| Dimension | Good (Jun 2026 best practice) | gig-RAG today | The move |
|---|---|---|---|
| Retrieval | Hybrid: dense vectors + BM25 keyword + rerank | ✓ v7: hybrid dense+BM25 fusion + MMR source-diversity rerank, on Studio | Add a cross-encoder reranker next |
| Structure | GraphRAG — entity graph (insurer→line→benefit→clause) for multi-hop "follow the road" | ✓ v8: knowledge graph (graph.json, 51 edges) | Add multi-hop traversal retrieval |
| Facts | Hard facts (limits, prices, contacts) in structured JSONB/SQL, queried not embedded | ✓ v8: facts.json store, prepended authoritative | Move to DuckDB/SQL + add book data |
| Coverage | All sources: PDFs + live HTML site + reviews, with a refresh cadence | 122 PDFs only; 0 HTML; eKomi not ingested | Crawl giggulf.ae HTML + ingest eKomi 1,227; quarterly re-verify |
| Grounding | Guardrail layer: every answer cited, refuse when unsupported, eval harness (faithfulness score) | ✓ v7: self-check faithfulness node (qwen3 judge, 0–1 score) + citation-enforced answers | Add refuse-and-retry loop + batch RAGAS eval |
| Host | On-prem / sovereign for regulated data | Built off-Studio | Relocate build + index + DuckDB to the Mac Studio |
v8 score vs good — measured
Scored 0–5 per dimension. Retrieval & grounding rows carry measured numbers from a
6-question eval run through the graph on Studio (rag/eval_v7.json): retrieval hit-rate
0.83, answer-correct 0.83, mean faithfulness 1.0.
| Dimension | v8 state | Score |
|---|---|---|
| Retrieval | Hybrid dense+BM25 + MMR rerank (no cross-encoder yet); measured hit-rate 0.83 | 3 / 5 |
| Structure (graph) | v8: knowledge graph built — graph.json, 51 insurer→benefit edges. Not yet multi-hop GraphRAG traversal. | 2 / 5 |
| Facts (JSONB/SQL) | v8: structured fact store — facts.json (10-insurer motor matrix + reviews + ratings), prepended authoritative in generate. Not SQL/DuckDB yet. | 2 / 5 |
| Coverage | PDFs + 4 GIG pages + eKomi; 18 insurers + competitor sites + Google-Maps missing; no refresh | 2 / 5 |
| Grounding | Self-check + citations + eval harness; judge still too lenient (faithful=1.0 even on a wrong answer — qwen3:8b is a weak judge; needs a deterministic check). No refuse-retry. | 3 / 5 |
| Host | On Studio, local nomic+qwen3, no external API — sovereign | 4 / 5 |
Goal
Build the comprehensive UAE-insurer knowledge base — three things, for every insurer, across four product areas — in a form we can run retrieval, vectorised search and a knowledge graph over:
- Policy information (PDFs) — wordings, T&C, IPID, tables of benefits — motor, medical, travel, home.
- Website claims — the USPs and offers each insurer markets.
- Review & reputation scores — Google + eKomi first (high-N), Trustpilot directional only.
Layered, not flat
Vector-store-only RAG is the limited version. The store is layered: a compliance/guardrail layer, a firm/USP layer, and structured JSONB context profiles a low-grade model can read in a heartbeat. Only some information is ragged; the rest is structured and linked.
What's in it
Three source corpora feed one index today: MotorCompare (competitor motor, 13 entities, 1,010 chunks), MedBench (competitor medical, 6 insurers, 1,198 chunks), and the GIG SiteCorpus (122 of GIG's own PDFs across all lines incl. travel, 7,597 chunks). The new UAE insurer registry extends this to the full market and adds the website-claims and reviews layers.
UAE insurer registry
The spine — 29 entities × 4 product areas × 3 layers, each status-coded: V verified U secondary Q quote-only M missing pending. Sample of the priority set:
| Insurer | Type | L1 PDFs | L2 USP | L3 Reviews |
|---|---|---|---|---|
| GIG Gulf | national | V | pending | V |
| Sukoon | national | V | pending | V |
| Liva | national | V | pending | U |
| Orient | national | U | pending | U |
| ADNIC | national | Q | pending | U |
| Dubai Insurance | national | V | pending | M |
| Now Health | foreign br. | V | pending | V |
| Daman | national | Q | pending | U |
| Emirates / Watania / Fidelity | various | pending | pending | U |
| + 20 more (Salama, RAK, Takaful Emarat, MetLife, Cigna, Bupa…) | — | pending | pending | M / pending |
Full machine-readable registry: UAECorpus/registry.json. Harness for the walled targets is
Browserbase + IPRoyal UAE residential proxy (both verified live); Firecrawl for the rest.
Reviews & reputation (Layer 3)
First competitor pass (Trustpilot, directional) already surfaces a real signal — GIG's consumer reputation leads the set:
| Insurer | Trustpilot | Reviews | Financial strength | Read |
|---|---|---|---|---|
| GIG Gulf | 4.0 | 169 | — | Replies to 100% of negatives; 830k+ customers. Strongest in set. |
| Sukoon | 1.5 | 65 | — | Service reviews scathing. Big reputation gap vs GIG. |
| Now Health | 4.0 | 821 | — | Strong expat-medical reputation, high volume. |
| Policybazaar (aggr.) | 4.0 | 312 | — | Highest review volume in the space. |
| Orient | 3.0 | 1 | AM Best A+ / S&P A | Consumer N=1; financial strength is its real signal. |
| ADNIC | 3.0 | 1 | S&P A stable | Same — strength over sentiment. |
| Emirates Insurance | — | — | AM Best A- | No consumer footprint; strength is the signal. |
| Watania | — | — | AM Best B (under review −) | Weakest strength rating in set. |
| Fidelity United | 2.9 | 2 | — | Low-N; claims-delay complaints. |
Dubai Insurance, Union, Salama and RAK returned no clean consumer score and are flagged M
for a Google-Maps pass via Browserbase. Data: UAECorpus/L3_reviews.json.
How it's stored & where it runs
Pipeline: source → pdftotext → chunks.jsonl → index.npy → query.py, plus a DuckDB store for the review layer and a graph layer linking insurer → product → benefit. Flat-file and portable.
The index: local index_local.npy (768-dim nomic) on Studio + the original
OpenAI index.npy (1536-dim) in vault. The legacy flat path mis-ranked; v7 fixes retrieval.
StateGraph: retrieve (hybrid dense+BM25) → MMR rerank
(source-diversity cap) → qwen3 grounded generate → self-check faithfulness node (0–1).
Proven: "free home medication delivery + eKomi score" returned "Not in corpus"
under v6 flat retrieval; under v7 it answers "yes — free home medicine delivery; eKomi 4.5/5",
cited, faithfulness 1.0. File: rag/studio_graph.py. No JSONB knowledge graph yet — next.Benchmark — completeness vs 100%
Measured against the full intended universe — 29 UAE insurers × 4 product areas × 3 data layers + pricing. This is the manifest scored against complete, not two corpora's internal %.
| Dimension | Captured | Target (100%) | Coverage |
|---|---|---|---|
| Insurers in registry | 29 | ~29 UAE personal-lines | ~100% |
| Policy PDFs (L1) | 11 insurers w/ docs · 137 files (122 GIG-own + 6 motor + 9 medical) | 29 insurers × 4 lines | 38% of insurers |
| Website USPs (L2) | 1 insurer (GIG, 4 pages) | 29 insurers | 3% |
| Reviews / reputation (L3) | 11 insurers · GIG eKomi 37k + Google 4.5/900 | 29 insurers (Google Maps high-N) | 38% |
| Pricing / premiums | 0 verified (quote-walled) + indicative aggregator floors | 29 insurers × lines | 0% |
| RAG index (built) | 10,021 chunks · 768-dim · Studio-local | — | live ✓ |
What's missing
What it can be used for
- GG chatbot — grounded internal Q&A over GIG's own wordings (truth layer under the conversational surface).
- Competitor battlecards — verified, cited cover + reputation comparisons (GIG 4.0 vs Sukoon 1.5).
- Defensible claim engine — every USP checked against policy text before it ships.
- Review intelligence — testimonial harvest, service-recovery queue, SEO review schema, VoC copy (from the 1,227 eKomi set → 37k plan).
- Targeting — match GIG strengths to segments (blocked on book data).
- Repeatable engine — same ingest/extract/embed/graph re-runs for any line, market or competitor.
Kendall — adversarial pass
The doc described the plan as if it were the build
REJECT the "done" framing. Verdict: conditional pass as an internal provenance tracker; reject as a description of a built system. This page now states present-tense truth.
Stated but unproven (each measured against the files)
| Claim | Reality on disk |
|---|---|
| "layered RAG + vector + knowledge-graph" | Flat vector only. 9,805 chunks, one model, cosine. Zero graph / JSONB / guardrail files. The exact "limited" version Gully named. |
| "runs on the Mac Studio" | ✓ RESOLVED 2026-06-19. 10,021 chunks embedded locally (nomic) + queried by qwen3:8b on Studio. No OpenAI. |
| "29 insurers tracked" | Still 11 with data; L2 now started (GIG site in). The other 18 remain pending. |
| "all review sources" | Partly resolved. eKomi 1,000 now in the index; site shows eKomi 4.5/37k. Google-Maps per-insurer (Browserbase) still pending. |
| "full website in the RAG" | Partly resolved. 4 GIG product pages (car/health/travel/home) now ingested; ~56 more URLs mapped, not yet scraped. |
What's missing
PDF coverage for ~18 of 29 insurers · the entire L2 website-claims layer · the graph + JSONB layers Gully specified · eKomi ingestion · GIG book/pricing data (the gate) · GCC markets · and until this version the index wasn't even linked from the site.