GIG Gulf · GG knowledge layer · Articulate

gig-RAG v8 — the UAE insurer corpus

One review surface for the knowledge layer behind GG: every UAE insurer, four product areas, three data layers — policy PDFs, website claims, and review/reputation. Now running ON the Mac Studio: 10,021 chunks — policy PDFs + GIG website pages + the 1,000-review eKomi set — embedded locally (nomic-embed-text) and queried by a local LLM (qwen3:8b). No OpenAI. v7–v8 add a LangGraph agent — hybrid dense+BM25 retrieval, MMR rerank, self-check, plus a knowledge graph (51 edges) + structured fact store — see the adversarial pass.

What good looks like — June 2026

Production knowledge systems in mid-2026 are not single vector stores. The bar, the gap, and the specific move to close it — per-dimension. Aligns with Gully's model.

Dimension	Good (Jun 2026 best practice)	gig-RAG today	The move
Retrieval	Hybrid: dense vectors + BM25 keyword + rerank	✓ v7: hybrid dense+BM25 fusion + MMR source-diversity rerank, on Studio	Add a cross-encoder reranker next
Structure	GraphRAG — entity graph (insurer→line→benefit→clause) for multi-hop "follow the road"	✓ v8: knowledge graph (graph.json, 51 edges)	Add multi-hop traversal retrieval
Facts	Hard facts (limits, prices, contacts) in structured JSONB/SQL, queried not embedded	✓ v8: facts.json store, prepended authoritative	Move to DuckDB/SQL + add book data
Coverage	All sources: PDFs + live HTML site + reviews, with a refresh cadence	122 PDFs only; 0 HTML; eKomi not ingested	Crawl giggulf.ae HTML + ingest eKomi 1,227; quarterly re-verify
Grounding	Guardrail layer: every answer cited, refuse when unsupported, eval harness (faithfulness score)	✓ v7: self-check faithfulness node (qwen3 judge, 0–1 score) + citation-enforced answers	Add refuse-and-retry loop + batch RAGAS eval
Host	On-prem / sovereign for regulated data	Built off-Studio	Relocate build + index + DuckDB to the Mac Studio

One line: good = hybrid retrieval + graph + structured facts + full-source coverage + grounding evals, on-prem. The six moves above are the path — tracked, not asserted.

v8 score vs good — measured

Scored 0–5 per dimension. Retrieval & grounding rows carry measured numbers from a 6-question eval run through the graph on Studio (rag/eval_v7.json): retrieval hit-rate 0.83, answer-correct 0.83, mean faithfulness 1.0.

Dimension	v8 state	Score
Retrieval	Hybrid dense+BM25 + MMR rerank (no cross-encoder yet); measured hit-rate 0.83	3 / 5
Structure (graph)	v8: knowledge graph built — `graph.json`, 51 insurer→benefit edges. Not yet multi-hop GraphRAG traversal.	2 / 5
Facts (JSONB/SQL)	v8: structured fact store — `facts.json` (10-insurer motor matrix + reviews + ratings), prepended authoritative in generate. Not SQL/DuckDB yet.	2 / 5
Coverage	PDFs + 4 GIG pages + eKomi; 18 insurers + competitor sites + Google-Maps missing; no refresh	2 / 5
Grounding	Self-check + citations + eval harness; judge still too lenient (faithful=1.0 even on a wrong answer — qwen3:8b is a weak judge; needs a deterministic check). No refuse-retry.	3 / 5
Host	On Studio, local nomic+qwen3, no external API — sovereign	4 / 5

Overall: 16 / 30 ≈ 53% of the June-2026 "good" bar (v8). Up from 40% (v7) and ~15% (v1). v8 added the graph + facts layers (Structure/Facts off 0). Honest: the 6-question eval stayed 0.83/0.83/1.0 — it doesn't isolate the new layers, and the faithfulness judge is still weak. Next lifts: a deterministic grounding check, fact-specific eval questions, and the 18-insurer coverage scrape.

Goal

Build the comprehensive UAE-insurer knowledge base — three things, for every insurer, across four product areas — in a form we can run retrieval, vectorised search and a knowledge graph over:

Policy information (PDFs) — wordings, T&C, IPID, tables of benefits — motor, medical, travel, home.
Website claims — the USPs and offers each insurer markets.
Review & reputation scores — Google + eKomi first (high-N), Trustpilot directional only.

Success: every corpus-derived claim on a client deliverable traces to a row here; each named gap carries an owner and a next action; and the whole store runs on-prem on the Studio — sovereign, data never leaves.

Operating principle

Layered, not flat

Vector-store-only RAG is the limited version. The store is layered: a compliance/guardrail layer, a firm/USP layer, and structured JSONB context profiles a low-grade model can read in a heartbeat. Only some information is ragged; the rest is structured and linked.

What's in it

Insurers tracked

UAE insurers + distributors across motor, medical, travel, home.

RAG chunks (Studio-local)

10,021

PDFs + GIG site + eKomi · nomic-embed 768-dim · on Studio.

GIG reviews already audited

1,227

eKomi export (9 Mar–31 May 2026), labelled + analysed.

Portfolio completion

~55%

Motor strongest, medical/travel thin — see benchmark.

Three source corpora feed one index today: MotorCompare (competitor motor, 13 entities, 1,010 chunks), MedBench (competitor medical, 6 insurers, 1,198 chunks), and the GIG SiteCorpus (122 of GIG's own PDFs across all lines incl. travel, 7,597 chunks). The new UAE insurer registry extends this to the full market and adds the website-claims and reviews layers.

UAE insurer registry

The spine — 29 entities × 4 product areas × 3 layers, each status-coded: V verified U secondary Q quote-only M missing pending. Sample of the priority set:

Insurer	Type	L1 PDFs	L2 USP	L3 Reviews
GIG Gulf	national	V	pending	V
Sukoon	national	V	pending	V
Liva	national	V	pending	U
Orient	national	U	pending	U
ADNIC	national	Q	pending	U
Dubai Insurance	national	V	pending	M
Now Health	foreign br.	V	pending	V
Daman	national	Q	pending	U
Emirates / Watania / Fidelity	various	pending	pending	U
+ 20 more (Salama, RAK, Takaful Emarat, MetLife, Cigna, Bupa…)	—	pending	pending	M / pending

Full machine-readable registry: UAECorpus/registry.json. Harness for the walled targets is Browserbase + IPRoyal UAE residential proxy (both verified live); Firecrawl for the rest.

Reviews & reputation (Layer 3)

Source-of-truth correction: reputation is built from eKomi (GIG's own platform — 1,227 reviews already in the vault) and Google Maps (high-N per insurer). Trustpilot is low-N for UAE insurers and is directional only.

First competitor pass (Trustpilot, directional) already surfaces a real signal — GIG's consumer reputation leads the set:

Insurer	Trustpilot	Reviews	Financial strength	Read
GIG Gulf	4.0	169	—	Replies to 100% of negatives; 830k+ customers. Strongest in set.
Sukoon	1.5	65	—	Service reviews scathing. Big reputation gap vs GIG.
Now Health	4.0	821	—	Strong expat-medical reputation, high volume.
Policybazaar (aggr.)	4.0	312	—	Highest review volume in the space.
Orient	3.0	1	AM Best A+ / S&P A	Consumer N=1; financial strength is its real signal.
ADNIC	3.0	1	S&P A stable	Same — strength over sentiment.
Emirates Insurance	—	—	AM Best A-	No consumer footprint; strength is the signal.
Watania	—	—	AM Best B (under review −)	Weakest strength rating in set.
Fidelity United	2.9	2	—	Low-N; claims-delay complaints.

Dubai Insurance, Union, Salama and RAK returned no clean consumer score and are flagged M for a Google-Maps pass via Browserbase. Data: UAECorpus/L3_reviews.json.

How it's stored & where it runs

Pipeline: source → pdftotext → chunks.jsonl → index.npy → query.py, plus a DuckDB store for the review layer and a graph layer linking insurer → product → benefit. Flat-file and portable.

Now running ON the Mac Studio (M4 Max — studio / 192.168.1.246 / tailnet 100.82.41.69). Executed 2026-06-19: corpus + scripts pushed to Studio; 10,021 chunks re-embedded locally with nomic-embed-text via Ollama; queries answered by qwen3:8b locally. Proven end-to-end (e.g. "GIG Prestige back-to-invoice = 24 months, market max" — cited). On-prem and sovereign — no external API, data never leaves GIG hardware.

The index: local index_local.npy (768-dim nomic) on Studio + the original OpenAI index.npy (1536-dim) in vault. The legacy flat path mis-ranked; v7 fixes retrieval.

v7 — agentic graph live on Studio. Real LangGraph StateGraph: retrieve (hybrid dense+BM25) → MMR rerank (source-diversity cap) → qwen3 grounded generate → self-check faithfulness node (0–1). Proven: "free home medication delivery + eKomi score" returned "Not in corpus" under v6 flat retrieval; under v7 it answers "yes — free home medicine delivery; eKomi 4.5/5", cited, faithfulness 1.0. File: rag/studio_graph.py. No JSONB knowledge graph yet — next.

Benchmark — completeness vs 100%

Measured against the full intended universe — 29 UAE insurers × 4 product areas × 3 data layers + pricing. This is the manifest scored against complete, not two corpora's internal %.

Dimension	Captured	Target (100%)	Coverage
Insurers in registry	29	~29 UAE personal-lines	~100%
Policy PDFs (L1)	11 insurers w/ docs · 137 files (122 GIG-own + 6 motor + 9 medical)	29 insurers × 4 lines	38% of insurers
Website USPs (L2)	1 insurer (GIG, 4 pages)	29 insurers	3%
Reviews / reputation (L3)	11 insurers · GIG eKomi 37k + Google 4.5/900	29 insurers (Google Maps high-N)	38%
Pricing / premiums	0 verified (quote-walled) + indicative aggregator floors	29 insurers × lines	0%
RAG index (built)	10,021 chunks · 768-dim · Studio-local	—	live ✓

Overall corpus completeness ≈ 20% of the full 29-insurer data universe (L1 38% · L2 3% · L3 38% · pricing 0%, averaged). The earlier "~55%" was the original 3 corpora's internal scope, not the full market. Biggest single lift: the 18-insurer L1/L2 scrape + Google-Maps L3.

What's missing

Blocker — GIG first-party book data (premiums, nationality, vehicle, LTV). Gates the targeting hypothesis; owner: GIG to supply.

High — medical breadth. 6 of ~15 insurers; missing Bupa, Cigna, MetLife, Liva, Salama, Watania, Takaful Emarat. Daman wording image-only (needs OCR).

High — travel is GIG-own only. 4 docs, no UAE travel product, no competitor travel benchmark.

Medium — L2 USP layer barely started; L3 Google/eKomi competitor pass outstanding; GCC markets (KSA/Oman/Kuwait) absent.

What it can be used for

GG chatbot — grounded internal Q&A over GIG's own wordings (truth layer under the conversational surface).
Competitor battlecards — verified, cited cover + reputation comparisons (GIG 4.0 vs Sukoon 1.5).
Defensible claim engine — every USP checked against policy text before it ships.
Review intelligence — testimonial harvest, service-recovery queue, SEO review schema, VoC copy (from the 1,227 eKomi set → 37k plan).
Targeting — match GIG strengths to segments (blocked on book data).
Repeatable engine — same ingest/extract/embed/graph re-runs for any line, market or competitor.

Kendall — adversarial pass

KendallRoy · default-to-NO · the doc reviewed against its own goal

The doc described the plan as if it were the build

REJECT the "done" framing. Verdict: conditional pass as an internal provenance tracker; reject as a description of a built system. This page now states present-tense truth.

Stated but unproven (each measured against the files)

Claim	Reality on disk
"layered RAG + vector + knowledge-graph"	Flat vector only. 9,805 chunks, one model, cosine. Zero graph / JSONB / guardrail files. The exact "limited" version Gully named.
"runs on the Mac Studio"	✓ RESOLVED 2026-06-19. 10,021 chunks embedded locally (nomic) + queried by qwen3:8b on Studio. No OpenAI.
"29 insurers tracked"	Still 11 with data; L2 now started (GIG site in). The other 18 remain pending.
"all review sources"	Partly resolved. eKomi 1,000 now in the index; site shows eKomi 4.5/37k. Google-Maps per-insurer (Browserbase) still pending.
"full website in the RAG"	Partly resolved. 4 GIG product pages (car/health/travel/home) now ingested; ~56 more URLs mapped, not yet scraped.

What's missing

PDF coverage for ~18 of 29 insurers · the entire L2 website-claims layer · the graph + JSONB layers Gully specified · eKomi ingestion · GIG book/pricing data (the gate) · GCC markets · and until this version the index wasn't even linked from the site.

Honest probabilities: useful internally today 90%; supports an external "complete market view" claim 35%; targeting testable without GIG book data ~5%. In flight now: ingest eKomi + crawl the live GIG website into the RAG; competitor USP pages as L2; then the graph + JSONB layers and the Studio relocation.