gig-RAG v8 What good looks like v8 score Goal What's in it UAE registry Reviews Storage & Studio Gaps Use cases Kendall
GIG Gulf · GG knowledge layer · Articulate

gig-RAG v8 — the UAE insurer corpus

One review surface for the knowledge layer behind GG: every UAE insurer, four product areas, three data layers — policy PDFs, website claims, and review/reputation. Now running ON the Mac Studio: 10,021 chunks — policy PDFs + GIG website pages + the 1,000-review eKomi set — embedded locally (nomic-embed-text) and queried by a local LLM (qwen3:8b). No OpenAI. v7–v8 add a LangGraph agent — hybrid dense+BM25 retrieval, MMR rerank, self-check, plus a knowledge graph (51 edges) + structured fact store — see the adversarial pass.

What good looks like — June 2026

Production knowledge systems in mid-2026 are not single vector stores. The bar, the gap, and the specific move to close it — per-dimension. Aligns with Gully's model.

DimensionGood (Jun 2026 best practice)gig-RAG todayThe move
RetrievalHybrid: dense vectors + BM25 keyword + rerank✓ v7: hybrid dense+BM25 fusion + MMR source-diversity rerank, on StudioAdd a cross-encoder reranker next
StructureGraphRAG — entity graph (insurer→line→benefit→clause) for multi-hop "follow the road"✓ v8: knowledge graph (graph.json, 51 edges)Add multi-hop traversal retrieval
FactsHard facts (limits, prices, contacts) in structured JSONB/SQL, queried not embedded✓ v8: facts.json store, prepended authoritativeMove to DuckDB/SQL + add book data
CoverageAll sources: PDFs + live HTML site + reviews, with a refresh cadence122 PDFs only; 0 HTML; eKomi not ingestedCrawl giggulf.ae HTML + ingest eKomi 1,227; quarterly re-verify
GroundingGuardrail layer: every answer cited, refuse when unsupported, eval harness (faithfulness score)✓ v7: self-check faithfulness node (qwen3 judge, 0–1 score) + citation-enforced answersAdd refuse-and-retry loop + batch RAGAS eval
HostOn-prem / sovereign for regulated dataBuilt off-StudioRelocate build + index + DuckDB to the Mac Studio
One line: good = hybrid retrieval + graph + structured facts + full-source coverage + grounding evals, on-prem. The six moves above are the path — tracked, not asserted.

v8 score vs good — measured

Scored 0–5 per dimension. Retrieval & grounding rows carry measured numbers from a 6-question eval run through the graph on Studio (rag/eval_v7.json): retrieval hit-rate 0.83, answer-correct 0.83, mean faithfulness 1.0.

Dimensionv8 stateScore
RetrievalHybrid dense+BM25 + MMR rerank (no cross-encoder yet); measured hit-rate 0.833 / 5
Structure (graph)v8: knowledge graph builtgraph.json, 51 insurer→benefit edges. Not yet multi-hop GraphRAG traversal.2 / 5
Facts (JSONB/SQL)v8: structured fact storefacts.json (10-insurer motor matrix + reviews + ratings), prepended authoritative in generate. Not SQL/DuckDB yet.2 / 5
CoveragePDFs + 4 GIG pages + eKomi; 18 insurers + competitor sites + Google-Maps missing; no refresh2 / 5
GroundingSelf-check + citations + eval harness; judge still too lenient (faithful=1.0 even on a wrong answer — qwen3:8b is a weak judge; needs a deterministic check). No refuse-retry.3 / 5
HostOn Studio, local nomic+qwen3, no external API — sovereign4 / 5
Overall: 16 / 30 ≈ 53% of the June-2026 "good" bar (v8). Up from 40% (v7) and ~15% (v1). v8 added the graph + facts layers (Structure/Facts off 0). Honest: the 6-question eval stayed 0.83/0.83/1.0 — it doesn't isolate the new layers, and the faithfulness judge is still weak. Next lifts: a deterministic grounding check, fact-specific eval questions, and the 18-insurer coverage scrape.

Goal

Build the comprehensive UAE-insurer knowledge base — three things, for every insurer, across four product areas — in a form we can run retrieval, vectorised search and a knowledge graph over:

  1. Policy information (PDFs) — wordings, T&C, IPID, tables of benefits — motor, medical, travel, home.
  2. Website claims — the USPs and offers each insurer markets.
  3. Review & reputation scores — Google + eKomi first (high-N), Trustpilot directional only.
Success: every corpus-derived claim on a client deliverable traces to a row here; each named gap carries an owner and a next action; and the whole store runs on-prem on the Studio — sovereign, data never leaves.
Operating principle

Layered, not flat

Vector-store-only RAG is the limited version. The store is layered: a compliance/guardrail layer, a firm/USP layer, and structured JSONB context profiles a low-grade model can read in a heartbeat. Only some information is ragged; the rest is structured and linked.

What's in it

Insurers tracked
29
UAE insurers + distributors across motor, medical, travel, home.
RAG chunks (Studio-local)
10,021
PDFs + GIG site + eKomi · nomic-embed 768-dim · on Studio.
GIG reviews already audited
1,227
eKomi export (9 Mar–31 May 2026), labelled + analysed.
Portfolio completion
~55%
Motor strongest, medical/travel thin — see benchmark.

Three source corpora feed one index today: MotorCompare (competitor motor, 13 entities, 1,010 chunks), MedBench (competitor medical, 6 insurers, 1,198 chunks), and the GIG SiteCorpus (122 of GIG's own PDFs across all lines incl. travel, 7,597 chunks). The new UAE insurer registry extends this to the full market and adds the website-claims and reviews layers.

UAE insurer registry

The spine — 29 entities × 4 product areas × 3 layers, each status-coded: V verified U secondary Q quote-only M missing pending. Sample of the priority set:

InsurerTypeL1 PDFsL2 USPL3 Reviews
GIG GulfnationalVpendingV
SukoonnationalVpendingV
LivanationalVpendingU
OrientnationalUpendingU
ADNICnationalQpendingU
Dubai InsurancenationalVpendingM
Now Healthforeign br.VpendingV
DamannationalQpendingU
Emirates / Watania / FidelityvariouspendingpendingU
+ 20 more (Salama, RAK, Takaful Emarat, MetLife, Cigna, Bupa…)pendingpendingM / pending

Full machine-readable registry: UAECorpus/registry.json. Harness for the walled targets is Browserbase + IPRoyal UAE residential proxy (both verified live); Firecrawl for the rest.

Reviews & reputation (Layer 3)

Source-of-truth correction: reputation is built from eKomi (GIG's own platform — 1,227 reviews already in the vault) and Google Maps (high-N per insurer). Trustpilot is low-N for UAE insurers and is directional only.

First competitor pass (Trustpilot, directional) already surfaces a real signal — GIG's consumer reputation leads the set:

InsurerTrustpilotReviewsFinancial strengthRead
GIG Gulf4.0169Replies to 100% of negatives; 830k+ customers. Strongest in set.
Sukoon1.565Service reviews scathing. Big reputation gap vs GIG.
Now Health4.0821Strong expat-medical reputation, high volume.
Policybazaar (aggr.)4.0312Highest review volume in the space.
Orient3.01AM Best A+ / S&P AConsumer N=1; financial strength is its real signal.
ADNIC3.01S&P A stableSame — strength over sentiment.
Emirates InsuranceAM Best A-No consumer footprint; strength is the signal.
WataniaAM Best B (under review −)Weakest strength rating in set.
Fidelity United2.92Low-N; claims-delay complaints.

Dubai Insurance, Union, Salama and RAK returned no clean consumer score and are flagged M for a Google-Maps pass via Browserbase. Data: UAECorpus/L3_reviews.json.

How it's stored & where it runs

Pipeline: source → pdftotext → chunks.jsonl → index.npy → query.py, plus a DuckDB store for the review layer and a graph layer linking insurer → product → benefit. Flat-file and portable.

Now running ON the Mac Studio (M4 Max — studio / 192.168.1.246 / tailnet 100.82.41.69). Executed 2026-06-19: corpus + scripts pushed to Studio; 10,021 chunks re-embedded locally with nomic-embed-text via Ollama; queries answered by qwen3:8b locally. Proven end-to-end (e.g. "GIG Prestige back-to-invoice = 24 months, market max" — cited). On-prem and sovereign — no external API, data never leaves GIG hardware.

The index: local index_local.npy (768-dim nomic) on Studio + the original OpenAI index.npy (1536-dim) in vault. The legacy flat path mis-ranked; v7 fixes retrieval.

v7 — agentic graph live on Studio. Real LangGraph StateGraph: retrieve (hybrid dense+BM25) → MMR rerank (source-diversity cap) → qwen3 grounded generate → self-check faithfulness node (0–1). Proven: "free home medication delivery + eKomi score" returned "Not in corpus" under v6 flat retrieval; under v7 it answers "yes — free home medicine delivery; eKomi 4.5/5", cited, faithfulness 1.0. File: rag/studio_graph.py. No JSONB knowledge graph yet — next.

Benchmark — completeness vs 100%

Measured against the full intended universe — 29 UAE insurers × 4 product areas × 3 data layers + pricing. This is the manifest scored against complete, not two corpora's internal %.

DimensionCapturedTarget (100%)Coverage
Insurers in registry29~29 UAE personal-lines~100%
Policy PDFs (L1)11 insurers w/ docs · 137 files (122 GIG-own + 6 motor + 9 medical)29 insurers × 4 lines38% of insurers
Website USPs (L2)1 insurer (GIG, 4 pages)29 insurers3%
Reviews / reputation (L3)11 insurers · GIG eKomi 37k + Google 4.5/90029 insurers (Google Maps high-N)38%
Pricing / premiums0 verified (quote-walled) + indicative aggregator floors29 insurers × lines0%
RAG index (built)10,021 chunks · 768-dim · Studio-locallive ✓
Overall corpus completeness ≈ 20% of the full 29-insurer data universe (L1 38% · L2 3% · L3 38% · pricing 0%, averaged). The earlier "~55%" was the original 3 corpora's internal scope, not the full market. Biggest single lift: the 18-insurer L1/L2 scrape + Google-Maps L3.

What's missing

Blocker — GIG first-party book data (premiums, nationality, vehicle, LTV). Gates the targeting hypothesis; owner: GIG to supply.
High — medical breadth. 6 of ~15 insurers; missing Bupa, Cigna, MetLife, Liva, Salama, Watania, Takaful Emarat. Daman wording image-only (needs OCR).
High — travel is GIG-own only. 4 docs, no UAE travel product, no competitor travel benchmark.
Medium — L2 USP layer barely started; L3 Google/eKomi competitor pass outstanding; GCC markets (KSA/Oman/Kuwait) absent.

What it can be used for

Kendall — adversarial pass

KendallRoy · default-to-NO · the doc reviewed against its own goal

The doc described the plan as if it were the build

REJECT the "done" framing. Verdict: conditional pass as an internal provenance tracker; reject as a description of a built system. This page now states present-tense truth.

Stated but unproven (each measured against the files)

ClaimReality on disk
"layered RAG + vector + knowledge-graph"Flat vector only. 9,805 chunks, one model, cosine. Zero graph / JSONB / guardrail files. The exact "limited" version Gully named.
"runs on the Mac Studio"✓ RESOLVED 2026-06-19. 10,021 chunks embedded locally (nomic) + queried by qwen3:8b on Studio. No OpenAI.
"29 insurers tracked"Still 11 with data; L2 now started (GIG site in). The other 18 remain pending.
"all review sources"Partly resolved. eKomi 1,000 now in the index; site shows eKomi 4.5/37k. Google-Maps per-insurer (Browserbase) still pending.
"full website in the RAG"Partly resolved. 4 GIG product pages (car/health/travel/home) now ingested; ~56 more URLs mapped, not yet scraped.

What's missing

PDF coverage for ~18 of 29 insurers · the entire L2 website-claims layer · the graph + JSONB layers Gully specified · eKomi ingestion · GIG book/pricing data (the gate) · GCC markets · and until this version the index wasn't even linked from the site.

Honest probabilities: useful internally today 90%; supports an external "complete market view" claim 35%; targeting testable without GIG book data ~5%. In flight now: ingest eKomi + crawl the live GIG website into the RAG; competitor USP pages as L2; then the graph + JSONB layers and the Studio relocation.
Versions: v1 corpus-audit (2026-06-19) → v4 UAE-corpus + reviews (current).

Measured from corpus-scorecard.md · rag/manifest.json · MotorCompare/coverage-matrix.md · 06-review-mining (eKomi 1,227) · SiteCorpus/manifest.md. Articulate for GIG Gulf. Internal review surface.