Data Methodology

Source Pipeline

Profiles are assembled from up to 4 independent data sources, cross-validated for agreement:

  • Google Maps / DataForSEO — Business name, address, phone, hours, categories, ratings. Highest-trust source.
  • Schema.org Structured Data — Structured markup from the business’s own website.
  • Gemini AI Classification — Category classification and business type inference.
  • Website Scan — Contact info, services, descriptions extracted from the website.

Field confidence is computed from cross-source agreement. Conflicts between sources are detected and surfaced.

Citation Tiers

Every published profile is assigned one of three citation tiers based on data quality and verification status:

  • Verified — Owner-confirmed with high cross-source data agreement. Highest quality. Safe for authoritative AI citation.
  • Citable — Multi-source validated, well-structured. Not yet owner-confirmed but meets quality threshold for AI citation.
  • Listed — Auto-generated from public data. Accurate but below the threshold for AI citation use.

Scoring Dimensions (0–100)

  1. AI Interpretability — How well AI systems can parse and understand the content.
  2. Entity & Business Identity — Whether the business is a clearly defined entity (name, type, location).
  3. AI Presence — Whether the business already appears in AI-generated answers today.
  4. Trust & Authority — Trust signals, legal pages, security posture, certifications.
  5. AI Crawlability — Whether AI bots can access and read site content.
  6. Distribution Signals — How broadly the business is referenced across the web.

Final score = weighted composite across all 6 dimensions.

Freshness & Re-scan Policy

  • Pro subscribers — Weekly re-scan (every Sunday).
  • Basic subscribers — Monthly re-scan (1st of month).
  • All profiles — Auto-discovery re-run weekly (every Wednesday) via 103-country geographic waterfall.

Closed and permanently-closed businesses are detected at import and excluded.

Anti-Duplication

Deduplication uses Google Maps place_id as the canonical key. Domain-based dedup catches the same business discovered from different sources. An entity_match_log tracks all match resolutions.

Deterministic Scoring

All scoring is rule-based. No LLM tokens are used in the final score calculation. This prevents hallucination and ensures reproducibility — the same inputs always produce the same score.