Data Methodology
Source Pipeline
Profiles are assembled from up to 4 independent data sources, cross-validated for agreement:
- Google Maps / DataForSEO — Business name, address, phone, hours, categories, ratings. Highest-trust source.
- Schema.org Structured Data — Structured markup from the business’s own website.
- Gemini AI Classification — Category classification and business type inference.
- Website Scan — Contact info, services, descriptions extracted from the website.
Field confidence is computed from cross-source agreement. Conflicts between sources are detected and surfaced.
Citation Tiers
Every published profile is assigned one of three citation tiers based on data quality and verification status:
- Verified — Owner-confirmed with high cross-source data agreement. Highest quality. Safe for authoritative AI citation.
- Citable — Multi-source validated, well-structured. Not yet owner-confirmed but meets quality threshold for AI citation.
- Listed — Auto-generated from public data. Accurate but below the threshold for AI citation use.
Scoring Dimensions (0–100)
- AI Interpretability — How well AI systems can parse and understand the content.
- Entity & Business Identity — Whether the business is a clearly defined entity (name, type, location).
- AI Presence — Whether the business already appears in AI-generated answers today.
- Trust & Authority — Trust signals, legal pages, security posture, certifications.
- AI Crawlability — Whether AI bots can access and read site content.
- Distribution Signals — How broadly the business is referenced across the web.
Final score = weighted composite across all 6 dimensions.
Freshness & Re-scan Policy
- Pro subscribers — Weekly re-scan (every Sunday).
- Basic subscribers — Monthly re-scan (1st of month).
- All profiles — Auto-discovery re-run weekly (every Wednesday) via 103-country geographic waterfall.
Closed and permanently-closed businesses are detected at import and excluded.
Anti-Duplication
Deduplication uses Google Maps place_id as the canonical key. Domain-based dedup catches the same business discovered from different sources. An entity_match_log tracks all match resolutions.
Deterministic Scoring
All scoring is rule-based. No LLM tokens are used in the final score calculation. This prevents hallucination and ensures reproducibility — the same inputs always produce the same score.