Skip to content

Case 11: Cross-Validating Attribution Properties

Case 11: Cross-Validating Attribution Properties

Section titled “Case 11: Cross-Validating Attribution Properties”

Real Slack conversation showing the process of testing which attribution properties actually work.

“Check landing page events. Filter by path to see main landings.”

Checking landing page events

What exists:

  • conv_landing_page_view event ✓
  • Has landing_page_source property (full URL)
  • Top landing pages visible immediately

Event exists, create insight

Top landing pages: https://us.posthog.com/project/98417/insights/mwjObaVe

“Group by traffic sources. Where are visitors coming from?”

Need to group by traffic source

Which property to use?

  • utm_source?
  • affid?
  • referring_domain?

Test them all. See what has data.

Testing properties - referring_domain works

Insights created:

  1. Landing Page Views - Main Landings Only: https://us.posthog.com/project/98417/insights/mwjObaVe
  2. Landing Pages by Affiliate ID (affid): https://us.posthog.com/project/98417/insights/Z90PbMNm
  3. Landing Pages by Referring Domain: https://us.posthog.com/project/98417/insights/l4ecohol

What it shows:

  • Automatically captured by PostHog (no setup)
  • Shows referrer: Facebook, Google, Direct, Snapchat
  • Works for all traffic

Top Referring Domains (Last 7 Days):

  1. Facebook (m.facebook.com + www.facebook.com + l.facebook.com) - ~31,830 views total
  2. Direct traffic ($direct) - 24,839 views
  3. Google (www.google.com) - 4,205 views
  4. Snapchat - 532 views
  5. Instagram - 751 views
  6. Yahoo - 435 views
  7. Bing/MSN - 897 views combined

“25k direct traffic? That seems high. Check affid too.”

Question the numbers

Test alternative: Check affid (affiliate ID) for campaign attribution.

Affid mapping reveals more detail

Detailed breakdown insights:

When affid exists, it’s highly accurate:

Affid 1000 = Facebook

  • Total views: ~31,765
  • Breakdown:
    • m.facebook.com: 21,844
    • Direct (after Facebook): 3,764
    • l.facebook.com: 2,088
    • Instagram: 745

Affid 1001 = Google Search

  • Total views: ~2,869
  • Breakdown:

Affid 1004 = Snapchat

  • Total views: ~8,072
  • Breakdown:

Affid 1010 = Bing/Yahoo

  • Total views: ~1,917
  • MSN, Yahoo, Direct, Bing, DuckDuckGo

Affid 15975 = Facebook (different campaign)

Affid 3016 = TheOffer

  • Total views: ~2,561
  • Direct + click.mediaforce.com

Coverage:

  • 78% of traffic has affid (55,447 with vs 15,578 without)
  • Highly reliable when present
  • Can map affid → actual traffic sources for attribution

affid - Most reliable for campaign attribution

  • 78% coverage in this dataset
  • Requires mapping (affid → source name)
  • Shows specific campaigns: affid 1000 = Facebook, 1001 = Google, etc.
  • Can cross-validate with referring_domain to see actual referrer breakdown

referring_domain - Good for cross-validation

  • Automatically captured (no setup)
  • Shows where traffic comes from
  • But: lumps things into “direct” that affid can separate
  • Use to validate affid mappings

utm_source - Mostly broken

  • 91% null in this dataset
  • Requires manual UTM tagging
  • Only works if marketers consistently tag (they don’t)

tid - Useless

  • Almost all values are “1”
  • No differentiation

Don’t trust single properties. Cross-validate.

referring_domain alone showed 25k “direct” traffic. Seems high. Check with affid:

  • affid breaks down that “direct” into actual campaigns
  • Affid 1004 (Snapchat): 7,004 of that “direct” came from Snapchat params
  • Affid 1000 (Facebook): 3,764 “direct” = post-Facebook clicks

The “direct” traffic wasn’t actually direct. You’d miss this without cross-validation.

Use:

  1. affid as primary for campaign attribution (when tagged)
  2. referring_domain to cross-validate and catch untagged traffic
  3. Always compare both - discrepancies reveal insights

Ignore:

  • utm_source (too many nulls)
  • tid (not differentiated)

Not complex. Simple workflow:

  1. Check what properties exist
  2. Test which ones have data (look for nulls)
  3. Question the results (25k direct? Really?)
  4. Cross-validate multiple sources
  5. Use what works, ignore what doesn’t

No meetings. No documentation. Check the data. Question everything.

Without questioning:

  • See referring_domain works → Done!
  • Accept 25k “direct” traffic → Move on
  • Trust the first chart → Make decisions

Result: Wrong conclusions. Bad decisions.

With questioning:

  • See referring_domain works → “But 25k direct seems high”
  • Check with affid → “Oh, that ‘direct’ is actually Snapchat + Facebook”
  • Cross-validate → Find the truth

Result: Accurate understanding. Good decisions.

This applies to everything:

  • Question AI outputs (like this case showed)
  • Question dashboard numbers
  • Question “industry best practices”
  • Question vendor claims
  • Question your own assumptions

The tool (PostHog + MCP) enables fast testing. But only works if you question the results.

  • PostHog event definitions list
  • Property exploration
  • Insight creation with breakdowns
  • Cross-validation across multiple properties
  • Real-time data querying