Case 11: Cross-Validating Attribution Properties

Context

Real Slack conversation showing the process of testing which attribution properties actually work.

Starting Point

“Check landing page events. Filter by path to see main landings.”

Checking landing page events

What exists:

conv_landing_page_view event ✓
Has landing_page_source property (full URL)
Top landing pages visible immediately

Event exists, create insight

Top landing pages: https://us.posthog.com/project/98417/insights/mwjObaVe

The Question

“Group by traffic sources. Where are visitors coming from?”

Need to group by traffic source

Which property to use?

utm_source?
affid?
referring_domain?

Test them all. See what has data.

The Winner

Testing properties - referring_domain works

Insights created:

Landing Page Views - Main Landings Only: https://us.posthog.com/project/98417/insights/mwjObaVe
Landing Pages by Affiliate ID (affid): https://us.posthog.com/project/98417/insights/Z90PbMNm
Landing Pages by Referring Domain: https://us.posthog.com/project/98417/insights/l4ecohol

`referring_domain` - Good Starting Point

What it shows:

Automatically captured by PostHog (no setup)
Shows referrer: Facebook, Google, Direct, Snapchat
Works for all traffic

Top Referring Domains (Last 7 Days):

Facebook (m.facebook.com + www.facebook.com + l.facebook.com) - ~31,830 views total
Direct traffic ($direct) - 24,839 views
Google (www.google.com) - 4,205 views
Snapchat - 532 views
Instagram - 751 views
Yahoo - 435 views
Bing/MSN - 897 views combined

Cross-Validation

“25k direct traffic? That seems high. Check affid too.”

Question the numbers

Test alternative: Check affid (affiliate ID) for campaign attribution.

Affid mapping reveals more detail

Detailed breakdown insights:

Landing Pages by Path (Clean): https://us.posthog.com/project/98417/insights/PtBA63DB
Traffic Sources by Affiliate ID (Mapped): https://us.posthog.com/project/98417/insights/Kbd3NcUN

Affid → Traffic Source Mapping

When affid exists, it’s highly accurate:

Affid 1000 = Facebook

Total views: ~31,765
Breakdown:
- m.facebook.com: 21,844
- Direct (after Facebook): 3,764
- l.facebook.com: 2,088
- Instagram: 745

Affid 1001 = Google Search

Total views: ~2,869
Breakdown:
- www.google.com: 2,017
- Direct: 431
- Syndicated search: 92

Affid 1004 = Snapchat

Total views: ~8,072
Breakdown:
- Direct with Snapchat params: 7,004
- www.snapchat.com: 533

Affid 1010 = Bing/Yahoo

Total views: ~1,917
MSN, Yahoo, Direct, Bing, DuckDuckGo

Affid 15975 = Facebook (different campaign)

Total views: ~7,271
m.facebook.com + www.facebook.com

Affid 3016 = TheOffer

Total views: ~2,561
Direct + click.mediaforce.com

Coverage:

78% of traffic has affid (55,447 with vs 15,578 without)
Highly reliable when present
Can map affid → actual traffic sources for attribution

Results: Which Properties Work?

✅ affid - Most reliable for campaign attribution

78% coverage in this dataset
Requires mapping (affid → source name)
Shows specific campaigns: affid 1000 = Facebook, 1001 = Google, etc.
Can cross-validate with referring_domain to see actual referrer breakdown

✅ referring_domain - Good for cross-validation

Automatically captured (no setup)
Shows where traffic comes from
But: lumps things into “direct” that affid can separate
Use to validate affid mappings

❌ utm_source - Mostly broken

91% null in this dataset
Requires manual UTM tagging
Only works if marketers consistently tag (they don’t)

❌ tid - Useless

Almost all values are “1”
No differentiation

The Key Finding

Don’t trust single properties. Cross-validate.

referring_domain alone showed 25k “direct” traffic. Seems high. Check with affid:

affid breaks down that “direct” into actual campaigns
Affid 1004 (Snapchat): 7,004 of that “direct” came from Snapchat params
Affid 1000 (Facebook): 3,764 “direct” = post-Facebook clicks

The “direct” traffic wasn’t actually direct. You’d miss this without cross-validation.

Decision

Use:

affid as primary for campaign attribution (when tagged)
referring_domain to cross-validate and catch untagged traffic
Always compare both - discrepancies reveal insights

Ignore:

utm_source (too many nulls)
tid (not differentiated)

The Pattern

Not complex. Simple workflow:

Check what properties exist
Test which ones have data (look for nulls)
Question the results (25k direct? Really?)
Cross-validate multiple sources
Use what works, ignore what doesn’t

No meetings. No documentation. Check the data. Question everything.

Why This Matters

Without questioning:

See referring_domain works → Done!
Accept 25k “direct” traffic → Move on
Trust the first chart → Make decisions

Result: Wrong conclusions. Bad decisions.

With questioning:

See referring_domain works → “But 25k direct seems high”
Check with affid → “Oh, that ‘direct’ is actually Snapchat + Facebook”
Cross-validate → Find the truth

Result: Accurate understanding. Good decisions.

This applies to everything:

Question AI outputs (like this case showed)
Question dashboard numbers
Question “industry best practices”
Question vendor claims
Question your own assumptions

The tool (PostHog + MCP) enables fast testing. But only works if you question the results.

Principles Applied

Tool Features Used

PostHog event definitions list
Property exploration
Insight creation with breakdowns
Cross-validation across multiple properties
Real-time data querying

Previous: Case 10: Sales Investigation
Back to Index: All Cases

Case 11: Cross-Validating Attribution Properties

Case 11: Cross-Validating Attribution Properties

Context

Starting Point

The Question

The Winner

referring_domain - Good Starting Point

Cross-Validation

Affid → Traffic Source Mapping

Results: Which Properties Work?

The Key Finding

Decision

The Pattern

Why This Matters

Principles Applied

Tool Features Used

Navigation

`referring_domain` - Good Starting Point