Day 6 was about making themlspulse.com stop needing me. A site that updates itself. Every stat, every standing, every coach, every advanced metric—refreshed automatically, on schedule, from public APIs and Wikipedia.
Three Crons, Three Data Sources
The site now runs three Vercel cron jobs:
| Cron | Schedule | Source | What It Updates |
|---|---|---|---|
| refresh-stats | Daily, 8am UTC | ESPN API | Goals, assists, appearances, minutes, new players |
| refresh-standings | Sundays, 10am UTC | ESPN API + Wikipedia | 2026 standings (East/West/Overall) + head coaches |
| refresh-asa | Wednesdays, 12pm UTC | American Soccer Analysis | xG, xA, goals added, passing stats, GK metrics |
The ESPN cron was already there. Today I added the ASA cron and the Wikipedia coach scraper.
The Wikipedia Coach Problem
ESPN's soccer API doesn't expose coaches. That's a gap most sports sites would fill manually. We fill it with a Wikipedia scraper that parses MediaWiki infobox wikitext for all 30 teams every Sunday.
Three edge cases made this interesting:
- LAFC has both
| manager(ownership group) and| coach(head coach). Had to prefercoachovermanager. - NYCFC and San Jose have
| coach =(empty) with the actual coach under| manager. Had to fall through empty fields. - Orlando City had
''(interim)''wiki markup that needed stripping.
The parser now handles all 30 teams cleanly and only writes to the database when a coach actually changes. When it does change, it logs the diff: atlanta-united: Ronny Deila → Gerardo Martino.
ASA Enrichment: The xG Layer
American Soccer Analysis provides the advanced metrics that separate a stats site from a real analytics resource: expected goals, expected assists, goals added above average, passing completion vs. expected, and goalkeeper performance models.
The matching challenge is the same one we hit with youth pathway data—name normalization. ASA uses different name formats than ESPN. "Danny Musovski" vs. "Daniel Musovski." "Matty Longstaff" vs. "Matthew Longstaff." "Nouhou" vs. "Nouhou Tolo."
We maintain a hand-curated alias map (25 entries) plus fallback matching by last-name+team and first-name+team for mononyms. Current match rate: 99.3% of all ASA players who've appeared in 2026.
Youth Networks v2: The Full Pipeline
The youth pathways data from Day 5 was incomplete—single-source, low confidence. Today an 8-agent swarm rebuilt it from scratch:
- FotMob: 679 players with career histories (3,833 career entries)
- Wikipedia: 704 players with 2,961 senior + 529 youth career entries
- Wikidata: 657 players with structured team/education data
- TheSportsDB: 589 players with birth locations and nationalities
Merged result: 866 players (100% coverage), 1,967 unique clubs across 83 countries. 659 high confidence, 131 medium, 76 low.
New pages shipped:
- 30 country pathway pages (
/pathways/country/united-states, etc.) - 30 team development network pages (
/pathways/network/atlanta-united, etc.) - Upgraded hub with geographic breakdown and pipeline type visualization
- 2 pillar articles: "Where Do MLS Players Come From" and "MLS Development Networks"
The Health Check
Built a /murk-checkup command—a 10-check diagnostic that audits database integrity, route health, deployment status, sitemap completeness, content quality, data freshness, internal linking, SEO, build health, and comparison coverage. Three modes: read-only audit, auto-fix, and fix+deploy.
First run caught three issues:
- Player comparisons had been truncated to 500 rows (should be 100K+). Regenerated.
- Stat leaders dataset was completely missing. Seeded 10 categories, 477 entries.
- 50 meta descriptions exceeded 160 characters. Trimmed to under 155.
The Numbers
| Metric | Before | After |
|---|---|---|
| Youth clubs tracked | 832 | 1,967 |
| Countries represented | ~40 | 83 |
| Pathway confidence (high) | ~50% | 76% |
| ASA coverage | 61% | 65% (99.3% of eligible) |
| Comparisons in DB | 500 | 100,092 |
| Cron jobs | 2 | 3 |
| Automated data sources | 1 (ESPN) | 3 (ESPN, ASA, Wikipedia) |
What's Next
The site now has 1,400+ pages of real content, automated data pipelines, and a health monitoring system. The bottleneck is no longer content or data quality. It's visibility. Domain Rating is 0. Google Search Console hasn't been submitted. Nothing we've built matters until people can find it.
Day 7 should be about backlinks and indexation.