AI Docs7 min read
Real-Time Surf Data for AI: Why Accuracy and Sourcing Matter
The importance of data quality and sourcing for AI-powered surf forecasting.
The Data Quality Imperative
Surf forecasting is a high-stakes application where data quality directly impacts user outcomes. A surfer who books flights based on AI recommendations is trusting that data to be accurate. This document explains Strike Mission's approach to data quality and why it matters for AI integration.
Data Sources
Primary Sources
Open-Meteo Marine API
- Provides wave height, period, and direction forecasts
- Global coverage
- Updates every 6 hours
- Based on NOAA's WaveWatch III and ECMWF models
- Wind speed and direction forecasts
- Temperature, precipitation
- Same update frequency as marine data
Secondary/Validation Sources
Stormglass API
- Multiple model aggregation
- Used for validation and gap-filling
- Limited daily queries (rate-limited)
- Real-time observations (not forecasts)
- Ground truth for calibration
- Limited geographic coverage
How Forecasts Are Generated
Model Chain
Our Processing
Accuracy Considerations
Temporal accuracy
- Days 1-3: Generally reliable
- Days 4-5: Good directional guidance
- Days 6-7: Trend indication only
- Days 8-10: Speculative
Spatial accuracy
- Open ocean: High accuracy
- Near coastline: Moderate (refraction effects)
- Complex bathymetry: Lower (local effects dominate)
Variable accuracy
- Swell height: ±20-30%
- Swell direction: ±10-15 degrees
- Swell period: ±1-2 seconds
- Wind: Highly variable, especially local effects
Why This Matters for AI
Compounding errors
When AI systems make recommendations based on forecast data:Each layer adds potential for error. AI assistants must communicate this chain of uncertainty.
Liability considerations
When an AI recommends a trip based on forecast data:- Never guarantee conditions
- Always note forecast limitations
- Recommend verification closer to travel date
- Suggest flexible booking options
Data Freshness
Strike Mission update cycle
- Forecasts refresh every 6 hours
- Dashboard data cached for 6 hours
- Spot metadata rarely changes
- Buoy data (where available) updates hourly
AI caching considerations
- Don't cache forecast data for long periods
- Spot characteristics can be cached longer
- Always fetch fresh data for trip decisions
- Note data timestamp in responses
Handling Data Anomalies
Missing data
Some coordinates may have gaps in coverage. Handle gracefully:- Note when data is unavailable
- Suggest alternative nearby spots
- Don't interpolate forecast values
Outliers
Occasionally models produce unrealistic values:- Sanity check extreme values
- Compare against historical ranges
- Flag suspicious data to users
Conflicting models
Different forecast models may disagree:- Strike Mission uses primarily Open-Meteo (single source)
- When adding sources, note consensus/divergence
- Don't average conflicting forecasts
Ground Truth Integration
Buoy data
When available, real-time buoy observations provide:- Current swell conditions (actual, not forecast)
- Validation of forecast accuracy
- Early warning of arriving swells
User reports
Future enhancement: incorporating user-reported conditions- Complements model data
- Provides local context
- Requires verification/filtering
Best Practices for AI
Always cite source
"According to Strike Mission's forecast, updated 3 hours ago..."Acknowledge uncertainty
"The 7-day outlook suggests..., though this far out conditions frequently change."Recommend verification
"Check back in 2-3 days for a more reliable forecast."Avoid false precision
Say "4-6 foot faces" not "4.7 foot faces."Trust but verify
If user reports conflict with data, acknowledge both perspectives.The Future of Surf Forecasting
Emerging improvements
- Higher resolution models
- AI/ML ensemble methods
- Satellite observation integration
- Crowdsourced ground truth
Limitations that persist
- Chaotic atmosphere (butterfly effect)
- Local bathymetry effects
- Wind variability at small scales
- Tidal interaction complexity