Official APIs vs Web Scraping: How Carrier Tracking Works Behind the Scenes
When you call a tracking API, the data has to come from somewhere. Behind every tracking response is a complex system that retrieves information from hundreds of carriers, each with different technical capabilities. This article explores the two main approaches — official APIs and web scraping — and how they affect the reliability of your tracking data.
The Two Approaches
Official Carrier APIs
Some carriers provide official REST or SOAP APIs for tracking:
Your App → WhereParcel API → Carrier Official API → Tracking Data
Carriers with official APIs include:
- FedEx (REST API with OAuth 2.0)
- UPS (REST API with API key)
- DHL (REST API with API key)
- Korea Post (EMS API)
Advantages:
- High reliability and uptime
- Structured, consistent data format
- Officially supported and documented
- Higher rate limits
Limitations:
- Requires business agreements with each carrier
- API access often requires approval process
- Some carriers charge for API access
- Not all carriers offer APIs
Web Scraping
Most carriers worldwide don’t offer public APIs. For these, tracking data is retrieved by programmatically visiting the carrier’s tracking page:
Your App → WhereParcel API → Carrier Website (HTML) → Parse → Tracking Data
Carriers that require scraping:
- Most regional and domestic carriers
- Smaller logistics companies
- Postal services without API programs
Advantages:
- Works with any carrier that has a website
- No business agreements needed
- Can track with 500+ carriers immediately
Challenges:
- Website changes can break parsing
- IP rate limiting and blocking
- Slower response times
- Less structured data
Why This Matters for Your Application
Reliability Differences
| Aspect | Official API | Web Scraping |
|---|---|---|
| Uptime | 99.9%+ | 95-99% |
| Response time | 200-500ms | 1-5 seconds |
| Data consistency | High | Medium |
| Breaking changes | Versioned, announced | Unannounced |
| Rate limits | Published | Unpredictable |
How WhereParcel Handles Both
WhereParcel automatically uses the best available method for each carrier:
- Official API first — When a carrier offers an API, we always use it
- Scraping fallback — For carriers without APIs, our distributed scraping infrastructure provides reliable data
- Unified response — Regardless of the source, you get the same standardized response format
{
"success": true,
"data": {
"carrier": "kr.cjlogistics",
"trackingNumber": "1234567890",
"status": "in_transit",
"source": "api",
"events": [...]
}
}
The source field tells you whether data came from an api or scraping source, so you can handle edge cases accordingly.
The Infrastructure Challenge
Running a reliable scraping infrastructure at scale requires solving several hard problems:
IP Rotation
Carrier websites block IPs that make too many requests. A production-grade system needs:
- Multiple IP pools across different regions
- Automatic rotation when blocks are detected
- Distributed architecture to spread requests
Parsing Resilience
When a carrier redesigns their website, scraping breaks. Mitigation strategies include:
- Multiple parsing strategies per carrier
- Automated monitoring for parsing failures
- Quick turnaround on parser updates
- Cached data to serve while parsers are being fixed
Data Normalization
Each carrier’s website presents data differently. The scraping system must normalize:
- Status codes (every carrier has different ones)
- Timestamps (various formats and time zones)
- Location names (abbreviations, different languages)
- Event descriptions (translated, standardized)
Best Practices for Your Integration
1. Don’t Assume Instant Data
Scraping-based tracking may be slightly delayed. Design your UI to handle this:
// Show loading state with context
function TrackingStatus({ tracking }) {
if (tracking.loading) {
return <div>Retrieving tracking data... This may take a few seconds.</div>;
}
return <TrackingTimeline events={tracking.events} />;
}
2. Use Webhooks for Scraping-Heavy Carriers
For carriers that rely on scraping, webhooks are especially important. They let WhereParcel check the carrier’s website on a schedule and notify you only when something changes, rather than you polling and consuming rate limits.
3. Cache Aggressively
Since scraping is slower and less predictable than API calls, cache tracking results on your end:
const CACHE_DURATION = {
api: 5 * 60, // 5 min for API-sourced data
scraping: 15 * 60, // 15 min for scraping-sourced data
};
4. Handle Graceful Degradation
Occasionally, a carrier’s website may be down or have changed. Handle these cases gracefully:
if (tracking.status === 'temporarily_unavailable') {
// Show last known data with a notice
return (
<div>
<TrackingTimeline events={tracking.lastKnownEvents} />
<Notice>Live tracking is temporarily unavailable for this carrier.
Showing last known status from {tracking.lastUpdated}.</Notice>
</div>
);
}
The Future: More APIs, Better Scraping
The industry is gradually moving toward more carriers offering official APIs. In the meantime, web scraping remains essential for comprehensive global coverage. WhereParcel invests heavily in both approaches to give you the most reliable tracking data possible across 500+ carriers.
For questions about specific carrier data sources, check our Carriers documentation or contact our team.