Official APIs vs Web Scraping: How Carrier Tracking Works Behind the Scenes

When you call a tracking API, the data has to come from somewhere. Behind every tracking response is a complex system that retrieves information from hundreds of carriers, each with different technical capabilities. This article explores the two main approaches — official APIs and web scraping — and how they affect the reliability of your tracking data.

The Two Approaches

Official Carrier APIs

Some carriers provide official REST or SOAP APIs for tracking:

Your App → WhereParcel API → Carrier Official API → Tracking Data

Carriers with official APIs include:

  • FedEx (REST API with OAuth 2.0)
  • UPS (REST API with API key)
  • DHL (REST API with API key)
  • Korea Post (EMS API)

Advantages:

  • High reliability and uptime
  • Structured, consistent data format
  • Officially supported and documented
  • Higher rate limits

Limitations:

  • Requires business agreements with each carrier
  • API access often requires approval process
  • Some carriers charge for API access
  • Not all carriers offer APIs

Web Scraping

Most carriers worldwide don’t offer public APIs. For these, tracking data is retrieved by programmatically visiting the carrier’s tracking page:

Your App → WhereParcel API → Carrier Website (HTML) → Parse → Tracking Data

Carriers that require scraping:

  • Most regional and domestic carriers
  • Smaller logistics companies
  • Postal services without API programs

Advantages:

  • Works with any carrier that has a website
  • No business agreements needed
  • Can track with 500+ carriers immediately

Challenges:

  • Website changes can break parsing
  • IP rate limiting and blocking
  • Slower response times
  • Less structured data

Why This Matters for Your Application

Reliability Differences

AspectOfficial APIWeb Scraping
Uptime99.9%+95-99%
Response time200-500ms1-5 seconds
Data consistencyHighMedium
Breaking changesVersioned, announcedUnannounced
Rate limitsPublishedUnpredictable

How WhereParcel Handles Both

WhereParcel automatically uses the best available method for each carrier:

  1. Official API first — When a carrier offers an API, we always use it
  2. Scraping fallback — For carriers without APIs, our distributed scraping infrastructure provides reliable data
  3. Unified response — Regardless of the source, you get the same standardized response format
{
  "success": true,
  "data": {
    "carrier": "kr.cjlogistics",
    "trackingNumber": "1234567890",
    "status": "in_transit",
    "source": "api",
    "events": [...]
  }
}

The source field tells you whether data came from an api or scraping source, so you can handle edge cases accordingly.

The Infrastructure Challenge

Running a reliable scraping infrastructure at scale requires solving several hard problems:

IP Rotation

Carrier websites block IPs that make too many requests. A production-grade system needs:

  • Multiple IP pools across different regions
  • Automatic rotation when blocks are detected
  • Distributed architecture to spread requests

Parsing Resilience

When a carrier redesigns their website, scraping breaks. Mitigation strategies include:

  • Multiple parsing strategies per carrier
  • Automated monitoring for parsing failures
  • Quick turnaround on parser updates
  • Cached data to serve while parsers are being fixed

Data Normalization

Each carrier’s website presents data differently. The scraping system must normalize:

  • Status codes (every carrier has different ones)
  • Timestamps (various formats and time zones)
  • Location names (abbreviations, different languages)
  • Event descriptions (translated, standardized)

Best Practices for Your Integration

1. Don’t Assume Instant Data

Scraping-based tracking may be slightly delayed. Design your UI to handle this:

// Show loading state with context
function TrackingStatus({ tracking }) {
  if (tracking.loading) {
    return <div>Retrieving tracking data... This may take a few seconds.</div>;
  }
  return <TrackingTimeline events={tracking.events} />;
}

2. Use Webhooks for Scraping-Heavy Carriers

For carriers that rely on scraping, webhooks are especially important. They let WhereParcel check the carrier’s website on a schedule and notify you only when something changes, rather than you polling and consuming rate limits.

3. Cache Aggressively

Since scraping is slower and less predictable than API calls, cache tracking results on your end:

const CACHE_DURATION = {
  api: 5 * 60,      // 5 min for API-sourced data
  scraping: 15 * 60, // 15 min for scraping-sourced data
};

4. Handle Graceful Degradation

Occasionally, a carrier’s website may be down or have changed. Handle these cases gracefully:

if (tracking.status === 'temporarily_unavailable') {
  // Show last known data with a notice
  return (
    <div>
      <TrackingTimeline events={tracking.lastKnownEvents} />
      <Notice>Live tracking is temporarily unavailable for this carrier.
        Showing last known status from {tracking.lastUpdated}.</Notice>
    </div>
  );
}

The Future: More APIs, Better Scraping

The industry is gradually moving toward more carriers offering official APIs. In the meantime, web scraping remains essential for comprehensive global coverage. WhereParcel invests heavily in both approaches to give you the most reliable tracking data possible across 500+ carriers.

For questions about specific carrier data sources, check our Carriers documentation or contact our team.