Blog
·6 min read·WooScraper team

The four places Shopify ↔ WooCommerce migrations break

After running a thousand-product test, here are the data shapes that punish every CSV-based migration tool — and what we do about them.

We built WooScraper v2 because every CSV-based catalog migration we'd ever seen broke on the same handful of data shapes. After a 1,019-product end-to-end test against allbirds.com, here are the four edges that hurt — and what we do about each.

1. Variant matrices that don't round-trip

Shopify natively supports three option dimensions (size, color, material). WooCommerce is unbounded. Move a Shopify product into Woo and life is easy; move a four-dimensional Woo product into Shopify and a naive tool either drops the fourth axis or silently flattens it into the title.

Our approach: map up to three options 1:1, then split overflow into Shopify metafields under custom.option_4, custom.option_5, etc. The theme can read them; the import never silently loses data.

2. Images that work today and 404 next month

Most migration tools generate a CSV where image columns are hot-link URLs back to the source store. That works when you import — the destination platform fetches the image once. It fails six weeks later when the source store changes its CDN, blocks your IP, or simply deletes the product. The destination still has the CSV reference, but the image is gone.

We download every image at scrape time, optimize to WebP, and rehost on our CDN. The CSV you import references our CDN. Even if the source disappears tomorrow, your import is stable. Storage costs us money; we'd rather eat it than ship time bombs.

3. Reviews stuck in third-party apps

Shopify stores routinely store reviews in Yotpo, Judge.me, Loox, or Stamped. WooCommerce stores use native WP comments. There's no canonical format. A migration that leaves reviews behind erases social proof — the single most fragile asset on most ecom sites.

We probe the four major third-party review apps' public endpoints when the widget is detected on the source page, normalize everything into one schema, and emit the import file in whichever format the destination app needs: Woo's native reviews endpoint, Judge.me CSV, Loox CSV, or raw JSON. Bundled with every job, never an upsell.

4. Catalogs that exceed browser-only scrapers

Most browser-extension scrapers do everything in the browser tab. Parse, hold in memory, ZIP. That works at 100 products. It freezes at 1,000 and crashes the tab at 5,000. Worse, if the user closes the tab mid-scrape, all progress is lost.

We use the browser only as a fetcher — the smallest necessary primitive, run with the user's real session so anti-bot protection sees a human. Raw responses ship to our backend in batches of ten. Parsing, normalization, deduplication, image rehosting, CSV generation, and ZIP packaging all happen server-side. The user can close the tab after the first 30 seconds; the backend finishes the job and posts an “export ready” notification.

The takeaway

The four edges above aren't exotic. Any catalog over 500 products will hit at least two of them. Most CSV-based tools acknowledge zero of them. We don't claim to be the only ones who do this work — we do claim to be the only ones who do it for $19 one-time instead of $40/month.

If you want the platform-specific guides: Shopify → WooCommerce or WooCommerce → Shopify. The full Allbirds case study with raw numbers is here.