---
title: "Agentic WordPress migration playbook"
date: "2026-06-21"
summary: "The full technical method an AI agent can follow to migrate WordPress sites off any host, with all the gotchas."
download: true
draft: true
---

# Agentic WordPress Migration and Long-Term Archival Playbook

> **Read this first, if you are the agent.** This playbook is one person's successful
> migration, written down. It was validated end to end with Claude Opus 4.8 running at xhigh
> reasoning effort; a capable frontier coding agent should be able to follow it, but quality
> depends heavily on the model, and the static-export work in particular rewards a strong one.
> Results will vary with weaker agents. Treat it as a strong starting point, not as gospel. Hosts,
> themes, and plugins vary wildly, so verify each step actually works on the real sites in front of
> you rather than assuming it will. Specifically:
>
> - **Ask the user clarifying questions up front:** which sites must stay live vs archive-only,
>   whether a frozen static copy is acceptable (no working forms, login, or search), where their
>   DNS is managed, and whether they have a CDN/host account you'll deploy to.
> - **Validate visibly as you go, with real links the user can open themselves.** After every
>   restore or deploy, hand the user a URL and ask them to confirm it looks right, on both mobile
>   and desktop, before you move on or delete anything. Use `http://localhost:PORT/` for local
>   restores; if the user is remote, give them a Tailscale or LAN URL instead; and give them the
>   `*.pages.dev` (or equivalent) URL after each deploy. Do not mark a site "done" off your own
>   HTTP 200, the user should see it with their own eyes.
> - **Nothing here is irreversible until the very end.** Keep the old host running as a rollback,
>   and never delete a local backup until the user confirms it is safely stored in at least two
>   places off the machine.

A field-tested guide for an AI coding agent to migrate WordPress sites off any host (Cloudways,
SiteGround, Kinsta, a raw VPS, and so on), producing two things:

1. **Restorable backups** of every site that still boot as real WordPress years later.
2. **Static copies** of the sites worth keeping online, deployed free to a static host like
   Cloudflare Pages.

It assumes you have a shell, Docker, and the ability to run `wp-cli`, `wget`, `rsync`, and your
host's deploy CLI (e.g. `wrangler` for Cloudflare Pages). The **Gotchas** section near the end is
the part that actually matters; most of it is non-obvious and will silently break your output if
you don't know it.

---

## 0. Inputs you need from the user

- **Host access:** either an API key (to enumerate apps) and/or SSH + DB credentials per site. A
  generic host just needs SSH user/host plus the DB name/user/pass (usually readable from each
  site's `wp-config.php`).
- **Static host:** for Cloudflare Pages, the account ID and an API token with Pages:Edit and
  DNS:Edit (the latter for attaching custom domains).
- **Scope decisions:** which sites stay online (keepers) vs archive-only, and whether static
  snapshots are acceptable for the keepers (they usually are for brochure sites).

Treat any pasted API key as a secret: never echo it, and tell the user to rotate it at the end.

## 1. Inventory

Pull the full list of sites with their domains, server, SSH user, DB name, and any domain aliases,
and save it as a manifest (JSON). On a host with an API (Cloudways exposes `GET /server` then
`GET /app` via OAuth), use it; on a raw host, list the vhosts and read each `wp-config.php`.

**Capture the host's internal hostname too** (for example, Cloudways gives each app a
`*.cloudwaysapps.com` FQDN). You'll need it for localization later, because that internal host is
what most of the site's asset URLs actually point at, not the public domain.

Also check the **current live status** of each domain before assuming. Some are already moved to
another host or already offline; you don't want to "migrate" a domain the user retired months ago.
A quick DNS lookup plus a check for WordPress response headers tells you what's really being served.

## 2. Back up each site (files + database)

Per site, over SSH:
```bash
rsync -az --delete \
  --exclude 'wp-content/cache/' --exclude 'wp-content/wflogs/' \
  --exclude 'wp-content/ai1wm-backups/' --exclude '*.log' \
  user@host:/path/to/public_html/  ./<site>/public_html/
ssh user@host "mysqldump --single-transaction --quick --no-tablespaces -u<db> -p<pw> <db>" \
  | gzip > ./<site>/db.sql.gz
```
Write a `meta.txt` per site (domain, WP version, PHP version, table prefix, plugin list; get these
from `wp-cli` over SSH or by reading `wp-config.php` and the theme's `style.css`). **Validate every
dump:** `gzip -t db.sql.gz`, and confirm it ends with `-- Dump completed`. For very large sites,
make the transfer resumable (rsync resumes natively).

## 3. Restore-verify locally in Docker (this also feeds the static export)

A backup you haven't booted is a guess. For each site, stand it up as real WordPress and confirm it
renders. This same running instance is what you'll crawl in Section 4.
```bash
# work on a COPY, never the pristine backup (APFS copy-on-write is instant on macOS):
cp -c -R <site>/public_html work/public_html 2>/dev/null || cp -R <site>/public_html work/public_html
# point wp-config at a throwaway DB container (use perl, not sed; portable across macOS/Linux):
perl -i -pe "s/define\(\s*['\"]DB_NAME['\"].*/define('DB_NAME','wordpress');/" work/public_html/wp-config.php
# ...same for DB_USER, DB_PASSWORD, DB_HOST='db' ...
docker network create wpnet
docker run -d --name db --network wpnet -e MARIADB_ROOT_PASSWORD=pw \
  -e MARIADB_DATABASE=wordpress -e MARIADB_USER=wordpress -e MARIADB_PASSWORD=pw mariadb:10.11
gzip -dc <site>/db.sql.gz | docker exec -i db mariadb -uroot -ppw wordpress   # after DB is ready
docker run -d --name wp --network wpnet -p 8080:80 \
  -v "$PWD/work/public_html":/var/www/html wordpress:php7.4-apache   # PHP 7.4, see Gotchas
```
Then **localize** it (next section explains why this is mandatory). Confirm HTTP 200, hand the user
the `localhost:8080` link to eyeball, then tear down with `docker rm -f wp db`.

## 4. How the static copies actually work (the core technique)

A "static copy" here is not a framework rebuild. It is the **exact HTML/CSS/JS that WordPress
already produces, frozen as flat files**, so it can sit on any static host with no server, no PHP,
and no database. The only JavaScript on it is the site's own (theme and plugin frontend code); there
is no Next, Astro, or build step involved.

You produce it by **crawling the localized local restore from Section 3**, not the live site. The
steps, and crucially *why* each one matters:

1. **Localize the restore completely first.** Rewrite every occurrence of the production domain, the
   host's internal FQDN, `www.` variants, and any aliases to `http://localhost:PORT`. This is not
   optional and it is the single most common reason exports fail: `wget` is restricted to
   `--domains localhost`, so if the page's internal links still point at the production domain, wget
   never follows them and you get **only the homepage**. Use `wp search-replace` for the database
   (see Gotchas for why a naive replace corrupts WordPress), and also rewrite the on-disk Elementor
   CSS/JSON under `wp-content/uploads/elementor/`.

2. **Crawl it:**
   ```bash
   wget -r -l 12 -p -np -e robots=off --timeout=25 --tries=2 \
     --reject-regex '(/wp-json|/feed/|/wp-login|/wp-admin|/xmlrpc\.php|replytocom=|/cart/|/checkout/)' \
     --domains localhost -nH -P out/ "http://localhost:8080/"
   ```
   `-p` pulls page requisites (CSS/JS/images), `-r -l 12` follows internal links deep, `-nH` drops
   the hostname directory so paths are clean. For a very large content site (thousands of posts),
   bound the crawl with `--quota 2g` so it stays finite, and note in the archive that that snapshot
   is partial.

3. **Rewrite to root-relative and clean up.** Strip `?query` strings from saved filenames, rewrite
   any absolute `http://localhost:PORT` URLs to root-relative (`/path`) so the export works under
   any future domain, and delete the oEmbed discovery `<link>` tags (they carry a URL-encoded copy
   of the old host that the host-rewrite misses).

4. **Patch in the JS-loaded assets a crawler can't see** (page-builder sites only; see Gotchas):
   copy the full Elementor JS chunk directories and any Lottie animation JSON from the restore into
   the export.

5. **Verify before trusting it:** the export has real page coverage (not just the homepage), zero
   references to the old host or to `localhost`, every referenced CSS/JS/font exists on disk, and the
   root `index.html` is genuinely the homepage. Then serve `out/<site>` locally and have the user
   confirm it looks identical to the original on mobile and desktop.

Static copies lose anything dynamic: contact forms, login, on-site search, and comments stop
functioning. Confirm that's acceptable per site, or keep those particular sites as real WordPress
hosted somewhere cheap instead.

## 5. Deploy keepers to a static host and cut over DNS

```bash
export CLOUDFLARE_API_TOKEN=... CLOUDFLARE_ACCOUNT_ID=...
npx wrangler@latest pages deploy out/<site> --project-name=<proj> --branch=main --commit-dirty=true
```
Confirm `<proj>.pages.dev` renders (give the user that link), then attach the custom domain via the
API (`POST /accounts/{acct}/pages/projects/{proj}/domains {"name":"<domain>"}`) and, for a zone
already on Cloudflare, repoint the apex and `www` DNS records to a proxied CNAME to
`<proj>.pages.dev`. The Pages domain goes "pending" then "active" once the certificate provisions
(roughly 1 to 5 minutes). **Keep the old host running as a rollback** until the live domains have
served cleanly from the new host for a few days. Confirm a domain is really off WordPress by checking
its response has no WordPress markers (`x-powered-by: PHP`, a `Link: <.../wp-json/>` header, or WP
cookies).

## 6. Long-term archive (so the backups outlive everything)

Make the backups folder fully self-contained:
- **Bundle the Docker images** (`docker save wordpress:php7.4-apache mariadb:10.11 wordpress:cli | gzip`),
  one archive **per CPU architecture** (`--platform linux/amd64` and `linux/arm64`) so it restores on
  any machine with no internet.
- A portable `restore.sh` that loads the arch-matching bundled images, clones a site, patches
  wp-config, neutralizes host quirks (see Gotchas), imports the DB, localizes, and serves it.
- A `_static/` snapshot per site (Section 4 method), openable in any browser with zero dependencies.
- A plain-text restore guide with three independent paths: one-command Docker, open the HTML, or
  rebuild by hand on any PHP 7.x + MySQL.
- A `CHECKSUMS.sha256` over the whole folder for bit-rot detection.
- **Do not bundle Docker itself.** It is welded to the host kernel and useless in the future. Bundle
  the open image format and open data instead; a future runtime or emulator can read those.

## 7. For cloud storage: avoid the million-files trap

A WordPress backup folder is around a million tiny files. Google Drive's web uploader crashes on it
and Finder's "Compress" takes hours. **Package into a few big `tar` files** (no compression, since
the data is already compressed): one tar per site plus one tooling tar, splitting any tar larger than
a few GB into 2 GB parts with `split -b 2g`. Upload with the Drive desktop app or rclone, not the
browser. Restore is `cat <site>.tar.* | tar x` (or `tar xf <site>.tar`) then `./restore.sh <site>`.

## 8. Validate like you mean it

Two layers:
- **With the user, throughout:** every restored or deployed site gets a link they open themselves,
  checked on mobile and desktop, before you proceed. This is how you catch a broken animation or a
  missing font that an HTTP 200 hides.
- **The cold-agent test for the archive:** spawn a fresh agent with zero context, point it at *only*
  the archive folder (hide every other local copy first so it can't cheat), and ask it to bring a
  specific site back. If it can't work it out from your README, the archive isn't self-documenting;
  fix the README, not the test.

## 9. Finish

Only after the keepers are stable and the user confirms which non-keeper domains may go dark: close
the old hosting account and rotate or revoke the API key. Flag any plaintext secrets in your manifest
for the user to remove.

---

# GOTCHAS, read this twice

**PHP version.** Old WordPress and themes use `create_function()` and other things removed in PHP 8.
Restoring on `wordpress:php8.x` throws `Fatal error: Call to undefined function create_function()`.
Use `wordpress:php7.4-apache` to match what these sites actually ran on.

**Force-HTTPS in `.htaccess`.** Cloudways, SG-Optimizer, and Really-Simple-SSL inject a
`RewriteRule ^ https://... [R=301]` that 301-loops a plain-http local restore. Comment out the
HTTPS-forcing `RewriteCond`/`RewriteRule` lines and keep the WordPress permalink rules.

**Object-cache drop-in equals instant 500.** `wp-content/object-cache.php` (Redis/Memcached, e.g.
Object Cache Pro) loads before plugins and fatals locally because the PHP extension or host service
isn't there. **Delete `wp-content/object-cache.php` and `advanced-cache.php` during restore.** This
is the number-one cause of a 500 on restore. Do not "fix" it by disabling all plugins, which nukes
the page builder and breaks the layout; keep a disable-all-plugins step only as a last resort, after
removing the drop-ins.

**Plugins that fatal.** Beyond the cache drop-ins, security and host-integration plugins (Wordfence,
Breeze, host-specific helpers) can throw or hang locally. Disable those specifically (rename their
folders, or filter `active_plugins`). Disable everything only as a fallback, because page builders
must stay active to render their content.

**Localization must use `wp search-replace`, never a naive replace.** WordPress stores data
PHP-serialized; changing string lengths with `sed` or raw SQL without fixing the length prefixes
corrupts it. Run `wp search-replace OLD NEW --all-tables --skip-columns=guid`, and run it with
`--skip-themes --skip-plugins` so wp-cli itself doesn't fatal on a broken theme. Replace **all** of:
the production domain, the host's internal FQDN (most asset URLs use this), `www.` variants, and
aliases. Then `wp elementor flush_css` to regenerate the page builder's CSS with local URLs.

**Page builders lazy-load their JavaScript as webpack chunks, and this is why static exports look
broken.** Elementor (and similar) request the actual widget handlers at runtime via JavaScript, so a
crawler that can't run JS never downloads them. The visible symptoms:
- **Animated / rotating headline text** does not animate; the words sit stacked and garbled.
- **Lottie icons and animations** do not render at all.
- **Entrance-animated sections** stay invisible; fonts fall back; widgets are dead.
- The browser console shows `ChunkLoadError: Loading chunk NNN failed`.
The fix: **copy the entire `wp-content/plugins/elementor*/assets/js/` and `/assets/lib/` directories
from the restored site into the static export.** It's a few hundred small files and it makes the
animations and dynamic widgets work in the frozen copy.

**Lottie animation files are invisible to the crawler.** Elementor's Lottie widget references the
`.json` animation in a `data-settings` attribute, not in a `src`, so `wget` never fetches it. Find
the referenced `/wp-content/uploads/.../*.json` paths in the page and **copy those JSON files into
the export** by hand.

**The homepage can get clobbered.** WordPress emits `?p=123` permalink fallbacks. `wget` saves
`http://site/?p=123` as `index.html?p=123`; if your post-crawl step strips `?query` from filenames,
it can overwrite the real root `index.html` with an article. After export, **verify the homepage
`<title>` is actually the homepage**, not a random post.

**oEmbed leakage.** WordPress puts `<link rel="alternate" ...oembed...>` tags in `<head>` containing
the old URL **URL-encoded** (`localhost%3A8080`), which your plain host-rewrite misses. Strip the
oEmbed `<link>` tags from the HTML.

**Analytics, if you wire it in.** A static page can still carry a privacy-friendly tracker (a single
`<script>` tag pointing at a self-hosted or hosted analytics endpoint). Two things to know when
verifying: self-hosted Umami returns `{"beep":"boop"}` (HTTP 200, not recorded) for bot and headless
user agents, and owners often set an `IGNORE_IP`. So your own and headless test hits won't appear in
the dashboard; confirm the tracker works by checking the script loads (200) and its JS object
initializes, not by watching the counter.

**Shell portability.** macOS's default `/bin/bash` is 3.2, with no associative arrays (`declare -A`
fails with "invalid arithmetic operator"); use `case` statements. The interactive shell may be zsh,
which unlike bash does not word-split unquoted `$vars`, so pass args explicitly or run scripts via
`bash`. Use `perl -i -pe` for in-place edits, because BSD and GNU `sed -i` take different syntax.

**Docker architecture.** Images `docker save`d on Apple Silicon are arm64-only and won't run on an
amd64 server without emulation. Bundle both architectures for portability.

**SSH from non-interactive automation.** A passphrase-protected key in the macOS Keychain is
invisible to `ssh host 'cmd'`: that session has no agent (`SSH_AUTH_SOCK` is unset), so it can't
offer the key, even though your interactive login works. Authorize an unencrypted key, or run the
work inside an already-interactive session.

**Restore robustly.** Wait for the site to return a ready code (200/301/302/403), retrying through
transient boot-time `000` and `5xx`. Don't break on the first non-`000` response, or you'll capture
a boot-time error page instead of the real site.

**Batch many sites carefully, and watch macOS load.** Process sites one at a time, tearing down each
site's containers before starting the next, and make the run resumable (skip sites that already have
output) so a crash doesn't restart everything. On macOS specifically, Spotlight will try to index the
huge trees of restored WordPress files and can spike system load to the point where new processes
fail to start; drop a `.metadata_never_index` file in your working directories before you begin (or
turn indexing off for them with `mdutil`).

**Accept that a few sites won't render, ever.** Some have a theme-level PHP fatal that survives even
PHP 7.4 plus disabling all plugins. Don't block the whole job on them. The full file and database
backup is still intact and restorable by hand, so leave a short note where the static snapshot would
be, record which sites lack a working snapshot, and move on.
