
Our Zigbee mesh had been sick for months: devices going stuck, new pairings hanging, removals failing, and a network-wide sluggishness that came and went. Around 70 devices — lights, plugs, buttons, climate sensors, mmWave presence radars — on an EFR32MG24-based Ethernet coordinator running Zigbee2MQTT’s ember driver.
Today we moved the whole network to a CC2652P-based coordinator (an SMLIGHT SLZB-06, also over Ethernet) on the zstack driver — in place, preserving the network, with almost no re-pairing. The official Zigbee2MQTT docs call cross-stack restore unsupported: “results might vary.” They did vary. This is the blueprint of what actually happened, including the one-line fix that made it work.
The setup
Two changes, one session:
- Rehome Zigbee2MQTT onto a new VM host (its data had been living on an NFS share; now it’s on local NVMe with hourly snapshots).
- Swap the coordinator: ember/EFR32 → zstack/CC2652P, using Z2M’s backup/restore so devices keep their network.
Both transport-over-TCP, so no USB passthrough anywhere. MQTT broker and base_topic stayed identical — which means every Home Assistant entity ID survives untouched. That’s the quiet superpower of the Z2M architecture: HA couples to MQTT topics, not to where Z2M runs or what radio it speaks through.
Finding #1: the secret second client
During pre-flight recon we found something nobody had put in the plan: Home Assistant still carried an enabled ZHA config entry pointing at the same TCP coordinator socket Zigbee2MQTT was using. Zero devices, never completed setup — but every HA restart it would poke the coordinator while Z2M held it.
We got live proof during the migration: the moment the old Z2M instance disconnected, something grabbed the socket, and the new instance crash-looped on the EZSP handshake until the ZHA entry was deleted.
Rule: one coordinator, one client. If you migrated from ZHA to Z2M years ago, check that the old config entry is actually gone. A TCP coordinator makes this failure mode much easier to create than a USB stick ever did.
Phase 1: moving the host
Mostly routine — stop, copy data, redeploy elsewhere — with one lesson worth paying for:
Take your backup after stopping Zigbee2MQTT, and checksum it. Z2M rewrites
database.dbandcoordinator_backup.jsonon shutdown. Our first tar, taken while it was still running, silently differed from the post-stop state.md5sumon both ends caught it.
Verification that the move was clean: same coordinator, same network, only the client host changed — devices answered a state read round-trip within a minute.
Phase 2: the cross-stack restore (and the trap)
The happy path: stop Z2M, point serial: at the new coordinator with adapter: zstack, power off the old coordinator, start. zigbee-herdsman detects a blank adapter plus a valid backup and restores the network — PAN ID, extended PAN, network key, frame counter — onto the new radio.
What actually happened:
z2m: Error: network commissioning timed out - most likely network
with the same panId or extendedPanId already exists nearbyCrash loop, every ~70 seconds. The “network nearby” was our own live mesh — forty mains-powered routers still beaconing. Z2M wasn’t restoring; it was trying to form a brand-new network with the same parameters, and colliding with itself.
Why? We read the herdsman source inside the container image. The zstack startup strategy requires the backup to exactly match the configuration before it will restore — and that comparison includes the channel list, compared as packed bitmasks:
- Our config said
channel: 25. - The backup — written by the ember driver — contained
channel_mask: [11, 12, ..., 26]. The full scan mask. All sixteen channels.
[25] !== [11..26], so herdsman concluded “configuration does not match backup” and silently fell through to forming a new network. The error message about a conflicting PAN nearby is two steps removed from the actual cause. Nothing in the logs says “your channel mask is why.”
The fix is one line. Edit coordinator_backup.json:
"channel_mask": [25]Set advanced.log_level: debug for the next start and you can watch the decision flip:
(stage-1) adapter is not configured / not commissioned
(stage-2) configuration matches backup
determined startup strategy: restoreBackup
...
zigbee-herdsman started (restored)Same PAN, same key, channel 25, frame counter carried over. The mesh never knew anything changed.
Two more cross-stack footnotes:
- Unplug the old coordinator before the first start. Its radio keeps the network alive independent of any host software. Two coordinators with identical restored parameters is not an experiment you want.
- The restored coordinator IEEE came out byte-reversed relative to the original. It sounds alarming; it’s benign. The radio, the database entry, and all new bind targets are self-consistent, and existing device bindings deliver by network address (0x0000) anyway. The only fallout is a possible stale duplicate “bridge” device in HA’s MQTT discovery.
Results
- ~90% of live devices worked immediately — no re-pairing, no renaming, no HA changes.
- Routes rebuilt over ~5 minutes (the first test wave looked scary at 1-of-6; twenty minutes later the core of the house answered).
- Several devices the old coordinator had lost for days came back on their own.
- A device that had never successfully paired on the ember coordinator — one of the reasons for this migration — paired on the first attempt.
- The stragglers were exactly the devices that were already dead before the migration: flat batteries and wall-switched lamps, not mesh victims.
Appendix: the flooders
Part of the original instability diagnosis was “at least one chatty device.” Measurement (a 45-second MQTT sample, counted per topic) found three Tuya ZY-M100 mmWave presence radars each pushing 30–100 messages per minute — and diffing consecutive payloads showed they were identical, or differed only in link quality. The firmware just re-broadcasts its full state about once a second. The advertised knobs (detection_delay, sensitivity) changed nothing.
What worked: Z2M’s per-device debounce option —
debounce: 2
debounce_ignore:
- presence— which collapsed 50 msgs/min to 6–9 on the MQTT side while keeping presence transitions instant. The radio-side chatter remains (only replacing the hardware fixes that), but the CC2652P absorbs it without drama. It was the previous driver/radio combination that couldn’t.
Credit where it’s due
Two nudges:
To the Zigbee2MQTT team and Koenkk’s ecosystem: the fact that a not-officially-supported cross-stack migration comes down to one JSON field is a testament to how well the open coordinator backup format, the adapter abstraction, and the debug logging are built. Per-device options like debounce, the frontend, the discovery integration — this project carries an absurd amount of the smart-home world on volunteer shoulders. Sponsor it if you rely on it.
On working with an AI pair-operator: this migration was executed interactively with Claude Code driving recon and cutover — and the decisive moment was not automation, it was diagnosis: when the restore crash-looped, it read the herdsman source straight out of the container image, traced the strategy decision to the packed-channel-list comparison, and proposed the one-line backup edit with a debug-log verification plan. Checksum discipline, a live log monitor during the soak, and payload-diffing the flooders came from the same place. The human contribution was judgment: what to risk, when to cut over, which physical plugs to pull. That division of labor felt right.
The blueprint, condensed
- Recon first: confirm nothing else talks to your coordinator socket (looking at you, leftover ZHA entries).
- Update Z2M, then stop it and back up the data dir. Checksum the copy.
- New coordinator: flash current Z-Stack coordinator firmware, Ethernet mode, reserved IP, before touching the network.
- Edit
serial:→adapter: zstack, new port. Setcoordinator_backup.jsonchannel_maskto your actual channel. - Power off the old coordinator. Start. Verify
restoreBackupin debug logs. - Wait for routes (minutes, not seconds). Test mains routers first, then battery devices.
- Re-pair only what stays dead — original friendly names mean HA entities survive re-pairing too.
- Measure your chattiest devices;
debouncethe unfixable ones. - Keep the old coordinator as a cold spare. Never power both.
Total downtime for the radio cutover: about 30 minutes, most of it deliberate verification. The network came back healthier than it went down.
BUILT.