Engineering Dispatches

I renamed every automation in my house and found four bugs

2026-04-22T09:00:00-04:00

My Home Assistant automations list had grown to 56 entries over four years, each one created in a different mood by the person I was that week. Some started with verbs (Turn off iron after 30 mins), some with subjects (Family Room Block Button), some with vendor names (Reolink driveway person/animal/vehicle notification), one with all-caps for no reason (OFF - Driveway Retaining Wall - 10 PM). When I wanted to find the motion-lights automation for the kitchen, I had to scroll past every automation whose name started with “Turn” before I got there.

I renamed all of them in one sitting. Here’s what I landed on and why.

the convention

Every automation now looks like this:

[Area] Subject — qualifier

Three examples from mine:

[Kitchen] Lights on with motion — after sunset
[Pool] Cover pump off — below 35°F
[Side Yard] Camera — motion describe + notify

[Area] is square-bracketed so it reads as a tag and not as part of an English sentence.
The subject is whatever the thing is — Lights, Camera, Iron, Cover pump. No verb.
The qualifier (after the em-dash) is the narrowing condition — the trigger, the threshold, the time of day.

This beat every other format I considered because of how the HA automations list is sorted: alphabetically, with no grouping. Area-first means everything in the kitchen clusters together; subject-second means I can find “Lights” within an area by eye rather than by search. The qualifier at the end is scannable because the em-dash gives it a visual handle.

The format I didn’t use was verb-first (Turn on kitchen lights...). It reads nicely as an English sentence, and it’s the default when you’re writing an automation from scratch. But every single verb-first automation I had started with the word “Turn” — which is exactly the column where I needed variation to find things.

the second dimension: labels

HA has a label system that most people seem to ignore. Labels are orthogonal to areas: each label is a tag that can apply across rooms, and an automation can have multiple labels.

I created eleven:

lights     cameras    ai         safety     security
presence   climate    kids       schedule   infra
notifications

Every automation got one to three labels. A motion-triggered light in the kitchen gets lights + presence. The pool cover pump freeze-protector gets safety + climate. The router auto-restart gets infra.

The payoff isn’t the labels themselves — it’s that I can now ask the list “show me everything labeled safety” and get back ten automations that protect the house in some way, across kitchen, bath, deck, pool, and outdoor zones. Before, those ten were scattered across my “Turn”, “Close”, and “OFF -“ piles.

The eleven labels are a deliberately small vocabulary. I resisted the urge to create lighting-bedroom and lighting-outdoor; the area already tells you that. I skipped anything that reads like a workflow tag — daily, weekly, one-off — because schedule does the job.

what I found during the rename

The rename pass doubled as an audit. Things I hadn’t noticed until I read every automation in order:

A duplicate. Two different automations both named “Restart Optimum Switch if Internet is down”. One was a three-line version I’d written eagerly late at night; the other was a properly debounced version I’d written six months later after the first one misfired. I’d never deleted the first one. They were both firing.
A typo. “Reolink frontyard person/animal/vehicle notifcation” — spelled wrong. Two years in the list. Nobody noticed (least of all me).
A copy-paste bug. The gravel-garden motion notification had a second notify block (for a phone I rarely use) that referenced the sideyard image and title — because I’d duplicated the sideyard automation to start the gravel-garden one. The block was disabled, but had it ever been enabled, every gravel-garden alert on that phone would have shown the wrong camera.
A stub. An automation called “New automation” from four months ago, one I’d started and abandoned. It was still wired up, referencing a deleted AI-task entity, quietly erroring at 7 AM every morning.

None of these were findable by reading the file randomly. All of them fell out of a sequential pass.

the tooling part

I did the rename in a script, not by clicking through the UI. A hundred UI clicks is a hundred opportunities to misread my own new convention. The script read automations.yaml, applied a dict of {old_id: new_alias}, wrote it back, reloaded.

HA automations carry a stable id field that’s separate from the alias, which makes this safe: renaming the alias doesn’t change the entity ID, which means no dashboards or other automations that reference automation.my_old_name break as a side effect.

Labels and areas were done separately via the WebSocket API (config/entity_registry/update), because those live in the entity registry, not in automations.yaml.

// the general thing

A naming convention isn’t really about the names. It’s a forcing function for reading everything you’ve built in one sitting. The format you land on matters less than the fact that you have to open every automation to apply it. I found four real bugs doing this — in code that was, by my estimation, “working fine.”

Block out a few hours. Pick any convention that reads cleanly in your automations list, not someone else’s. The part that pays back isn’t the alphabetical clustering — it’s the audit you do on the way there.

When the router dies, the house reboots it

2026-04-21T09:00:00-04:00

The Optimum router in the basement has a habit. Every couple of weeks — no pattern, no warning — the WAN light goes amber, the 5 GHz band gets stuck, and every video call in the house dies at once. The fix is always the same: power-cycle the router. Thirty seconds off, ninety seconds on, you’re back.

For about a year I was the fix. Someone would text me, I’d walk downstairs, pull the plug, count, plug it back in. It happened enough times that I got fast at it. It never happened enough that I fixed it properly.

the ingredients

Two pieces of hardware, no custom code:

A smart plug the router is plugged into. Any HA-controllable switch works; I happen to use one flashed with ESPHome so it’s local-only.
A ping sensor for 8.8.8.8. Configured via HA’s built-in Ping integration — no YAML needed, just Settings → Devices & Services → Add Integration → Ping.

That gives me binary_sensor.8_8_8_8: on when the public internet is reachable, off when it isn’t.

v1: too twitchy

My first automation was three lines of logic: if the ping sensor is off for thirty seconds, turn the switch off, wait ten seconds, turn it back on.

It worked. Too well. The problem wasn’t what you’d guess — Google didn’t go down. What happens is that the ping sensor itself drops a packet, or the HA host’s own network hiccups for a second, and the sensor flips off for forty seconds before it recovers. At which point I’d be on a call and the power to the router would drop, unnecessarily.

A single thirty-second threshold can’t distinguish “WAN is genuinely dead” from “a single packet got lost.” You need to ask the question more than once.

v2: patient

alias: Restart Optimum Switch if Internet is down
description: Restart switch only after 5 failed pings over 2.5 minutes
triggers:
  - trigger: state
    entity_id: binary_sensor.8_8_8_8
    to: ['off', 'unavailable', 'unknown']
actions:
  - repeat:
      count: 12
      sequence:
        - delay: "00:00:05"
        - condition: state
          entity_id: binary_sensor.8_8_8_8
          state: ['off', 'unavailable', 'unknown']
  - action: switch.turn_off
    target: { entity_id: switch.optimum_plug }
  - delay: "00:00:10"
  - action: switch.turn_on
    target: { entity_id: switch.optimum_plug }
  - delay: "00:10:00"
  - action: notify.mobile_app_pixel_8_pro
    data:
      title: Restarted Optimum Router
      message: Internet was down for 2.5 minutes. Switch was restarted.

The trick is the repeat with a condition: state check inside it. If the condition ever fails — that is, if the ping sensor flips back to on at any point during the sixty seconds of re-checks — the repeat exits early and skips the whole power-cycle. Only if 8.8.8.8 is consistently unreachable for the entire window does the switch actually cut power.

The ten-minute delay after the power-cycle is a cooldown: without it, the automation would immediately re-trigger during the reboot (since the ping sensor goes off again while the router is still coming back up).

what fires it in practice

A handful of times in the past few months. Each one matched a real outage, not a transient. The phone notification is the first I hear about it — by the time I’d have noticed manually, the house is already back online.

The failure mode I was worried about — false positives cutting the router during normal operation — hasn’t happened once since the rewrite.

the general shape

This pattern — “re-check the trigger condition inside a repeat loop before committing to an irreversible action” — is good for anything where a false positive is expensive. Power-cycling the router is mild. Power-cycling a freezer or an outdoor pump on a bad signal is not. Same debounce, different stakes.

Three pieces: a sensor that might lie, a repeat that keeps asking, an action you only want to take if the sensor is still telling you the same story a minute later.

Honey, what if we painted it all black

2026-04-20T21:30:00-04:00

We’ve lived in our house for a few years and have a running list of “what if we changed the…” arguments that never quite resolve. Repaint the whole thing black? Swap the shingles for standing-seam metal? Gut the landscaping? Every one of those ideas dies somewhere between the conversation and Google Images, because none of those renders are of our house.

So I built a tool that is of our house. One address, a few pre-uploaded angles, a prompt box, and three photorealistic variations per tap. About an hour of work, most of it spent on the loop polish rather than the model call. I’m keeping the URL off this post — it’s a single-address tool for one household and there’s no reason to put up a public sign.

What it does

You open it. A pre-loaded library of seven photos of the house (front, back, side, a few drone shots) is sitting there as thumbnails.
You tap one. You type what you want — “modern farmhouse, board-and-batten, black standing-seam roof”, or “French country with blue shutters and a fountain in the driveway”.
It generates three photorealistic variations in parallel.
You pick the one closest to what you want. It opens in a big hero view with a thumbnail strip for comparison and an ⇄ Compare with original toggle that splits the image so you can see side-by-side what changed.
From there you can tweak: “change the roof to warm terracotta”, “add copper gutters”, “remove the mailbox”. Every edit stacks on the previous render, with a history strip of thumbnails to jump back to any point.
When you like it, you share — a button creates a short URL like reimagine…/s/Ab3xR7_k2g-z with proper Open Graph metadata so WhatsApp renders a preview card instead of a cold link. You and your spouse argue about roof color via that URL instead of via screenshots.

That’s the whole thing. No accounts, no pricing page, no feature gate.

The stack

Google’s Gemini 2.5 Flash Image (codename Nano Banana) does the actual reimagining. Image in, text prompt in, image out. Very fast, surprisingly good at “keep the house, change the skin” style edits when you constrain it properly.
Next.js 15 (App Router) + Tailwind for the UI. Server API routes hide the Gemini API key from the client.
Docker multi-stage build with Next’s output: "standalone" for a small runtime image.
GitHub Actions builds the image on push to main and pushes to GHCR.
Hostinger VPS runs the container behind an existing Traefik reverse proxy. Deployment is docker compose pull && docker compose up -d.
IndexedDB on the client persists an in-progress session (source photo, variations, refinement history) so a refresh doesn’t lose the state of what you’re working on.

Total external dependencies I wrote: zero. It’s stdlib-Next.js + one Google SDK + a 200-line React page.

Three small choices that mattered more than they sound

1. Pre-load the photos

The first version of this tool asked the user to upload a photo each time. That was fine for a one-off. For “my wife and I argue about paint color over a weekend”, the repeated upload was the biggest friction. Solution: a volume-mounted directory on the VPS with every angle of the house we care about. The app lists the filenames as thumbnails. Seven taps away from seven generations.

The photos never touch the git repo — they’re bind-mounted at container runtime. That lets me keep the source public while the actual address imagery stays on the box.

services:
  home-reimagine:
    volumes:
      - ./photos:/app/photos:ro
      - ./shares:/app/data/shares:rw

2. Staple a structural constraint onto every prompt

Nano Banana is happy to “reimagine” anything, including turning a two-story colonial into a mid-century ranch. We didn’t want ranch. We wanted our house, painted differently. So every prompt gets prefixed server-side with a hard constraint:

DO NOT CHANGE THE STRUCTURE OF THE HOUSE: footprint, roofline shape, window and door locations, number and placement of stories, chimneys, dormers, porches, garage, and structural proportions stay exactly as they are. Only surface-level elements may change: cladding, colors, roof material, window frame color, door color, trim, lighting fixtures, landscaping, driveway surface.

Before: variations were creative but often unrecognizable as my house. After: they’re my house with different paint, different roof, different plantings. Every time. The single biggest quality improvement in the whole build, and it was one paragraph of prompt.

The first share button used the Web Share API with the image file attached. Fine on iOS, nice on Android, but WhatsApp attaching a file takes up a chat slot and doesn’t compose well with “what do you think of this?”. What you actually want is a link with a preview card.

So I added a server-side share store — each generated image gets written to /app/data/shares/.png with an optional JSON sidecar for the label. The share page at /s/ is a tiny server-rendered viewer with full Open Graph metadata:

 property="og:title"       content="A Home Reimagining">
 property="og:description" content='"Modern farmhouse, black metal roof…"'>
 property="og:image"       content="https://…/api/shares/Ab3xR7_k2g-z">
 property="og:image:type"  content="image/png">
 property="og:image:width" content="1200">
 property="og:url"         content="https://…/s/Ab3xR7_k2g-z">

Paste the URL in WhatsApp and it shows the image inline. Paste it in iMessage, it shows the image inline. Paste it in a text to anyone — same thing. That’s the whole point of OG.

The wire cost of sharing dropped from “download, wait, find file, attach, wait, send” to “tap Share link, paste”. Which means it actually gets used.

Working loop, not demo

Most AI-generated-image demos are impressive one-shots: look what I prompted!. That’s not useful for decisions. Decisions need loop: generate, react, tweak, compare, backtrack, commit. The refine step is where the tool earns its keep — each edit builds on the previous render, with a thumbnail strip to revisit any earlier state.

The whole UI is optimized for the fact that you’ll run ten rounds before landing on a direction:

Big hero image so the details are legible.
Compare-with-original always one tap away.
History strip of prior refinements so you can ditch a bad turn and restart from wherever.
Session persisted to IndexedDB so an accidental refresh doesn’t nuke twenty minutes of decisions.

The model gets used as a collaborator, not a slot machine.

What this isn’t

Not a product. There’s no login, no multi-tenant anything, no pricing. It’s literally one Traefik routing rule to one container for one address.
Not a Google Images killer. It’s a house-picture-with-a-prompt app. Deliberately narrow.
Not always right. Nano Banana occasionally hallucinates a door that wasn’t there, or moves a window. The structural constraint catches most of it; some slip through. You tweak or regenerate.

What it is is an example of how cheap it’s become to build a specific tool for a specific problem. The entire setup — Next.js scaffolding, Gemini API calls, Docker + Traefik + GHCR deploy, share-link subsystem, structured-prompt tuning, iterative refinement loop, IndexedDB persistence, WhatsApp-ready OG metadata — took one evening. Five years ago this would have been a company.

Now it’s a git repo I share with my wife.

Cracking a pool pump’s Wi-Fi protocol in an evening

2026-04-20T12:00:00-04:00

Code: github.com/kunalkhosla/ecoplug-homeassistant

HACS PR: hacs/default#7150 (in review)

The device: DEWENWILS Pool Pump Timer (Wi-Fi) on Amazon

I have an outdoor Wi-Fi switch on my pool pump — a DEWENWILS box that runs on the ECO Plugs app. Nice hardware, but the app is the only way to talk to it, and I wanted it in Home Assistant so I could schedule it alongside everything else in the house. None of the obvious paths worked, so I sat down one evening and reverse-engineered the thing.

Total time: about three hours. I worked alongside Claude Code (Anthropic’s CLI coding agent, running as Opus 4.7 with 1M context). I drove from my Mac, walked outside to the plug whenever we needed to confirm something physically, and acted as the human in the loop. Claude Code did the packet analysis, the cryptanalysis, the Python, and the deploy-over-SSH dance. I’d never reverse-engineered a network protocol before.

The whole thing used pretty ordinary tools: Wireshark, PCAPdroid on my Android phone, tcpdump from the HAOS SSH add-on, and Python’s standard library. Nothing exotic.

How it actually went

The first hour was all dead ends

A few things I tried before resorting to packet captures:

Assumed it was a Tuya device. These plugs look like every other rebranded Tuya/Smart Life gadget, so I figured Home Assistant’s Tuya integration would just pick it up. Nope — DEWENWILS uses the ECO Plugs app, which is its own little ecosystem.
Tried the existing pyecoplug HACS integration. Installed cleanly, then sat there forever. Never discovered the plug, never produced a switch entity. It seems to be aimed at an older firmware.
Tried Google Home as a bridge. The ECO Plugs OAuth flow into Google completes the login… and then hands Google zero devices. So that was out.
Looked at flashing Tasmota or ESPHome. The hardware is an ESP8266, so technically possible — but it lives inside a sealed 240V outdoor box on the side of my house. Disassembling and soldering on that felt like the wrong evening project.
Considered just replacing it with a Shelly Pro 2 plus a contactor. Works fine long-term, but it’s roughly $80 plus an electrician.

By that point I was a little annoyed and a lot curious, so we went straight at the protocol.

Watching the wire

First capture, from the HAOS Ethernet port: The plug is chatty. It broadcasts a 272-byte UDP packet to 255.255.255.255:10228 every two seconds, starting with a recognizable magic header that includes the literal string "ECO Plugs". It also resolves server1.eco-plugs.net from time to time, but never actually phones home during my capture. Notably, I saw nothing flowing the other direction — no phone-to-plug traffic at all.

Second capture, while toggling from the phone: Still nothing from phone to plug on the wire. The phone is sending out pyecoplug-style discovery broadcasts on ports 25 and 5888, but the plug is ignoring them — clearly a different protocol version. Meanwhile, toggling from the phone works perfectly (I went outside; the pump turned on and off), and yet the wire shows nothing.

That’s the moment things clicked: most APs don’t bridge Wi-Fi-to-Wi-Fi unicast onto the wired segment. The phone and the plug were both Wi-Fi clients on the same access point, so their conversation never crossed onto Ethernet. HAOS was sitting in the wrong seat.

Third capture, this time from the phone itself using PCAPdroid: There it was. The phone fires UDP unicast from :9090 to the plug at :1022. The plug answers back the same way. Each command gets repeated about four times for reliability. Now we had the channel.

Decoding the packets

Each command is 152 bytes and breaks down like this:

Bytes	What it is
0–3	Transaction ID (random per command; the response echoes it back)
4–15	Fixed header `17 00 00 00 00 00 00 00 DA E2 0C 00`
16–71	XOR-obfuscated body (56 bytes)
72–75	`00 00 00 00`
76–79	Opcode — `6A` for commands, `69` for queries/replies
80–83	State — `00` off, `01` on
84+	Padding or response-only fields

The “encryption” on the body turns out to be XOR with the 4-byte transaction ID, repeated. We figured that out by lining up two same-type packets side by side: the XOR of their bodies matched the XOR of their transaction IDs at every 4-byte boundary. That’s the classic fingerprint of a short repeating-key XOR.

Once you peel the XOR off, the body is the same 56 bytes every time — it starts with the ASCII "yvQC" and is padded with what looks like simple arithmetic-progression filler. The plug doesn’t seem to validate the contents at all, only the structure. So to talk to it, you XOR that known plaintext against a fresh transaction ID and drop in the opcode and state byte.

The first live test

Before getting clever, I wanted the simplest possible proof that we understood the channel: just replay a captured OFF command, byte for byte, from the HAOS shell.

python3 /tmp/replay_test.py 192.168.0.87
[OFF replay] sending 152 bytes → 192.168.0.87:1022
[OFF replay] REPLY from ('192.168.0.87', 1022): 152 bytes
  state[80:84] = 00000000

I walked outside. The pump was off. Replay works — there’s no nonce, no timestamp, no anti-replay check. The plug just trusts the packet.

Crafting fresh packets

Replay is fine for one plug, but useless for a real integration. So we wrote a small crafter that takes a desired state and produces a valid packet with a fresh random transaction ID. As a sanity check, we re-built every captured command using its captured TXID and confirmed all sixteen matched the originals byte for byte.

Then a live test from the Mac with a transaction ID the plug had never seen before:

[OFF] txid=7cdd2dac sending 152 bytes
  reply: txid=7cdd2dac state=OFF

Pump off. Then on with another fresh ID. Pump on. We were officially driving the thing.

Wrapping it up

custom_components/ecoplug/protocol.py — about 150 lines of pure asyncio, with craft_command, craft_query, and send_and_wait.
custom_components/ecoplug/switch.py — a thin Home Assistant switch wrapper that polls every 10 seconds.
8 unit tests, including a byte-for-byte rebuild of a captured packet.
Deployed via SSH to /config/custom_components/ecoplug/, restart Home Assistant, switch shows up, switch works.
Tagged v0.2.0 and cut a GitHub release so anyone can install it through HACS as a custom repository.

Credit

Investigation, protocol analysis, Python, tests, documentation: Claude Code (Opus 4.7).

Hardware, physical validation, and pointing at the next thing to try: Kunal Khosla.

If you’ve got a DEWENWILS / ECO Plugs box and Google Home is broken for you too, the integration is right here. Issues and PRs welcome.

What a four-year-old Home Assistant config has taught me

2026-04-20T08:00:00-04:00

My configuration.yaml was first written in late 2022. It’s survived three HAOS major upgrades, about forty automations, a decent pile of HACS integrations, one whole-house rewire, and one pool pump that needed its Wi-Fi protocol reverse-engineered (see the companion dispatch).

Here’s what’s actually worked — concrete patterns pulled straight from a live install. And two things I’d fix if I started today. Entity names in the examples are genericized; the structure is not.

Split your config from day one

Even a modest house ends up with hundreds of lines of YAML. My configuration.yaml starts with this:

automation: !include automations.yaml
script:     !include scripts.yaml
scene:      !include scenes.yaml

frontend:
  themes: !include_dir_merge_named themes

That single !include trick is what lets the UI editor write to automations.yaml without clobbering my handwritten configuration.yaml. It also means my visual-editor automations and my hand-rolled template sensors can coexist without stepping on each other.

!include_dir_merge_named does the same for a whole folder of theme files. Every integration I add that’s config-heavy eventually earns its own !include.

Secrets file, no exceptions

Any credential goes in secrets.yaml:

some_integration:
  username:  !secret integration_username
  password:  !secret integration_password
  api_token: !secret integration_api_token

secrets.yaml is in .gitignore if you version-control your config (you should). The payoff isn’t just safety — it’s that I can share screenshots or paste snippets anywhere without thinking twice.

Trust your LAN, ban the internet

Two small blocks give a better security posture than most “hardened” setups I’ve seen online:

http:
  ip_ban_enabled: true
  login_attempts_threshold: 10

homeassistant:
  auth_providers:
    - type: homeassistant
    - type: trusted_networks
      allow_bypass_login: true
      trusted_networks:
        - 192.168.1.0/24

Anything on the trusted LAN walks in; anything from the internet gets banned after ten bad guesses. No 2FA nag when someone in the house opens the app at 2 AM; no patience for random brute-force attempts from anywhere else.

Build template sensors that represent intent

The single most-useful sensor in my install isn’t from an integration — it’s five lines of template:

binary_sensor:
  - platform: template
    sensors:
      any_door_open:
        friendly_name: "Any Door Open"
        value_template: >-
          on

Every automation that used to be a multi-way OR — “turn on the foyer light if any of a handful of doors open” — now just watches binary_sensor.any_door_open. When I added a new door sensor last spring, I changed one template and every downstream automation got it for free.

The same pattern shows up for unit conversion, time-of-day flags, “is anyone home”, “is it dark outside”, or any other question my house needs to keep answering.

Safety timers instead of discipline

I used to rely on myself to turn things off. Now I don’t. A representative automation:

alias: Turn off iron after 30 mins
triggers:
  - trigger: state
    entity_id: switch.iron_plug
    to: "on"
    for: "00:30:00"
actions:
  - action: switch.turn_off
    target: { entity_id: switch.iron_plug }

Three lines, two minutes to write, saves your house.

I have a handful of these — appliances that shouldn’t run forever (irons, towel warmers, specific outdoor pumps in cold weather). Every one of them used to depend on me remembering. Now none of them do.

The cold-weather case is the bonus version: a numeric-state trigger on the outdoor temperature sensor cuts power before the outdoor device can damage itself.

Emergencies should have reflexes

Nothing in HA is more satisfying than this automation:

Smoke / Carbon Monoxide Emergency — Announce and Turn OFF HVAC

Triggered by any smoke or CO detector going to on. Actions: turn off HVAC blower, turn on every light in the house, broadcast a TTS announcement over the speakers.

It’s sixteen lines of YAML and it has never fired in anger. The day it does, I want the house to react while I’m still figuring out what’s happening.

Let the cameras narrate

The camera notification automations used to say:

Motion detected at driveway

Now they use the Google Gen AI integration to caption the frame:

- action: google_generative_ai_conversation.generate_content
  data:
    prompt: >-
      Describe what's happening in this image in one short sentence.
      Focus on the person or vehicle and what they're doing.
    image_filename: /config/www/snapshots/driveway.jpg

The result is notifications like “A delivery driver in a blue polo is leaving a package on the front porch” instead of generic motion pings. The difference in signal-to-noise is enormous.

React to the weather you actually have

Two automations I’m proud of because they replace judgment I used to exercise manually:

Close awning if raining — triggers on weather.home transitioning to rainy or pouring.
Close awning if windy — numeric-state trigger on wind speed above a threshold.

These aren’t clever. They just mean a retractable awning stops being a weekend chore.

Scenes as named states, not light shows

My scenes aren’t for ambiance — they’re for states the house can be in:

Away — relevant automations flip into their away posture.
All Lights On — what it says, for when something goes wrong.
Bedtime — coming in a future refactor.

Scenes are checkpoints. Automations can call them with one line, which keeps the individual automations clean.

HACS for anything that isn’t native

Sixteen custom integrations currently live in /config/custom_components/, installed via HACS. Plus one I wrote myself for the pool pump last Saturday.

The rule I’ve settled on: if the first-party integration doesn’t exist, or if it requires a cloud account I don’t want to maintain, check HACS before I assume I’m stuck. Nine times out of ten someone’s already done the work — and when they haven’t, the Jekyll theme next door shows it’s surprisingly tractable to fill in.

the dashboard, room by room

This is what I see when I open the app:

Top row is a set of state pills: alarm state, whether any door is open, irrigation, patio lights, TVs, the pool pump. Each one is a one-tap toggle and a glance-able current value. No dashboard panel, no deep-link — the pills are the index of “things I touch often.”

Below the greeting is a commute estimate and a weather card. Then the presence row: one avatar per person, with a green home badge when they’re on-network. Underneath, the three thermostat tiles for the zones I actively tune.

Past the fold — not in the screenshot — is a list of rooms: Office, Kitchen, Family Room, Living Room, Bedroom Hallway, Master Bedroom. Each one is a tile. Tapping a tile doesn’t toggle anything; it opens a dedicated page for that room. Lights, sensors, thermostat, occupancy, entertainment — the controls and readings scoped to that room, with nothing from the rest of the house to scroll through.

The design rule is: the landing page is for state I want to see, and the room pages are for things I want to change. When someone asks “is the dishwasher still running?”, they don’t read the landing view — they tap the kitchen tile. When I walk into the living room at 9 PM, I don’t need to see the garage thermostat.

This is worth the setup cost because it fixes the one thing Home Assistant does badly out of the box: the default “Overview” wants to show you everything at once. Everything is nothing.

// two things I’d change

Being honest with myself:

1. Move configuration into packages/. My configuration.yaml is 150 lines and growing. HA has supported packaged configuration for years — one file per domain (kitchen, security, notifications, pool), auto-merged at boot. My current single-file setup works, but reviewing a change means scrolling past unrelated MQTT, template, and http blocks to find the thing I’m touching. Packages would fix that.

2. Use blueprints for the motion-light pattern. I have at least seven automations that all boil down to “if motion sensor X goes on, turn on light Y, turn it off after Z minutes.” Each one was a separate editor session in 2023. A single blueprint with three parameters would replace all of them and give me one place to fix the inevitable edge cases.

Neither of these is urgent. Neither is sexy. Both will pay back fast once I get around to them.

// the common thread

The patterns that have aged well all share one property: they push state and decisions out of individual automations and into structures the whole system can share. Template sensors, scenes, trusted-network auth, safety timers — each one is a tiny reusable primitive that dozens of automations lean on. Nothing in this post required writing a single line of Python; Home Assistant already ships with the toolbox.

The ones I regret were the opposite: one-off automations that repeat logic, that know too much about specific entities, that made perfect sense at 11 PM on a Tuesday and incomprehensible sense six months later.

Build the primitives. Everything else gets cheap.

Three VLANs, one household: how my home network is actually laid out

2026-04-20T06:00:00-04:00

My UniFi controller currently shows the map below. Three VLANs, each with its own subnet, its own SSID, and its own opinions about what’s allowed to talk to what.

VLAN	ID	Subnet	Active leases	What lives here
IoT	1	`192.168.10.0/24`	67	Everything Wi-Fi-connected that you don’t touch daily
Guest	2	`192.168.20.0/24`	3	Visitors — captive portal, internet-only
Primary	3	`192.168.30.0/24`	18	Humans — phones, laptops, tablets

88 active leases right now, and Home Assistant has tracked 193 distinct MAC addresses across them over time. The ratio of “things in the house that are on the internet” to “humans in the house” is roughly 4-to-1 and climbing.

Why three VLANs

Two reasons, in order of how much they bothered me.

1. Trust asymmetry. Most of the devices on a home network should not be trusted. That Wi-Fi candle from the Christmas box is running a five-year-old ARM firmware with a hard-coded telnet password and a DNS query for some server you’ve never heard of. My laptop and my bank’s 2FA app used to live on the same flat LAN that it did. There’s no compelling technical reason for the candle and the laptop to be able to ping each other, and if the candle ever joins a botnet, I’d prefer it couldn’t ARP-scan my printer.

2. Inventory hygiene. It’s almost impossible to keep mental track of which device is which on a flat network of 200 clients. Separating “things humans interact with” from “infrastructure that quietly does its job” makes everything easier — finding a device, blocking a device, rebooting a rogue device, auditing what’s phoning home at 3 AM.

The three VLANs

IoT (VLAN 1, `192.168.10.0/24`)

The heaviest VLAN by a wide margin — 67 active leases today. Smart bulbs, plugs, cameras, thermostats, the pool pump from last week’s dispatch, the garage-door controllers, every appliance that ships with a Wi-Fi chip, the robot vacuum, the weather station, the irrigation controller.

Home Assistant itself lives here. HAOS has a DHCP reservation in this subnet. That is the single most important design choice on this page, because:

HA is, by volume, a piece of IoT infrastructure. It talks to 60+ devices that all live on VLAN 1. Keeping HA on the same subnet means cross-VLAN firewall rules aren’t in the hot path — every automation, every poll, every sensor update stays at L2 inside the same broadcast domain.
It inverts the usual “how do I let HA reach my isolated IoT devices?” question into the much simpler “how do I let my Primary-LAN phone reach HA?” question, which is one firewall rule instead of dozens.
It also means HA, if it were ever compromised, is already segmented from the machines I bank on. The trust asymmetry stays intact.

Guest (VLAN 2, `192.168.20.0/24`)

Captive portal. Three active leases, which is honestly about right for an afternoon. Anyone who visits connects, puts in their name, and gets an internet-only connection that’s walled off from both IoT and Primary.

The important bit — easy to miss in UniFi’s UI — is Client Device Isolation on the guest SSID. Without it, a friend’s phone can see my parents’ laptop if both are on Guest. With it on, every guest is cordoned into their own tiny bubble.

Primary (VLAN 3, `192.168.30.0/24`)

Humans. 18 active leases today — phones, laptops, tablets, the Apple TV in the living room, my work MacBook. Small by device count, but the whole point of segmentation is that these 18 devices are the trusted ones. Primary can reach the internet, it can reach HA on IoT via one specific rule, and it can participate in cross-VLAN casting via mDNS reflection. That’s it. Nothing else reaches into Primary from anywhere.

The part nobody warns you about

The moment you isolate IoT from Primary, a lot of stuff quietly stops working.

Chromecasts vanish from the phone because mDNS does not cross VLAN boundaries by default.
SSDP / UPnP discovery for media stops working.
HomeKit / AirPlay targets on the other VLAN go dark.
The printer on IoT becomes invisible to the laptop on Primary.
Any new integration you try in Home Assistant that relies on broadcast discovery silently fails — the integration adds fine, it just finds zero devices.

Segmentation is not a free lunch. You pay for it in packets that used to travel freely and now need explicit permission to cross a boundary. UniFi exposes two separate knobs that matter:

Firewall / Traffic rules — who can open a unicast connection to whom.
mDNS reflector per-VLAN toggle — whether multicast service discovery gets repeated into neighboring VLANs.

You need both. The firewall gets the data across; the reflector gets the announcement across so the sending side knows the receiver exists.

The rules I wrote

Around half a dozen, all labelled descriptively so future-me remembers why they exist.

Allow Primary → HA (8123) — phones and laptops on Primary need to reach http://homeassistant.local:8123 and its API. One rule, one direction, one port. That’s the Primary-to-IoT bridge in its entirety for day-to-day use.
Allow HA → IoT internal ports — HA lives on IoT so most traffic is intra-subnet, but a few integrations need ports or protocols that the VLAN’s default egress rules would otherwise drop (specifically outbound multicast for certain Wi-Fi plugs and Matter devices). This rule is narrow and exists because one vendor decided their protocol needed TTL > 1.
Allow Primary → IoT (cameras + media + mgmt) — direct RTSP from cameras into VLC on the laptop, Plex on the media server, SSH into the NVR for maintenance. Separate from the HA rule because the audit trail is clearer.
Allow Chromecast reflection — combined with UniFi’s mDNS reflector enabled on both Primary and IoT, this lets the phone’s Cast picker see the Chromecasts on IoT. Without it, casting silently fails with a “device not found” that’s nearly impossible to debug.
Allow Guest mDNS for casting — same idea as #4 but narrower: guests can cast to the living-room TV, which is on IoT. No unicast, no device control, just enough multicast for the Cast picker to populate.
Block IoT → Primary — this is the default, but I have an explicit rule near the top of the chain that drops any IoT-initiated connection into Primary. Belt and suspenders. The day an IoT device gets popped, I want the answer to “could it reach the laptop?” to be no, twice over.
Device-group-based egress restrictions for a handful of devices that should only talk to specific WAN destinations (a couple of appliances I don’t trust with open internet). Per-device isolation at the firewall level is easier once you have a few days of traffic flow data to look at.

Every rule has a label that names why it exists, not what it does. The what is in the rule body; the why is the part I need to read six months later when the printer stops working.

While we’re here: the DNS layer

One benefit of HA living on the IoT VLAN with a DHCP reservation is that it’s a stable, always-on box with a known IP. Which makes it the obvious place to host AdGuard Home — a DNS-level ad and tracker blocker. It runs as an add-on inside HAOS, listens on port 53, and does two things that compound:

Blocks ads and trackers at the DNS layer, network-wide. Every device on every VLAN — the phones on Primary, the TV on IoT, even the guest’s laptop if they’re using DHCP DNS — resolves through AdGuard. Devices that have no plausible way to run their own ad blocker (smart TVs, every IoT appliance that quietly beacons telemetry) get the same filtering for free.
Surfaces what’s actually happening on the network. The AdGuard UI shows which client made which DNS query. When a new IoT gadget gets added and I want to know who it’s phoning home to at 3 AM, I just look — the queries are all there, grouped by client IP. This is the only time I’ve ever found vendor-surveillance concerns to be inspectable rather than hand-wavy.

The router is configured to hand out HAOS’s IoT-VLAN IP as the DNS server in every DHCP lease, across all three VLANs. AdGuard forwards anything it doesn’t block to a real upstream (1.1.1.1 with DNS-over-TLS).

The tradeoff: if HA goes down, DNS goes down for the whole house — which, in practice, means the internet feels broken until the box is back. I considered this for a while. Counter-arguments that won me over: HA hasn’t crashed in any way that took out the container in the year I’ve been running this, AdGuard’s own uptime is better than most consumer routers’ built-in DNS, and the “wait, is the internet down?” failure mode is not meaningfully different from “wait, did the router reboot?” — which I used to get from ISP-supplied hardware routinely.

What I’d do differently

Start with the three VLANs on day one, not after two years of one flat LAN. Migrating 150+ devices across VLANs means re-pairing a chunk of them, because vendor apps cache the original subnet and the device sullenly refuses to rejoin. Start clean and the pain is frontloaded and smaller.

Put HA on the IoT VLAN from the start. I did not do this initially; HA was on Primary for about a year. Cross-VLAN firewall rules for every single integration is a worse life than just treating HA as IoT infrastructure and moving it where the traffic naturally is.

DHCP reservations over static IPs. Every integration doc says “set the device to a static IP.” Don’t. Use DHCP reservations at the controller. If you ever renumber the subnet — as I did when I split the VLANs — it’s one file to edit instead of forty devices to walk around the house to.

Short lease on the IoT VLAN. Many IoT devices don’t gracefully handle IP changes. A 1-hour lease means when a device misbehaves and you force a rejoin, its old lease is gone by the time it comes back. The default 24-hour lease is purgatory.

Point DHCP at AdGuard before you do anything else. If I’d started with the DNS layer in place, I would have caught a handful of “that integration sends every API call through a telemetry domain” decisions much earlier.

The bigger lesson

Network segmentation at home is mostly a documentation problem disguised as a networking problem. The VLAN setup takes an afternoon. What takes months is remembering which rule applies to what, which device you put on which VLAN, and why the printer stopped working two years later.

Name your VLANs something descriptive. Name your firewall rules better than the UI suggests. Write them down somewhere you’ll actually look. Future-you will be grateful.