claude-projects/Homelab Infrastructure/homelab-infrastructure.md

321 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Homelab Infrastructure — Project Knowledge
## Host Machine
- **ThinkStation-P710** — Primary Proxmox host
- IP: `192.168.88.25` (jg-hud)
- Specs: Dual Xeon E5 v4, 110GB ECC RAM, 16 cores
- OS: Proxmox VE
- SSH: `root@192.168.88.25` (key: SHA256:i4whMVfdf+SxGttDKhL9PEltWv2ABPQs7BRUunCfcf8)
## Local WORKSTATION
- **ThinkStation-P710**
- Specs: Dual Xeon E5 v4
- OS Linux Mint
- Also runs Frigate NVR directly at `192.168.88.41`
## Network
- **MikroTik router**: `192.168.88.1`
- Main LAN: `192.168.88.0/24`
- IoT VLAN: `192.168.2.0/24`
- Static WAN IP: `184.170.161.177`
- **Synology RT6600ax**: WiFi AP only
- **DNS**: Split-DNS via MikroTik static entries + Pi-hole forwarding local domains
- **Cloudflare**:
- Zone ID: `3511a9ac47fa469b24a5c0f411063da4`
- DNS API token: `gEgJiSdJKSLhnCwQGXRiRWgz5WhTmkZQk0H89X8p`
- DNS Edit token: `cfat_RhlfvleyZGst8k6rTVx9Ti7x3b8NuE3uOxExka3i9d128e8c`
- All jgitta.com subdomains point to WAN IP
## Proxmox VMs & Containers
| ID | Name | IP | Specs | Role |
|---|---|---|---|---|
| VM102 | jellyfin | 192.168.88.10 | 2c/12GB | Jellyfin + *arr stack + qBittorrent/ProtonVPN |
| VM103 | next | 192.168.88.62 | 2c/8GB | Nextcloud + OnlyOffice |
| VM104 | Windows11-NVR | — | 8c/11GB | Windows VM |
| VM105 | openclaw-debian | — | 4c/4GB | — |
| VM106 | haos | 192.168.88.39 | 2c/8GB | Home Assistant OS |
| VM107 | pbs | 192.168.88.60 | 4c/8GB | Proxmox Backup Server (nightly full backups) |
| VM112 | siklos/docker-server | 192.168.88.27 | 4c/12GB | Main Docker host |
| VM113 | photos | 192.168.88.32 | 4c/16GB | PhotoPrism + Immich (dedicated photo VM) |
| VM113 | zorin-os | — | 2c/4GB | — |
| CT200 | gitea | 192.168.88.200 | 1c/512MB | Gitea git server (port 3000) |
| CT201 | openclaw | 192.168.88.29 | — | AI gateway (Gemini + Telegram bot, port 18789) |
| CT202 | caddy-proxy | 192.168.88.110 | 2c/4GB | Caddy reverse proxy |
| VM111 | debian-1 | — | — | Stopped/unused |
## Storage Pools (on jg-hud)
| Pool | Size | Type |
|---|---|---|
| nvme-thin | 4.3TB | LVM |
| SSD-1.7T | 1.7TB | — |
| big-11t | 10.8TB | — |
| HD4T-1 | 3.6TB | — |
| HD4T-2 | 3.6TB | — |
| nextcloud-fast | 1.8TB | ZFS |
| pbs | 39TB | PBS |
- **TrueNAS**: `192.168.88.24` — 25.5TB (TrueNAS-NFS pool)
- **Photos NFS**: `/mnt/big-11t/photos` on Proxmox host, NFS-exported to siklos and ThinkStation
## SSH Access Pattern
- Direct to Proxmox host: `root@192.168.88.25`
- To Siklos (VM112): `ssh -o StrictHostKeyChecking=no jgitta@192.168.88.27` (from jg-hud)
- To other VMs: use VM103 or VM107 as jump hosts
- Public key: `ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMRHmoQ63d1qi5yjYoFm8FgnBwUo5uNyRCPChW25DmjF root@jg-hud`
- **Note**: Reach Siklos via Proxmox host shell SSH — QEMU agent times out on long commands
## Caddy Reverse Proxy (CT202 — 192.168.88.110)
- Version: v2.11.2 with Cloudflare DNS plugin
- Config: `/etc/caddy/Caddyfile` imports `/etc/caddy/snippets.caddy` and `/etc/caddy/sites/*.caddy`
- TLS: DNS challenge via Cloudflare
- Snippets: `headers_base`, `internal_only`, `web_secure`, `web_media`, `stream_secure`, `proxy_timeouts`, `pve_proxy`
### sites/infrastructure.caddy
| Subdomain | Backend | Notes |
|---|---|---|
| proxmox.jgitta.com | 192.168.88.25:8006 | internal_only |
| pbs.jgitta.com | 192.168.88.60:8007 | internal_only |
| mesh.jgitta.com | 192.168.88.27:444 | MeshCentral |
| pihole.jgitta.com | 192.168.88.27:8080 | internal_only |
| homarr.jgitta.com | 192.168.88.27:7575 | — |
| apache.jgitta.com | 192.168.88.27:8383 | Guacamole |
| notes.jgitta.com | 192.168.88.27:3010 | Karakeep |
| links.jgitta.com | 192.168.88.27:3015 | Linkwarden |
| gitea.jgitta.com | 192.168.88.200:3000 | — |
| status.jgitta.com | 192.168.88.27:3001 | Uptime Kuma, internal_only |
| grafana.jgitta.com | 192.168.88.27:3020 | internal_only |
| glances.jgitta.com | 192.168.88.27:61208 | internal_only |
| dashboard.jgitta.com | 192.168.88.27:8096 | custom dashboard |
| blue.jgitta.com | 192.168.88.47:75 | — |
| claw.jgitta.com | 192.168.88.29:18789 | internal_only |
| office.jgitta.com | 192.168.88.27:8880 | OnlyOffice |
| jgitta.com / www | 192.168.88.27:8095 | WordPress |
### sites/media.caddy
| Subdomain | Backend | Notes |
|---|---|---|
| jellyfin.jgitta.com | 192.168.88.10:8096 | stream_secure |
| sonarr.jgitta.com | 192.168.88.10:8989 | internal_only |
| radarr.jgitta.com | 192.168.88.10:7878 | internal_only |
| prowlarr.jgitta.com | 192.168.88.10:9696 | internal_only |
| ha.jgitta.com | 192.168.88.39:8123 | Home Assistant |
| cameras.jgitta.com | 192.168.88.41:8971 | Frigate, stream_secure |
| next.jgitta.com | 192.168.88.62:80 | Nextcloud |
| collabora.jgitta.com | 192.168.88.62:9980 | Collabora (may be replaced by OnlyOffice) |
| hub.jgitta.com | 192.168.88.65 | Hubitat |
| pictures.jgitta.com | 192.168.88.32:2283 | Immich (VM113) |
| photos.jgitta.com | 192.168.88.32:2342 | PhotoPrism (VM113) |
### When adding new subdomains
Always handle DNS (Cloudflare) and Caddy changes together as a single operation.
## Jellyfin (VM102 — 192.168.88.10)
- URL: `https://jellyfin.jgitta.com`
- Port: 8096
- API Token: `b861664663924b68858a67b91f6d1188`
- Admin User ID: `12659173-DBBE-43E5-B116-DE70B63A470D`
- DB: `/var/lib/jellyfin/data/jellyfin.db` (SQLite)
- Virtual folders config: `/var/lib/jellyfin/root/default/<LibraryName>/options.xml`
- LiveTV config: `/etc/jellyfin/livetv.xml`
- HDHomeRun tuner: `http://192.168.88.22` (device ID: 10B00DE1)
### Media Libraries
| Library | CollectionFolder ID | Path |
|---|---|---|
| Shows | A656B907-EB3A-7353-2E40-E44B968D0225 | `/mnt/media/tv` |
| Movies | F137A2DD-21BB-C1B9-9AA5-C0F6BF02A805 | `/mnt/media/movies` |
| Recordings | 79A2726D-3C50-E769-A8AF-1E4184E4FCCF | `/mnt/media/recordings` |
Media mount: `/mnt/media` (5.8TB, ~55% used) — subdirs: `movies/`, `tv/`, `recordings/`, `torrents/`
### Critical: Library Path Management
**NEVER edit `options.xml` directly to change library paths.** Jellyfin ignores changes made directly to the file while running, and on restart it overwrites the file from its internal state.
Always use the API to add/remove paths:
```bash
# Add a path to a library
curl -X POST "http://localhost:8096/Library/VirtualFolders/Paths" \
-H "Authorization: MediaBrowser Token=\"<token>\"" \
-H "Content-Type: application/json" \
-d '{"Name":"Shows","PathInfo":{"Path":"/mnt/media/tv"}}'
# Remove a path from a library
curl -X DELETE "http://localhost:8096/Library/VirtualFolders/Paths?name=Shows&path=%2Fmnt%2Fmedia%2Ftv&refreshLibrary=false" \
-H "Authorization: MediaBrowser Token=\"<token>\""
```
### Root Cause History: TV Shows Not Appearing (April 2026)
The Recordings library was accidentally configured with path `/mnt/media` (the entire media mount) instead of `/mnt/media/recordings`. This caused:
- Recordings library to scan and "own" all TV shows and movies
- Shows library to have no path configured (empty PathInfos) — never scanned `/mnt/media/tv`
- All items getting `TopParentId` pointing to the `/mnt/media` Folder (not Shows CollectionFolder)
- Jellyfin auto-creating a `Recordings2` library whenever LiveTV checked `/mnt/media/recordings` didn't match Recordings' path `/mnt/media`
Fix: Use API to remove `/mnt/media` from Recordings, add `/mnt/media/recordings` to Recordings, add `/mnt/media/tv` to Shows, delete Recordings2, wipe BaseItems table (preserving UserData), restart + scan.
### DB Wipe Command (preserves UserData/watch history)
```sql
DELETE FROM AncestorIds; DELETE FROM BaseItemImageInfos;
DELETE FROM BaseItemMetadataFields; DELETE FROM BaseItemProviders;
DELETE FROM BaseItemTrailerTypes; DELETE FROM Chapters;
DELETE FROM MediaStreamInfos; DELETE FROM AttachmentStreamInfos;
DELETE FROM TrickplayInfos; DELETE FROM KeyframeData;
DELETE FROM PeopleBaseItemMap; DELETE FROM Peoples;
DELETE FROM MediaSegments; DELETE FROM ItemValuesMap;
DELETE FROM ItemValues; DELETE FROM BaseItems; VACUUM;
```
Run as: `sudo sqlite3 /var/lib/jellyfin/data/jellyfin.db "..."`
(Must stop Jellyfin first: `sudo systemctl stop jellyfin`)
### Recordings2 Auto-Creation
If a `Recordings2` library appears, it means LiveTV's recording path in `/etc/jellyfin/livetv.xml` doesn't match the Recordings library path. Fix via API (remove/add paths) rather than editing XML files.
## Dotfiles
- Gitea: `http://192.168.88.200:3000/jgitta/dotfiles.git`
- Token (read/write): `56ac53def371df34d8f9c4b5580b28d4f62ab1ab`
- Includes: micro config, bash aliases, `update-dotfiles` alias
- Preferred editor: `micro` (never nano)
## Backups
- PBS (VM107, 192.168.88.60): nightly full backups of all VMs
- TrueNAS (192.168.88.24): bulk storage backups
## Key Notes
- KSM enabled with ksmtuned on Proxmox host
- Watchtower runs on Siklos but Pi-hole is excluded from auto-updates
- When restoring VMs: restore all disks to HD4T-2 first, then `qm disk move` OS disk to nvme-thin
- `delete_vm` MCP tool removes all disks — always confirm before running
## DNS Resolution Flow (Split-DNS)
All `*.jgitta.com` subdomains resolve internally to Caddy (192.168.88.110) via MikroTik static DNS entries — **not** through Cloudflare/public DNS. If a subdomain doesn't resolve internally, check MikroTik static DNS first, then Caddy config.
- **Internal**: MikroTik static entries → 192.168.88.110 (Caddy)
- **External**: Cloudflare A records → 184.170.161.177 (WAN) → NAT → Caddy
- **Other names**: Pi-hole (192.168.88.27:53) handles upstream forwarding
- **`.homenet` names**: MikroTik static DNS only (e.g., `siklos.homenet`, `jellyfin.homenet`)
- **IoT DNS**: Force-redirected to Pi-hole via MikroTik DSTNAT rules
**Firewall gotcha**: `vlan20-IoT` is dynamically added to the WAN interface list. NAT rules using `in-interface-list=WAN` will match IoT traffic. WAN-facing rules must use `in-interface=ether1` instead.
## Service Architecture — Native vs Docker
**VM102 (jellyfin, 192.168.88.10)** — native systemd for core services:
- `jellyfin.service` — native systemd
- `qbittorrent-nox.service` — native systemd
- `me.proton.vpn.split_tunneling.service` — native systemd
- Radarr, Sonarr, Prowlarr, FlareSolverr — **Docker containers**
**VM103 (next, 192.168.88.62)** — Nextcloud native Apache:
- Nextcloud 33.0.0 — native Apache (`/etc/apache2/sites-enabled/nextcloud.conf`)
- Web root: `/var/www/nextcloud/`
- Config: `/var/www/nextcloud/config/config.php`
- Data dir: `/mnt/nextcloud-data` (local 2TB disk `/dev/sdb1`, ~65% used)
- DB: MySQL on localhost, db `nextcloud`, user `nextcloud`
- Tiered storage mounts:
- Warm: `192.168.88.25:/nvme5-1tb/nextcloud-data``/mnt/warm-storage` (900GB NFS)
- Cold: `192.168.88.24:/mnt/pool1/cold-storage``/mnt/cold-storage` (58TB NFS from TrueNAS)
- Media: `192.168.88.25:/mnt/big-11t/media``/mnt/media`
**VM112 / Siklos (192.168.88.27)** — all services Docker; compose files at `/srv/docker/<service>/` (RAM reduced to 12GB after photo services migrated out)
**VM113 / photos (192.168.88.32)** — dedicated photo services VM, Docker:
- PhotoPrism — port 2342, compose `/srv/docker/photoprism/docker-compose.yml`
- Storage: `/srv/docker/photoprism/storage_data/` (14GB)
- DB: MariaDB 11 at `/srv/docker/photoprism/db_data/`
- Immich — port 2283, compose `/srv/docker/immich/docker-compose.yml`
- Library: `/srv/docker/immich/library/` (22GB)
- DB: PostgreSQL at `/srv/docker/immich/postgres/`
- NFS: `/mnt/photos` from `192.168.88.25:/mnt/big-11t/photos`
- Node exporter: port 9100
**ThinkStation workstation (192.168.88.41)** — Linux Mint 22.3 (separate machine from Proxmox host at .25):
- Frigate NVR — Docker, image `ghcr.io/blakeblackshear/frigate:stable-tensorrt`
- Compose: `/opt/frigate/docker-compose.yml`
- Ports: 8971 (web UI), 5000, 85548555 (RTSP), 8555/udp
- Open-WebUI — Docker (no external ports)
- Disks: OS `/dev/sda2` (234GB, 73%), `/dev/sdb2` 14TB HD at `/mnt/14TB-HD` (72%), `/dev/nvme0n1p1` 469GB at `/mnt/INTEL-SSD`, `/dev/sdc1` 1.8TB at `/mnt/StorFly-SSD`
- NFS mount: `/mnt/photos` from `192.168.88.25:/mnt/big-11t/photos`
## VM Disk → Storage Pool Mapping
| VM | Disk | Pool | Size | Notes |
|---|---|---|---|---|
| VM102 jellyfin | scsi0 (OS) | big-11t | 32GB | |
| VM102 jellyfin | scsi1 (media) | big-11t | 6TB | `/mnt/media` inside VM |
| VM103 next | scsi0 (OS) | nvme-thin | 32GB | |
| VM103 next | scsi1 (data) | nvme-thin | 2TB | local nextcloud data disk |
| VM106 haos | scsi0 | HD4T-2 | 32GB | |
| VM107 pbs | scsi0 | SSD-1.7T | 32GB | |
| VM112 siklos | scsi0 | nvme-thin | 250GB | |
**Storage pool usage** (as of April 2026):
| Pool | Type | Total | Used% |
|---|---|---|---|
| nvme-thin | lvmthin | 4.4TB | 39% |
| big-11t | dir | 10.8TB | 59% |
| SSD-1.7T | dir | 1.7TB | 70% |
| HD4T-1 | dir | 3.6TB | 70% |
| HD4T-2 | dir | 3.6TB | 0.2% |
| pbs | PBS | 64TB | 23% |
| nextcloud-fast | zfspool | — | INACTIVE |
| nvme4-2tb | zfspool | 1.8TB | 0% |
| nvme5-1tb | zfspool | 900GB | 0% |
## NFS Export Map
**From jg-hud (192.168.88.25)**:
| Export | Clients | Mounted by |
|---|---|---|
| `/mnt/big-11t/media` | 192.168.88.0/24 | VM102 (`/mnt/media`), VM103 (`/mnt/media`) |
| `/mnt/big-11t/photos` | 192.168.88.0/24 | VM112 (`/mnt/photos`), ThinkStation (`/mnt/photos`) |
| `/nvme5-1tb/nextcloud-data` | 192.168.88.62 only | VM103 (`/mnt/warm-storage`) |
**From TrueNAS (192.168.88.24)**:
| Export | Mounted by |
|---|---|
| `/mnt/pool1/cold-storage` | VM103 (`/mnt/cold-storage`, 58TB) |
| PBS dataset | VM107 PBS (`/mnt/pbsdataset/pbs-store`) |
## Backup Schedule & Retention
**Job 1 — PBS (enabled)**: Mon/Wed/Fri at 21:00
- All VMs/CTs **except VM107** (PBS itself)
- Storage: `pbs` (PBS datastore `pbs-store` on TrueNAS NFS at `/mnt/pbsdataset/pbs-store`)
- Retention: keep-daily=7, keep-weekly=4, keep-monthly=3
- Email notification: `jgitta@jgitta.com`
**Job 2 — TrueNAS NFS (disabled)**: Sun/Tue/Thu/Sat at 00:00
- All VMs, storage: `Proxmox-TrueNAS-NFS`, keep-last=5
## TrueNAS (192.168.88.24)
- TrueNAS SCALE 24.10.2.4
- **pool1**: RAIDZ2, 8 disks, ~63.7TB used / ~104.4TB total, ONLINE/healthy
- Hosts PBS backup store (NFS-exported to VM107)
- Hosts cold-storage NFS export (58TB, mounted on VM103)
- API: use `curl -sk https://192.168.88.24/api/v2.0/<endpoint> -u "root:<password>"`
- MCP connection is broken — use curl API directly
- Disks with SMART warnings: sda/9JH31S2T (40 unreadable, 24 ATA errors), sdh/9JG47UTT (13 ATA errors)
## Log Locations
| Service | Location | Command |
|---|---|---|
| Jellyfin | `/var/log/jellyfin/` | `tail -f /var/log/jellyfin/jellyfin$(date +%Y%m%d).log` on VM102 |
| FFmpeg transcode | `/var/log/jellyfin/FFmpeg.Transcode-*.log` | VM102 |
| Caddy | systemd journal | `journalctl -u caddy -f` on CT202 |
| Pi-hole | FTL DB / journal | `/etc/pihole/pihole-FTL.db`, `journalctl -u pihole-FTL` on Siklos |
| Docker services (Siklos) | docker logs | `docker compose logs -f` in `/srv/docker/<service>/` |
| Proxmox | `/var/log/pve/` | on jg-hud (192.168.88.25) |
| qBittorrent | systemd journal | `journalctl -u qbittorrent-nox -f` on VM102 |
| ProtonVPN | systemd journal | `journalctl -u me.proton.vpn.split_tunneling -f` on VM102 |
## Uptime Kuma Monitors (Siklos, port 3001)
HTTP checks: Caddy Health, Gitea, Glances, Grafana, Homarr, Home Assistant, Immich, Jellyfin, Karakeep, Linkwarden, MeshCentral, Nextcloud, PBS, Pi-hole web, Prometheus
Port checks: MikroTik SSH (:22), Pi-hole DNS (:53), Proxmox SSH (:22), Siklos SSH (:22)
DNS check: Pi-hole DNS resolution (google.com via 192.168.88.27:53)
Metric check: Node Exporter on ThinkStation (192.168.88.41:9100)
**Not monitored**: VM102 SSH, qBittorrent, ProtonVPN status, Radarr/Sonarr/Prowlarr health, TrueNAS, Frigate