<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://cryptojones.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://cryptojones.github.io/" rel="alternate" type="text/html" /><updated>2026-06-15T16:10:57+00:00</updated><id>https://cryptojones.github.io/feed.xml</id><title type="html">CryptoJones</title><subtitle>Principal SWE · AI agents &amp; security</subtitle><entry><title type="html">XSpaceWar-AI: We Rebuilt Spacewar! With Real Gravity — and It’s Free</title><link href="https://cryptojones.github.io/XSpaceWar-AI-Open-Source-Newtonian-Spacewar/" rel="alternate" type="text/html" title="XSpaceWar-AI: We Rebuilt Spacewar! With Real Gravity — and It’s Free" /><published>2026-06-14T00:00:00+00:00</published><updated>2026-06-14T00:00:00+00:00</updated><id>https://cryptojones.github.io/XSpaceWar-AI-Open-Source-Newtonian-Spacewar</id><content type="html" xml:base="https://cryptojones.github.io/XSpaceWar-AI-Open-Source-Newtonian-Spacewar/"><![CDATA[<p><em>Sixty-four years ago a couple of MIT hackers wired two spaceships and a star
into a PDP-1 and invented the multiplayer video game. We took that idea, gave it
honest Newtonian physics, generated every pixel and every sound from math, taught
the bots to be genuinely mean, and put 16 ships in the same arena. It’s done, it
runs on Windows, macOS, and Linux, it’s Apache-2.0 open source, and it costs
nothing.</em></p>

<p><strong>👉 Play it now on <a href="https://cryptojones.itch.io/xspacewar-ai">itch.io</a> — or grab
a build straight from
<a href="https://github.com/CryptoJones/XSpaceWar-AI/releases/latest">GitHub Releases</a>.</strong></p>

<ul>
  <li>🎮 <strong>itch.io:</strong> <a href="https://cryptojones.itch.io/xspacewar-ai">cryptojones.itch.io/xspacewar-ai</a></li>
  <li>🌐 <strong>Landing page:</strong> <a href="https://cryptojones.github.io/XSpaceWar-AI/">cryptojones.github.io/XSpaceWar-AI</a></li>
  <li>⬇️ <strong>Direct downloads (Win · macOS · Linux):</strong> <a href="https://github.com/CryptoJones/XSpaceWar-AI/releases/latest">GitHub Releases</a></li>
  <li>🛠️ <strong>Source (Apache-2.0):</strong> <a href="https://github.com/CryptoJones/XSpaceWar-AI">github.com/CryptoJones/XSpaceWar-AI</a></li>
</ul>

<h2 id="the-first-networked-match-said-everything">The first networked match said everything</h2>

<p>Dedicated server on one machine, two pilots joining from two more. The human —
barely moving, just letting the star do the work — beat the AI <strong>3 to −19</strong>,
while the bot fell into the gravity well <strong>fifty-nine times</strong>. Newtonian gravity
is undefeated.</p>

<p><img src="/images/xspacewar-ai-victory.png" alt="The first networked deathmatch — human beats the AI 3 to −19" /></p>

<p>That score isn’t a difficulty setting. It’s physics. The star pulls with pure
<code class="language-plaintext highlighter-rouge">G·M / r²</code> — no caps, no clamps, no softening. Fly too close and the pull
genuinely diverges; you cannot thrust your way out of a star you’ve already
fallen into. That’s not a bug, it’s a star.</p>

<p><img src="/images/xspacewar-ai-star.png" alt="The gravity well claims another ship" /></p>

<h2 id="what-it-actually-is">What it actually is</h2>

<p>A modern, top-down <strong>space-fighter</strong> with a heavy AI focus — a 2026 reimagining
of the classic networked <em>Spacewar!</em> / <code class="language-plaintext highlighter-rouge">xspacewar</code>. Bird’s-eye view like the 1962
original, but built in <strong>Godot 4</strong> with:</p>

<ul>
  <li><strong>Honest Newtonian flight.</strong> Conserved momentum, inverse-square gravity on ships
<em>and</em> torpedoes, gravity-assist slingshots, hyperspace. No drag — to slow down
you flip 180° and burn against your own velocity, the same maneuver Spacewar!
pilots have flown since ‘62.</li>
  <li><strong>100% procedural everything.</strong> Stars, planets, moons, asteroid fields, nebulae,
ships, VFX — and <em>audio</em>. There’s not a single hand-drawn art asset or recorded
sound file in the game; every sound is synthesized from math at load. Every
match is a fresh seeded arena.</li>
  <li><strong>Up to 16 ships</strong>, four game modes: Free-for-all, Team battle, AI bots
(<strong>Rookie → Insane</strong>), and <strong>Movie Mode</strong> — an all-AI attract reel that
regenerates a brand-new arena, roster, and teams every 30 minutes. Leave it
running on a second monitor; it never plays the same match twice.</li>
  <li><strong>Multiplayer that works today.</strong> Host-authoritative netcode with client-side
prediction and reconciliation, automatic <strong>LAN discovery</strong>, <strong>direct IP</strong>, and
platform-agnostic <strong>internet play</strong> through the project’s own relay/master
server — room codes, an online server browser, no port forwarding and no
platform account required.</li>
  <li><strong>Seven UI languages</strong>, including right-to-left Arabic.</li>
</ul>

<h2 id="standing-on-64-years-of-shoulders">Standing on 64 years of shoulders</h2>

<p>XSpaceWar-AI is a clean-room implementation, but it knows exactly where it comes
from: <strong>Spacewar!</strong> (MIT, 1962, on the DEC PDP-1), the 1974 PDP-11 networked
port, and <strong>Ron Frederick’s</strong> 1992 X11 <code class="language-plaintext highlighter-rouge">xspacewar 1.2</code> — an early <em>serverless</em>
networked game where peers shared state directly. Huge thanks to Ron
(<a href="https://github.com/ronf">github.com/ronf</a>), whose 1992 networked Spacewar is
this game’s direct ancestor.</p>

<h2 id="get-in-a-ship">Get in a ship</h2>

<p>The whole thing is free and open source under Apache-2.0. No launcher, no account,
no telemetry, ~100 MB on disk.</p>

<ul>
  <li><strong>Easiest:</strong> play in the browser or download from
<strong><a href="https://cryptojones.itch.io/xspacewar-ai">itch.io</a></strong>.</li>
  <li><strong>Prebuilt binaries:</strong> the zip for your platform on
<strong><a href="https://github.com/CryptoJones/XSpaceWar-AI/releases/latest">GitHub Releases</a></strong>
(Windows, macOS universal, Linux x86_64).</li>
  <li><strong>From source:</strong> clone
<a href="https://github.com/CryptoJones/XSpaceWar-AI">the repo</a>, run <code class="language-plaintext highlighter-rouge">./install.sh</code>
(it fetches and SHA-512-verifies Godot 4.6.3 for you), then <code class="language-plaintext highlighter-rouge">godot --path .</code>.</li>
  <li><strong>Steam:</strong> coming soon.</li>
</ul>

<blockquote>
  <p><strong>macOS note:</strong> the build is ad-hoc signed but not notarized, so Gatekeeper may
claim it’s “damaged.” It isn’t — right-click → <strong>Open</strong>, or
<code class="language-plaintext highlighter-rouge">xattr -dr com.apple.quarantine /path/to/XSpaceWar-AI.app</code>. Full instructions are
on the <a href="https://github.com/CryptoJones/XSpaceWar-AI/releases/latest">releases page</a>.</p>
</blockquote>

<p>Bring friends. Pick <em>HOST — LAN skirmish</em>, share a 4-letter room code over the
internet, or just turn on Movie Mode and watch the bots feed themselves to the
star. The gravity is real, the maps are infinite, and the source is yours.</p>

<p>🚀 <strong><a href="https://cryptojones.itch.io/xspacewar-ai">Play XSpaceWar-AI →</a></strong></p>

<hr />

<p><em>Apache-2.0 · <a href="https://github.com/CryptoJones/XSpaceWar-AI">GitHub</a> ·
<a href="https://codeberg.org/CryptoJones/XSpaceWar-AI">Codeberg mirror</a>. Dedicated to
JVL. Proudly Made in Nebraska. 🌽</em></p>]]></content><author><name></name></author><summary type="html"><![CDATA[Sixty-four years ago a couple of MIT hackers wired two spaceships and a star into a PDP-1 and invented the multiplayer video game. We took that idea, gave it honest Newtonian physics, generated every pixel and every sound from math, taught the bots to be genuinely mean, and put 16 ships in the same arena. It’s done, it runs on Windows, macOS, and Linux, it’s Apache-2.0 open source, and it costs nothing.]]></summary></entry><entry><title type="html">From a Memory Aid to Multi-Tenant: The .NET Decision That Exploded the Scope</title><link href="https://cryptojones.github.io/From-Memory-Aid-to-Multi-Tenant-The-NET-Decision-That-Exploded-The-Scope/" rel="alternate" type="text/html" title="From a Memory Aid to Multi-Tenant: The .NET Decision That Exploded the Scope" /><published>2026-06-08T00:00:00+00:00</published><updated>2026-06-08T00:00:00+00:00</updated><id>https://cryptojones.github.io/From-Memory-Aid-to-Multi-Tenant-The-NET-Decision-That-Exploded-The-Scope</id><content type="html" xml:base="https://cryptojones.github.io/From-Memory-Aid-to-Multi-Tenant-The-NET-Decision-That-Exploded-The-Scope/"><![CDATA[<p><em>ApplyTrack started life as a glorified memory aid — a folder of Markdown files
so I’d stop forgetting where I’d applied and when to follow up. It is now an
open-source, multi-tenant, self-hostable application with magic-link auth, a
two-runtime backend, and a CI pipeline that ships container images. This post is
about the single design decision that turned the first thing into the second one
— moving the core API off pure Python and onto .NET — and the avalanche of scope
that came with it.</em></p>

<p><em>It’s also, secretly, a post about systems design. I’ve been told the thing that
separates a senior or staff engineer from a strong mid-level one isn’t knowing
more frameworks — it’s reasoning about <strong>trade-offs, invariants, coupling, and
failure modes</strong> before you write a line of code. So as I walk through this
rewrite I’m going to narrate the systems-design thinking out loud: the concepts I
reached for at each fork, and why. The project was, honestly, an excuse to
practice exactly that muscle.</em></p>

<h2 id="what-it-actually-was-at-the-start">What it actually was at the start</h2>

<p>The honest origin story: I kept losing track of job applications. Did I already
apply to that company? When was I supposed to follow up on the screen? What
salary did the post say before they took it down?</p>

<p>So I built the dumbest possible thing that solved it. Every application was a
Markdown file with some YAML frontmatter:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="na">company</span><span class="pi">:</span> <span class="s">Acme Corp</span>
<span class="na">role</span><span class="pi">:</span> <span class="s">Senior Platform Engineer</span>
<span class="na">lane</span><span class="pi">:</span> <span class="s">applied</span>
<span class="na">status</span><span class="pi">:</span> <span class="s">screen</span>
<span class="na">link</span><span class="pi">:</span> <span class="s">https://example.com/jobs/123</span>
<span class="na">salary</span><span class="pi">:</span> <span class="s2">"</span><span class="s">$180k–$210k"</span>
<span class="na">applied</span><span class="pi">:</span> <span class="s">2026-05-01</span>
<span class="na">followup</span><span class="pi">:</span> <span class="s">2026-05-08</span>
<span class="na">score</span><span class="pi">:</span> <span class="m">87</span>
<span class="nn">---</span>

Recruiter was responsive. Take-home is a Postgres schema design.
</code></pre></div></div>

<p>The whole “database” was a directory of those files. State lived in three JSON
files next to it — <code class="language-plaintext highlighter-rouge">.criteria.json</code> for my search keywords, <code class="language-plaintext highlighter-rouge">.blacklist.json</code>
for companies I never wanted to see again, <code class="language-plaintext highlighter-rouge">.seen.json</code> so the discovery poller
wouldn’t show me the same listing twice. A thin FastAPI app put a vanilla-JS
single-page app in front of it, and a Python poller scraped a handful of public
job boards and dropped fresh matches in as new files.</p>

<p>That’s it. <strong>Single user. Files on disk. No auth worth the name.</strong> It was a
memory enhancement utility, and for one person it was <em>perfect</em>. The “schema”
was a slug naming convention. The “migration story” was <code class="language-plaintext highlighter-rouge">git add</code>.</p>

<h2 id="the-deceptively-small-product-decision">The deceptively small product decision</h2>

<p>Then I decided to open-source it and make it something other people could
self-host for their own job search. On the surface that sounds like a packaging
problem — write a README, add a Dockerfile, push to GitHub. Ship it.</p>

<p>It is not a packaging problem. The moment “one person” becomes “many people on
one deployment,” every cozy assumption the file-based design rested on falls
over at once:</p>

<ul>
  <li>A folder of Markdown files has no concept of <em>whose</em> files they are.</li>
  <li>Three shared JSON config files can’t hold one search profile <strong>per user</strong>.</li>
  <li>“No auth” stops being charmingly minimal and starts being a data breach.</li>
  <li>A <code class="language-plaintext highlighter-rouge">.seen.json</code> that dedupes for <em>me</em> will happily hide your leads behind mine.</li>
</ul>

<p>This is the first place systems-design thinking earns its keep. Multi-tenancy
isn’t a feature you bolt on the side — it’s a <strong>system invariant</strong>: a property
that has to hold on <em>every single read and write in the system</em>, in both
runtimes, forever. Invariants are the things you design the system <em>around</em>, not
checks you remember to sprinkle in. And the test of a design is whether it can
even <em>express</em> the invariant you need. The file-on-disk model couldn’t express
“this row belongs to that user” at all — there was no place for the constraint to
live. That’s the tell that you’ve outgrown a data model: not that it’s slow, but
that the property you now need is <em>inexpressible</em> in it.</p>

<h2 id="the-fork-in-the-road-stay-python-or-move-the-core-to-net">The fork in the road: stay Python, or move the core to .NET</h2>

<p>Here’s where the real decision was. The path of least resistance was obvious:
keep it all Python. FastAPI was already there. Reach for SQLAlchemy, bolt on an
auth library, add a <code class="language-plaintext highlighter-rouge">tenant_id</code> foreign key, and grind it out in the language
the whole thing was already written in. One runtime, one mental model, no
context-switching.</p>

<p>I didn’t do that. I moved the core API to <strong>.NET 10</strong> — ASP.NET Core Minimal
APIs on Kestrel, Dapper + Npgsql over Postgres, DbUp for migrations — and kept
Python <em>only</em> for the discovery poller.</p>

<p>The reasons that won the argument were all systems-design arguments, not language
preferences:</p>

<ul>
  <li><strong>The auth + session + concurrency surface is exactly what this stack is built
for.</strong> Multi-tenant CRUD with optimistic locking, server-side sessions, and a
hard request-scoped security boundary is bread-and-butter ASP.NET Core. The
design principle here is <em>don’t hand-roll your load-bearing primitives</em> —
identity, sessions, and concurrency control are exactly the parts where a
framework’s well-trodden spine beats a bespoke assembly of libraries, because
the failure mode of getting them subtly wrong is “data breach,” not “bug.”</li>
  <li><strong>Hand-written SQL over a typed data layer.</strong> Dapper maps SQL straight to
records with no ORM mystery. This is a <em>legibility</em> trade-off: for a schema that
two different runtimes have to agree on byte-for-byte, I wanted the SQL to be
the explicit, reviewable contract, not an abstraction generating queries I’d
have to reverse-engineer. An ORM optimizes for developer convenience; I was
optimizing for <em>the boundary being inspectable</em>.</li>
  <li><strong>I wanted the invariant to be structurally enforced, not remembered.</strong> A
compiled, statically-typed API layer lets me make “every query filters
<code class="language-plaintext highlighter-rouge">tenant_id</code>” a property of the <em>type system and the DI graph</em> rather than a rule
in a code-review checklist. The best way to enforce an invariant is to make
violating it <em>unrepresentable</em> — and a typed boundary gets you closer to that
than a dynamically-typed one.</li>
</ul>

<p>I knew, choosing it, that I was trading a weekend of Python grinding for
something much bigger. I underestimated by how much — but that’s the other
half of staff-level thinking: the cheap option and the <em>right</em> option are often
different options, and the skill is knowing when the goal has changed enough that
the cheap one is now the expensive one.</p>

<h2 id="the-avalanche">The avalanche</h2>

<p>Picking .NET didn’t just change the language. It detonated the scope, because
once you commit to a <em>real</em> API you have to actually build all the things the
file-based toy got to skip:</p>

<p><strong>A real schema, and migrations to evolve it.</strong> The folder-of-files became nine
DbUp migrations — <code class="language-plaintext highlighter-rouge">applications</code>, <code class="language-plaintext highlighter-rouge">search_profiles</code>, <code class="language-plaintext highlighter-rouge">blacklist</code>, <code class="language-plaintext highlighter-rouge">users</code>,
<code class="language-plaintext highlighter-rouge">magic_tokens</code>, <code class="language-plaintext highlighter-rouge">sessions</code>, <code class="language-plaintext highlighter-rouge">seen</code>, <code class="language-plaintext highlighter-rouge">poll_requests</code>, and the cascade wiring.
Idempotent <code class="language-plaintext highlighter-rouge">.sql</code> scripts that run on startup. The slug naming convention became
a <code class="language-plaintext highlighter-rouge">UNIQUE (tenant_id, name)</code> constraint and a validation choke-point.</p>

<p><strong>Real authentication.</strong> I built passwordless magic-link sign-in: request a
link, get a single-use token (only its SHA-256 is stored, 15-minute TTL), verify
it, and mint an <strong>opaque server-side session</strong> — deliberately not a JWT, so
logout is instant revocation, not “wait for the token to expire.” The endpoint
that requests a link always returns <code class="language-plaintext highlighter-rouge">200</code> whether or not the account exists, so
it can’t be used to enumerate who has signed up.</p>

<p><strong>A tenancy choke-point.</strong> This is the heart of the whole rewrite, and it’s the
single design pattern I’m proudest of. The systems-design idea is <em>funnel the
dangerous decision through exactly one place</em>. One middleware resolves the
session cookie to a tenant, and it is the <strong>only</strong> place a <code class="language-plaintext highlighter-rouge">tenant_id</code> enters the
system. Endpoints are handed a repository from DI that’s already scoped to the
caller — endpoint code physically cannot query another tenant’s rows, because it
never sees a tenant id to get wrong. That’s the difference between a security
<em>control</em> and a security <em>invariant</em>: instead of N endpoints each remembering to
filter correctly (N chances to leak), there’s one choke-point and N endpoints
that <em>structurally can’t</em>. You shrink the attack surface to a single auditable
function:</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// every read and write carries the tenant, unconditionally</span>
<span class="n">UPDATE</span> <span class="n">applications</span>
   <span class="n">SET</span> <span class="p">...,</span> <span class="n">version</span> <span class="p">=</span> <span class="n">version</span> <span class="p">+</span> <span class="m">1</span>
 <span class="n">WHERE</span> <span class="n">id</span> <span class="p">=</span> <span class="n">@id</span> <span class="n">AND</span> <span class="n">tenant_id</span> <span class="p">=</span> <span class="n">@tenantId</span> <span class="n">AND</span> <span class="n">version</span> <span class="p">=</span> <span class="n">@expectedVersion</span><span class="p">;</span>
<span class="c1">// 0 rows affected -&gt; 409 Conflict</span>
</code></pre></div></div>

<p><strong>Optimistic concurrency.</strong> The old design’s “version” was a file’s mtime and
size. On a multi-user database that’s meaningless, so every application row got a
real <code class="language-plaintext highlighter-rouge">version</code> column. Writes pass <code class="language-plaintext highlighter-rouge">?expected_version=</code>, a mismatch answers
<strong>409 Conflict</strong>, and the SPA’s overwrite-confirm flow drives off that — two
open tabs can’t silently clobber each other. The design decision underneath is
<em>optimistic vs. pessimistic locking</em>: I bet that write-write conflicts are rare
(two people rarely edit the same application row at the same instant), so I don’t
pay the cost of locking rows on every read — I just detect the rare collision and
make the client resolve it. Choosing the concurrency-control strategy that
matches your <em>actual</em> contention profile, rather than reflexively reaching for
locks, is a very systems-design call.</p>

<p><strong>Security designed in, not bolted on — graded against 20 years of OWASP.</strong> Once
strangers trust the thing with their data, “I’ll secure it later” is a design
smell, so I audited the app against the <strong>OWASP Top 10</strong> — and not just the
current list. I ran it against two decades of the Top 10, every edition back to
the original 2007 list, because the categories that <em>rotate off</em> the list don’t
stop being exploitable; they just stop being fashionable. (CSRF dropped off years
ago, but a multi-tab session app still has to answer for it.) The threat model is
the union of all of them, not the latest snapshot. What came out of that audit: a
strict CSP and a security-header middleware, per-IP rate limits, an SSRF-hardened
link probe (because a server that fetches user-supplied URLs is a confused-deputy
waiting to happen), DOMPurify on rendered notes to kill stored XSS, generic 500s
that don’t leak internals, the no-account-enumeration auth surface, and a
dependency-audit CI job so a vulnerable transitive package fails the build instead
of shipping.</p>

<p><strong>And then everything else that comes after “it works on my machine”:</strong> a
three-service <code class="language-plaintext highlighter-rouge">docker compose up</code>, account export (a zip of your Markdown — your
data stays yours) and one-call account deletion via <code class="language-plaintext highlighter-rouge">ON DELETE CASCADE</code>, and a
tag-driven release pipeline that publishes both runtimes as container images. None
of that existed in the memory-aid version. None of it is optional once strangers
trust the thing with their data.</p>

<h2 id="the-compromise-that-kept-it-sane-polyglot-with-the-schema-as-the-contract">The compromise that kept it sane: polyglot, with the schema as the contract</h2>

<p>The one thing I refused to do was rewrite the poller. It already had eight source
fetchers, the HTML scraping, and the scoring/dedup logic — all in working,
tested Python. Re-implementing that in C# would have been throwing away the part
that already worked to satisfy a purity I didn’t care about.</p>

<p>So ApplyTrack is deliberately <strong>polyglot</strong>: a .NET API and a Python poller that
<strong>never call each other</strong>. They share exactly one thing — the Postgres schema —
and that schema <em>is</em> the contract between them.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>            ┌─────────────────────────────────┐
Browser ──► │ ASP.NET Core (.NET 10, Kestrel)  │
 (the SPA)  │  • serves the SPA + JSON API     │──┐
            │  • magic-link auth + sessions    │  │
            │  • CRUD + criteria + blacklist   │  │
            └─────────────────────────────────┘  ├──► Postgres  (shared schema
            ┌─────────────────────────────────┐  │              = the contract)
 Cron  ───► │ Python poller                    │──┘
            │  • fetch + score + dedupe leads  │
            │  • drain the on-demand poll queue│
            └─────────────────────────────────┘
</code></pre></div></div>

<p>The systems-design backbone here is <strong>clear data ownership</strong>. Every table has
exactly one writer-of-record: .NET owns auth, sessions, and CRUD, and it owns the
migrations; Python writes new leads and reads profiles, the seen-ledger, and the
active-user list. Ambiguous ownership (“either service might write this”) is how
you get races and corruption that no amount of locking saves you from, so I made
ownership explicit and one-directional. <strong>Both runtimes unconditionally filter
<code class="language-plaintext highlighter-rouge">WHERE tenant_id</code></strong> — the cron worker doesn’t get to bypass the choke-point just
because it’s a background job; it builds a tenant-scoped reader <em>inside</em> its
per-tenant loop. (The invariant doesn’t get a day off because the request didn’t
come from a browser.)</p>

<p>Two more design choices in that diagram are doing quiet work:</p>

<ul>
  <li><strong>Decoupling via a queue.</strong> The “Poll now” button doesn’t shell out to Python
from C# — synchronous cross-runtime calls would couple their uptime and latency
together. Instead it drops a row in a <code class="language-plaintext highlighter-rouge">poll_requests</code> queue that the worker
drains on its own schedule. That’s the classic move: turn a <em>temporal</em> coupling
into an <em>asynchronous</em> one. The API can answer “queued” in milliseconds even if
the poller is mid-run or restarting; neither runtime blocks on the other.</li>
  <li><strong>Failure isolation / blast radius.</strong> The worker processes tenants in a loop
where one tenant’s failure is caught and <em>can’t</em> abort the others. When you have
a shared background job, the design question is always “what’s the blast radius
of one bad input?” — and the answer here is “one tenant’s poll, not everyone’s.”</li>
</ul>

<p>This contract is the part I’d defend hardest in an interview. The temptation when
you go polyglot is to have the two halves talk over an internal HTTP API — and
then you’ve signed up for versioning, retries, authentication <em>between your own
services</em>, and a second contract to keep in sync. That’s <strong>accidental coupling</strong>
dressed up as architecture. Making the <em>database the single source of truth</em> and
the <em>schema the contract</em> meant there was exactly one thing to keep honest, with
no network partition to reason about between the halves and a schema-shape test on
each side to guard against drift. Fewer moving parts that can disagree is, almost
always, the better system.</p>

<h2 id="was-the-net-decision-worth-it">Was the .NET decision worth it?</h2>

<p>Yes — but I want to be honest about what “yes” cost.</p>

<p>If the goal had stayed “help <em>me</em> remember my applications,” choosing .NET would
have been malpractice. The Markdown-files version was better at that job:
zero infrastructure, grep-able, git-versioned, done in an afternoon. The whole
multi-tenant edifice would have been a monument to a problem I didn’t have.</p>

<p>The decision was right <em>because the goal changed</em>. The instant the target became
“many people, one deployment, self-hostable, don’t leak anyone’s data,” I needed
a real security boundary, real sessions, real concurrency control, and a schema
that could evolve under live data. Those are precisely the problems .NET’s spine
is shaped to hold, and leaning on it — instead of hand-assembling the same guards
out of Python libraries — is why the tenancy story is <em>one middleware and a
scoped repo</em> instead of a discipline I have to re-prove in every endpoint.</p>

<p>The lesson I’m taking with me is a systems-design one: <strong>the scope explosion
wasn’t caused by the technology choice — it was <em>revealed</em> by it.</strong> The
multi-tenant complexity was an inherent property of the problem the moment I went
from one user to many; pure Python wouldn’t have made it smaller, just quieter and
easier to get subtly wrong. Picking the stack that makes the invariant loud forced
me to actually build the invariant. Going from a memory aid to a real application
was never going to be a packaging task. It was a different application wearing the
old one’s name.</p>

<p>And that, I think, is the actual content of “be good at systems design” — the
thing I keep hearing is the gate to senior and staff roles. It isn’t memorizing
patterns. It’s the habit of, at every fork, asking the questions this rewrite kept
forcing on me: <em>What invariant must hold, everywhere, forever? Where does the
dangerous decision get funneled so it’s auditable in one place? Who owns this
data, and is that ownership unambiguous? What’s coupled to what, and can I trade a
synchronous dependency for an asynchronous one? What’s the blast radius when this
fails — because it will fail?</em> The stack I picked didn’t make me ask those
questions. It just made it impossible to skip them. Honestly, that’s why I built
the thing the hard way: a side project is the cheapest place there is to practice
the expensive kind of thinking.</p>

<h2 id="the-part-i-deliberately-left-out-a-local-frontier-model">The part I deliberately left out: a local frontier model</h2>

<p>There’s a feature I cut from v1 on purpose — an AI engine for drafting tailored
cover letters and application materials. It was the heaviest thing to build and
the scariest data to handle, so the disciplined call was to ship the tracker
without it and leave a clean seam for later. (Knowing what <em>not</em> to build yet is
its own systems-design skill; scope is a design decision.) But I keep turning over
what that seam could become, and the answer that excites me isn’t “call an API.”
It’s: <strong>run a powerful frontier model locally, right next to the data.</strong></p>

<p>Think about the shape of this app. It’s self-hosted, no telemetry, your data
stays on your box — that’s the whole pitch. The instant you bolt on a hosted LLM,
you’ve quietly broken that promise: now your entire job search, your résumé, every
note you wrote about a recruiter, is flowing to a third party’s servers to be
logged and maybe trained on. A <em>local</em> model is the only addition that doesn’t
violate the thing that makes ApplyTrack worth self-hosting in the first place.
<strong>Data locality isn’t just a performance idea — here it’s the privacy
architecture.</strong> The model comes to the data; the data never leaves.</p>

<p>And once a capable model is sitting in the same trust boundary as the database, a
lot of the app’s rough edges turn into soft ones:</p>

<ul>
  <li><strong>Discovery gets a brain.</strong> Today the poller scores leads with keyword matching
— blunt, full of false positives. A local model does <em>semantic</em> relevance: it
actually reads the posting against your history and your real preferences, not
just your keyword list. It can extract clean structured fields out of the messy
free-text every board formats differently, and explain <em>why</em> a lead scored the
way it did.</li>
  <li><strong>The materials engine, finally.</strong> Draft a cover letter grounded in the specific
posting <em>and</em> your own past applications — retrieval over your history, not a
generic template. Tailor a résumé summary per role. All of it on hardware you
control, with your writing never leaving the building.</li>
  <li><strong>A research assistant over your own funnel.</strong> “What stage is everything in, and
what’s gone cold?” “Summarize every interview note for this company before my
onsite.” “Which of these three offers fits what I said I wanted?” That’s RAG over
data the app already owns.</li>
</ul>

<p>Here’s the systems-design payoff, and the reason I’m dwelling on it: <strong>the
architecture I already built is exactly the shape that makes this a clean add.</strong>
The polyglot, schema-as-contract design means a model runtime is just <em>another
worker against the same database</em>, behind the same tenant choke-point, owning its
own writes — no different in kind from the Python poller. I wouldn’t be retrofitting
AI into a monolith; I’d be adding a third lane to a system that was already
designed as decoupled lanes sharing one contract. The decoupling I did for boring
reasons (keep the Python fetchers, don’t couple the runtimes) turns out to be the
thing that makes the <em>interesting</em> future cheap. Good boundaries pay you back in
directions you didn’t predict when you drew them — which is, maybe, the whole
argument for caring about them.</p>

<h2 id="where-this-falls-over-at-enterprise-scale">Where this falls over at enterprise scale</h2>

<p>I want to close on the most senior-engineer move there is: being honest about the
limits of your own design. ApplyTrack is built for <em>self-hosting and small
multi-tenant deployments</em> — dozens, maybe low hundreds of users on one box. The
architecture is deliberately right-sized for that. If someone asked me to take it
to “enterprise” — thousands of orgs, millions of applications, an SLA — here’s
where I already know it would break, roughly in the order the cracks would show:</p>

<ul>
  <li>
    <p><strong>The single Postgres is the first ceiling.</strong> Every layer leans on one database:
CRUD, the session lookup on <em>every authenticated request</em>, the poll queue, and
the poller’s writes all land on the same instance. That’s a single point of
failure and a single point of contention. The first work is the boring,
load-bearing stuff: a primary with read replicas, a connection pooler
(PgBouncer) so thousands of clients don’t exhaust backends, and moving the
hot-path session check off Postgres into a cache like Redis. Server-side sessions
bought me instant revocation; at scale that convenience becomes a per-request
database read I’d have to buy back with a cache + invalidation.</p>
  </li>
  <li>
    <p><strong>Pooled (shared-schema) tenancy hits a wall — both technically and on
compliance.</strong> Row-level <code class="language-plaintext highlighter-rouge">tenant_id</code> filtering is the right call for hundreds of
tenants, but it has a <em>noisy-neighbor</em> problem (one tenant’s heavy queries
degrade everyone on the shared tables) and a <em>trust</em> problem (enterprise buyers,
SOC 2, data-residency rules often demand physical isolation, not “we promise the
<code class="language-plaintext highlighter-rouge">WHERE</code> clause is always there”). The escape path is a tenancy spectrum:
Postgres row-level security as belt-and-suspenders, then schema-per-tenant, then
database-per-tenant or sharding by tenant for the big customers. None of that is
free — it turns one migration into N, and the choke-point has to learn to route.</p>
  </li>
  <li>
    <p><strong>The poller doesn’t scale horizontally — it’s one cron worker in a <code class="language-plaintext highlighter-rouge">for</code>
loop.</strong> Fetch-once-per-run is a clever rate-limit dodge at small scale, but the
scoring pass is O(tenants × listings) on a <em>single</em> process, and a slow source
stalls the whole sweep. Enterprise needs the work fanned across many workers
(tenant sharding with leader election or partition assignment, so two workers
never double-process a tenant), a real broker (SQS/Redis/Kafka) instead of a
database-polling <code class="language-plaintext highlighter-rouge">poll_requests</code> table, per-source rate-limit <em>budgets</em>, and
backpressure so a board outage doesn’t cascade. Scraping public boards at all is
itself brittle at volume — IP bans, CAPTCHAs, ToS exposure — so at scale you’d
push toward official APIs and a shared caching layer in front of every source.</p>
  </li>
  <li>
    <p><strong>The schema-as-contract that saved me small becomes a coupling tax large.</strong>
Two runtimes sharing one database is the perfect amount of architecture for two
components; the <em>shared-database integration</em> pattern is a known anti-pattern
the moment you have several teams and services, because it couples their deploys
and their schema evolution. Past a certain size you have to break the shared DB
into service-owned stores that talk over events or APIs — which reintroduces
exactly the versioning-and-contracts cost I was so happy to avoid. That’s not a
mistake in the current design; it’s the trade-off having an expiry date.</p>
  </li>
  <li>
    <p><strong>The identity model assumes one human per tenant.</strong> Today <code class="language-plaintext highlighter-rouge">tenant_id == user.id</code>
— fine for individuals self-hosting. Enterprise means <em>organizations</em>: many users
per tenant, teams, roles and RBAC, SSO/SAML/OIDC and SCIM provisioning instead of
magic links, and an audit log of who-did-what for compliance. The schema was
named to future-proof orgs, but actually building them is a real project on its
own.</p>
  </li>
  <li>
    <p><strong>And everything operational I got to ignore at one box.</strong> Single-node Docker
Compose has no HA, no rolling deploys, no autoscaling — that’s Kubernetes or a
managed platform, multiple stateless API replicas behind a load balancer (easy,
since sessions live in the data tier) and a poller that’s safe to run more than
one of (hard). Plus the things multi-tenant SaaS lives or dies on: per-tenant
metering and quotas so one tenant can’t starve the poller, distributed tracing
and structured logs, alerting, and an unbounded-growth story (partitioning and
archival for <code class="language-plaintext highlighter-rouge">applications</code> and the <code class="language-plaintext highlighter-rouge">seen</code> ledger) before the tables get slow.</p>
  </li>
</ul>

<p>The through-line: almost none of these are <em>bugs</em>. They’re trade-offs I made
<strong>on purpose</strong> for the scale I’m actually at, each with a known escape hatch for
the day the scale changes. That’s the real deliverable of systems-design thinking
— not a system that’s ready for a million users, but one where you can name,
precisely, what would have to change to get there, and why you correctly chose not
to build it yet.</p>

<hr />

<p><em>ApplyTrack is open-source (Apache-2.0) and self-hostable —
<a href="https://github.com/CryptoJones/OSApplyTrack">github.com/CryptoJones/OSApplyTrack</a>.
One <code class="language-plaintext highlighter-rouge">docker compose up</code> brings up the database, the API, and the poller. Your
data is yours: one-click Markdown export, one-call account deletion, no
telemetry, no SaaS.</em></p>]]></content><author><name></name></author><summary type="html"><![CDATA[ApplyTrack started life as a glorified memory aid — a folder of Markdown files so I’d stop forgetting where I’d applied and when to follow up. It is now an open-source, multi-tenant, self-hostable application with magic-link auth, a two-runtime backend, and a CI pipeline that ships container images. This post is about the single design decision that turned the first thing into the second one — moving the core API off pure Python and onto .NET — and the avalanche of scope that came with it.]]></summary></entry><entry><title type="html">LoRA, For Real: The Tech Stack Behind the Videos</title><link href="https://cryptojones.github.io/LoRA-For-Real-The-Tech-Stack-Behind-The-Videos/" rel="alternate" type="text/html" title="LoRA, For Real: The Tech Stack Behind the Videos" /><published>2026-06-07T00:00:00+00:00</published><updated>2026-06-07T00:00:00+00:00</updated><id>https://cryptojones.github.io/LoRA-For-Real-The-Tech-Stack-Behind-The-Videos</id><content type="html" xml:base="https://cryptojones.github.io/LoRA-For-Real-The-Tech-Stack-Behind-The-Videos/"><![CDATA[<p><strong>Huge thanks to Ronin 48 and Thomas Wenzke for the initial push to do this.</strong></p>

<p><em>I made two videos about LoRA. One is 90 seconds for people who have never
written a line of code; the other is a 13-minute walk through building adapters
for frontier models. This post is neither of those. The videos use analogies —
“a tiny notepad clipped onto a frozen brain.” This post is the actual stack: the
config files, the quantization scheme, the data pipeline, and the GPU bill.</em></p>

<h2 id="the-two-videos">The two videos</h2>

<ul>
  <li><strong>The 90-second version — “LoRA: WTF Is It?”</strong> — zero CS background required:
<a href="https://www.youtube.com/watch?v=XniGimn0Eng">youtube.com/watch?v=XniGimn0Eng</a></li>
  <li><strong>The full explainer — “Building LoRA Adapters for Frontier Models”</strong> — the
deep dive:
<a href="https://www.youtube.com/watch?v=2UOfcOxyAfA">youtube.com/watch?v=2UOfcOxyAfA</a></li>
</ul>

<p>Watch those for the <em>why</em>. Read this for the <em>how</em>. The code that backs all of
it is the <strong>SELMA</strong> project — an Apache-2.0 legal-reasoning model fine-tuned with
QLoRA — and it’s public: <a href="https://github.com/CryptoJones/SELMA">github.com/CryptoJones/SELMA</a>.</p>

<h2 id="what-were-actually-building">What we’re actually building</h2>

<p>Full fine-tuning of a 70B-parameter model means updating all 70 billion weights.
That needs the model, its gradients, and optimizer state resident in VRAM at
once — comfortably 1TB+ across a node of A100s. LoRA sidesteps that: freeze every
original weight, and inject a pair of small low-rank matrices (<code class="language-plaintext highlighter-rouge">A</code> and <code class="language-plaintext highlighter-rouge">B</code>)
alongside the layers you want to adapt. Only <code class="language-plaintext highlighter-rouge">A</code> and <code class="language-plaintext highlighter-rouge">B</code> train. The update to a
weight matrix <code class="language-plaintext highlighter-rouge">W</code> is approximated as <code class="language-plaintext highlighter-rouge">W + (B·A) · (alpha/r)</code>, where <code class="language-plaintext highlighter-rouge">r</code> is the
rank and <code class="language-plaintext highlighter-rouge">alpha</code> is a scaling term.</p>

<p>QLoRA goes one step further: it loads the frozen base model in <strong>4-bit</strong> so it
fits in a fraction of the memory, then trains the LoRA matrices in higher
precision on top. That’s how a 70B model fine-tunes on a single 80GB card
instead of a cluster.</p>

<h2 id="the-base-model">The base model</h2>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Base:        meta-llama/Llama-3.3-70B-Instruct
Method:      QLoRA (4-bit NF4 + Low-Rank Adaptation)
Context:     128K tokens (native)
Quantization: NF4 double-quant via bitsandbytes, bf16 compute
</code></pre></div></div>

<p>Llama 3.3 70B is gated, so the pipeline assumes you’ve accepted the license on
HuggingFace and authenticated (<code class="language-plaintext highlighter-rouge">huggingface-cli login</code>, or <code class="language-plaintext highlighter-rouge">export HF_TOKEN=...</code>
before the run). The full rationale for picking this base — license, context
window, provenance — is in the repo’s <code class="language-plaintext highlighter-rouge">docs/MODEL_SELECTION.md</code>.</p>

<h2 id="the-qlora-config">The QLoRA config</h2>

<p>This is the heart of it. From <code class="language-plaintext highlighter-rouge">configs/training_config.yaml</code>:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">quantization</span><span class="pi">:</span>
  <span class="na">load_in_4bit</span><span class="pi">:</span> <span class="no">true</span>
  <span class="na">bnb_4bit_compute_dtype</span><span class="pi">:</span> <span class="s2">"</span><span class="s">bfloat16"</span>
  <span class="na">bnb_4bit_quant_type</span><span class="pi">:</span> <span class="s2">"</span><span class="s">nf4"</span>
  <span class="na">bnb_4bit_use_double_quant</span><span class="pi">:</span> <span class="no">true</span>

<span class="na">lora</span><span class="pi">:</span>
  <span class="na">r</span><span class="pi">:</span> <span class="m">64</span>
  <span class="na">lora_alpha</span><span class="pi">:</span> <span class="m">128</span>
  <span class="na">lora_dropout</span><span class="pi">:</span> <span class="m">0.05</span>
  <span class="na">target_modules</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">q_proj</span><span class="pi">,</span> <span class="nv">k_proj</span><span class="pi">,</span> <span class="nv">v_proj</span><span class="pi">,</span> <span class="nv">o_proj</span><span class="pi">,</span> <span class="nv">gate_proj</span><span class="pi">,</span> <span class="nv">up_proj</span><span class="pi">,</span> <span class="nv">down_proj</span><span class="pi">]</span>

<span class="na">training</span><span class="pi">:</span>
  <span class="na">num_train_epochs</span><span class="pi">:</span> <span class="m">3</span>
  <span class="na">per_device_train_batch_size</span><span class="pi">:</span> <span class="m">2</span>
  <span class="na">gradient_accumulation_steps</span><span class="pi">:</span> <span class="m">8</span>      <span class="c1"># effective batch = 16</span>
  <span class="na">learning_rate</span><span class="pi">:</span> <span class="s">2.0e-4</span>
  <span class="na">lr_scheduler_type</span><span class="pi">:</span> <span class="s2">"</span><span class="s">cosine"</span>
  <span class="na">warmup_ratio</span><span class="pi">:</span> <span class="m">0.05</span>
  <span class="na">max_seq_length</span><span class="pi">:</span> <span class="m">4096</span>
  <span class="na">gradient_checkpointing</span><span class="pi">:</span> <span class="no">true</span>
</code></pre></div></div>

<p>A few decisions worth calling out:</p>

<ul>
  <li><strong><code class="language-plaintext highlighter-rouge">r: 64</code>, <code class="language-plaintext highlighter-rouge">alpha: 128</code>.</strong> A 2:1 alpha-to-rank ratio is a common, stable
starting point. Higher rank buys more capacity (and more trainable params) at
the cost of memory and overfitting risk.</li>
  <li><strong>Target the whole transformer block, not just attention.</strong> Adapting the MLP
projections (<code class="language-plaintext highlighter-rouge">gate/up/down_proj</code>) in addition to the attention projections
(<code class="language-plaintext highlighter-rouge">q/k/v/o_proj</code>) consistently helps on knowledge-heavy fine-tunes. Attention-only
LoRA is cheaper but leaves capability on the table.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">gradient_checkpointing: true</code></strong> trades compute for memory — it recomputes
activations on the backward pass instead of storing them. On a 70B QLoRA run
it’s the difference between fitting and OOM.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">group_by_length</code></strong> batches similar-length sequences together to cut padding
waste.</li>
</ul>

<h2 id="the-version-you-can-actually-run-at-home">The version you can actually run at home</h2>

<p>The 70B run needs an A100/H100. Most people don’t have one, so there’s a second
config — <code class="language-plaintext highlighter-rouge">configs/training_config_8b.yaml</code> — that targets <strong>Llama 3.1 8B</strong> and
fits on a single 24GB consumer card:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">model</span><span class="pi">:</span>
  <span class="na">name</span><span class="pi">:</span> <span class="s2">"</span><span class="s">meta-llama/Llama-3.1-8B-Instruct"</span>
  <span class="na">attn_implementation</span><span class="pi">:</span> <span class="s2">"</span><span class="s">flash_attention_2"</span>
<span class="na">lora</span><span class="pi">:</span>
  <span class="na">r</span><span class="pi">:</span> <span class="m">32</span>
  <span class="na">lora_alpha</span><span class="pi">:</span> <span class="m">64</span>
<span class="na">training</span><span class="pi">:</span>
  <span class="na">max_seq_length</span><span class="pi">:</span> <span class="m">2048</span>      <span class="c1"># reduced from 4096 to fit in VRAM</span>
</code></pre></div></div>

<p>Rough wall-clock for the 8B run: ~2–3 hours on an RTX 4090, ~6–8 hours on a free
Colab T4 with <code class="language-plaintext highlighter-rouge">batch_size=1</code>. This is the one to start with — iterate on 8B,
then scale the recipe to 70B once it works.</p>

<h2 id="multi-state-architecture-many-small-adapters-one-frozen-giant">Multi-state architecture: many small adapters, one frozen giant</h2>

<p>This is where LoRA stops being a memory trick and starts being an architecture.
Instead of one model that tries to know all 50 states’ criminal codes (and
constantly confuses Georgia’s assault statute with California’s), SELMA trains
<strong>one adapter per jurisdiction</strong> — 50 states plus a federal baseline — all
sharing the same frozen Llama base.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>models/
├── federal/      # 18 U.S.C. — baseline for every model
├── georgia/      # federal + O.C.G.A. Title 16
├── california/   # federal + Cal. Penal Code
└── ...           # one directory per state
</code></pre></div></div>

<p>The payoff is exactly the “swap the notepad” idea from the video, made concrete:</p>

<ul>
  <li><strong>Update independence</strong> — amend Georgia’s code, retrain only the Georgia
adapter. The other 49 are untouched.</li>
  <li><strong>Deployment flexibility</strong> — an agency ships only the adapter for its
jurisdiction; the multi-gigabyte base is shared.</li>
  <li><strong>Less cross-contamination</strong> — a narrow adapter hallucinates less across
jurisdictions than one overloaded generalist.</li>
</ul>

<p>Each adapter is a few hundred MB against a ~140GB base. Fifty specialists for
the storage cost of one model plus change.</p>

<h2 id="the-data-pipeline">The data pipeline</h2>

<p>Adapters are only as good as what you feed them. SELMA’s training mix:</p>

<table>
  <thead>
    <tr>
      <th>Source</th>
      <th>What it is</th>
      <th>Size</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>U.S. Code Title 18</td>
      <td>Federal criminal statutes (USLM XML)</td>
      <td>~2,700 sections</td>
    </tr>
    <tr>
      <td>State criminal codes</td>
      <td>e.g. O.C.G.A. Title 16</td>
      <td>~500 sections each</td>
    </tr>
    <tr>
      <td>ALEA US Courts</td>
      <td>Federal filings with NOS codes</td>
      <td>491K examples</td>
    </tr>
    <tr>
      <td>LegalBench</td>
      <td>Legal-reasoning benchmark tasks</td>
      <td>91.8K examples</td>
    </tr>
    <tr>
      <td>CaseHOLD</td>
      <td>Holding classification</td>
      <td>585K examples</td>
    </tr>
    <tr>
      <td>Synthetic</td>
      <td>Generated incident→statute mappings</td>
      <td>~50K examples</td>
    </tr>
  </tbody>
</table>

<p>The flow is three scripts:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># 1. fetch raw sources (statutes auto-discovered from the current release)</span>
python scripts/data_collection/fetch_federal_statutes.py
python scripts/data_collection/generate_synthetic.py   <span class="c"># ~50K incident→charge pairs</span>

<span class="c"># 2. combine + split into instruction-tuning JSONL</span>
python scripts/training/prepare_dataset.py
<span class="c">#    -&gt; data/processed/train.jsonl + eval.jsonl</span>

<span class="c"># 3. train</span>
python scripts/training/train_qlora.py <span class="nt">--config</span> configs/training_config.yaml
</code></pre></div></div>

<p>The synthetic step is the quiet hero: hand-written statute text teaches the model
<em>what the law says</em>, but the ~50K generated incident-to-statute examples teach it
<em>how to apply the law to a fact pattern</em> — which is the actual task.</p>

<h2 id="the-training-run-and-the-merge-gotcha">The training run, and the merge gotcha</h2>

<p>On an A100-80GB, the 70B QLoRA run lands around <strong>~72GB VRAM</strong> and <strong>6–10 hours</strong>.
The one that bites people isn’t training — it’s the merge:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python scripts/training/merge_adapter.py <span class="nt">--config</span> configs/model_config.yaml
</code></pre></div></div>

<p>Merging the LoRA weights back into the base produces a standalone model you can
serve without PEFT — but it loads the full 70B in fp16 <strong>on CPU</strong>, which wants
<strong>~140GB of system RAM</strong>. GPU pods don’t have that. The fix is to skip the merge
on the training box (<code class="language-plaintext highlighter-rouge">train.sh --skip-merge</code>), upload the adapter to HuggingFace,
and merge later on a high-memory CPU instance — or just <strong>serve the adapter
unmerged</strong>, which is the whole point of LoRA anyway.</p>

<p>Deployment ends up trivial: the adapter ships to HuggingFace
(<a href="https://huggingface.co/Ronin48LLC/selma-lora-adapter">Ronin48LLC/selma-lora-adapter</a>),
a GGUF export feeds llama.cpp / LM Studio / Ollama, and <code class="language-plaintext highlighter-rouge">ollama run</code> serves it
with no Python at all.</p>

<h2 id="bonus-how-the-videos-themselves-were-built">Bonus: how the videos themselves were built</h2>

<p>Same spirit — small, scriptable, no proprietary stack. The whole pipeline is
plain Python and CLI tools:</p>

<ol>
  <li><strong>Deck</strong> — generated with <code class="language-plaintext highlighter-rouge">python-pptx</code>. A house-style palette (WCAG-AAA
contrast), Calibri, and a PIL/Noto text-fitter that measures each line so
titles never overflow after the font substitution that happens during render.</li>
  <li><strong>Render</strong> — <code class="language-plaintext highlighter-rouge">soffice --headless --convert-to pdf</code>, then
<code class="language-plaintext highlighter-rouge">pdftoppm -scale-to-x 1920 -scale-to-y 1080 -png</code> to get one 1080p frame per
slide.</li>
  <li><strong>Narration</strong> — a cloned voice via ElevenLabs TTS, one MP3 per slide, driven
by a script split on <code class="language-plaintext highlighter-rouge">[SLIDE N]</code> markers. (Rule I learned the hard way: render
one or two sample slides and approve the voice <em>before</em> paying for the full run.)</li>
  <li><strong>Assembly</strong> — <code class="language-plaintext highlighter-rouge">ffmpeg</code> stitches each still to its narration with a short
lead/tail of silence, then concatenates to a single 1920×1080 H.264/AAC file.</li>
  <li><strong>Captions</strong> — <code class="language-plaintext highlighter-rouge">faster-whisper</code> with word timestamps does forced-ish
alignment: the caption <em>wording</em> stays exactly the script’s, but the <em>timing</em>
is pulled from the real audio, so subtitles track the voice instead of drifting.</li>
</ol>

<p>That’s it. Two YouTube videos and a 70B legal model, and not one piece of the
stack is closed-source or unavailable to you. The adapters are small, the recipe
is in a YAML file, and the giant stays frozen.</p>

<hr />

<p><em>The code: <a href="https://github.com/CryptoJones/SELMA">github.com/CryptoJones/SELMA</a>
(Apache-2.0). Questions or corrections — find me where I usually am.</em></p>]]></content><author><name></name></author><summary type="html"><![CDATA[Huge thanks to Ronin 48 and Thomas Wenzke for the initial push to do this.]]></summary></entry><entry><title type="html">Narrating a Novel Locally: A Voice-Cloning Audio Pipeline</title><link href="https://cryptojones.github.io/Narrating-a-Novel-Locally-A-Voice-Cloning-Audio-Pipeline/" rel="alternate" type="text/html" title="Narrating a Novel Locally: A Voice-Cloning Audio Pipeline" /><published>2026-06-07T00:00:00+00:00</published><updated>2026-06-07T00:00:00+00:00</updated><id>https://cryptojones.github.io/Narrating-a-Novel-Locally-A-Voice-Cloning-Audio-Pipeline</id><content type="html" xml:base="https://cryptojones.github.io/Narrating-a-Novel-Locally-A-Voice-Cloning-Audio-Pipeline/"><![CDATA[<p><em>I wanted a multi-voice audiobook — a full narrator plus distinct character voices —
produced entirely on my own hardware, for free, without cloning a single living
person. This is the audio pipeline that got me there: zero-shot voice cloning on an
8GB consumer GPU, a casting system, and the one trick that matters more than any of
it — keeping the read from jumping around.</em></p>

<p>The commercial route priced itself out immediately. A cloud TTS read of an entire
novel would have run tens of thousands of credits and months of wall-clock on a
metered plan. So the whole thing runs locally on <a href="https://github.com/SWivid/F5-TTS">F5-TTS</a>,
a zero-shot voice-cloning model: give it ~10 seconds of reference audio plus the
transcript of that snippet, and it speaks new text in that voice. No fine-tuning, no
training run — it clones from the reference at inference time on an 8GB card.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">f5_tts.api</span> <span class="kn">import</span> <span class="n">F5TTS</span>
<span class="n">f5</span> <span class="o">=</span> <span class="n">F5TTS</span><span class="p">()</span>
<span class="n">f5</span><span class="p">.</span><span class="n">infer</span><span class="p">(</span>
    <span class="n">ref_file</span><span class="o">=</span><span class="n">ref_wav</span><span class="p">,</span> <span class="n">ref_text</span><span class="o">=</span><span class="n">ref_transcript</span><span class="p">,</span>   <span class="c1"># ~10s seed + its exact transcript
</span>    <span class="n">gen_text</span><span class="o">=</span><span class="n">line</span><span class="p">,</span>                                <span class="c1"># the sentence to speak
</span>    <span class="n">file_wave</span><span class="o">=</span><span class="n">out_path</span><span class="p">,</span>
    <span class="n">remove_silence</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
    <span class="n">speed</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">seed</span><span class="o">=</span><span class="mi">42</span><span class="p">,</span>                           <span class="c1"># see "consistency" below
</span><span class="p">)</span>
</code></pre></div></div>

<h2 id="the-rule-that-shaped-everything-consent">The rule that shaped everything: consent</h2>

<p>Before any tech: <strong>no cloning living, identifiable people.</strong> Not actors, not
celebrities, nobody who didn’t agree to it. Every voice in the production comes from
one of two places:</p>

<ol>
  <li><strong>Public-domain audio seeds</strong> — LibriVox / archive.org recordings old enough or
licensed to be free to use.</li>
  <li><strong>My own voice</strong>, which I own.</li>
</ol>

<p>That constraint isn’t a footnote, it drove the whole casting process. The dead-sounding
“ROM construct” character, for instance, started life as a public-domain LibriVox
chapter read by a volunteer, then got machined into something inhuman with DSP (more on
that below) — <em>not</em> lifted from any film performance. If you take one thing from this
post, let it be that you can do expressive, characterful voice work without scraping a
real person’s identity.</p>

<h2 id="consistency-beats-per-voice-perfection">Consistency beats per-voice perfection</h2>

<p>Here’s the counterintuitive lesson. The thing that makes AI narration sound <em>amateur</em>
isn’t a slightly-off character voice — it’s the audio <strong>lurching</strong> between clips:
timbre and pace drifting from one sentence to the next. Zero-shot TTS re-estimates
duration and prosody on every independent generation, so back-to-back clips wander in
speed and tone even from the same reference.</p>

<p>Fixing that took a four-part “de-jitter” recipe, and it’s worth more than any amount of
per-character polish:</p>

<ul>
  <li><strong>Pin the seed.</strong> A fixed RNG seed (<code class="language-plaintext highlighter-rouge">seed=42</code>) gives every clip the same latent
initialization, which holds timbre steady across thousands of generations.</li>
  <li><strong>Pin the speed.</strong> <code class="language-plaintext highlighter-rouge">speed=1.0</code> everywhere. No per-line pace, ever. (I tried slowing a
drugged-out character down to 0.85 — it read as a <em>jump</em>, so it got reverted.)</li>
  <li>
    <p><strong>One loudness target on every clip.</strong> Every single cue ends its filter chain with
the <em>identical</em> <code class="language-plaintext highlighter-rouge">loudnorm</code>:</p>

    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>loudnorm=I=-16:TP=-1.5:LRA=11
</code></pre></div>    </div>

    <p>No per-voice loudness shaping. The whole production sits at one level.</p>
  </li>
  <li>
    <p><strong>Short crossfades at every seam.</strong> Clips are joined with a tiny triangular
crossfade so the boundaries don’t click or lurch:</p>

    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>acrossfade=d=0.12:c1=tri:c2=tri
</code></pre></div>    </div>
  </li>
</ul>

<p>A useful gotcha: a “casting sheet” that slams five voices together back-to-back will
sound <em>jumpier</em> than the real thing, because it’s nothing but seams. A real chapter is
~80% one narrator, so it flows. Judge consistency on real material, not the stress test.</p>

<p>A second gotcha worth saving you the rabbit hole: <strong>moving the same model to the cloud
does not fix drift.</strong> It’s the same model — hosted F5 only offloads your GPU, it doesn’t
change the prosody behavior. Don’t chase consistency by changing where it runs.</p>

<h2 id="casting-one-source-of-truth">Casting: one source of truth</h2>

<p>Every speaker — the narrator, each principal, and each one-line walk-on — maps to a
voice in a single table that both the renderer and the stitcher read. A row is just:
<em>tag → (seed, speed, DSP chain)</em>.</p>

<ul>
  <li><strong>Principals</strong> get a fixed public-domain seed, chosen once and reused everywhere so a
character sounds the same in chapter 2 and chapter 20.</li>
  <li><strong>Distinctness comes from DSP, not from hunting down 30 seeds.</strong> Two characters can
share a base seed and still sound clearly different if each gets its own <em>internally
uniform</em> processing. Which leads to the timbre chains…</li>
  <li><strong>Walk-on “drop” characters</strong> (a guard, a clerk — one or two lines) get a
<strong>deterministically random</strong> voice: hash the character’s name, pick a (seed, pitch)
pair from a gender-appropriate pool, and guarantee no two walk-ons collide <em>and</em> that
none lands too close in pitch to a principal on the same seed. Everybody’s different,
nobody had to be hand-cast.</li>
</ul>

<h2 id="timbre-is-an-ffmpeg-chain">Timbre is an ffmpeg chain</h2>

<p>Once a voice’s <em>identity</em> lives in its seed, its <em>character</em> is just a deterministic
audio-filter chain — independently dialable, and applied identically to every line that
character speaks (consistency again). Pitch shifts use the resample trick (<code class="language-plaintext highlighter-rouge">asetrate</code>
to move pitch+formants, <code class="language-plaintext highlighter-rouge">atempo</code> to restore the original duration), so they sound like
a differently-sized person rather than a chipmunk artifact:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># deepen a voice ~1.25 semitones, add body</span>
<span class="nv">asetrate</span><span class="o">=</span>24000<span class="k">*</span>0.93,aresample<span class="o">=</span>24000,atempo<span class="o">=</span>1.0753,bass<span class="o">=</span><span class="nv">g</span><span class="o">=</span>2:f<span class="o">=</span>120, ... ,loudnorm<span class="o">=</span>...

<span class="c"># an AI/comms character: telephone band + a touch of bitcrush</span>
<span class="nv">highpass</span><span class="o">=</span><span class="nv">f</span><span class="o">=</span>300,lowpass<span class="o">=</span><span class="nv">f</span><span class="o">=</span>3400,acompressor<span class="o">=</span>...:ratio<span class="o">=</span>4,acrusher<span class="o">=</span><span class="nv">bits</span><span class="o">=</span>8:mode<span class="o">=</span>log:mix<span class="o">=</span>0.15,loudnorm<span class="o">=</span>...

<span class="c"># the "dead ROM construct": pitch-down, flat affect, digital grit, hollow comb echo</span>
<span class="nv">asetrate</span><span class="o">=</span>24000<span class="k">*</span>0.95,aresample<span class="o">=</span>24000,atempo<span class="o">=</span>1.0526,acompressor<span class="o">=</span>...:ratio<span class="o">=</span>4,acrusher<span class="o">=</span><span class="nv">bits</span><span class="o">=</span>7:mode<span class="o">=</span>log:mix<span class="o">=</span>0.35,aecho<span class="o">=</span>0.85:0.6:18:0.35,highpass<span class="o">=</span><span class="nv">f</span><span class="o">=</span>80,loudnorm<span class="o">=</span>...
</code></pre></div></div>

<p>Compressor flattens emotional affect, <code class="language-plaintext highlighter-rouge">acrusher</code> adds digital grit, an 18ms <code class="language-plaintext highlighter-rouge">aecho</code>
gives a hollow comb. Every chain ends in the same <code class="language-plaintext highlighter-rouge">loudnorm</code> — that’s the
non-negotiable.</p>

<h2 id="the-script-verbatim-text-then-heuristic-attribution">The script: verbatim text, then heuristic attribution</h2>

<p>The audio is only as good as the script feeding it, and there were two rules:</p>

<ol>
  <li><strong>Text is verbatim from the source file — never reconstructed from memory.</strong> An LLM
will happily “remember” a famous novel and paraphrase it. For a faithful reading
that’s poison. Every line is sliced out of the actual source text; the model never
gets to improvise the words.</li>
  <li><strong>Attribution is heuristic.</strong> To turn flat prose into tagged cues
(<code class="language-plaintext highlighter-rouge">[NARRATION]</code>, <code class="language-plaintext highlighter-rouge">[CASE]</code>, <code class="language-plaintext highlighter-rouge">[MOLLY]</code>, …) the tagger matches balanced quote pairs and
then guesses the speaker from the surrounding text: self-introductions (“my name
is…”), speech-verb patterns (<em>name + said</em>, <em>said + name</em>, <em>pronoun + verb</em> resolved
by gender), action-beat subjects, and a running 2–3 party “who’s in this scene”
tracker that resets after long stretches of narration. It lands around 80% right on
the first pass — good enough to review and nudge, and narration (over half the lines)
is 100% correct because it’s just the un-quoted remainder.</li>
</ol>

<p>Prose gets grouped a few sentences per cue; dialogue is one cue per quote. Quotes are
stripped (they’re <em>spoken</em>, not read aloud as “quote… unquote”).</p>

<h2 id="stitching-it-together">Stitching it together</h2>

<p>Per chapter the pipeline is:</p>

<ol>
  <li><strong>Render</strong> every cue to its own wav (<code class="language-plaintext highlighter-rouge">output/&lt;chapter&gt;/NNN_TAG.wav</code>). The whole loop
is <strong>resumable</strong> — it skips any clip that already exists, so a multi-hour run on a
small GPU can be killed and restarted without losing work. The model loads once and
stays resident across thousands of cues.</li>
  <li><strong>Process</strong> each cue with its character’s filter chain (ending in the shared
<code class="language-plaintext highlighter-rouge">loudnorm</code>).</li>
  <li><strong>Crossfade</strong> the processed cues into one continuous chapter track with the 0.12s
triangular fade at every seam.</li>
  <li><strong>Mux</strong> the chapter audio under its video and move on.</li>
</ol>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># resumable render: existing clips are skipped, model stays loaded</span>
python3 render_all.py            <span class="c"># all chapters</span>
python3 render_all.py ch05 ch06  <span class="c"># or just a few</span>

<span class="c"># per-chapter stitch: per-voice DSP + uniform loudnorm + crossfades</span>
bash stitch_chapter.sh ch05
</code></pre></div></div>

<p>Everything is plain Python and <code class="language-plaintext highlighter-rouge">ffmpeg</code>. No proprietary stack, no per-minute meter, no
nonconsensual voice in the building. The visual side — the animated ASCII-art
backgrounds and the lo-fi score each chapter plays over — is its own pure-Python engine
I’ve open-sourced separately as
<strong><a href="https://github.com/CryptoJones/TheFlatlineSessions">The Flatline Sessions</a></strong>.</p>

<hr />

<p><em>The takeaways, if you’re building something similar: clone only what you have the
right to clone; pin your seed, speed, and loudness so the read doesn’t lurch; let DSP
do the work of making voices distinct; and keep your script verbatim from a real source
instead of trusting a model’s memory.</em></p>]]></content><author><name></name></author><summary type="html"><![CDATA[I wanted a multi-voice audiobook — a full narrator plus distinct character voices — produced entirely on my own hardware, for free, without cloning a single living person. This is the audio pipeline that got me there: zero-shot voice cloning on an 8GB consumer GPU, a casting system, and the one trick that matters more than any of it — keeping the read from jumping around.]]></summary></entry><entry><title type="html">For Sale: A 12 GB GPU That Runs Local LLMs at ~60 tok/s</title><link href="https://cryptojones.github.io/For-Sale-RTX-3060-12GB-Local-LLM-Card/" rel="alternate" type="text/html" title="For Sale: A 12 GB GPU That Runs Local LLMs at ~60 tok/s" /><published>2026-06-05T00:00:00+00:00</published><updated>2026-06-05T00:00:00+00:00</updated><id>https://cryptojones.github.io/For-Sale-RTX-3060-12GB-Local-LLM-Card</id><content type="html" xml:base="https://cryptojones.github.io/For-Sale-RTX-3060-12GB-Local-LLM-Card/"><![CDATA[<p><em>GIGABYTE RTX 3060 EAGLE OC 12 GB — $350 shipped anywhere in CONUS.</em></p>

<p><img src="/images/rtx3060-eagle-oc.png" alt="GIGABYTE RTX 3060 EAGLE OC 12 GB specifications" /></p>

<p><em>Specs via <a href="https://www.techpowerup.com/gpu-specs/gigabyte-rtx-3060-eagle-oc.b8629">TechPowerUp’s GPU database</a>.</em></p>

<p>I’m selling a graphics card. The quick pitch: it’s a <strong>GIGABYTE RTX 3060 EAGLE OC</strong> with <strong>12 GB of VRAM</strong> — and that VRAM is the whole story for local AI.</p>

<p>Twelve gigs is the sweet spot for running modern small models entirely on your own hardware — no API keys, no per-token billing, nothing leaving the box:</p>

<ul>
  <li><strong>8B-class LLMs at ~60 tok/s.</strong> Qwen3-8B fits comfortably and generates at around 60 tokens/second — fast enough to feel interactive.</li>
  <li><strong>Runs Google’s new Gemma 4 with function calling</strong>, so you can wire it into tool-using agents locally.</li>
  <li><strong>All in <a href="https://github.com/vllm-project/vllm">vLLM</a></strong> — batched, production-grade serving on a single card.</li>
</ul>

<p>For a homelab inference box, a quiet always-on coding assistant, or just learning how local models actually behave, 12 GB of VRAM at this price is hard to beat.</p>

<p><strong>Specs at a glance</strong></p>

<table>
  <thead>
    <tr>
      <th> </th>
      <th> </th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>GPU</td>
      <td>NVIDIA GeForce RTX 3060 (GA106)</td>
    </tr>
    <tr>
      <td>Memory</td>
      <td>12 GB GDDR6</td>
    </tr>
    <tr>
      <td>Card</td>
      <td>GIGABYTE EAGLE OC, dual-fan</td>
    </tr>
    <tr>
      <td>Price</td>
      <td><strong>$350 shipped — CONUS only</strong></td>
    </tr>
  </tbody>
</table>

<p>Interested? <strong><a href="https://infosec.exchange/@CryptoJones/116699970101195679">Reply to the original post on Mastodon</a></strong> and we’ll sort out the details.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[GIGABYTE RTX 3060 EAGLE OC 12 GB — $350 shipped anywhere in CONUS.]]></summary></entry><entry><title type="html">Reach Your Whole Home Lab From Anywhere: Tailscale on a UniFi Dream Machine Pro</title><link href="https://cryptojones.github.io/Tailscale-on-a-UniFi-Dream-Machine-Pro/" rel="alternate" type="text/html" title="Reach Your Whole Home Lab From Anywhere: Tailscale on a UniFi Dream Machine Pro" /><published>2026-06-05T00:00:00+00:00</published><updated>2026-06-05T00:00:00+00:00</updated><id>https://cryptojones.github.io/Tailscale-on-a-UniFi-Dream-Machine-Pro</id><content type="html" xml:base="https://cryptojones.github.io/Tailscale-on-a-UniFi-Dream-Machine-Pro/"><![CDATA[<p><em>Turn your UDM Pro into a Tailscale subnet router so every device on your LAN is reachable from anywhere — no port forwarding, no public exposure.</em></p>

<h2 id="why-do-this-on-the-router-instead-of-each-device">Why do this on the router instead of each device?</h2>

<p>You <em>could</em> install Tailscale on every machine you want to reach. But installing it <strong>on the UDM Pro as a subnet router</strong> is the power move: one install advertises your entire LAN onto your private Tailscale network. After that, any of your Tailscale devices (laptop, phone) can talk to <em>anything</em> on your home network — a NAS, a local LLM server, a Pi, a printer — using its normal LAN IP, from anywhere in the world.</p>

<p>Crucially, <strong>nothing is exposed to the public internet.</strong> No ports are forwarded. Tailscale builds an encrypted WireGuard tunnel that only your own devices can join, and it punches through NAT automatically — so this works even if your ISP has you behind carrier-grade NAT where port forwarding is impossible.</p>

<p><strong>The catch:</strong> Tailscale isn’t official Ubiquiti software. We use the excellent community installer <a href="https://github.com/SierraSoftworks/tailscale-unifi"><code class="language-plaintext highlighter-rouge">SierraSoftworks/tailscale-unifi</code></a>, which is widely used and handles the tricky part — surviving reboots and firmware updates. You’re modifying your edge router, so go slow and read each step.</p>

<h2 id="prerequisites">Prerequisites</h2>

<ul>
  <li>A UniFi gateway running <strong>UniFi OS 2.x or later</strong> (UDM Pro, UDM SE, UDR, Cloud Gateway, etc.). <em>Not</em> supported: UniFi OS 1.x, Cloud Key Gen 1, the old USG, or BusyBox-based devices.</li>
  <li>A free <a href="https://tailscale.com">Tailscale account</a> (personal use is free for up to 100 devices).</li>
  <li>Admin access to your UniFi console and a few minutes of SSH time.</li>
</ul>

<blockquote>
  <p><strong>Worked example used throughout:</strong> my LAN is <code class="language-plaintext highlighter-rouge">192.168.1.0/24</code> and my gateway (the UDM Pro) is <code class="language-plaintext highlighter-rouge">192.168.1.1</code>. <strong>Substitute your own values</strong> — find yours under <em>UniFi Network → Settings → Networks</em>, or on a Mac with <code class="language-plaintext highlighter-rouge">route -n get default | grep gateway</code>.</p>
</blockquote>

<h2 id="step-1--enable-ssh-on-the-udm-pro">Step 1 — Enable SSH on the UDM Pro</h2>

<ol>
  <li>Open your UniFi console (e.g. <code class="language-plaintext highlighter-rouge">https://192.168.1.1</code> or <code class="language-plaintext highlighter-rouge">unifi.ui.com</code>).</li>
  <li>Go to <strong>UniFi OS → Settings → System → Advanced</strong>.</li>
  <li>Toggle <strong>SSH</strong> on, and <strong>set a strong root password</strong> (this is the password you’ll use to log in as <code class="language-plaintext highlighter-rouge">root</code>).</li>
</ol>

<h2 id="step-2--ssh-into-the-gateway">Step 2 — SSH into the gateway</h2>

<p>From your computer:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ssh root@192.168.1.1
</code></pre></div></div>

<p>Enter the SSH password you just set. You’re now on the gateway.</p>

<h2 id="step-3--install-tailscale">Step 3 — Install Tailscale</h2>

<p>Run the community installer:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl <span class="nt">-sSLq</span> https://raw.githubusercontent.com/SierraSoftworks/tailscale-unifi/main/install.sh | sh
</code></pre></div></div>

<p>This installs <code class="language-plaintext highlighter-rouge">tailscaled</code> as a systemd service under <code class="language-plaintext highlighter-rouge">/data/tailscale/</code>, and adds a boot hook in <code class="language-plaintext highlighter-rouge">/data/on_boot.d/</code> so it <strong>persists across reboots and firmware upgrades</strong> (UniFi OS otherwise wipes non-persistent storage on update).</p>

<p>Verify the install:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tailscale status
</code></pre></div></div>

<h2 id="step-4--bring-it-up-as-a-subnet-router">Step 4 — Bring it up as a subnet router</h2>

<p>This is the line that advertises your LAN. <strong>Replace the CIDR with your own subnet.</strong></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tailscale up <span class="se">\</span>
  <span class="nt">--advertise-routes</span><span class="o">=</span><span class="s2">"192.168.1.0/24"</span> <span class="se">\</span>
  <span class="nt">--snat-subnet-routes</span><span class="o">=</span><span class="nb">false</span> <span class="se">\</span>
  <span class="nt">--accept-routes</span>
</code></pre></div></div>

<p>What the flags do:</p>

<table>
  <thead>
    <tr>
      <th>Flag</th>
      <th>Purpose</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">--advertise-routes=...</code></td>
      <td>Offers your LAN subnet to your tailnet so remote devices can reach it.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">--snat-subnet-routes=false</code></td>
      <td>Preserves the <em>original</em> client IP for traffic crossing the tunnel (nicer for logging/ACLs).</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">--accept-routes</code></td>
      <td>Lets the UDM also use routes other subnet routers advertise. Optional.</td>
    </tr>
  </tbody>
</table>

<p>Want the UDM to double as an <strong>exit node</strong> (route <em>all</em> your internet traffic through home when you’re traveling)? Add <code class="language-plaintext highlighter-rouge">--advertise-exit-node</code>.</p>

<p>When you run this, Tailscale prints a login URL. Open it in a browser and authenticate to attach the UDM to your tailnet.</p>

<blockquote>
  <p>Subnet routing needs IP forwarding, which is <strong>already enabled by default</strong> on UniFi OS gateways — no extra sysctl tweaks required.</p>
</blockquote>

<h2 id="step-5--approve-the-device-and-routes-in-the-admin-console">Step 5 — Approve the device and routes in the admin console</h2>

<p>The advertised routes don’t go live until you approve them:</p>

<ol>
  <li>Open the <a href="https://login.tailscale.com/admin/machines">Tailscale admin console → <strong>Machines</strong></a>.</li>
  <li>Find your UDM Pro in the list.</li>
  <li>Click it → <strong>Edit route settings</strong> → <strong>enable</strong> the subnet route(s) you advertised (and the exit node, if you added it).</li>
  <li><strong>Strongly recommended:</strong> also <strong>disable key expiry</strong> for the UDM. Otherwise the node’s auth key expires (default ~180 days) and your remote access silently dies until you re-authenticate on the router.</li>
</ol>

<h2 id="step-6--test-it">Step 6 — Test it</h2>

<p>Install Tailscale on a phone or laptop, sign in with the <strong>same account</strong>, then — from a coffee shop, on cellular, anywhere — hit a device on your home LAN by its normal IP. For example, a local web service on your LAN:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>curl http://192.168.1.50:8080/
</code></pre></div></div>

<p>If that responds from off-network, you’re done. 🎉</p>

<h2 id="day-2-operations">Day-2 operations</h2>

<p><strong>Update Tailscale:</strong></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/data/tailscale/manage.sh update
</code></pre></div></div>

<p><strong>Restart the service:</strong></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>systemctl restart tailscaled
</code></pre></div></div>

<p><strong>Check status / your tailnet IP:</strong></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tailscale status
tailscale ip <span class="nt">-4</span>
</code></pre></div></div>

<p><strong>Uninstall cleanly:</strong></p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/data/tailscale/manage.sh uninstall
</code></pre></div></div>

<h2 id="troubleshooting">Troubleshooting</h2>

<ul>
  <li><strong>Routes not working from remote devices?</strong> Re-check Step 5 — unapproved subnet routes are the #1 cause. The route must show as enabled in the admin console.</li>
  <li><strong>Access died after a few months?</strong> Key expiry. Disable it on the UDM node (Step 5) and run <code class="language-plaintext highlighter-rouge">tailscale up ...</code> again to re-auth.</li>
  <li><strong>Gone after a firmware update?</strong> The installer’s boot hook normally handles this; if not, re-run the Step 3 install command — it’s idempotent.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">tailscale: command not found</code> after SSH?</strong> Use the full path <code class="language-plaintext highlighter-rouge">/data/tailscale/tailscale</code>, or log out and back in to pick up the PATH.</li>
</ul>

<h2 id="security-notes">Security notes</h2>

<ul>
  <li>Tailscale exposes your LAN <strong>only to devices in your own tailnet</strong> — it is not a public hole in your firewall. But that also means <em>every</em> device you connect from must run Tailscale and be signed into your account.</li>
  <li>For finer control (e.g. “my laptop can reach the LLM server but not the whole LAN”), use <a href="https://tailscale.com/kb/1018/acls">Tailscale ACLs</a> to scope what each device may reach.</li>
  <li>This is third-party software on your edge router. It’s well-maintained and popular, but it’s not Ubiquiti-supported — keep that in mind for a device your whole network depends on.</li>
</ul>

<hr />

<p><em>Installer credit: <a href="https://github.com/SierraSoftworks/tailscale-unifi">SierraSoftworks/tailscale-unifi</a>.</em></p>]]></content><author><name></name></author><summary type="html"><![CDATA[Turn your UDM Pro into a Tailscale subnet router so every device on your LAN is reachable from anywhere — no port forwarding, no public exposure.]]></summary></entry><entry><title type="html">Forgot Domain Admin Password Windows Server 2022</title><link href="https://cryptojones.github.io/Forgot-Domain-Admin-Password-Windows-Server-2022/" rel="alternate" type="text/html" title="Forgot Domain Admin Password Windows Server 2022" /><published>2024-08-22T00:00:00+00:00</published><updated>2024-08-22T00:00:00+00:00</updated><id>https://cryptojones.github.io/Forgot-Domain-Admin-Password-Windows-Server-2022</id><content type="html" xml:base="https://cryptojones.github.io/Forgot-Domain-Admin-Password-Windows-Server-2022/"><![CDATA[<p>##Forgot Domain Admin Password Windows Server 2022</p>

<p>–Boot to recovery mode with ISO</p>

<p>– Click Next, Then “Repair my computer”</p>

<p>– Click Troubleshoot, then click the Command Prompt option</p>

<p>– list disks by opening diskpart, then “list volume”</p>

<p>–run “select volume 1” (assuming 1 is your main hdd)</p>

<p>–run “assign letter C”</p>

<p>–run “exit” to leave diskpart</p>

<p>–run “C:” to switch to your main hard drive</p>

<p>–run “cd \Winddows\System32” in order to change directories</p>

<p>–run “rename Utilman.exe Utilman.bak” to backup the utility</p>

<p>–run “copy cmd.exe utilman.exe”</p>

<p>–run “D:”</p>

<p>–run “bcdedit /set {bootmgr} timeout 15”</p>

<p>–run “bcdedit /set {bootmgr} displaybootmenu yes”</p>

<p>–exit and reboot computer after removing the iso</p>

<p>–select windows in the boot menu then hit enter</p>

<p>–go to login but instead of typing a username and password click the accesibility icon to get a command prompt</p>

<p>–in the command prompt run “net user” to list the accounts</p>

<p>–to change the password on one of the accounts type “net user username password /domain”</p>]]></content><author><name></name></author><summary type="html"><![CDATA[##Forgot Domain Admin Password Windows Server 2022]]></summary></entry><entry><title type="html">You’re up and running!</title><link href="https://cryptojones.github.io/Hello-World/" rel="alternate" type="text/html" title="You’re up and running!" /><published>2014-03-03T00:00:00+00:00</published><updated>2014-03-03T00:00:00+00:00</updated><id>https://cryptojones.github.io/Hello-World</id><content type="html" xml:base="https://cryptojones.github.io/Hello-World/"><![CDATA[<p>Next you can update your site name, avatar and other options using the _config.yml file in the root of your repository (shown below).</p>

<p><img src="/images/config.png" alt="_config.yml" /></p>

<p>The easiest way to make your first post is to edit this one. Go into /_posts/ and update the Hello World markdown file. For more instructions head over to the <a href="https://github.com/barryclark/jekyll-now">Jekyll Now repository</a> on GitHub.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Next you can update your site name, avatar and other options using the _config.yml file in the root of your repository (shown below).]]></summary></entry></feed>