CryptoJones

XSpaceWar-AI: We Rebuilt Spacewar! With Real Gravity — and It’s Free

2026-06-14T00:00:00+00:00

Sixty-four years ago a couple of MIT hackers wired two spaceships and a star into a PDP-1 and invented the multiplayer video game. We took that idea, gave it honest Newtonian physics, generated every pixel and every sound from math, taught the bots to be genuinely mean, and put 16 ships in the same arena. It’s done, it runs on Windows, macOS, and Linux, it’s Apache-2.0 open source, and it costs nothing.

👉 Play it now on itch.io — or grab a build straight from GitHub Releases.

🎮 itch.io: cryptojones.itch.io/xspacewar-ai
🌐 Landing page: cryptojones.github.io/XSpaceWar-AI
⬇️ Direct downloads (Win · macOS · Linux): GitHub Releases
🛠️ Source (Apache-2.0): github.com/CryptoJones/XSpaceWar-AI

The first networked match said everything

Dedicated server on one machine, two pilots joining from two more. The human — barely moving, just letting the star do the work — beat the AI 3 to −19, while the bot fell into the gravity well fifty-nine times. Newtonian gravity is undefeated.

That score isn’t a difficulty setting. It’s physics. The star pulls with pure G·M / r² — no caps, no clamps, no softening. Fly too close and the pull genuinely diverges; you cannot thrust your way out of a star you’ve already fallen into. That’s not a bug, it’s a star.

What it actually is

A modern, top-down space-fighter with a heavy AI focus — a 2026 reimagining of the classic networked Spacewar! / xspacewar. Bird’s-eye view like the 1962 original, but built in Godot 4 with:

Honest Newtonian flight. Conserved momentum, inverse-square gravity on ships and torpedoes, gravity-assist slingshots, hyperspace. No drag — to slow down you flip 180° and burn against your own velocity, the same maneuver Spacewar! pilots have flown since ‘62.
100% procedural everything. Stars, planets, moons, asteroid fields, nebulae, ships, VFX — and audio. There’s not a single hand-drawn art asset or recorded sound file in the game; every sound is synthesized from math at load. Every match is a fresh seeded arena.
Up to 16 ships, four game modes: Free-for-all, Team battle, AI bots (Rookie → Insane), and Movie Mode — an all-AI attract reel that regenerates a brand-new arena, roster, and teams every 30 minutes. Leave it running on a second monitor; it never plays the same match twice.
Multiplayer that works today. Host-authoritative netcode with client-side prediction and reconciliation, automatic LAN discovery, direct IP, and platform-agnostic internet play through the project’s own relay/master server — room codes, an online server browser, no port forwarding and no platform account required.
Seven UI languages, including right-to-left Arabic.

Standing on 64 years of shoulders

XSpaceWar-AI is a clean-room implementation, but it knows exactly where it comes from: Spacewar! (MIT, 1962, on the DEC PDP-1), the 1974 PDP-11 networked port, and Ron Frederick’s 1992 X11 xspacewar 1.2 — an early serverless networked game where peers shared state directly. Huge thanks to Ron (github.com/ronf), whose 1992 networked Spacewar is this game’s direct ancestor.

Get in a ship

The whole thing is free and open source under Apache-2.0. No launcher, no account, no telemetry, ~100 MB on disk.

Easiest: play in the browser or download from itch.io.
Prebuilt binaries: the zip for your platform on GitHub Releases (Windows, macOS universal, Linux x86_64).
From source: clone the repo, run ./install.sh (it fetches and SHA-512-verifies Godot 4.6.3 for you), then godot --path ..
Steam: coming soon.

macOS note: the build is ad-hoc signed but not notarized, so Gatekeeper may claim it’s “damaged.” It isn’t — right-click → Open, or xattr -dr com.apple.quarantine /path/to/XSpaceWar-AI.app. Full instructions are on the releases page.

Bring friends. Pick HOST — LAN skirmish, share a 4-letter room code over the internet, or just turn on Movie Mode and watch the bots feed themselves to the star. The gravity is real, the maps are infinite, and the source is yours.

🚀 Play XSpaceWar-AI →

Apache-2.0 · GitHub · Codeberg mirror. Dedicated to JVL. Proudly Made in Nebraska. 🌽

From a Memory Aid to Multi-Tenant: The .NET Decision That Exploded the Scope

2026-06-08T00:00:00+00:00

ApplyTrack started life as a glorified memory aid — a folder of Markdown files so I’d stop forgetting where I’d applied and when to follow up. It is now an open-source, multi-tenant, self-hostable application with magic-link auth, a two-runtime backend, and a CI pipeline that ships container images. This post is about the single design decision that turned the first thing into the second one — moving the core API off pure Python and onto .NET — and the avalanche of scope that came with it.

It’s also, secretly, a post about systems design. I’ve been told the thing that separates a senior or staff engineer from a strong mid-level one isn’t knowing more frameworks — it’s reasoning about trade-offs, invariants, coupling, and failure modes before you write a line of code. So as I walk through this rewrite I’m going to narrate the systems-design thinking out loud: the concepts I reached for at each fork, and why. The project was, honestly, an excuse to practice exactly that muscle.

What it actually was at the start

The honest origin story: I kept losing track of job applications. Did I already apply to that company? When was I supposed to follow up on the screen? What salary did the post say before they took it down?

So I built the dumbest possible thing that solved it. Every application was a Markdown file with some YAML frontmatter:

---
company: Acme Corp
role: Senior Platform Engineer
lane: applied
status: screen
link: https://example.com/jobs/123
salary: "$180k–$210k"
applied: 2026-05-01
followup: 2026-05-08
score: 87
---

Recruiter was responsive. Take-home is a Postgres schema design.

The whole “database” was a directory of those files. State lived in three JSON files next to it — .criteria.json for my search keywords, .blacklist.json for companies I never wanted to see again, .seen.json so the discovery poller wouldn’t show me the same listing twice. A thin FastAPI app put a vanilla-JS single-page app in front of it, and a Python poller scraped a handful of public job boards and dropped fresh matches in as new files.

That’s it. Single user. Files on disk. No auth worth the name. It was a memory enhancement utility, and for one person it was perfect. The “schema” was a slug naming convention. The “migration story” was git add.

The deceptively small product decision

Then I decided to open-source it and make it something other people could self-host for their own job search. On the surface that sounds like a packaging problem — write a README, add a Dockerfile, push to GitHub. Ship it.

It is not a packaging problem. The moment “one person” becomes “many people on one deployment,” every cozy assumption the file-based design rested on falls over at once:

A folder of Markdown files has no concept of whose files they are.
Three shared JSON config files can’t hold one search profile per user.
“No auth” stops being charmingly minimal and starts being a data breach.
A .seen.json that dedupes for me will happily hide your leads behind mine.

This is the first place systems-design thinking earns its keep. Multi-tenancy isn’t a feature you bolt on the side — it’s a system invariant: a property that has to hold on every single read and write in the system, in both runtimes, forever. Invariants are the things you design the system around, not checks you remember to sprinkle in. And the test of a design is whether it can even express the invariant you need. The file-on-disk model couldn’t express “this row belongs to that user” at all — there was no place for the constraint to live. That’s the tell that you’ve outgrown a data model: not that it’s slow, but that the property you now need is inexpressible in it.

The fork in the road: stay Python, or move the core to .NET

Here’s where the real decision was. The path of least resistance was obvious: keep it all Python. FastAPI was already there. Reach for SQLAlchemy, bolt on an auth library, add a tenant_id foreign key, and grind it out in the language the whole thing was already written in. One runtime, one mental model, no context-switching.

I didn’t do that. I moved the core API to .NET 10 — ASP.NET Core Minimal APIs on Kestrel, Dapper + Npgsql over Postgres, DbUp for migrations — and kept Python only for the discovery poller.

The reasons that won the argument were all systems-design arguments, not language preferences:

The auth + session + concurrency surface is exactly what this stack is built for. Multi-tenant CRUD with optimistic locking, server-side sessions, and a hard request-scoped security boundary is bread-and-butter ASP.NET Core. The design principle here is don’t hand-roll your load-bearing primitives — identity, sessions, and concurrency control are exactly the parts where a framework’s well-trodden spine beats a bespoke assembly of libraries, because the failure mode of getting them subtly wrong is “data breach,” not “bug.”
Hand-written SQL over a typed data layer. Dapper maps SQL straight to records with no ORM mystery. This is a legibility trade-off: for a schema that two different runtimes have to agree on byte-for-byte, I wanted the SQL to be the explicit, reviewable contract, not an abstraction generating queries I’d have to reverse-engineer. An ORM optimizes for developer convenience; I was optimizing for the boundary being inspectable.
I wanted the invariant to be structurally enforced, not remembered. A compiled, statically-typed API layer lets me make “every query filters tenant_id” a property of the type system and the DI graph rather than a rule in a code-review checklist. The best way to enforce an invariant is to make violating it unrepresentable — and a typed boundary gets you closer to that than a dynamically-typed one.

I knew, choosing it, that I was trading a weekend of Python grinding for something much bigger. I underestimated by how much — but that’s the other half of staff-level thinking: the cheap option and the right option are often different options, and the skill is knowing when the goal has changed enough that the cheap one is now the expensive one.

The avalanche

Picking .NET didn’t just change the language. It detonated the scope, because once you commit to a real API you have to actually build all the things the file-based toy got to skip:

A real schema, and migrations to evolve it. The folder-of-files became nine DbUp migrations — applications, search_profiles, blacklist, users, magic_tokens, sessions, seen, poll_requests, and the cascade wiring. Idempotent .sql scripts that run on startup. The slug naming convention became a UNIQUE (tenant_id, name) constraint and a validation choke-point.

Real authentication. I built passwordless magic-link sign-in: request a link, get a single-use token (only its SHA-256 is stored, 15-minute TTL), verify it, and mint an opaque server-side session — deliberately not a JWT, so logout is instant revocation, not “wait for the token to expire.” The endpoint that requests a link always returns 200 whether or not the account exists, so it can’t be used to enumerate who has signed up.

A tenancy choke-point. This is the heart of the whole rewrite, and it’s the single design pattern I’m proudest of. The systems-design idea is funnel the dangerous decision through exactly one place. One middleware resolves the session cookie to a tenant, and it is the only place a tenant_id enters the system. Endpoints are handed a repository from DI that’s already scoped to the caller — endpoint code physically cannot query another tenant’s rows, because it never sees a tenant id to get wrong. That’s the difference between a security control and a security invariant: instead of N endpoints each remembering to filter correctly (N chances to leak), there’s one choke-point and N endpoints that structurally can’t. You shrink the attack surface to a single auditable function:

// every read and write carries the tenant, unconditionally
UPDATE applications
   SET ..., version = version + 1
 WHERE id = @id AND tenant_id = @tenantId AND version = @expectedVersion;
// 0 rows affected -> 409 Conflict

Optimistic concurrency. The old design’s “version” was a file’s mtime and size. On a multi-user database that’s meaningless, so every application row got a real version column. Writes pass ?expected_version=, a mismatch answers 409 Conflict, and the SPA’s overwrite-confirm flow drives off that — two open tabs can’t silently clobber each other. The design decision underneath is optimistic vs. pessimistic locking: I bet that write-write conflicts are rare (two people rarely edit the same application row at the same instant), so I don’t pay the cost of locking rows on every read — I just detect the rare collision and make the client resolve it. Choosing the concurrency-control strategy that matches your actual contention profile, rather than reflexively reaching for locks, is a very systems-design call.

Security designed in, not bolted on — graded against 20 years of OWASP. Once strangers trust the thing with their data, “I’ll secure it later” is a design smell, so I audited the app against the OWASP Top 10 — and not just the current list. I ran it against two decades of the Top 10, every edition back to the original 2007 list, because the categories that rotate off the list don’t stop being exploitable; they just stop being fashionable. (CSRF dropped off years ago, but a multi-tab session app still has to answer for it.) The threat model is the union of all of them, not the latest snapshot. What came out of that audit: a strict CSP and a security-header middleware, per-IP rate limits, an SSRF-hardened link probe (because a server that fetches user-supplied URLs is a confused-deputy waiting to happen), DOMPurify on rendered notes to kill stored XSS, generic 500s that don’t leak internals, the no-account-enumeration auth surface, and a dependency-audit CI job so a vulnerable transitive package fails the build instead of shipping.

And then everything else that comes after “it works on my machine”: a three-service docker compose up, account export (a zip of your Markdown — your data stays yours) and one-call account deletion via ON DELETE CASCADE, and a tag-driven release pipeline that publishes both runtimes as container images. None of that existed in the memory-aid version. None of it is optional once strangers trust the thing with their data.

The compromise that kept it sane: polyglot, with the schema as the contract

The one thing I refused to do was rewrite the poller. It already had eight source fetchers, the HTML scraping, and the scoring/dedup logic — all in working, tested Python. Re-implementing that in C# would have been throwing away the part that already worked to satisfy a purity I didn’t care about.

So ApplyTrack is deliberately polyglot: a .NET API and a Python poller that never call each other. They share exactly one thing — the Postgres schema — and that schema is the contract between them.

            ┌─────────────────────────────────┐
Browser ──► │ ASP.NET Core (.NET 10, Kestrel)  │
 (the SPA)  │  • serves the SPA + JSON API     │──┐
            │  • magic-link auth + sessions    │  │
            │  • CRUD + criteria + blacklist   │  │
            └─────────────────────────────────┘  ├──► Postgres  (shared schema
            ┌─────────────────────────────────┐  │              = the contract)
 Cron  ───► │ Python poller                    │──┘
            │  • fetch + score + dedupe leads  │
            │  • drain the on-demand poll queue│
            └─────────────────────────────────┘

The systems-design backbone here is clear data ownership. Every table has exactly one writer-of-record: .NET owns auth, sessions, and CRUD, and it owns the migrations; Python writes new leads and reads profiles, the seen-ledger, and the active-user list. Ambiguous ownership (“either service might write this”) is how you get races and corruption that no amount of locking saves you from, so I made ownership explicit and one-directional. Both runtimes unconditionally filter WHERE tenant_id — the cron worker doesn’t get to bypass the choke-point just because it’s a background job; it builds a tenant-scoped reader inside its per-tenant loop. (The invariant doesn’t get a day off because the request didn’t come from a browser.)

Two more design choices in that diagram are doing quiet work:

Decoupling via a queue. The “Poll now” button doesn’t shell out to Python from C# — synchronous cross-runtime calls would couple their uptime and latency together. Instead it drops a row in a poll_requests queue that the worker drains on its own schedule. That’s the classic move: turn a temporal coupling into an asynchronous one. The API can answer “queued” in milliseconds even if the poller is mid-run or restarting; neither runtime blocks on the other.
Failure isolation / blast radius. The worker processes tenants in a loop where one tenant’s failure is caught and can’t abort the others. When you have a shared background job, the design question is always “what’s the blast radius of one bad input?” — and the answer here is “one tenant’s poll, not everyone’s.”

This contract is the part I’d defend hardest in an interview. The temptation when you go polyglot is to have the two halves talk over an internal HTTP API — and then you’ve signed up for versioning, retries, authentication between your own services, and a second contract to keep in sync. That’s accidental coupling dressed up as architecture. Making the database the single source of truth and the schema the contract meant there was exactly one thing to keep honest, with no network partition to reason about between the halves and a schema-shape test on each side to guard against drift. Fewer moving parts that can disagree is, almost always, the better system.

Was the .NET decision worth it?

Yes — but I want to be honest about what “yes” cost.

If the goal had stayed “help me remember my applications,” choosing .NET would have been malpractice. The Markdown-files version was better at that job: zero infrastructure, grep-able, git-versioned, done in an afternoon. The whole multi-tenant edifice would have been a monument to a problem I didn’t have.

The decision was right because the goal changed. The instant the target became “many people, one deployment, self-hostable, don’t leak anyone’s data,” I needed a real security boundary, real sessions, real concurrency control, and a schema that could evolve under live data. Those are precisely the problems .NET’s spine is shaped to hold, and leaning on it — instead of hand-assembling the same guards out of Python libraries — is why the tenancy story is one middleware and a scoped repo instead of a discipline I have to re-prove in every endpoint.

The lesson I’m taking with me is a systems-design one: the scope explosion wasn’t caused by the technology choice — it was revealed by it. The multi-tenant complexity was an inherent property of the problem the moment I went from one user to many; pure Python wouldn’t have made it smaller, just quieter and easier to get subtly wrong. Picking the stack that makes the invariant loud forced me to actually build the invariant. Going from a memory aid to a real application was never going to be a packaging task. It was a different application wearing the old one’s name.

And that, I think, is the actual content of “be good at systems design” — the thing I keep hearing is the gate to senior and staff roles. It isn’t memorizing patterns. It’s the habit of, at every fork, asking the questions this rewrite kept forcing on me: What invariant must hold, everywhere, forever? Where does the dangerous decision get funneled so it’s auditable in one place? Who owns this data, and is that ownership unambiguous? What’s coupled to what, and can I trade a synchronous dependency for an asynchronous one? What’s the blast radius when this fails — because it will fail? The stack I picked didn’t make me ask those questions. It just made it impossible to skip them. Honestly, that’s why I built the thing the hard way: a side project is the cheapest place there is to practice the expensive kind of thinking.

The part I deliberately left out: a local frontier model

There’s a feature I cut from v1 on purpose — an AI engine for drafting tailored cover letters and application materials. It was the heaviest thing to build and the scariest data to handle, so the disciplined call was to ship the tracker without it and leave a clean seam for later. (Knowing what not to build yet is its own systems-design skill; scope is a design decision.) But I keep turning over what that seam could become, and the answer that excites me isn’t “call an API.” It’s: run a powerful frontier model locally, right next to the data.

Think about the shape of this app. It’s self-hosted, no telemetry, your data stays on your box — that’s the whole pitch. The instant you bolt on a hosted LLM, you’ve quietly broken that promise: now your entire job search, your résumé, every note you wrote about a recruiter, is flowing to a third party’s servers to be logged and maybe trained on. A local model is the only addition that doesn’t violate the thing that makes ApplyTrack worth self-hosting in the first place. Data locality isn’t just a performance idea — here it’s the privacy architecture. The model comes to the data; the data never leaves.

And once a capable model is sitting in the same trust boundary as the database, a lot of the app’s rough edges turn into soft ones:

Discovery gets a brain. Today the poller scores leads with keyword matching — blunt, full of false positives. A local model does semantic relevance: it actually reads the posting against your history and your real preferences, not just your keyword list. It can extract clean structured fields out of the messy free-text every board formats differently, and explain why a lead scored the way it did.
The materials engine, finally. Draft a cover letter grounded in the specific posting and your own past applications — retrieval over your history, not a generic template. Tailor a résumé summary per role. All of it on hardware you control, with your writing never leaving the building.
A research assistant over your own funnel. “What stage is everything in, and what’s gone cold?” “Summarize every interview note for this company before my onsite.” “Which of these three offers fits what I said I wanted?” That’s RAG over data the app already owns.

Here’s the systems-design payoff, and the reason I’m dwelling on it: the architecture I already built is exactly the shape that makes this a clean add. The polyglot, schema-as-contract design means a model runtime is just another worker against the same database, behind the same tenant choke-point, owning its own writes — no different in kind from the Python poller. I wouldn’t be retrofitting AI into a monolith; I’d be adding a third lane to a system that was already designed as decoupled lanes sharing one contract. The decoupling I did for boring reasons (keep the Python fetchers, don’t couple the runtimes) turns out to be the thing that makes the interesting future cheap. Good boundaries pay you back in directions you didn’t predict when you drew them — which is, maybe, the whole argument for caring about them.

Where this falls over at enterprise scale

I want to close on the most senior-engineer move there is: being honest about the limits of your own design. ApplyTrack is built for self-hosting and small multi-tenant deployments — dozens, maybe low hundreds of users on one box. The architecture is deliberately right-sized for that. If someone asked me to take it to “enterprise” — thousands of orgs, millions of applications, an SLA — here’s where I already know it would break, roughly in the order the cracks would show:

The single Postgres is the first ceiling. Every layer leans on one database: CRUD, the session lookup on every authenticated request, the poll queue, and the poller’s writes all land on the same instance. That’s a single point of failure and a single point of contention. The first work is the boring, load-bearing stuff: a primary with read replicas, a connection pooler (PgBouncer) so thousands of clients don’t exhaust backends, and moving the hot-path session check off Postgres into a cache like Redis. Server-side sessions bought me instant revocation; at scale that convenience becomes a per-request database read I’d have to buy back with a cache + invalidation.
Pooled (shared-schema) tenancy hits a wall — both technically and on compliance. Row-level tenant_id filtering is the right call for hundreds of tenants, but it has a noisy-neighbor problem (one tenant’s heavy queries degrade everyone on the shared tables) and a trust problem (enterprise buyers, SOC 2, data-residency rules often demand physical isolation, not “we promise the WHERE clause is always there”). The escape path is a tenancy spectrum: Postgres row-level security as belt-and-suspenders, then schema-per-tenant, then database-per-tenant or sharding by tenant for the big customers. None of that is free — it turns one migration into N, and the choke-point has to learn to route.
The poller doesn’t scale horizontally — it’s one cron worker in a for loop. Fetch-once-per-run is a clever rate-limit dodge at small scale, but the scoring pass is O(tenants × listings) on a single process, and a slow source stalls the whole sweep. Enterprise needs the work fanned across many workers (tenant sharding with leader election or partition assignment, so two workers never double-process a tenant), a real broker (SQS/Redis/Kafka) instead of a database-polling poll_requests table, per-source rate-limit budgets, and backpressure so a board outage doesn’t cascade. Scraping public boards at all is itself brittle at volume — IP bans, CAPTCHAs, ToS exposure — so at scale you’d push toward official APIs and a shared caching layer in front of every source.
The schema-as-contract that saved me small becomes a coupling tax large. Two runtimes sharing one database is the perfect amount of architecture for two components; the shared-database integration pattern is a known anti-pattern the moment you have several teams and services, because it couples their deploys and their schema evolution. Past a certain size you have to break the shared DB into service-owned stores that talk over events or APIs — which reintroduces exactly the versioning-and-contracts cost I was so happy to avoid. That’s not a mistake in the current design; it’s the trade-off having an expiry date.
The identity model assumes one human per tenant. Today tenant_id == user.id — fine for individuals self-hosting. Enterprise means organizations: many users per tenant, teams, roles and RBAC, SSO/SAML/OIDC and SCIM provisioning instead of magic links, and an audit log of who-did-what for compliance. The schema was named to future-proof orgs, but actually building them is a real project on its own.
And everything operational I got to ignore at one box. Single-node Docker Compose has no HA, no rolling deploys, no autoscaling — that’s Kubernetes or a managed platform, multiple stateless API replicas behind a load balancer (easy, since sessions live in the data tier) and a poller that’s safe to run more than one of (hard). Plus the things multi-tenant SaaS lives or dies on: per-tenant metering and quotas so one tenant can’t starve the poller, distributed tracing and structured logs, alerting, and an unbounded-growth story (partitioning and archival for applications and the seen ledger) before the tables get slow.

The through-line: almost none of these are bugs. They’re trade-offs I made on purpose for the scale I’m actually at, each with a known escape hatch for the day the scale changes. That’s the real deliverable of systems-design thinking — not a system that’s ready for a million users, but one where you can name, precisely, what would have to change to get there, and why you correctly chose not to build it yet.

ApplyTrack is open-source (Apache-2.0) and self-hostable — github.com/CryptoJones/OSApplyTrack. One docker compose up brings up the database, the API, and the poller. Your data is yours: one-click Markdown export, one-call account deletion, no telemetry, no SaaS.

LoRA, For Real: The Tech Stack Behind the Videos

2026-06-07T00:00:00+00:00

Huge thanks to Ronin 48 and Thomas Wenzke for the initial push to do this.

I made two videos about LoRA. One is 90 seconds for people who have never written a line of code; the other is a 13-minute walk through building adapters for frontier models. This post is neither of those. The videos use analogies — “a tiny notepad clipped onto a frozen brain.” This post is the actual stack: the config files, the quantization scheme, the data pipeline, and the GPU bill.

The two videos

The 90-second version — “LoRA: WTF Is It?” — zero CS background required: youtube.com/watch?v=XniGimn0Eng
The full explainer — “Building LoRA Adapters for Frontier Models” — the deep dive: youtube.com/watch?v=2UOfcOxyAfA

Watch those for the why. Read this for the how. The code that backs all of it is the SELMA project — an Apache-2.0 legal-reasoning model fine-tuned with QLoRA — and it’s public: github.com/CryptoJones/SELMA.

What we’re actually building

Full fine-tuning of a 70B-parameter model means updating all 70 billion weights. That needs the model, its gradients, and optimizer state resident in VRAM at once — comfortably 1TB+ across a node of A100s. LoRA sidesteps that: freeze every original weight, and inject a pair of small low-rank matrices (A and B) alongside the layers you want to adapt. Only A and B train. The update to a weight matrix W is approximated as W + (B·A) · (alpha/r), where r is the rank and alpha is a scaling term.

QLoRA goes one step further: it loads the frozen base model in 4-bit so it fits in a fraction of the memory, then trains the LoRA matrices in higher precision on top. That’s how a 70B model fine-tunes on a single 80GB card instead of a cluster.

The base model

Base:        meta-llama/Llama-3.3-70B-Instruct
Method:      QLoRA (4-bit NF4 + Low-Rank Adaptation)
Context:     128K tokens (native)
Quantization: NF4 double-quant via bitsandbytes, bf16 compute

Llama 3.3 70B is gated, so the pipeline assumes you’ve accepted the license on HuggingFace and authenticated (huggingface-cli login, or export HF_TOKEN=... before the run). The full rationale for picking this base — license, context window, provenance — is in the repo’s docs/MODEL_SELECTION.md.

The QLoRA config

This is the heart of it. From configs/training_config.yaml:

quantization:
  load_in_4bit: true
  bnb_4bit_compute_dtype: "bfloat16"
  bnb_4bit_quant_type: "nf4"
  bnb_4bit_use_double_quant: true

lora:
  r: 64
  lora_alpha: 128
  lora_dropout: 0.05
  target_modules: [q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj]

training:
  num_train_epochs: 3
  per_device_train_batch_size: 2
  gradient_accumulation_steps: 8      # effective batch = 16
  learning_rate: 2.0e-4
  lr_scheduler_type: "cosine"
  warmup_ratio: 0.05
  max_seq_length: 4096
  gradient_checkpointing: true

A few decisions worth calling out:

r: 64, alpha: 128. A 2:1 alpha-to-rank ratio is a common, stable starting point. Higher rank buys more capacity (and more trainable params) at the cost of memory and overfitting risk.
Target the whole transformer block, not just attention. Adapting the MLP projections (gate/up/down_proj) in addition to the attention projections (q/k/v/o_proj) consistently helps on knowledge-heavy fine-tunes. Attention-only LoRA is cheaper but leaves capability on the table.
gradient_checkpointing: true trades compute for memory — it recomputes activations on the backward pass instead of storing them. On a 70B QLoRA run it’s the difference between fitting and OOM.
group_by_length batches similar-length sequences together to cut padding waste.

The version you can actually run at home

The 70B run needs an A100/H100. Most people don’t have one, so there’s a second config — configs/training_config_8b.yaml — that targets Llama 3.1 8B and fits on a single 24GB consumer card:

model:
  name: "meta-llama/Llama-3.1-8B-Instruct"
  attn_implementation: "flash_attention_2"
lora:
  r: 32
  lora_alpha: 64
training:
  max_seq_length: 2048      # reduced from 4096 to fit in VRAM

Rough wall-clock for the 8B run: ~2–3 hours on an RTX 4090, ~6–8 hours on a free Colab T4 with batch_size=1. This is the one to start with — iterate on 8B, then scale the recipe to 70B once it works.

Multi-state architecture: many small adapters, one frozen giant

This is where LoRA stops being a memory trick and starts being an architecture. Instead of one model that tries to know all 50 states’ criminal codes (and constantly confuses Georgia’s assault statute with California’s), SELMA trains one adapter per jurisdiction — 50 states plus a federal baseline — all sharing the same frozen Llama base.

models/
├── federal/      # 18 U.S.C. — baseline for every model
├── georgia/      # federal + O.C.G.A. Title 16
├── california/   # federal + Cal. Penal Code
└── ...           # one directory per state

The payoff is exactly the “swap the notepad” idea from the video, made concrete:

Update independence — amend Georgia’s code, retrain only the Georgia adapter. The other 49 are untouched.
Deployment flexibility — an agency ships only the adapter for its jurisdiction; the multi-gigabyte base is shared.
Less cross-contamination — a narrow adapter hallucinates less across jurisdictions than one overloaded generalist.

Each adapter is a few hundred MB against a ~140GB base. Fifty specialists for the storage cost of one model plus change.

The data pipeline

Adapters are only as good as what you feed them. SELMA’s training mix:

Source	What it is	Size
U.S. Code Title 18	Federal criminal statutes (USLM XML)	~2,700 sections
State criminal codes	e.g. O.C.G.A. Title 16	~500 sections each
ALEA US Courts	Federal filings with NOS codes	491K examples
LegalBench	Legal-reasoning benchmark tasks	91.8K examples
CaseHOLD	Holding classification	585K examples
Synthetic	Generated incident→statute mappings	~50K examples

The flow is three scripts:

# 1. fetch raw sources (statutes auto-discovered from the current release)
python scripts/data_collection/fetch_federal_statutes.py
python scripts/data_collection/generate_synthetic.py   # ~50K incident→charge pairs

# 2. combine + split into instruction-tuning JSONL
python scripts/training/prepare_dataset.py
#    -> data/processed/train.jsonl + eval.jsonl

# 3. train
python scripts/training/train_qlora.py --config configs/training_config.yaml

The synthetic step is the quiet hero: hand-written statute text teaches the model what the law says, but the ~50K generated incident-to-statute examples teach it how to apply the law to a fact pattern — which is the actual task.

The training run, and the merge gotcha

On an A100-80GB, the 70B QLoRA run lands around ~72GB VRAM and 6–10 hours. The one that bites people isn’t training — it’s the merge:

python scripts/training/merge_adapter.py --config configs/model_config.yaml

Merging the LoRA weights back into the base produces a standalone model you can serve without PEFT — but it loads the full 70B in fp16 on CPU, which wants ~140GB of system RAM. GPU pods don’t have that. The fix is to skip the merge on the training box (train.sh --skip-merge), upload the adapter to HuggingFace, and merge later on a high-memory CPU instance — or just serve the adapter unmerged, which is the whole point of LoRA anyway.

Deployment ends up trivial: the adapter ships to HuggingFace (Ronin48LLC/selma-lora-adapter), a GGUF export feeds llama.cpp / LM Studio / Ollama, and ollama run serves it with no Python at all.

Bonus: how the videos themselves were built

Same spirit — small, scriptable, no proprietary stack. The whole pipeline is plain Python and CLI tools:

Deck — generated with python-pptx. A house-style palette (WCAG-AAA contrast), Calibri, and a PIL/Noto text-fitter that measures each line so titles never overflow after the font substitution that happens during render.
Render — soffice --headless --convert-to pdf, then pdftoppm -scale-to-x 1920 -scale-to-y 1080 -png to get one 1080p frame per slide.
Narration — a cloned voice via ElevenLabs TTS, one MP3 per slide, driven by a script split on [SLIDE N] markers. (Rule I learned the hard way: render one or two sample slides and approve the voice before paying for the full run.)
Assembly — ffmpeg stitches each still to its narration with a short lead/tail of silence, then concatenates to a single 1920×1080 H.264/AAC file.
Captions — faster-whisper with word timestamps does forced-ish alignment: the caption wording stays exactly the script’s, but the timing is pulled from the real audio, so subtitles track the voice instead of drifting.

That’s it. Two YouTube videos and a 70B legal model, and not one piece of the stack is closed-source or unavailable to you. The adapters are small, the recipe is in a YAML file, and the giant stays frozen.

The code: github.com/CryptoJones/SELMA (Apache-2.0). Questions or corrections — find me where I usually am.

Narrating a Novel Locally: A Voice-Cloning Audio Pipeline

2026-06-07T00:00:00+00:00

I wanted a multi-voice audiobook — a full narrator plus distinct character voices — produced entirely on my own hardware, for free, without cloning a single living person. This is the audio pipeline that got me there: zero-shot voice cloning on an 8GB consumer GPU, a casting system, and the one trick that matters more than any of it — keeping the read from jumping around.

The commercial route priced itself out immediately. A cloud TTS read of an entire novel would have run tens of thousands of credits and months of wall-clock on a metered plan. So the whole thing runs locally on F5-TTS, a zero-shot voice-cloning model: give it ~10 seconds of reference audio plus the transcript of that snippet, and it speaks new text in that voice. No fine-tuning, no training run — it clones from the reference at inference time on an 8GB card.

from f5_tts.api import F5TTS
f5 = F5TTS()
f5.infer(
    ref_file=ref_wav, ref_text=ref_transcript,   # ~10s seed + its exact transcript
    gen_text=line,                                # the sentence to speak
    file_wave=out_path,
    remove_silence=True,
    speed=1.0, seed=42,                           # see "consistency" below
)

Before any tech: no cloning living, identifiable people. Not actors, not celebrities, nobody who didn’t agree to it. Every voice in the production comes from one of two places:

Public-domain audio seeds — LibriVox / archive.org recordings old enough or licensed to be free to use.
My own voice, which I own.

That constraint isn’t a footnote, it drove the whole casting process. The dead-sounding “ROM construct” character, for instance, started life as a public-domain LibriVox chapter read by a volunteer, then got machined into something inhuman with DSP (more on that below) — not lifted from any film performance. If you take one thing from this post, let it be that you can do expressive, characterful voice work without scraping a real person’s identity.

Consistency beats per-voice perfection

Here’s the counterintuitive lesson. The thing that makes AI narration sound amateur isn’t a slightly-off character voice — it’s the audio lurching between clips: timbre and pace drifting from one sentence to the next. Zero-shot TTS re-estimates duration and prosody on every independent generation, so back-to-back clips wander in speed and tone even from the same reference.

Fixing that took a four-part “de-jitter” recipe, and it’s worth more than any amount of per-character polish:

Pin the seed. A fixed RNG seed (seed=42) gives every clip the same latent initialization, which holds timbre steady across thousands of generations.
Pin the speed. speed=1.0 everywhere. No per-line pace, ever. (I tried slowing a drugged-out character down to 0.85 — it read as a jump, so it got reverted.)
One loudness target on every clip. Every single cue ends its filter chain with the identical loudnorm:
```
loudnorm=I=-16:TP=-1.5:LRA=11
```
No per-voice loudness shaping. The whole production sits at one level.
Short crossfades at every seam. Clips are joined with a tiny triangular crossfade so the boundaries don’t click or lurch:
```
acrossfade=d=0.12:c1=tri:c2=tri
```

A useful gotcha: a “casting sheet” that slams five voices together back-to-back will sound jumpier than the real thing, because it’s nothing but seams. A real chapter is ~80% one narrator, so it flows. Judge consistency on real material, not the stress test.

A second gotcha worth saving you the rabbit hole: moving the same model to the cloud does not fix drift. It’s the same model — hosted F5 only offloads your GPU, it doesn’t change the prosody behavior. Don’t chase consistency by changing where it runs.

Casting: one source of truth

Every speaker — the narrator, each principal, and each one-line walk-on — maps to a voice in a single table that both the renderer and the stitcher read. A row is just: tag → (seed, speed, DSP chain).

Principals get a fixed public-domain seed, chosen once and reused everywhere so a character sounds the same in chapter 2 and chapter 20.
Distinctness comes from DSP, not from hunting down 30 seeds. Two characters can share a base seed and still sound clearly different if each gets its own internally uniform processing. Which leads to the timbre chains…
Walk-on “drop” characters (a guard, a clerk — one or two lines) get a deterministically random voice: hash the character’s name, pick a (seed, pitch) pair from a gender-appropriate pool, and guarantee no two walk-ons collide and that none lands too close in pitch to a principal on the same seed. Everybody’s different, nobody had to be hand-cast.

Timbre is an ffmpeg chain

Once a voice’s identity lives in its seed, its character is just a deterministic audio-filter chain — independently dialable, and applied identically to every line that character speaks (consistency again). Pitch shifts use the resample trick (asetrate to move pitch+formants, atempo to restore the original duration), so they sound like a differently-sized person rather than a chipmunk artifact:

# deepen a voice ~1.25 semitones, add body
asetrate=24000*0.93,aresample=24000,atempo=1.0753,bass=g=2:f=120, ... ,loudnorm=...

# an AI/comms character: telephone band + a touch of bitcrush
highpass=f=300,lowpass=f=3400,acompressor=...:ratio=4,acrusher=bits=8:mode=log:mix=0.15,loudnorm=...

# the "dead ROM construct": pitch-down, flat affect, digital grit, hollow comb echo
asetrate=24000*0.95,aresample=24000,atempo=1.0526,acompressor=...:ratio=4,acrusher=bits=7:mode=log:mix=0.35,aecho=0.85:0.6:18:0.35,highpass=f=80,loudnorm=...

Compressor flattens emotional affect, acrusher adds digital grit, an 18ms aecho gives a hollow comb. Every chain ends in the same loudnorm — that’s the non-negotiable.

The script: verbatim text, then heuristic attribution

The audio is only as good as the script feeding it, and there were two rules:

Text is verbatim from the source file — never reconstructed from memory. An LLM will happily “remember” a famous novel and paraphrase it. For a faithful reading that’s poison. Every line is sliced out of the actual source text; the model never gets to improvise the words.
Attribution is heuristic. To turn flat prose into tagged cues ([NARRATION], [CASE], [MOLLY], …) the tagger matches balanced quote pairs and then guesses the speaker from the surrounding text: self-introductions (“my name is…”), speech-verb patterns (name + said, said + name, pronoun + verb resolved by gender), action-beat subjects, and a running 2–3 party “who’s in this scene” tracker that resets after long stretches of narration. It lands around 80% right on the first pass — good enough to review and nudge, and narration (over half the lines) is 100% correct because it’s just the un-quoted remainder.

Prose gets grouped a few sentences per cue; dialogue is one cue per quote. Quotes are stripped (they’re spoken, not read aloud as “quote… unquote”).

Stitching it together

Per chapter the pipeline is:

Render every cue to its own wav (output//NNN_TAG.wav). The whole loop is resumable — it skips any clip that already exists, so a multi-hour run on a small GPU can be killed and restarted without losing work. The model loads once and stays resident across thousands of cues.
Process each cue with its character’s filter chain (ending in the shared loudnorm).
Crossfade the processed cues into one continuous chapter track with the 0.12s triangular fade at every seam.
Mux the chapter audio under its video and move on.

# resumable render: existing clips are skipped, model stays loaded
python3 render_all.py            # all chapters
python3 render_all.py ch05 ch06  # or just a few

# per-chapter stitch: per-voice DSP + uniform loudnorm + crossfades
bash stitch_chapter.sh ch05

Everything is plain Python and ffmpeg. No proprietary stack, no per-minute meter, no nonconsensual voice in the building. The visual side — the animated ASCII-art backgrounds and the lo-fi score each chapter plays over — is its own pure-Python engine I’ve open-sourced separately as The Flatline Sessions.

The takeaways, if you’re building something similar: clone only what you have the right to clone; pin your seed, speed, and loudness so the read doesn’t lurch; let DSP do the work of making voices distinct; and keep your script verbatim from a real source instead of trusting a model’s memory.

For Sale: A 12 GB GPU That Runs Local LLMs at ~60 tok/s

2026-06-05T00:00:00+00:00

GIGABYTE RTX 3060 EAGLE OC 12 GB — $350 shipped anywhere in CONUS.

Specs via TechPowerUp’s GPU database.

I’m selling a graphics card. The quick pitch: it’s a GIGABYTE RTX 3060 EAGLE OC with 12 GB of VRAM — and that VRAM is the whole story for local AI.

Twelve gigs is the sweet spot for running modern small models entirely on your own hardware — no API keys, no per-token billing, nothing leaving the box:

8B-class LLMs at ~60 tok/s. Qwen3-8B fits comfortably and generates at around 60 tokens/second — fast enough to feel interactive.
Runs Google’s new Gemma 4 with function calling, so you can wire it into tool-using agents locally.
All in vLLM — batched, production-grade serving on a single card.

For a homelab inference box, a quiet always-on coding assistant, or just learning how local models actually behave, 12 GB of VRAM at this price is hard to beat.

Specs at a glance


GPU	NVIDIA GeForce RTX 3060 (GA106)
Memory	12 GB GDDR6
Card	GIGABYTE EAGLE OC, dual-fan
Price	$350 shipped — CONUS only

Interested? Reply to the original post on Mastodon and we’ll sort out the details.

Reach Your Whole Home Lab From Anywhere: Tailscale on a UniFi Dream Machine Pro

2026-06-05T00:00:00+00:00

Turn your UDM Pro into a Tailscale subnet router so every device on your LAN is reachable from anywhere — no port forwarding, no public exposure.

Why do this on the router instead of each device?

You could install Tailscale on every machine you want to reach. But installing it on the UDM Pro as a subnet router is the power move: one install advertises your entire LAN onto your private Tailscale network. After that, any of your Tailscale devices (laptop, phone) can talk to anything on your home network — a NAS, a local LLM server, a Pi, a printer — using its normal LAN IP, from anywhere in the world.

Crucially, nothing is exposed to the public internet. No ports are forwarded. Tailscale builds an encrypted WireGuard tunnel that only your own devices can join, and it punches through NAT automatically — so this works even if your ISP has you behind carrier-grade NAT where port forwarding is impossible.

The catch: Tailscale isn’t official Ubiquiti software. We use the excellent community installer SierraSoftworks/tailscale-unifi, which is widely used and handles the tricky part — surviving reboots and firmware updates. You’re modifying your edge router, so go slow and read each step.

Prerequisites

A UniFi gateway running UniFi OS 2.x or later (UDM Pro, UDM SE, UDR, Cloud Gateway, etc.). Not supported: UniFi OS 1.x, Cloud Key Gen 1, the old USG, or BusyBox-based devices.
A free Tailscale account (personal use is free for up to 100 devices).
Admin access to your UniFi console and a few minutes of SSH time.

Worked example used throughout: my LAN is 192.168.1.0/24 and my gateway (the UDM Pro) is 192.168.1.1. Substitute your own values — find yours under UniFi Network → Settings → Networks, or on a Mac with route -n get default | grep gateway.

Step 1 — Enable SSH on the UDM Pro

Open your UniFi console (e.g. https://192.168.1.1 or unifi.ui.com).
Go to UniFi OS → Settings → System → Advanced.
Toggle SSH on, and set a strong root password (this is the password you’ll use to log in as root).

Step 2 — SSH into the gateway

From your computer:

ssh root@192.168.1.1

Enter the SSH password you just set. You’re now on the gateway.

Step 3 — Install Tailscale

Run the community installer:

curl -sSLq https://raw.githubusercontent.com/SierraSoftworks/tailscale-unifi/main/install.sh | sh

This installs tailscaled as a systemd service under /data/tailscale/, and adds a boot hook in /data/on_boot.d/ so it persists across reboots and firmware upgrades (UniFi OS otherwise wipes non-persistent storage on update).

Verify the install:

tailscale status

Step 4 — Bring it up as a subnet router

This is the line that advertises your LAN. Replace the CIDR with your own subnet.

tailscale up \
  --advertise-routes="192.168.1.0/24" \
  --snat-subnet-routes=false \
  --accept-routes

What the flags do:

Flag	Purpose
`--advertise-routes=...`	Offers your LAN subnet to your tailnet so remote devices can reach it.
`--snat-subnet-routes=false`	Preserves the original client IP for traffic crossing the tunnel (nicer for logging/ACLs).
`--accept-routes`	Lets the UDM also use routes other subnet routers advertise. Optional.

Want the UDM to double as an exit node (route all your internet traffic through home when you’re traveling)? Add --advertise-exit-node.

When you run this, Tailscale prints a login URL. Open it in a browser and authenticate to attach the UDM to your tailnet.

Subnet routing needs IP forwarding, which is already enabled by default on UniFi OS gateways — no extra sysctl tweaks required.

Step 5 — Approve the device and routes in the admin console

The advertised routes don’t go live until you approve them:

Open the Tailscale admin console → Machines.
Find your UDM Pro in the list.
Click it → Edit route settings → enable the subnet route(s) you advertised (and the exit node, if you added it).
Strongly recommended: also disable key expiry for the UDM. Otherwise the node’s auth key expires (default ~180 days) and your remote access silently dies until you re-authenticate on the router.

Step 6 — Test it

Install Tailscale on a phone or laptop, sign in with the same account, then — from a coffee shop, on cellular, anywhere — hit a device on your home LAN by its normal IP. For example, a local web service on your LAN:

curl http://192.168.1.50:8080/

If that responds from off-network, you’re done. 🎉

Day-2 operations

Update Tailscale:

/data/tailscale/manage.sh update

Restart the service:

systemctl restart tailscaled

Check status / your tailnet IP:

tailscale status
tailscale ip -4

Uninstall cleanly:

/data/tailscale/manage.sh uninstall

Troubleshooting

Routes not working from remote devices? Re-check Step 5 — unapproved subnet routes are the #1 cause. The route must show as enabled in the admin console.
Access died after a few months? Key expiry. Disable it on the UDM node (Step 5) and run tailscale up ... again to re-auth.
Gone after a firmware update? The installer’s boot hook normally handles this; if not, re-run the Step 3 install command — it’s idempotent.
tailscale: command not found after SSH? Use the full path /data/tailscale/tailscale, or log out and back in to pick up the PATH.

Security notes

Tailscale exposes your LAN only to devices in your own tailnet — it is not a public hole in your firewall. But that also means every device you connect from must run Tailscale and be signed into your account.
For finer control (e.g. “my laptop can reach the LLM server but not the whole LAN”), use Tailscale ACLs to scope what each device may reach.
This is third-party software on your edge router. It’s well-maintained and popular, but it’s not Ubiquiti-supported — keep that in mind for a device your whole network depends on.

Installer credit: SierraSoftworks/tailscale-unifi.

Forgot Domain Admin Password Windows Server 2022

2024-08-22T00:00:00+00:00

##Forgot Domain Admin Password Windows Server 2022

–Boot to recovery mode with ISO

– Click Next, Then “Repair my computer”

– Click Troubleshoot, then click the Command Prompt option

– list disks by opening diskpart, then “list volume”

–run “select volume 1” (assuming 1 is your main hdd)

–run “assign letter C”

–run “exit” to leave diskpart

–run “C:” to switch to your main hard drive

–run “cd \Winddows\System32” in order to change directories

–run “rename Utilman.exe Utilman.bak” to backup the utility

–run “copy cmd.exe utilman.exe”

–run “D:”

–run “bcdedit /set {bootmgr} timeout 15”

–run “bcdedit /set {bootmgr} displaybootmenu yes”

–exit and reboot computer after removing the iso

–select windows in the boot menu then hit enter

–go to login but instead of typing a username and password click the accesibility icon to get a command prompt

–in the command prompt run “net user” to list the accounts

–to change the password on one of the accounts type “net user username password /domain”

You’re up and running!

2014-03-03T00:00:00+00:00

Next you can update your site name, avatar and other options using the _config.yml file in the root of your repository (shown below).

The easiest way to make your first post is to edit this one. Go into /_posts/ and update the Hello World markdown file. For more instructions head over to the Jekyll Now repository on GitHub.