Week of 2025-12-08

Featuring more LLM discovery, a great time in Seattle, and reflecting on identity

December 16, 2025

Late again--I'll blame IKEA for this one. At least I got some of their new Matter gadgets, and I possess a real nightstand for once in my life.

My wife and I are allocating Monday as a group chore day for the foreseeable future, so I'll move my notes day to Sunday to better stay on schedule. Hopefully that will improve my timing--these notes usually take me all day.

Most of my week happened on the two-day trip up to Seattle and back. If you didn't read my notes last week, I was headed up to attend the Letta Social Agents meetup.

Ministral 3 and llamacpp

There was a workshop planned at the meetup, so I wanted to make sure my self-hosted LLM setup was in tiptop shape. Alas--the workshop fell through due to timing, but the prep period was valuable to me nonetheless.

Long story short, I switched from vLLM to llama.cpp, and I deployed Ministral 3 14B Reasoning. I was motivated by two thoughts:

If I'm reading the vLLM logs properly, I'm only serving 1.4 requests simultaneously. To hell with parallelism--let's just make that a round number.
If I'm not serving parallel requests, I don't need a KV cache--so let's find the biggest, newest model I can fit in my 16GB of VRAM.

So far, so good. The Ministral 3 model is noticeably less "anxious" and emoji-happy than Qwen3. It also seems to adhere to the system prompt better, which is better for chat and agent use cases.

I'm also pleasantly surprised by the feature set for llama.cpp. I was peripherally aware of how pervasive its usage is--after all, the first quantizations for any new model are usually .gguf files--but I didn't expect using it to be so nice.

llama.cpp provides an ideal mix of the Ollama and vLLM features. While the basic usage allows you to pull any model off of Hugging Face and deploy it to an OpenAI-compatible endpoint--just as the other two servers do--you can optionally route between multiple models, or even enable the same sort of KV cache used for parallel request handling. In concert with first-class support for GGUF, I don't think I would choose another runtime unless I was utilizing multiple GPUs.

The Meetup(s)

Graham

4mo

Here’s the one blurry PS Vita pic I remembered to take which is of @cameron.pfiffer.org presenting, lmao

An extremely low-quality photo of Cameron delivering a presentation. A TV displays an unreadable slide on the left, there are two guitars on stands in the middle, and Cameron stands behind his laptop on a podium on the right using his hands to articulate his point.

llama.cpp happened at the start of my trip--the rest of it was occupied by the meetup itself and some other socializing. I had an incredible time chatting with folks who are passionate about atproto and social agents--March and ATmosphereConf can't possibly come soon enough.

In the meantime, I hope to channel my anticipation into some more project work. may have convinced me to pick Cistern back up again--we'll see. Before I commit to doing any more project work, I need to work out a system for organizing my existing projects. Too much excitement and too many ideas makes it hard for me to focus.

"What do you do?"

Every time I'm asked that question, time slows to a stop. I panic a little. I never really know how to answer--which is funny, because if I've resolved to continue attending meetups where I'm destined to be asked such a thing, you'd think I would have a formulated answer by now. I was asked it at least twice during my trip.

Truthfully, I didn't see the pattern until I was on the train home from Seattle. I usually come up with my answer in the moment, so I don't think much about the interaction past the initial discomfort.

After finally inspecting that discomfort, I realized that when someone asks what I do, two thoughts occur simultaneously:

Intellectually, I do a lot of things, which I can't possibly describe in a couple seconds. I write websites, API backends, and DevOps pipelines. I'm a CAD and 3D-printing enthusiast; I'm a stay-at-home cat dad and house husband. I'm a walking catalog for EDC bags, JS libraries, and self-hostable services. How can I possibly pick a short statement to summarize it all?

Emotionally, either I do none of those things, or none of them are relevant to the person asking. I don't have an employer, a job title, or a singular project to wield as an identity. Sometimes I say that I'm between jobs or that I'm a contract developer, but I'm not exactly looking for work right now. I just... am. I'm around. I'm interested and actively working on stuff, allegedly.

If a software developer does "stuff" and has no title, deliverables, or achievements to point at, are they even a software developer?

The answer is, of course, yes. I don't need to make the next React or Bluesky to be who I claim to be. It's enough that I'm keeping up with the state of software development, and always starting--though rarely finishing--little projects here and there. It just makes life a little tough when I can't deliver that subtext in the midst of conversation.

At the end of the day, my situation is complicated, and I need to reflect on how best to describe it--there's a succinct identity in there somewhere. Hopefully, I can attend ATmosphereConf with a confident answer.

At any rate, thanks for sticking with me. My notes got a bit more personal than normal this week. As always, I hope you have a great week!

weekly recap

A Pocket for my Weeks

Weekly notes by Graham