While I've been impressed with the advancements in software engineering LLMs over the past six months, it's hard for me to say that anything has been really paradigm shifting. What I mean by that is I feel increasingly comfortable offloading larger and more nuanced tasks to cloud or whatever the hottest LLM of the given week is, but it's not really changing how I approach building software or how I approach what my day looks like. It simply makes me faster and more productive. This has been a bit of a through line for me really since Copilot came out, which is that a lot of the iteration in this space feels like very strong and impressive incremental improvements, but with obvious jank, especially once you escape the lovely sandbox of small, un-nuanced apps without the accretion of technical debt or erstwhile design decisions or things that don't necessarily fit into a corpus trained on Stack Overflow and other such sites.
Outside of the realm of software engineering, I have found a tool that has changed the way I work on a day-to-day basis. I've used it for a grand sum of one month, so this is not exactly a gargantuan sample size — but the tool has held up and done remarkably well in such a way that my day viscerally feels different than it used to. This tool is called Aqua Voice.
Aqua Voice is a speech-to-text transcriber. It's an LLM thing, technically. I say technically because you don't really feel like you're interacting with an LLM. There's no chat paradigm. There's not a lot of losslessness there. It feels like the way LLM-powered search feels, which is here's this thing that has always been possible with varying degrees of fidelity, speed, and other trade-offs. Now it's just much better in ways that are hard to quantify but easy to internalize.
I spend a lot of my days writing things. I write blog posts (including this one!), long-winded pull request comments, and many, many emails to prospects, active customers, potential team members, and existing team members. This is frankly kind of tiring. I'm old now: my wrists and fingers aren't as bereft of carpal tunnel as they used to be, and I often feel myself intentionally limiting my communication because the effort in having to write out something that is nuanced or complicated and perhaps long-winded is just difficult to do. It's hard to really pour everything you have into writing a five-paragraph email when you know once you hit send there are 47 left.
And that brings me to Aqua Voice, which I am (as you might have guessed) using to write this very blog post, though I likely won't use it for others (my prose is verbose enough as-is, and this just makes it worse!). Aqua is just a really, really good transcriber. There's nothing else to say about it from a product or UX perspective, though maybe that'll change. Really, it's just an example of something being so, so good that you start to find places to use it where you wouldn't have even fathomed. And that's a marked difference from my previous experiences of using speech-to-text, where things were fine and solid, but you couldn't really trust it to be character perfect. You had to do a lot of massaging and editing after the fact, which, from a volume perspective, often made me conclude that it really wasn't worth it compared to just writing out stuff myself or sending a voice memo in very specific cases.
I want to give two examples of things that I am just straight up doing differently now compared to a week ago because I have this tool:
- Writing issue descriptions in Linear or whatever tool you want to use. I'm really bad at this in no small part because it's hard for me to jump a rat's nest of context and state and known knowns and known unknowns and all of the flotsam that is required to write up a good ticket. What ends up happening is 95% of the time, I will just write a ticket that has the title and no description, and if I end up taking it a couple of months down the line, hopefully, I remember what it's about. If someone else takes it, they have to kind of figure out what it means and maybe we just cancel it because I cannot remember. Now I'm just writing a bunch of ticket descriptions.
- The blank screen problem. It's regardless of what the artifact is, whether it's a changelog entry, a blog post, or a position paper. I often find there's this huge mental tax in just getting the first couple bullet points out the door. This tax feels heavier when I'm more tired or have a lot on my mind, or it's been a tricky day in some respect. And yet with Aqua, I can just start talking. Often the talking is kind of rambly, and I might want to go back and delete a paragraph or two — which is, of course, fine, because the hard work has already been completed (going from zero to non-zero words.)
I think so much of the AI discourse, such as it is, is prescriptivist: you kind of implicitly take a position before evaluating the thing, whether the thing is, you know, image generation or automatic code review or whatever (a failure mode of which I have certainly been guilty, on both sides!)
But to me, what feels like both the more ideologically pure and more pragmatic thing to do is pretend the AI does not exist. Pretend the LLM is an implementation detail to which you as a user are not privy, and evaluate the tool, whatever it is, on its own merits. This is where Aqua is really interesting to me — it could secretly be backed by nothing LLM-ish at all, and it's just really, really great from a performance perspective. I don't care. It's a super useful tool that makes my day easier, and I am grateful that it exists.
I wrote this essay around three weeks ago and let it sit for no other reason than laziness and a very long to-do list. In those past three weeks, I've continued to use Aqua.
Whether it's hedonic adaptation or some other cause, the experience has kind of gotten worse in the way that AI experiences sadly tend to. Aqua was rewriting my words if I was in a certain composer like Gmail. Suddenly, the sub-500 milliseconds wake time would drop the first sentence or two. How much of this is due to AI? How much of this is due to my own shifting demands for how snappy this should be? How much is just due to code being code or network connection being network connections? It is impossible to tell, which is one of the tricky things about this bit of software. I'm still happy enough with the experience to continue using it over any of its competitors, but the tools that I use for years on end are the ones that benefit from a confidence and muscle memory in their usage. And as I read back this essay, I am reminded that it is very hard for LLM tools to develop any sort of patina at all.