Scattered thoughts on LLM tools
- Claude Desktop is a poor app in bizarre and inexplicable ways — stale table view cells, constant reauthentication requests, a markedly worse harness and response rate. I am reminded of Paul's excellent review of GPT-5, in particular this passage:
ChatGPT 5 is an incrementally better, higher-quality experience than its predecessors, and it lets you use an LLM in many different ways. But as a piece of software, it's absolutely bananas how busted it is—and I think we've all gone so far down the rabbit hole that we're not seeing it.
Cursor's roadmap is best understood by what their most prominent ICs are posting on X the everything app, which right now is cloud agent workflows. When everything else is just so, it seems like the logical endpoint is infinite and perfectly abstracted sandboxes with previewing, isolation, and very tight feedback loops. But right now the largest gap between where we and most other organizations are and that brilliant future is not on the AI side but on all the calls from coming inside the house that make it difficult to sandbox a mature application.
I am still loving Conductor; it is the interface through which I do the majority of my LLM experimentation, and yet none of my long-term fears about their business prospects as outlined in that essay have been quelled. I, right this very second, find Conductor valuable enough to pay for; I don't think I represent the majority.
Internal LLM tools are having a bit of a moment in the spotlight. Ramp pushed theirs for a couple PR cycles (Why we built our background agent) and Stripe is now attempting to do the same (Minions: Stripe's one-shot end-to-end coding agents). Naturally, there are 100 or so independent projects on GitHub that are trying to recreate the behavior.
Everyone has built AI code review; nobody's made it stick. Linear is actively working on code review; GitHub, the pre-eminent market leader, has somehow destroyed their own app to the point where I am prompted, on a twenty-file pull request, to instead view each of those files on its own page to improve performance.
I am sure the landscape will look different in a few months' time, but not tremendously so. Everyone appears to be coalescing on the same handful of truisms from slightly different vantage points:
- LLMs that run in sandboxed cloud contexts are more horizontally scalable;
- LLMs work best when as much extracurricular data as possible is provided;
- Improving the feedback loop of an LLM is as important (if not more so) than improving the LLM itself;
- The chokepoints in high-level processes are not yet being addressed by LLMs in a systemic way.