AI
AI Can't Be Accountable, So We Must Be
This Tom's Hardware piece, and the timeline of Jer Crane (CEO of PocketOS) correctly place accountability for the recent destruction of his company's production data with Cursor, Claude, and Railway, but do not include Jer Crane himself. The phrase a computer can never be held accountable is even truer today than it was when first written in an internal IBM training manual back in 1979 because of how much more capable computer systems are today. Crane's timeline is valuable whether you're an engineer building systems that leverage AI, an executive selecting dependencies, or an engineering manager between them. This piece will explore those details to model a post-mortem with full accountability.
Crane lays out the stakes of his software working as-designed clearly:
"We build software that rental businesses — primarily car rental operators — use to run their entire operations: reservations, payments, customer management, vehicle tracking, the works. Some of our customers are five-year subscribers who literally cannot operate their businesses without us."
A single API call destroyed business-critical data for every PocketOS customer in 9 seconds. Railway's architecture storing volume-level backups in the same volume meant all the backups of that data were also deleted. They experienced what every business that uses AWS us-east-1 experiences when that region has an outage--but worse, because current backups no longer existed.
Crane includes the agent's "confession" in his post-mortem. Reading it in full reiterates numerous ways in which people granting an AI system both a broad scope of action and control over its own guardrails violates the principle of separation of duties with catastrophic results. Talking about Cursor's safeguards failing or the difference between Cursor's marketing and the reality of the tool in action doesn't change the fact that "a computer can never be held accountable". Buying access to Claude's most expensive model and configuring it "with explicit safety rules in our project configuration" integrated with Cursor does not transfer his share of accountability to Cursor, Anthropic, or Railway. Where specifically did the post-mortem fall short?
Railway's documentation on data & storage backups has a Caveats section which describes backups as "a newer feature that is still under development". Given this, a post-mortem that talked about migrating to managed database outside of Railway to enable multi-region database failover as a way to mitigate the risk of that feature would highlight the platform shortcoming and demonstrate a CEO taking proactive steps to make his service offering more resilient.
Per Crane's timeline, the agent encountered a credential mismatch while working on a routine task in a staging environment. The post-mortem doesn't explain what this routine task was, or why a credential mismatch led to volume deletion rather than a failed operation and a logged error. The lack of additional detail on this is an accountability gap for the PocketOS team, and ultimately the CEO. The same tool being used to make staging environment changes, generate code on its own, help engineers develop code, review it, and deploy it constitutes a separation of duties (SoD)failure. Before GenAI, my prior company used a GitHub bot to require different engineers to review and approve PRs for merge than those who implemented the change. If a team failed to install that bot on their repos, any SoD violations detected by the CI/CD pipelines would block production deployment until the violation was resolved or an accountable executive agreed to take responsibility for any production issues resulting from that deployment. Multi-layer SoD enforcement mechanisms are still needed even with GenAI.
From Railway's API documentation, the token created "to add and remove custom domains via the Railway CLI" was an Account token--the only type with "all resources and workspaces" scope. A post-mortem which pointed out the different available scopes and replacing that token with a Workspace or Project token scoped to staging would create a balanced picture of the CEO being accountable and reinforce his good points regarding the absence of feedback and human-in-the-loop opportunities in the token creation flow. Railway's token types are not as granular as those offered by classic personal access tokens or fine-grained tokens in GitHub, but they do provide the limited scope which limits the blast radius of misuse (accidental or intentional). The API tokens aren't time-limited, so a long-term step the CEO might recommend in the post-mortem is a Railway feature providing an expiration option as a security measure customers can choose to use. Crane's post-mortem does a lot of things right--but the missing piece is himself. Cursor, Anthropic, and Railway need to be accountable--but so does he.
Regardless of the field of endeavor, we need to take accountability for the outcomes of actions taken by generative AI systems. Here's what that looks like in practice:
- Build guardrails that people control, instead of delegating that responsibility to AI.
- Apply the principle of separation of duties anywhere AI agents act independently.
- Use environment-scoped access so that credentials which work in non-production environments don't work in production environments (and vice versa).
- Govern AI agents' scope of action by the principle of least privilege--just as we should for people.
- Require generative AI to ask permission for every change, even in your personal use--this shifts accountability back to people.
Linux on the Desktop Revisited
I write blog posts primarily for myself, and the post I wrote about running Linux on my Google PixelBook back in 2022, came in handy as I set up the replacement for that Google PixelBook today. I bought an open-box Galaxy Chromebook Plus for under $600. Thanks to that old blog post, I was able to install the Debian version of Slack and Chromium and run it just like I do on my other devices (since there hasn’t been a version of Slack that runs on ChromeOS for years). I’ve also installed Visual Studio Code on the Debian Linux available on this Chromebook Plus. We’ll see if I can get a useful application written using this new device this year.
Compared to my old PixelBook (which now gets used very occasionally for my twins therapy appointments when they happen virtually instead of in-person), this Chromebook Plus is just as light (if not lighter), very thin, has more ports, and a bigger screen. This translates to a keyboard that has enough room for a number pad on the side. This device is my only personal laptop, having replaced my 16" MacBook Pro with a Mac mini M4 in late 2024 since I rarely took that laptop on personal travel.
This Chromebook Plus came with a free year of pro access to Gemini. I’ll do some prompt comparisons with Claude, which I used to experiment a bit with Model Context Protocol (MCP) to try their weather server and MCP client demos last year. I’ve been paying for the $20/month Pro plan for a bit and it’s been an improvement over my experience with Perplexity for the most part–but it’s had some hallucination issues. Other Google-specific stuff I will play with primarily on this machine includes NotebookLM, Whisk, and Flow (AI tools for generating audio, images, and videos from a variety of sources).
Great Customer Service Smoothes Out Bad Self-Service
Success at switching to a truly bundled Disney+ and Hulu experience (both with no ads) from the janky status quo where both services were billed separately and Hulu had ads but Disney+ didn't required the great customer service experience I had earlier today. In prior months, I'd made the mistake of following the instructions provided as the self-service approach to accomplishing this, and failed miserably. I switched from annual billing to monthly on Disney+ and tried to switch to the Premium Duo multiple times over multiple months, only to be redirected to Hulu and be blocked from signing up for what I wanted.
Today I tried the chat option (with a live human being) and finally got the bundle I wanted--and a refund for the price differential between the new bundle and what I'd been paying. It ultimately took being manually unsubscribed from both Disney+ and Hulu, which the customer service rep accomplished by reaching out to whatever department and systems she needed to, in the span of about 20 minutes. Definitely a 5-star customer service experience--unfortunately made necessary by terrible self-service options.
Plenty of companies almost certainly believe that they will be able to use ChatGPT (or something like it) to replace the people that do this work. But at least initially (and probably for quite awhile after that) the fully-automated customer service experience is likely to be worse (if not much worse) than the experience of customer service from people. I'm very skeptical of the idea that an AI chatbot would have driven the same outcome from a customer service interaction as a person did in this case. And this is in a low-stakes situation like streaming services (some number of which will very likely end up on my budget chopping block in 2024). High-stakes customer service situations will not have the same tolerance for mistakes, as shown in the FTC's 5-year ban on Rite-Aid using facial recognition for surveillance. These are the sorts of mistakes warned about in the documentary Coded Bias years ago, but I have no doubt that other companies will make the same mistakes Rite-Aid did.
In an episode of Hanselminutes I listened to recently, the host (Scott Hanselman) used a comparison of how AI could be used between the Iron Man suit and Ultron. I hope using AI to augment human capabilities (like the Iron Man suit) is the destination we get back to, after the current pursuit of replacing humans entirely (like Ultron) fails. Customer service experiences that led by people but augmented by technology will be better for people on both sides of the customer service equation and better for brands.