AI Can't Be Accountable, So We Must Be

This Tom's Hardware piece, and the timeline of Jer Crane (CEO of PocketOS) correctly place accountability for the recent destruction of his company's production data with Cursor, Claude, and Railway, but do not include Jer Crane himself. The phrase a computer can never be held accountable is even truer today than it was when first written in an internal IBM training manual back in 1979 because of how much more capable computer systems are today. Crane's timeline is valuable whether you're an engineer building systems that leverage AI, an executive selecting dependencies, or an engineering manager between them. This piece will explore those details to model a post-mortem with full accountability.

Crane lays out the stakes of his software working as-designed clearly:
"We build software that rental businesses — primarily car rental operators — use to run their entire operations: reservations, payments, customer management, vehicle tracking, the works. Some of our customers are five-year subscribers who literally cannot operate their businesses without us."
A single API call destroyed business-critical data for every PocketOS customer in 9 seconds. Railway's architecture storing volume-level backups in the same volume meant all the backups of that data were also deleted. They experienced what every business that uses AWS us-east-1 experiences when that region has an outage--but worse, because current backups no longer existed.

Crane includes the agent's "confession" in his post-mortem. Reading it in full reiterates numerous ways in which people granting an AI system both a broad scope of action and control over its own guardrails violates the principle of separation of duties with catastrophic results. Talking about Cursor's safeguards failing or the difference between Cursor's marketing and the reality of the tool in action doesn't change the fact that "a computer can never be held accountable". Buying access to Claude's most expensive model and configuring it "with explicit safety rules in our project configuration" integrated with Cursor does not transfer his share of accountability to Cursor, Anthropic, or Railway. Where specifically did the post-mortem fall short?

Railway's documentation on data & storage backups has a Caveats section which describes backups as "a newer feature that is still under development". Given this, a post-mortem that talked about migrating to managed database outside of Railway to enable multi-region database failover as a way to mitigate the risk of that feature would highlight the platform shortcoming and demonstrate a CEO taking proactive steps to make his service offering more resilient.

Per Crane's timeline, the agent encountered a credential mismatch while working on a routine task in a staging environment. The post-mortem doesn't explain what this routine task was, or why a credential mismatch led to volume deletion rather than a failed operation and a logged error. The lack of additional detail on this is an accountability gap for the PocketOS team, and ultimately the CEO. The same tool being used to make staging environment changes, generate code on its own, help engineers develop code, review it, and deploy it constitutes a separation of duties (SoD)failure. Before GenAI, my prior company used a GitHub bot to require different engineers to review and approve PRs for merge than those who implemented the change. If a team failed to install that bot on their repos, any SoD violations detected by the CI/CD pipelines would block production deployment until the violation was resolved or an accountable executive agreed to take responsibility for any production issues resulting from that deployment. Multi-layer SoD enforcement mechanisms are still needed even with GenAI.

From Railway's API documentation, the token created "to add and remove custom domains via the Railway CLI" was an Account token--the only type with "all resources and workspaces" scope. A post-mortem which pointed out the different available scopes and replacing that token with a Workspace or Project token scoped to staging would create a balanced picture of the CEO being accountable and reinforce his good points regarding the absence of feedback and human-in-the-loop opportunities in the token creation flow. Railway's token types are not as granular as those offered by classic personal access tokens or fine-grained tokens in GitHub, but they do provide the limited scope which limits the blast radius of misuse (accidental or intentional). The API tokens aren't time-limited, so a long-term step the CEO might recommend in the post-mortem is a Railway feature providing an expiration option as a security measure customers can choose to use. Crane's post-mortem does a lot of things right--but the missing piece is himself. Cursor, Anthropic, and Railway need to be accountable--but so does he.

Regardless of the field of endeavor, we need to take accountability for the outcomes of actions taken by generative AI systems. Here's what that looks like in practice:

  • Build guardrails that people control, instead of delegating that responsibility to AI.
  • Apply the principle of separation of duties anywhere AI agents act independently.
  • Use environment-scoped access so that credentials which work in non-production environments don't work in production environments (and vice versa).
  • Govern AI agents' scope of action by the principle of least privilege--just as we should for people.
  • Require generative AI to ask permission for every change, even in your personal use--this shifts accountability back to people.

Linux on the Desktop Revisited

I write blog posts primarily for myself, and the post I wrote about running Linux on my Google PixelBook back in 2022, came in handy as I set up the replacement for that Google PixelBook today. I bought an open-box Galaxy Chromebook Plus for under $600. Thanks to that old blog post, I was able to install the Debian version of Slack and Chromium and run it just like I do on my other devices (since there hasn’t been a version of Slack that runs on ChromeOS for years). I’ve also installed Visual Studio Code on the Debian Linux available on this Chromebook Plus. We’ll see if I can get a useful application written using this new device this year.

Compared to my old PixelBook (which now gets used very occasionally for my twins therapy appointments when they happen virtually instead of in-person), this Chromebook Plus is just as light (if not lighter), very thin, has more ports, and a bigger screen. This translates to a keyboard that has enough room for a number pad on the side. This device is my only personal laptop, having replaced my 16" MacBook Pro with a Mac mini M4 in late 2024 since I rarely took that laptop on personal travel.

This Chromebook Plus came with a free year of pro access to Gemini. I’ll do some prompt comparisons with Claude, which I used to experiment a bit with Model Context Protocol (MCP) to try their weather server and MCP client demos last year. I’ve been paying for the $20/month Pro plan for a bit and it’s been an improvement over my experience with Perplexity for the most part–but it’s had some hallucination issues. Other Google-specific stuff I will play with primarily on this machine includes NotebookLM, Whisk, and Flow (AI tools for generating audio, images, and videos from a variety of sources).


Great Customer Service Smoothes Out Bad Self-Service

Success at switching to a truly bundled Disney+ and Hulu experience (both with no ads) from the janky status quo where both services were billed separately and Hulu had ads but Disney+ didn't required the great customer service experience I had earlier today. In prior months, I'd made the mistake of following the instructions provided as the self-service approach to accomplishing this, and failed miserably. I switched from annual billing to monthly on Disney+ and tried to switch to the Premium Duo multiple times over multiple months, only to be redirected to Hulu and be blocked from signing up for what I wanted.

Today I tried the chat option (with a live human being) and finally got the bundle I wanted--and a refund for the price differential between the new bundle and what I'd been paying. It ultimately took being manually unsubscribed from both Disney+ and Hulu, which the customer service rep accomplished by reaching out to whatever department and systems she needed to, in the span of about 20 minutes. Definitely a 5-star customer service experience--unfortunately made necessary by terrible self-service options.

Plenty of companies almost certainly believe that they will be able to use ChatGPT (or something like it) to replace the people that do this work. But at least initially (and probably for quite awhile after that) the fully-automated customer service experience is likely to be worse (if not much worse) than the experience of customer service from people. I'm very skeptical of the idea that an AI chatbot would have driven the same outcome from a customer service interaction as a person did in this case. And this is in a low-stakes situation like streaming services (some number of which will very likely end up on my budget chopping block in 2024). High-stakes customer service situations will not have the same tolerance for mistakes, as shown in the FTC's 5-year ban on Rite-Aid using facial recognition for surveillance. These are the sorts of mistakes warned about in the documentary Coded Bias years ago, but I have no doubt that other companies will make the same mistakes Rite-Aid did.

In an episode of Hanselminutes I listened to recently, the host (Scott Hanselman) used a comparison of how AI could be used between the Iron Man suit and Ultron. I hope using AI to augment human capabilities (like the Iron Man suit) is the destination we get back to, after the current pursuit of replacing humans entirely (like Ultron) fails. Customer service experiences that led by people but augmented by technology will be better for people on both sides of the customer service equation and better for brands.


Will AI Change My Job or Replace It?

One of my Twitter mutuals recently shared the following tweet with me regarding AI:

[twitter.com/carnage4l...](https://twitter.com/carnage4life/status/1648509247246974977?s=61&t=WjLQbL9JF8UV2BDiOn9Ung)


I found Dare Obasanjo's commentary especially interesting because my connection to Stack Overflow runs a bit deeper than it might for some developers. As I mentioned in a much older post, I was a beta tester for the original stackoverflow.com. Every beta tester contributed some of the original questions still on the site today. While the careers site StackOverflow went on to create was sunsetted as a feature last year, it helped me find a role in healthcare IT where I spent a few years of my career before returning to the management ranks. Why is this relevant to AI? Because the purpose of Stack Overflow was (and is) to provide a place for software engineers to ask questions of other software developers and get answers to help them solve programming problems. Obasanjo's takeaway from the CEO's letter is that this decade-plus old collection of questions and answers about software development challenges will be used as input for an AI that can replace software engineers altogether. My main takeaway from the same letter is that at some point this summer (possibly later) Stack Overflow and Stack Overflow for Teams (their corporate product) will get some sort of conversational AI capability added, perhaps even without the "hallucination problems" that have made the news recently.

Part of the reason I'm more inclined to believe that [chatbot] + [10+ years of programming Q & A site data] = [better programming Q & A resource] or [better starter app scaffolder] instead of [replacement for junior engineers] is knowing just how long we've been trying to replace people with expertise in software development with tools that will enable people without expertise to create software. While enough engineers have copied and pasted code from Stack Overflow into their own projects that it led to an April Fool's gag product (which later became a real product), I believe we're probably still quite some distance away from text prompts generating working Java APIs. I've lost track of how many companies have come and gone who put products into the market promising to let businesses replace software developers with tools that let you draw what you want and generate working software, or drag and drop boxes and arrows you can connect together that will yield working software, or some other variation on this theme of [idea] + [magic tool] = [working software product] with no testing, validation, or software developers in between. The truth is that there's much more mileage to be gained from tools that help software developers do their jobs better and more quickly.

ReSharper is a tool I used for many years when I was writing production C# code that went a long way toward reducing (if not eliminating) a lot of the drudgery of software development. Boilerplate code, variable renaming, class renaming are just a few of the boring (and time-consuming) things it accelerated immensely. And that's before you get to the numerous quick fixes it suggested to improve your code, and static code analysis to find and warn you of potential problems. I haven't used GitHub Copilot (Microsoft's so-called "AI pair programmer) myself (in part because I'm management and don't write production code anymore, in part because there are probably unpleasant legal ramifications to giving such a tool access to code owned by an employer), but it sounds very much like ReSharper on steroids.

Anthony B (on Twitter and Substack) has a far more profane, hilarious (and accurate) take on what ChatGPT, Bard, and other systems some (very) generously call conversational AI actually are:

[twitter.com/swearyant...](https://twitter.com/swearyanthony/status/1635979825428205568?s=61)&t=V5fIhk2nXZQISXUJjoshxQ

His Substack piece goes into more detail, and as amusing as the term "spicy autocomplete" is, his comparison of how large language model systems handle uncertainty to how spam detection systems handle uncertainty provides real insight into the limitations of these systems in their current state. Another aspect of the challenge he touches on briefly in the piece is training data. In the case of Stack Overflow in particular, having asked and answered dozens of questions that will presumably be part of the training data set for their chatbot, the quality of both questions and answers varies widely. The upvotes and downvotes for each are decent quality clues but are not necessarily authoritative. A Stack Overflow chatbot could conceivably respond with an answer based on something with a lot of upvotes that might actually not be correct.

There's an entirely different discussion to be had (and litigation in progress against an AI image generation startup, and a lawsuit against Microsoft, GitHub, and OpenAI) regarding the training of large language models on copyrighted material without paying copyright holders. How the lawsuits turn out (via judgments or settlements) should answer at least some questions about what would-be chatbot creators can use for training data (and how lucrative it might be for copyright holders to make some of their material available for this purpose). But in the meantime, I do not expect my job to be replaced by AI anytime soon.