Data Analysis · June 2026

Did Claude Increase Bugs in rsync?

A simple distributional analysis of every rsync release with bug data. No model. No assumptions. Just placement.

Repository: RsyncProject/rsync
Method: bugs per 10 commits, permutation test

1 · Background: The "rsync Outrage"

In late May 2026, rsync blew up. GitHub, Hacker News, Lobsters: hundreds of people arguing about whether open-source maintainers can ship AI-written code and have it be reliable — and whether the people taking the code for free get to demand how it is made.

On May 30, 2026, a GitHub issue titled "Please Do Not Vibe Fuck Up This Software" was opened against the rsync repository. It attached a screenshot of a Mastodon post criticizing the project's use of Claude. No bug report. No technical content. What followed was extraordinary: 329 comments and counting, ranging from thoughtful concern to outright harassment.

GitHub issue screenshot
The GitHub issue that started it all. The original post was a screenshot of a Mastodon critique, no bug report, no technical content. It has since accumulated 329 comments.
Hacker News thread screenshot
The thread quickly escalated, from "the software is free, if you don't like it then fork it or fuck off" to: "just because you are giving free soup to the homeless, does not mean you can piss in it".

The thread did not stop at words. One user posted My Little Pony drawings of themselves strangling the "project janitor that pushed vibecoded commits":

Threatening drawing
A user posting drawings depicting violence against the rsync maintainer, one of several threats that escalated the issue from heated debate to harassment.

It spread to Hacker News and Lobsters, generating hundreds more comments. The central claim, repeated everywhere: Claude-assisted development introduced bugs into a previously stable tool.

People are very justifiably angry that a very stable, well trusted tool, has started to immediately go downhill… all because the main dev is vibecoding that software.

— fao_ on Hacker News

On Lobsters, user boramalper wrote:

It'd be interesting if someone actually did a timechart of regressions after each release (if at all possible) to see if the number actually went up recently or not.

— boramalper on Lobsters

User bitshift replied: "I would also love to see such a chart. It wouldn't be completely informative… But at least it would be something objective we could measure."

This analysis is that chart. One metric, every release, no model.

On the HN thread, user zos_kia pointed at the confound directly:

From a cursory look, it looks like a security fix in response to a CVE surfaced a coding error which has been present in the code since 2007. This is so banal that it's actually hilarious to see people lose their shit over it.

— zos_kia on Hacker News

On Lobsters, user jbert spelled out the causal chain:

The trigger for the increased volume of changes (and hence increased number of regressions) was the influx of (mostly) LLM-enabled security issues. i.e. the causal chain was: LLMs → more known security issues → more changes needed than usual → more regressions than usual.

— jbert on Lobsters

These users identified the exact confound: it wasn't AI writing the code that caused regressions. It was AI finding security holes that forced tridge to ship more changes than usual — and more changes means more regressions, regardless of who wrote them. This is not a Claude problem. It is a "more changes" problem. Tridge himself confirmed this causal chain in his response, describing how a flood of AI-generated CVE reports forced rapid, extensive changes to rsync's attack surface. A retired developer who would rather be sailing, he reached for Claude to help with the volume: writing test suites, adding defence-in-depth hardening, and working through the security backlog. He acknowledged the regressions in v3.4.3 but said he had deliberately prioritized security fixes over edge-case compatibility.

2 · Executive Summary

3 · The Metric

The analysis uses a single metric: bugs per 10 commits (bugs/10c). For each release, divide the number of bugs attributed to it by the number of commits in its range, then multiply by 10. This normalizes for release size.

bugs/10c = (bug_count ÷ total_commits) × 10

How commits are assigned to releases

Every commit on the default branch was ordered by committer date to produce a sequential timeline. Each git tag points to a specific commit in this timeline. A release's range is all commits between the previous tag and its own tag. Pre-release tags ("pre", "rc") are skipped as boundaries and absorbed into their final release. Every commit belongs to exactly one release.

How bugs are found and attributed

Bug counts come from three sources: GitHub issues in the rsync repository, the rsync Bugzilla instance, and the rsync mailing list. Issues filed against the rsync project were collected via the GitHub REST API. Bugs from the mailing list were identified by parsing message subjects for bug report patterns and cross-referencing with the project's issue tracking. Bugzilla entries were collected via the Bugzilla API; each entry has a "Version" field that explicitly states which release the bug was reported against, and bugs are attributed to that release. GitHub issues and mailing-list bugs are attributed to the most recent release that shipped before the bug was reported.

Why this metric

The critics' claim is a simple comparison: did the rate go up? The simplest honest response is a simple rate. If the Claude releases sit in the middle of the historical distribution, the burden shifts to the critics to explain why this particular middle is somehow worse than all the other middles that came before it.

What this metric does not do

It does not control for commit complexity, security intensity, or bug severity. It does not distinguish between a one-line typo fix and a CVE patch. It is a blunt instrument. But the critics' accusation is also blunt: "Claude is making things worse." A blunt instrument is the fairest response.

4 · Results

Claude Releases

v3.4.2

0.80 bugs/10c
4 bugs · 50 commits · 9 Claude
31st percentile (rank 11 of 35)

v3.4.3

6.76 bugs/10c
23 bugs · 34 commits · 28 Claude
74th percentile (rank 26 of 35)

How Normal Are the Claude Releases?

46%
of random pairs match or exceed the Claude mean
271 of 595 possible pairs of 2 historical releases have mean bugs/10c ≥ 3.78. Nearly half. The Claude releases sit exactly where most pairs land — the middle of the distribution, not the tail.

Claude mean: 3.78 · Historical mean: 7.59

The Distribution

Here is where these releases fall in the distribution of all prior releases:

middle 50%
v3.4.2inside middle 50% ✓
v3.4.3inside middle 50% ✓
0.010.1110100
Historical Claude Middle 50% (IQR) Outside IQR

Each dot is a release. The shaded green band is the interquartile range (IQR) — the middle 50% of historical releases, from 0.65 to 6.82 bugs/10c. Half of all historical releases fall inside this band, and half fall outside. The darker regions on either side are the lower and upper quarters. The Claude releases (green dots) both fall inside the IQR — their bug rates are indistinguishable from the typical historical range.

Regime Check

The historical mean is 7.59 bugs/10c, but this is driven by a bimodal distribution. v2.x releases average 2.04 bugs/10c; v3.x releases average 11.46. Even within v3.x, the Claude releases are unremarkable: v3.4.2 ranks 4th of 21 v3.x releases, v3.4.3 ranks 13th of 21 v3.x releases.

A runs test on the 35 non-Claude releases finds 14 runs (expected 18.5 under randomness, z=-1.54, p=0.123). There is no evidence of temporal clustering — the sequence is consistent with a random draw from the same distribution.

The Outlier Nobody Noticed

113.33
bugs per 10 commits — v3.4.1, no Claude
The highest bug rate in the entire dataset. 102 bugs in 9 commits, a hotfix release the day after v3.4.0. It exceeds every other release by an order of magnitude. Nobody noticed. There was no AI to blame so there was no GitHub issue with 300 comments, no death threats, no threats to fork or move to openrsync. A maintainer shipped a broken release and fixed it. This happens. The only thing that made v3.4.3 special was the availability of an enemy everyone had already decided to hate.

All Releases (chronological)

ReleaseBugsCommitsClaudeBugs/10cPercentile
v2.4.621301.5446th percentile
v2.5.047300.5514th percentile
v2.5.146900.5817th percentile
v2.5.2611700.5111th percentile
v2.5.452102.3857th percentile
v2.5.5228802.5060th percentile
v2.5.61423900.5920th percentile
v2.6.0826700.309th percentile
v2.6.1544400.110th percentile
v2.6.22917017.0689th percentile
v2.6.34938101.2937th percentile
v2.6.42276000.296th percentile
v2.6.51614601.1034th percentile
v2.6.71564900.233rd percentile
v2.6.8127201.6749th percentile
v2.6.95326102.0351st percentile
v3.0.06490900.7026th percentile
v3.0.1610200.5923rd percentile
v3.0.2109011.1183rd percentile
v3.0.3225504.0071st percentile
v3.1.017057102.9863rd percentile
v3.1.16866010.3077th percentile
v3.1.2555709.6574th percentile
v3.1.38761014.2686th percentile
v3.2.02430400.7929th percentile
v3.2.196301.4343rd percentile
v3.2.2205803.4566th percentile
v3.2.3166157010.5780th percentile
v3.2.42921301.3640th percentile
v3.2.5125302.2654th percentile
v3.2.6112803.9369th percentile
v3.2.712860021.3394th percentile
v3.3.07638020.0091st percentile
v3.4.066001.0031st percentile
v3.4.110290113.3397th percentile
v3.4.245090.8031st percentile
v3.4.32334286.7674th percentile

5 · What the Data Is Consistent And Inconsistent With

"The Claude releases are statistically indistinguishable from historical releases"
Both releases fall inside the middle 50% of the historical distribution. The permutation test shows 46% of random pairs score equal or worse. There is no signal of abnormality.
"The outrage selected on a single tail event and narrativized it"
A Mastodon user noticed a regression in v3.4.3, saw Claude commits, and concluded causation. But v3.4.3 at 6.76 bugs/10c is at the 74th percentile — elevated but not extreme. 9 historical releases scored higher. The correlation is noise.
"Claude may have reduced the bug rate"
The Claude mean (3.78) is half the historical mean (7.59). But with only 2 releases, this difference is not statistically distinguishable from chance. The data cannot tell us the magnitude. It can tell us the direction: not harmful.
"Claude clearly made things worse"
Both Claude releases fall inside the middle 50% of historical releases. There is no distributional evidence of harm. The claim rests entirely on a post-hoc correlation observed by a social media user.
"The regressions speak for themselves"
v3.4.1 — a pre-Claude release — has the highest bug rate in the dataset (113.33 bugs/10c). Nobody noticed, because there was no AI to be angry at. The regressions only "speak" when you ignore the historical distribution.
"Just wait, more bugs will surface"
v3.4.3 has been out long enough that its rate (6.76) is already comparable to historical releases. The "wait and see" argument is an appeal to an unknowable future that shifts the burden of proof away from the critics. If more bugs surface, they will enter the distribution like every other release. There is no reason to expect a regime break.

…for the people saying things like "I'm a PhD from xyz uni and I'm telling you LLMs are just stochastic tools that make everything up and the world will fall apart if you use them", I'm here to tell you that you are out of date. The world of software engineering has changed dramatically in the last few months. The world of IT security and maintaining software in the face of the flood of reports has completely and utterly changed just in the last few weeks. Anything you learned about this stuff last year might as well be from another planet… Bottom line is I do know (well, roughly!) how LLMs work, but that doesn't make them not useful. It does mean you have to be cautious, but I am being cautious, or as cautious as I can be given my desire to be sailing and not dealing with a flood of gunk from so-called internet experts.

Andrew Tridgell