The Vibecoder Problem
Generative AI made it possible to ship software without understanding it. The hiring funnel has not caught up — and another layer of AI is not the fix.
A senior engineering manager at a mid-stage fintech startup recently described the experience to a colleague this way: the candidate had a polished portfolio, a working full-stack application live on a public domain, and a confident answer for every behavioral question. Asked to explain why his application was returning 502 errors under load, he could not. Asked what was on the other side of the API call he had written, he was not certain. Asked to draw the request lifecycle on a whiteboard, he produced a diagram that looked, in the manager's words, “like an architecture review written by someone who had only read the table of contents.”
The candidate was not lying. He had built the product. He had simply built it without understanding it.
This is the new shape of the technical hiring problem. Two years after the consumerization of large language models, a generation of developers has entered the job market having shipped real software without ever having to reason about it from first principles. The industry has a name for the practice now — vibecoding — and the term has migrated from a self-deprecating joke on developer forums into a serious concern in engineering leadership circles. According to GitHub's own developer survey, 92% of US-based professional developers now use AI coding tools at work. Stack Overflow's 2024 survey put the figure for AI tool adoption among all respondents at 76%. The question facing employers is no longer whether candidates use these tools. It is what, if anything, candidates can do without them.
The answer, increasingly, is not much. A 2024 study by GitClear examined 153 million lines of code authored across the period in which AI assistants became ubiquitous and found an eightfold increase in code churn — code that is written, then revised or removed within two weeks of being committed. Atlassian's 2024 State of Developer Experience report found that two-thirds of developers do not believe AI tools have improved the quality of the software they ship. McKinsey, in a separate 2024 analysis, estimated that 70% of generative AI projects across the enterprise fail to reach the value their sponsors projected. The throughput is real. The understanding is not.
For the hiring manager, this is the heart of the difficulty. The traditional filters of the technical funnel were never designed to detect the absence of comprehension behind a working artifact. They were designed to detect the absence of the artifact altogether.
Consider what those filters actually measure. The resume measures the candidate's willingness and ability to describe past work. The brand of the school on the resume measures the candidate's standardized test performance four to ten years ago. The LeetCode score measures pattern recognition on a constrained subset of algorithmic problems that bear, by the admission of nearly every working engineer, almost no relationship to production software. None of these signals were robust before generative AI. They are, in the present moment, close to inert. A candidate who has never debugged a memory leak can pass a LeetCode screen with the help of a model. A candidate who has never set up a CI pipeline can describe one in a resume bullet that an applicant tracking system will rate highly. The filters are still there. The thing they were filtering for has slipped through.
The industry's instinct, predictably, has been to add more filters. A growing class of vendors now sells AI-conducted interviews — voice agents that screen candidates by phone, video systems that score facial expressions and verbal latency, autonomous interviewers that ask follow-up questions and grade the responses. The pitch is volume reduction at constant rigor. The actual record is more difficult to defend.
A 2024 audit by researchers at the University of Cambridge of commercial AI hiring tools concluded that many of them function as “automated pseudoscience,” identifying patterns in superficial features — accent, lighting, headshot framing — that have no demonstrated relationship to job performance. The Equal Employment Opportunity Commission has opened multiple inquiries into video interview platforms over potential disparate-impact violations. Workday is the named defendant in a federal class action alleging that its AI-driven applicant screening discriminates against candidates over forty. None of this is a fringe concern. It is the operating environment of the very tools being marketed as the answer to AI-assisted candidates.
The deeper problem is structural. AI interviewers are trained on the same kinds of artifacts — resumes, transcripts, video clips — that AI candidates are now optimizing for. The system is, in effect, asking one model to evaluate the output of another, with the candidate's career on the line and the employer's capital footing the bill. The vendors who sell these products rarely publish the audit trail. They publish testimonials, time-to-hire reductions, and demo-day metrics. They almost never publish the one number a serious buyer should demand: the post-hire performance of candidates the system advanced versus the candidates it filtered out, measured a year on. The aftermath of the decision is the part that matters. It is also the part that is consistently absent from the marketing.
“The most expensive interview is the one you did not need to run,” an engineering director at a Series C infrastructure company observed in a recent industry roundtable. “The second-most expensive is the one a machine ran for you and got wrong.”
There is a more honest place to look. It is not on the resume, not in the transcript, and not on a video call mediated by a model. It is in the work the candidate is doing right now, in the open, on a system designed to make the work visible: the public commit history.
The case for evaluating candidates by their open-source contributions and active personal projects is not a new one. What is new is how decisive the signal has become. A commit history is the one artifact in the entire hiring process that cannot be vibecoded into existence. It is dated. It is incremental. It is reviewable line by line, including the dead ends, the reverts, the messages written at 11:47 on a Tuesday because something broke in production. It is the audit trail of a practitioner's reasoning, kept in real time, with no opportunity for retroactive polish.
There is a philosophical point here that deserves to be made plainly. A candidate's ongoing work is the only signal in the hiring process that is neither self-reported nor mediated by a third party. A resume is a claim about the past. An algorithmic score is a claim about the candidate made by a vendor. A degree is a claim about the candidate made by an institution. A commit history is the candidate, working, in public, for an audience that includes no one in particular. It is the closest thing the industry has to a genuine writing sample for software engineers.
What the commit history reveals, when read carefully, is not principally what the candidate knows. It is how the candidate works. There are three signals in particular that no other artifact in the hiring funnel can produce:
- —Consistency. Does the candidate ship in a sustained way, across months and years, or only in the weeks before a job search? A contribution graph is a longitudinal record of habit, and habit is the variable that correlates most reliably with engineering performance once a candidate is hired.
- —Curiosity. Are the candidate's repositories an expanding map of the languages, frameworks, and problem domains they are willing to engage with on their own time? A developer who has spent the last six months teaching themselves Rust because the work in front of them required a memory-safe systems language is communicating something about their relationship to the craft that no resume bullet can plausibly fake.
- —Collaboration under review. Pull requests and code reviews are the closest thing in software to a peer-reviewed publication. A candidate's history of accepted contributions to projects they do not own, and the substance of the back-and-forth in those threads, is a more demanding test of engineering judgment than any algorithm-puzzle interview ever invented.
None of this is theoretical. The labor-market behavior of the most selective engineering organizations has been quietly shifting in this direction for some time. Internal recruiters at companies including Vercel, Replicate, and Anthropic have publicly described their pipelines as substantially driven by the candidate's public output rather than by inbound resumes. The signal has always been available. What has been missing is the infrastructure to make the signal central to the hiring decision rather than incidental to it.
That is the conviction on which GitHire is built, and it is the reason the platform makes the choices it does. There is no resume upload because the resume has, in this market, become the least informative document in the funnel. There is no AI interviewer because the technology has not earned the trust that it is being asked to bear, and the candidates who would suffer most from its errors are precisely the ones whose actual work would speak for them most loudly. There is no agentic screening layer because the act of inserting another model between the employer and the candidate's real output is the diagnosis, not the cure.
What there is, instead, is a structured way for an employer to evaluate what a candidate is doing now — the languages they are writing in this quarter, the problems they have decided are worth their unpaid evenings, the cadence at which they ship, the quality of the conversations they hold with strangers in the comments of their pull requests. Those signals are durable. They are difficult to fabricate. They get stronger, not weaker, the longer the candidate spends building. They do not require artificial intelligence to interpret. They require an employer willing to read.
The vibecoder problem is, at root, a measurement problem. The industry briefly believed it had built a hiring funnel that selected for engineering ability. What it had built was a funnel that selected for the production of artifacts that resembled engineering ability. Generative AI has lowered the cost of producing those artifacts to nearly zero, and the funnel has, accordingly, stopped working.
There are two ways forward. One is to stack another generation of artificial-intelligence tools on top of the broken funnel and hope, against the available evidence, that the resulting system behaves better than its parts. The other is to step outside the funnel and look directly at the only thing in the entire process that the candidate could not have generated last Tuesday with a model and a weekend: the slow, public, dated record of their actual work.
GitHire is a bet on the second path. It is not a more sophisticated filter. It is the absence of the filter, and the presence, in its place, of the practitioner's own commits. In a hiring market that has lost its ability to tell the difference between a developer and a demo, that may turn out to be the only product-market fit that matters.