Can You Tell Which AI Wrote That Code? A New Method Tries to Fingerprint LLM-Generated Programs
Researchers propose a method to identify not just whether code was written by an AI, but which specific AI model produced it—by separating what the code does from how it does it.
Researchers propose a method to identify not just whether code was written by an AI, but which specific AI model produced it—by separating what the code does from how it does it.
The Problem
Most existing research on AI-generated code focuses on a binary question: did a human or a machine write this? But in practice, that's often not enough. When a vulnerability surfaces, a licensing conflict arises, or an incident investigation begins, the relevant question is more specific: *which* AI model produced this code? [§1] This is the problem of LLM Code Source Attribution (LLMCSA)—figuring out whether a snippet came from ChatGPT, Claude, DeepSeek, Qwen, or another model.
The challenge is that code generated by different LLMs for the same task tends to look very similar on the surface. If you ask four models to implement a binary search in Python, they'll all produce syntactically valid code that follows the same algorithmic logic. The differences are subtle: variable naming conventions, comment styles, structural organization, whitespace preferences. These are what the authors call "generative fingerprints"—model-dependent stylistic and structural variations introduced by differences in training data, architectures, alignment strategies, and decoding mechanisms [§1].
Prior passive attribution methods (ones that work without modifying the generation process) have relied on handcrafted stylometric features or language-specific indicators, which limits their scalability across programming languages [§2]. Active methods like watermarking require access to the generation pipeline, making them impractical for forensic scenarios where you're analyzing code after the fact [§2].
What They Did
A team from Sichuan University proposes the Disentangled Code Attribution Network (DCAN), which frames the attribution problem as a representation disentanglement task [§1]. The core idea: any code snippet contains two types of information tangled together. There's **source-agnostic information**—the functional semantics dictated by the programming task ("implement a sorting algorithm"). And there's **source-specific information**—the stylistic fingerprints unique to the model that generated it (how it names variables, how deeply it nests structures, how verbose it is) [§3.1].
Conventional detection methods learn representations dominated by task-dependent semantics, since functional correctness and solution structure are the most prominent patterns in code. As a result, subtle model-specific fingerprints get overshadowed [§3.1]. DCAN's approach is to explicitly pull these two types of information apart.
Concretely, the system assumes that a code snippet's latent representation can be decomposed additively: h = z_c + z_s, where z_c captures the task logic and z_s captures the model's stylistic signature [§3.1]. The framework uses a base encoder to produce an initial representation, then a disentanglement module to separate these components. A "representation consistency loss" aligns the source-agnostic representations of code generated by different models for the same task—the intuition being that if two models solve the same problem, their task-related representations should be similar, and whatever's left over is the model-specific signal [§3.1, Figure 2]. A contrastive learning objective helps isolate these discriminative signals, and a linear classifier then uses the source-specific component to predict which LLM produced the code [Figure 2].
To evaluate this, the team built what they describe as the first large-scale benchmark dataset for LLMCSA: 91,804 code samples generated by four LLMs (DeepSeek, Claude, Qwen, and ChatGPT) across four programming languages (Python, Java, C, and Go), collected under two settings—with and without comments [§1]. The dataset uses a controlled generation and quality-control pipeline to ensure reliability and diversity [§1].
The Results
The paper states that DCAN achieves "reliable attribution performance across diverse settings" [Abstract]. The framework is evaluated across all four languages and both coding settings (with and without comments).
The authors frame the source-specific signals they extract in terms of concrete stylistic dimensions visible in Figure 1: code verbosity (how much boilerplate or explanation a model includes), lexical density (token-level preferences), naming convention bias (camelCase vs. snake_case tendencies), structural depth (nesting patterns), and what they call "generative distinctiveness" [Figure 1]. These aren't hand-engineered features—they emerge from the disentanglement process.
It's worth noting several limitations. The benchmark covers only four LLMs and four languages. Real-world attribution scenarios might involve dozens of models, including fine-tuned variants and open-source derivatives that share architectural DNA. The paper's conditional invariance assumptions—that source-agnostic information is truly invariant across models, and source-specific information is stable across tasks—are stated as approximations (using ≈ rather than =) [§3.1, Equation 2], acknowledging these are idealizations. The approach also assumes you know the candidate set of models in advance; it's a closed-set classification problem, not an open-world detection task.
Additionally, the evaluation uses code generated under controlled conditions from competitive programming problems. Production code—with its dependencies, boilerplate, and human edits layered on top of AI suggestions—would present a much harder attribution target.
Why It Matters
For practitioners, this work matters because it addresses a gap between what current AI-code detection tools do (binary human-vs-machine classification) and what governance scenarios actually require (identifying the specific source) [§1]. If your organization needs to audit which AI tools contributed to a codebase—for licensing compliance, security triage, or policy enforcement—binary detection isn't sufficient.
The disentanglement approach is conceptually clean: rather than trying to find model fingerprints in a sea of task-related noise, explicitly remove the noise first. The dataset and implementation are publicly available [§1, footnote], which means other researchers can build on this benchmark. But the gap between a four-model lab setting and the messy reality of modern software supply chains—where code passes through multiple models, gets edited by humans, and is mixed with library code—remains substantial. This is a first step toward model-level code forensics, not a finished tool.