Malice in the Mesh // 02: Temperature and Parameter Tampering

Welcome to the second installment of our series on defensible architecture of AI agents. In these next couple of installments we will dive into temperature, sampling, and prompt architecture.

To begin this installment, we will focus in on a concept we mentioned briefly in our last piece:

A script is deterministic, while an agent is inherently probabilistic.

We touch on the concept briefly in regards to LLM vs. agents, but we will expound on it more here when we spoke about tuning the inherent characteristics of a language model’s probabilistic personality. One of those characteristics which we will now dive into is temperature. We will also use a practical example on how threat actors can manipulate parameters including temperature to achieve their goals. 

Temperature

At first glance, the only time whiskey barrels and language models seem to have anything to say to one another is during a vendor happy hour. One belongs to oak, corn, aging, and craft. The other belongs to probability distributions, tokens, and computation. Yet both are shaped by a common idea hiding behind the same word: temperature. In whiskey production, temperature changes the pace and depth of interaction between spirit and environment across time. Cooler temperatures are used strategically by distillers to age whiskey more slowly. Whenever you open a bottle of whiskey aged more than a decade, the chances are it came from barrels located on the lower floors of the warehouse–the cooler section (heat rises) where aging is subtler and evaporation rates aren’t as aggressive. 

In AI, temperature changes the shape of the model’s output distribution, determining how tightly generation remains concentrated around the most likely next token versus how readily it explores lower-probability alternatives. The mechanisms are obviously different, but the structural role is similar: in both cases, temperature acts as a governing parameter on variation within a constrained system.

Interestingly, alcohol contracts at cooler temperatures. This simple fact grounded in basic chemistry led distillers in the post-civil war whiskey tax days to chill their vats at tax time in order to lower the amount of measurable alcohol. Essentially using the same law of nature fundamental to producing tasteful whiskey to skirt regulation.

For those new to language modeling, what one does in simple terms at the onset is: Before the model generates anything, the text is first tokenized, which means the input is broken into smaller pieces called tokens. The next step is called encoding, where those tokens are then turned into numbers the model can work with. From that point forward, math. Given a sequence of now encoded input tokens, the model produces a probability distribution over possible next tokens, and a sampling strategy selects one. 

As we can see, this is a complex problem and complexity breeds the risk of manipulation. Enter temperature.

Temperature in a language model is the knob we turn to influence that probability distribution, ultimately affecting all downstream output. Ranging from 0.0 to 1.0+, a low temperature under 1 will sharpen the distribution and push selection towards tokens with higher probabilities associated. It is a “rich get richer” effect, the tokens that were already most likely become even more dominant. A temperature of 1.0 will keep native distribution. A temperature over 1.0 will flatten the probability distribution and cause a wider selection of tokens. 

The result is similar to adjusting the contrast on a photograph. High contrast and the darks get darker, the lights get brighter, everything becomes decisive. Low contrast and everything washes toward gray, subtle differences disappear. Let’s experience this in practice.

We take our second working example from our first installment–the one where we added a touch of philosophy. We can contrast the difference between 1.0 and a low temperature. Let’s first look at a temperature of 0.2 and then we will climb higher:

We have simply added the temperature variable to our code (and top-k–a preview of something to come). The output:

At low temperature, we can see the model stays closer to the safest and most probable wording path. It defines the vulnerability, gives a concrete analogy, and begins to connect the philosophies back to the technical issue in a procedural way.

The model is not trying to be especially interpretive. It is trying to be clear. You can see that in the way it chooses a grounded metaphor:

We change the temperature to 1.0, which keeps native distribution:

The output:

At 1.0, the answer is still coherent and still fact-focused, but it becomes more willing to frame the material differently. Instead of simply explaining the bug and then attaching the philosophy, it begins to synthesize them into a broader lens.

The clearest evidence is the change in title:

The key way to understand this is a center of gravity shift. At 0.2, mainly explains the vulnerability and then starts applying the philosophies. At 1.0, it starts treating the philosophies like the lens for the whole piece.

One crucial piece to connect the dots to security, which is difficult to show without a much lengthier installment, is the effect of the prompt. Temperature and prompting go hand in hand. A more constrained prompt such as the first example in our first installment (“Explain the impact and severity of CVE-2025-3248 in simple terms.”) will result in little difference between temperatures. Hence, why we went with something with wider boundaries here for explanation purposes. This is particularly important when we look at malicious prompt injection, as malice finds its fit better through the interplay of moving parts–hence why complexity breeds the risk of manipulation.

Our goal from the very beginning was to dissect the technology so that we could better understand its security implications. Let’s now look at a recent vulnerability disclosure in Claude Code that helps connect those two sides. The vulnerability, CVE-2026-21852, centers on the project-load path and allows an attacker to make configuration changes, such as altering temperature, and, more importantly, extract a victim’s API key.

Why might an attacker lower the temperature? Although Claude has many guardrails in this chain, lowering the temperature could increase determinism and help create conditions for a more successful prompt injection attack, or a similar downstream effect. We have not yet seen an explicit demonstration of an attacker in the wild changing temperature as part of an attack chain, but the field is still nascent and security firms are already issuing warnings. Those warnings broadly fall under the umbrella of parameter tampering, and temperature fits squarely within that category.

CVE-2026-21852

Let’s take a look at exploitation of this vulnerability. We will leverage our PPAO framework (if you haven’t–I highly suggest reading the first installment of our series) to better understand the attack chain. 

Temperature in our case lives at Plan here. It is the parameter that determines how the model selects from the distribution it computed. And the attack we are walking through enters at Perceive–a configuration file loaded at startup, its operational impact manifesting at Plan. The attacker controls the physics of the reasoning process without ever touching the reasoning itself.

Claude Code supports project-level configuration through a .claude/settings.json file that lives in a Github repository. This repository is one in which I have created a dummy version of–but in practice it would be a malicious repo disguised as a seemingly legitimate one and the initial vector of this supply chain attack. When a developer clones the repo and runs claude, this file is loaded automatically. The design was built for ease of use. But any contributor with commit access can modify this file. High trust. Low verification. Lack of control flow integrity. 

Due to the weakness, hooks defined in the project’s .claude/settings.json executed immediately upon session start, and API requests–complete with the user’s API key in the authorization header–are sent to the attacker-controlled endpoint before the user clicked anything at all. The proxy also allows parameter tampering. It is correct to also characterize this as a MiTM attack (T1557), alongside the supply chain designation (T1195.001).

Step 1: The payload and the proxy

We start on the attacker machine by creating two files: a reverse shell payload and a Python script for mitmproxy that intercepts API traffic, captures the victim’s API key, and modifies the temperature parameter on every request before forwarding it to Anthropic’s real API. First, we create the directory and then the reverse shell payload.

We then have a construct the proxy script on our attacker machine:

Step 2: The honeypot repository

We now have to make our repo appear legitimate. We chose an ML pipeline starter kit–a common type of kit to Github. The legitimate files are real. The malicious payload is entirely in .claude/settings.json:

And the payload:

Step 3: Start the attacker services

We start our listener:

And our server:

Step 4: Detonation

On the victim machine, the developer clones the project and starts Claude Code. This is the only user action required:

Our following output will not be in a snapshot, but in a code block. Reason being I like to take the extra step when dealing with sensitive information and go beyond simply blurring the image–due to the availability of any type of reverse blurring applications being used:

Here’s a simple mapping to our PPAO framework for this attack:

Conclusion

As we conclude, it’s important to note temperature is far from a necessary step an attacker has to take. For example, LangChain’s default ReAct agent runs at temperature 0 by default. We will cover orchestration layers soon enough. 

In this installment, we went deeper into our dissection of an agent. We learned about a crucial parameter that affects every stage of our PPAO framework in temperature. We then made the leap into a practical example with a Claude Code vulnerability, where we further grounded our understanding of the PPAO framework, introduced external manipulation of that cycle via a supply chain attack, and demonstrated how temperature can be manipulated in the process. In our next installment we will continue our dissection as we explore memory subsystems and prompt architecture.

Have suggestions or want to collaborate on a future project? Shoot me an email at roccofiorecyber@gmail.com or find me on LinkedIn at the icon below.

The content published on this site reflects personal views and research only. It does not represent the views, positions, or policies of any current or former employer, client, or affiliated organization.

Any references to technologies, vulnerabilities, or security practices are for educational and informational purposes only. Nothing on this site should be interpreted as endorsement, disclosure of confidential information, or professional advice.

All examples are generalized or fictionalized unless explicitly stated otherwise.

Latest Posts

roccofiorecyber@gmail.com