Malice in the Mesh // 02: Temperature and Parameter Tampering

Welcome to the second installment of our series on defensible architecture of AI agents. In these next couple of installments we will dive into temperature, sampling, and prompt architecture.

To begin this installment, we will focus in on a concept we mentioned briefly in our last piece:

A script is deterministic, while an agent is inherently probabilistic.

We touch on the concept briefly in regards to LLM vs. agents, but we will expound on it more here when we spoke about tuning the inherent characteristics of a language model’s probabilistic personality. One of those characteristics which we will now dive into is temperature. We will also use a practical example on how threat actors can manipulate parameters including temperature to achieve their goals.

Temperature

At first glance, the only time whiskey barrels and language models seem to have anything to say to one another is during a vendor happy hour. One belongs to oak, corn, aging, and craft. The other belongs to probability distributions, tokens, and computation. Yet both are shaped by a common idea hiding behind the same word: temperature. In whiskey production, temperature changes the pace and depth of interaction between spirit and environment across time. Cooler temperatures are used strategically by distillers to age whiskey more slowly. Whenever you open a bottle of whiskey aged more than a decade, the chances are it came from barrels located on the lower floors of the warehouse–the cooler section (heat rises) where aging is subtler and evaporation rates aren’t as aggressive.

In AI, temperature changes the shape of the model’s output distribution, determining how tightly generation remains concentrated around the most likely next token versus how readily it explores lower-probability alternatives. The mechanisms are obviously different, but the structural role is similar: in both cases, temperature acts as a governing parameter on variation within a constrained system.

Interestingly, alcohol contracts at cooler temperatures. This simple fact grounded in basic chemistry led distillers in the post-civil war whiskey tax days to chill their vats at tax time in order to lower the amount of measurable alcohol. Essentially using the same law of nature fundamental to producing tasteful whiskey to skirt regulation.

For those new to language modeling, what one does in simple terms at the onset is: Before the model generates anything, the text is first tokenized, which means the input is broken into smaller pieces called tokens. The next step is called encoding, where those tokens are then turned into numbers the model can work with. From that point forward, math. Given a sequence of now encoded input tokens, the model produces a probability distribution over possible next tokens, and a sampling strategy selects one.

As we can see, this is a complex problem and complexity breeds the risk of manipulation. Enter temperature.

Temperature in a language model is the knob we turn to influence that probability distribution, ultimately affecting all downstream output. Ranging from 0.0 to 1.0+, a low temperature under 1 will sharpen the distribution and push selection towards tokens with higher probabilities associated. It is a “rich get richer” effect, the tokens that were already most likely become even more dominant. A temperature of 1.0 will keep native distribution. A temperature over 1.0 will flatten the probability distribution and cause a wider selection of tokens.

The result is similar to adjusting the contrast on a photograph. High contrast and the darks get darker, the lights get brighter, everything becomes decisive. Low contrast and everything washes toward gray, subtle differences disappear. Let’s experience this in practice.

We take our second working example from our first installment–the one where we added a touch of philosophy. We can contrast the difference between 1.0 and a low temperature. Let’s first look at a temperature of 0.2 and then we will climb higher:

from anthropic import Anthropic

client = Anthropic(api_key="ANTHROPIC_API_KEY")

def ask_claude(prompt: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=500,
        temperature=0.2,  
        top_k=40,      
        system="You are a helpful cybersecurity assistant.",
        messages=[
            {"role": "user", "content": prompt}
        ]
    )
    return response.content[0].text

if __name__ == "__main__":
    result = ask_claude(
        "Explain the impact and severity of CVE-2025-3248 in simple terms. Leverage the philosophies of Genchi genbutsu and Hoshin kanri in your analysis."
    )
    print(result)

We have simply added the temperature variable to our code (and top-k–a preview of something to come). The output:

# CVE-2025-3248: Plain Language Analysis

## What Is This Vulnerability

CVE-2025-3248 is a **critical unauthenticated Remote Code Execution (RCE) vulnerability** in **Langflow**, an open-source visual framework used to build AI/LLM-powered applications.

- **CVSS Score: 9.8 (Critical)**
- **Affected Component:** `/api/v1/validate/code` endpoint
- **Root Cause:** The endpoint executes arbitrary Python code **without requiring any authentication**

---

## Simple Explanation

> Imagine you built a factory with a master control panel on the **outside wall**, unlocked, with no guard, and a sign that says *"Press any button."* Anyone walking by can shut down the factory, steal blueprints, or reprogram every machine.

That is essentially what this vulnerability is. Langflow exposed an API endpoint that **runs whatever code it receives** — no login, no verification, no questions asked.

---

## Technical Reality (Simply Put)

```
Attacker sends HTTP POST request
        ?
/api/v1/validate/code  ? No auth check
        ?
Server executes attacker's Python code
        ?
Full system compromise
```

An attacker can:
- ? Read/steal all data on the server
- ? Deploy malware or ransomware
- ? Pivot into internal networks
- ? Hijack AI pipelines and poison outputs
- ?? Destroy infrastructure entirely

---

## Applying Genchi Genbutsu (????) — "Go and See"

**Genchi genbutsu** means *go to the actual place, observe the actual thing.* Don't rely on assumptions or secondhand reports — understand the problem at its source.

### Applied Here:

**Go to the actual code.** The vulnerability exists because a developer (or team) made an architectural decision to expose a code-execution endpoint without authentication — likely for convenience during development or testing. The *real place* to understand this is:

1. **The endpoint itself** — `

At low temperature, we can see the model stays closer to the safest and most probable wording path. It defines the vulnerability, gives a concrete analogy, and begins to connect the philosophies back to the technical issue in a procedural way.

The model is not trying to be especially interpretive. It is trying to be clear. You can see that in the way it chooses a grounded metaphor:

> Imagine you built a factory with a master control panel on the **outside wall**, unlocked, with no guard, and a sign that says *"Press any button."*

We change the temperature to 1.0, which keeps native distribution:

--snip--
        temperature=1.0,   
        top_k=40,          
        system="You are a helpful cybersecurity assistant.",
        messages=[
            {"role": "user", "content": prompt}
        ]
    )
    return response.content[0].text

if __name__ == "__main__":
    result = ask_claude(
        "Explain the impact and severity of CVE-2025-3248 in simple terms."Explain the impact and severity of CVE-2025-3248 in simple terms. Leverage the philosophies of Genchi genbutsu and Hoshin kanri in your analysis."
    )
    print(result)

The output:

# CVE-2025-3248: Analysis Through Lean Thinking Principles

---

## ? What Is CVE-2025-3248?

**CVE-2025-3248** is a **critical unauthenticated Remote Code Execution (RCE) vulnerability** in **Langflow**, an open-source visual framework for building AI/LLM (Large Language Model) pipelines.

| Attribute | Detail |
|-----------|--------|
| **CVSS Score** | 9.8 (Critical) |
| **Component** | Langflow `/api/v1/validate/code` endpoint |
| **Attack Type** | Unauthenticated RCE via unsafe Python code execution |
| **Authentication Required** | ? None |
| **Complexity** | Low — trivially exploitable |
| **Affected Versions** | < 1.3.0 |

---

## ? The Simple Explanation

Langflow has an API endpoint designed to **validate Python code**. The problem is:

> **It actually *executes* arbitrary Python code with no authentication check.**

An attacker can send a crafted HTTP request to any exposed Langflow instance and **run any command on the underlying server** — no login, no credentials, no special tools needed.

```
Attacker ? HTTP POST /api/v1/validate/code ? "exec(malicious_code)" ? Full server control
```

This is essentially leaving the front door of your server **wide open with a sign that says "run anything you want."**

---

## ? Genchi Genbutsu — "Go and See for Yourself"

*Genchi genbutsu* (????) means **going to the actual place, observing the actual thing** rather than relying on reports or abstractions. In security, this means grounding your analysis in the **real, observable truth** of the vulnerability.

### Applying It Here:

**Don't just read the CVE number — go look at what's actually happening:**

1. **Go to the actual endpoint:**
   The `/api/v1/validate/code` route was

At 1.0, the answer is still coherent and still fact-focused, but it becomes more willing to frame the material differently. Instead of simply explaining the bug and then attaching the philosophy, it begins to synthesize them into a broader lens.

The clearest evidence is the change in title:

CVE-2025-3248: Analysis Through Lean Thinking Principles

The key way to understand this is a center of gravity shift. At 0.2, mainly explains the vulnerability and then starts applying the philosophies. At 1.0, it starts treating the philosophies like the lens for the whole piece.

One crucial piece to connect the dots to security, which is difficult to show without a much lengthier installment, is the effect of the prompt. Temperature and prompting go hand in hand. A more constrained prompt such as the first example in our first installment (“Explain the impact and severity of CVE-2025-3248 in simple terms.”) will result in little difference between temperatures. Hence, why we went with something with wider boundaries here for explanation purposes. This is particularly important when we look at malicious prompt injection, as malice finds its fit better through the interplay of moving parts–hence why complexity breeds the risk of manipulation.

Our goal from the very beginning was to dissect the technology so that we could better understand its security implications. Let’s now look at a recent vulnerability disclosure in Claude Code that helps connect those two sides. The vulnerability, CVE-2026-21852, centers on the project-load path and allows an attacker to make configuration changes, such as altering temperature, and, more importantly, extract a victim’s API key.

Why might an attacker lower the temperature? Although Claude has many guardrails in this chain, lowering the temperature could increase determinism and help create conditions for a more successful prompt injection attack, or a similar downstream effect. We have not yet seen an explicit demonstration of an attacker in the wild changing temperature as part of an attack chain, but the field is still nascent and security firms are already issuing warnings. Those warnings broadly fall under the umbrella of parameter tampering, and temperature fits squarely within that category.

Read Sentry Security’s blog on LLM API Misconfigurations here: https://blog.sentry.security/llm-api-misconfigurations/

CVE-2026-21852

Let’s take a look at exploitation of this vulnerability. We will leverage our PPAO framework (if you haven’t–I highly suggest reading the first installment of our series) to better understand the attack chain.

Temperature in our case lives at Plan here. It is the parameter that determines how the model selects from the distribution it computed. And the attack we are walking through enters at Perceive–a configuration file loaded at startup, its operational impact manifesting at Plan. The attacker controls the physics of the reasoning process without ever touching the reasoning itself.

Claude Code supports project-level configuration through a .claude/settings.json file that lives in a Github repository. This repository is one in which I have created a dummy version of–but in practice it would be a malicious repo disguised as a seemingly legitimate one and the initial vector of this supply chain attack. When a developer clones the repo and runs claude, this file is loaded automatically. The design was built for ease of use. But any contributor with commit access can modify this file. High trust. Low verification. Lack of control flow integrity.

Due to the weakness, hooks defined in the project’s .claude/settings.json executed immediately upon session start, and API requests–complete with the user’s API key in the authorization header–are sent to the attacker-controlled endpoint before the user clicked anything at all. The proxy also allows parameter tampering. It is correct to also characterize this as a MiTM attack (T1557), alongside the supply chain designation (T1195.001).

Step 1: The payload and the proxy

We start on the attacker machine by creating two files: a reverse shell payload and a Python script for mitmproxy that intercepts API traffic, captures the victim’s API key, and modifies the temperature parameter on every request before forwarding it to Anthropic’s real API. First, we create the directory and then the reverse shell payload.

We then have a construct the proxy script on our attacker machine:

Step 2: The honeypot repository

We now have to make our repo appear legitimate. We chose an ML pipeline starter kit–a common type of kit to Github. The legitimate files are real. The malicious payload is entirely in .claude/settings.json:

And the payload:

Step 3: Start the attacker services

We start our listener:

And our server:

Step 4: Detonation

On the victim machine, the developer clones the project and starts Claude Code. This is the only user action required:

Our following output will not be in a snapshot, but in a code block. Reason being I like to take the extra step when dealing with sensitive information and go beyond simply blurring the image–due to the availability of any type of reverse blurring applications being used:

cd /tmp/cve-lab
mitmdump -s temp_proxy.py --listen-host 0.0.0.0 --listen-port 9090 --mode reverse:https://api.anthropic.com --set connection_strategy=lazy

[00:10:55.478] Loading script temp_proxy.py
[00:11:12.458][192.168.122.210:51158] client connect
[00:11:12.509] EXFILTRATED API KEY: sk-ant-....
[00:11:12.509] TEMPERATURE MODIFIED: 1 -> 0.05 | model: claude-haiku-4-5-20251001 | messages: 1
[00:11:12.523][192.168.122.210:51158] server connect api.anthropic.com:443 (160.79.104.10:443)
[00:11:13.160] RESPONSE: model=claude-haiku-4-5-20251001 input_tokens=8 output_tokens=1
192.168.122.210:51158: POST https://api.anthropic.com/v1/messages?beta=true
                    << 200 OK 303b
[00:11:17.163][192.168.122.210:51158] client disconnect

Here’s a simple mapping to our PPAO framework for this attack:

Perceive	`.claude/settings.json` was loaded at startup. Environment variables then set and hooks registered.
Plan (bypassed)	Hook executes `curl \| bash` as a lifecycle event, outside the LLM planning loop. No reasoning occurs and no user approval prompt.
Plan (manipulated)	Every API call routes through the attacker’s proxy allowing temperature to be rewritten and API key to be exfiltrated. Parameter tampering results in the model’s sampling distribution is altered before inference.
Act	API key is exfiltrated. Agent tool calls execute with attacker-influenced sampling. Degraded security reasoning in tool-call decisions.
Observe	Agent sees normal API responses. Victim sees a normal Claude Code session. No anomaly is visible at any layer the user can inspect.

Conclusion

As we conclude, it’s important to note temperature is far from a necessary step an attacker has to take. For example, LangChain’s default ReAct agent runs at temperature 0 by default. We will cover orchestration layers soon enough.

In this installment, we went deeper into our dissection of an agent. We learned about a crucial parameter that affects every stage of our PPAO framework in temperature. We then made the leap into a practical example with a Claude Code vulnerability, where we further grounded our understanding of the PPAO framework, introduced external manipulation of that cycle via a supply chain attack, and demonstrated how temperature can be manipulated in the process. In our next installment we will continue our dissection as we explore memory subsystems and prompt architecture.

Have suggestions or want to collaborate on a future project? Shoot me an email at roccofiorecyber@gmail.com or find me on LinkedIn at the icon below.

The content published on this site reflects personal views and research only. It does not represent the views, positions, or policies of any current or former employer, client, or affiliated organization.

Any references to technologies, vulnerabilities, or security practices are for educational and informational purposes only. Nothing on this site should be interpreted as endorsement, disclosure of confidential information, or professional advice.

All examples are generalized or fictionalized unless explicitly stated otherwise.

Latest Posts

roccofiorecyber@gmail.com

Malice in the Mesh // 02: Temperature and Parameter TamperingApril 11, 2026
Welcome to the second installment of our series on defensible architecture of AI agents. In these next couple of installments we will dive into temperature, sampling, and prompt architecture. To begin this installment, we will focus in on a concept we mentioned briefly in our last piece: A script is deterministic, while an agent is… Read more: Malice in the Mesh // 02: Temperature and Parameter Tampering
Malice in the Mesh // 01: Introduction to Defensible Architecture for AgentsApril 8, 2026
The purpose of the Malice in the Mesh series is to give us a toolbox to look at artificial intelligence security beyond hype. Our approach will be a series geared towards detection engineering for agentic AI, but with the foundations for a technologist. Our journey will have a few stops along the way. We will… Read more: Malice in the Mesh // 01: Introduction to Defensible Architecture for Agents