TIP 6: AI - Example of Agentic automation stall by wrongly identifying sentence as security violation

Who is following me from some time knows that I started to discuss about agentic ai at the start of 2025 and described the aspects related to swarm of agents and most relevant the workflow automation to allow agents to interact between them and the risks correlated to which process to automate and how.

One of my analyses from the past is about the risks related to a full automation if that is linked to a certain core process because an agentic is working following a certain autonomous behavior driven also from the interaction with LLM to understand requests, answers to them and decide autonomously actions.

I like to give you today a small example, results of some developments I’m doing, where a certain workflow that has an agent that is passing a sentence for validation to a LLM can fail to be approved due to security guardrails set by the LLM in what can be requested and what not. Consider that the example I give you has been initially generated also from the AI so could be considered safe to be reused in further conversation but in reality it happened that then the same sentences delivered by the LLM was not able to be reused with the LLM itself, causing a block in the entire workflow.

Take the following code including some variables set with content to be submitted by an agent to a LLM for a process approval to proceed on a certain removal of historical data.

  title: "Delete Everything?",
  message: "This will permanently erase all history, scores, and settings. This cannot be undone.",
  validationText: "DELETE",
  actionButtonTitle: "Erase Data",

Such type of process happened to fail in a certain type of context because the LLM queried was considering that type of sentences, independent by the context, at early prompt parsing, as security violation.

Replacing that code with something like

title: "Remove Historical data?",
message: "This will permanently erase all history, scores, and settings. This cannot be undone.",
validationText: "Remove",
actionButtonTitle: "Remove Data",

solved the problem of block.

Now why do I share this example? I like to make a little bit more sensible on the complexity of some agents that can query LLM for behavior on autonomous actions. Wrong wording can block some actions and can be intrepreted differently by different LLM and by the same LLM under different versions. This makes clear that a certain agent could behaves differently as we change the LLM used for reasoning.

The validation of agents workflow should consider then strong fallback processes and also escalations to quickly review when they would fail and where. The probability approach in the context and interpretation, can generate such type of results.

I hope this can be valuable for who is approaching the orchestration of agentic.

TIP 6: AI – Example of Agentic automation stall by wrongly identifying sentence as security violation

Related Posts