Model-Based Reflex Agents

Model-based reflex agents extend simple reflex behavior with an internal state model.

They keep short memory about the environment (what happened before), then still use reflex-style rules to act.

How it works (architecture)

Observe current input.
Update internal state (recent context, status, last actions).
Match rules using both current input and state.
Execute action.

This is common in support bots, workflow bots, and stateful assistants where "current message only" is not enough.

Best use cases

Environments where not everything is visible at once
Systems that need short memory
Stateful workflows
Session-aware assistants
Incident handling with state transitions

Trade-offs

Better than pure reflex systems
Still weaker than goal/utility planners for complex tasks
State design can become fragile if not versioned

Real-world company and service examples

Examples where stateful reflex behavior is common:

OpenAI - Responses API tool-calling flows that keep conversation state and tool outputs in context.
Approximate API range: from about $0.20 input / $1.25 output up to $2.50 input / $15 output per 1M tokens depending on model class.
Anthropic Claude API - stateful chat and tool sessions where decisions depend on prior turns.
Approximate API range: around $1 input / $5 output to $3 input / $15 output per 1M tokens for many common model tiers.
Customer support platforms (Salesforce, Zendesk AI add-ons, Intercom AI) - ticket status and prior steps influence next action.
Approximate pricing: usually subscription or seat-based; often tens to hundreds USD per agent/seat monthly plus usage.

Design tips

Keep state minimal and explicit (stage, last_action, confidence).
Add timeout/expiry for stale state.
Store critical state server-side, not only in prompts.
Add fallback: if state is inconsistent, route to human or reset safely.

When to choose this type

Choose model-based reflex before goal-based if: - The process is mostly deterministic. - You need short memory but not full planning. - You want low latency and low operational complexity.