PART 2: Building AI Agents: A Developer’s Guide to Frameworks, Architecture, and Cost
Welcome back to Part 2 of our AI Agent series. In Part 1, we covered the conceptual foundations of AI agents. Now, it’s time to put on our developer hats.
If you are a software engineer tasked with adding “agentic” capabilities to your application, this guide is for you. We will break down the architectural choices, compare the most popular frameworks, analyze the hidden costs of agentic loops, and look at the code required to make it all work.
1. The Anatomy of an Agent System
As a developer, you shouldn’t think of an agent as a magic black box. Think of it as a standard software system where the control flow is partially determined by a non-deterministic natural language processor (the LLM).
A production-ready agent architecture generally consists of four pillars:
- The Routing/Orchestration Layer: The main loop (often a
whileloop or a state machine) that queries the LLM, parses its desired actions, executes them, and feeds the results back. - The LLM (The “Brain”): The model doing the reasoning. It must be exceptionally good at Function Calling (tool use) and adhering to JSON schemas.
- The Tool Registry: A set of strictly typed functions (APIs, database queries, calculators) that the LLM is aware of and authorized to use.
- State/Memory Management: A system to persist the conversation history, agent scratchpad (intermediate thoughts), and long-term memory (often via a Vector DB like Pinecone or Qdrant).
2. Choosing the Right Framework
The AI ecosystem is moving incredibly fast. The framework you choose depends heavily on your use case and your tolerance for abstraction.
A. LangChain & LangGraph
- Best for: Complex, stateful, and cyclic workflows.
- Pros: Massive ecosystem, integrations for nearly every tool and database in existence. LangGraph solves the “infinite loop” problem of early agents by modeling the agent as a stateful graph.
- Cons: Extremely heavy abstraction. It can be difficult to debug when things go wrong because there is so much “magic” happening under the hood.
B. LlamaIndex
- Best for: Data-heavy agents (RAG - Retrieval-Augmented Generation).
- Pros: Best-in-class tools for ingesting, chunking, and querying your own documents. If your agent’s primary job is synthesizing internal data, start here.
- Cons: Not as generalized for complex multi-tool orchestration as LangGraph.
C. CrewAI / Microsoft AutoGen
- Best for: Multi-agent systems.
- Pros: Designed around the concept of “agents with distinct roles” (e.g., a “Researcher” agent passing data to a “Writer” agent, overseen by a “Manager” agent). Great for simulating complex business processes.
- Cons: Overkill for single-agent tasks. Can quickly run up API bills due to agents talking to each other extensively.
D. The Pure SDK Approach (OpenAI / Anthropic SDKs + Vercel AI SDK)
- Best for: Production applications requiring absolute control and minimal dependencies.
- Pros: Zero abstraction. You handle the
whileloop, you handle the tool execution. It is highly predictable, easy to debug, and lightweight. - Cons: You have to write boilerplate code for managing conversation history and parsing function calls.
Recommendation: If you are building a learning project, use LangChain or CrewAI. If you are building a highly reliable production microservice, stick closer to the raw API or a lightweight wrapper like the Vercel AI SDK.
3. The Economics of Agents (Pricing it Out)
Unlike traditional software where compute is cheap, LLM inference is expensive—and agents consume a lot of tokens.
The “Agentic Loop” Tax
In a standard LLM chat, you send Prompt -> Output.
In an agent loop, the model might need 5 steps to solve a problem. Because LLMs are stateless, you must resend the entire conversation history, plus the tool outputs, on every single step.
If Step 1 consumes 1,000 input tokens, Step 5 might consume 5,000 input tokens just to pass the accumulated context.
Cost Breakdown (Estimates as of early 2026)
- Premium Models (GPT-4o, Claude 3.5 Sonnet):
- Best for: Complex reasoning, reliable JSON formatting, minimal hallucinations.
- Cost: ~$5.00 per 1M Input Tokens / ~$15.00 per 1M Output Tokens.
- Agent Run Estimate: A 5-step agent task can easily cost $0.05 - $0.15 per execution.
- Fast/Cheaper Models (GPT-4o-mini, Claude 3 Haiku):
- Best for: Simple routing, basic data extraction, fast responses.
- Cost: ~$0.15 per 1M Input Tokens / ~$0.60 per 1M Output Tokens.
- Open Source (Llama 3.3 via Groq or Together AI):
- Best for: High-volume, specific tasks where you can tolerate slight reliability drops or are willing to fine-tune.
- Cost: Often under $0.10 per 1M tokens. Extremely fast time-to-first-token (TTFT).
Cost Optimization Strategies:
- Semantic Caching: Do not send requests to the LLM if the exact same agent task was run recently.
- Small-to-Big Routing: Use a cheap model (like Haiku) to classify the user’s intent, and only trigger the expensive agent (Sonnet/GPT-4o) if complex tool use is required.
- Strict Context Windows: Summarize or truncate the agent’s “scratchpad” (intermediate thoughts) before the context window balloons and increases your input token costs.
4. Code Example: A Minimal Agent Loop (Node.js)
To demystify the magic, here is what a barebones agent looks like using the official OpenAI SDK. No frameworks, just the core pattern: Function Calling inside a loop.
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
// 1. Define your tool (Function)
const tools = [{
type: "function",
function: {
name: "get_weather",
description: "Get the current weather for a location",
parameters: {
type: "object",
properties: {
location: { type: "string", description: "City and state, e.g. San Francisco, CA" },
},
required: ["location"],
},
},
}];
// Mock implementation of the tool
function getWeather(location) {
console.log(`[Tool Execution] Fetching weather for ${location}...`);
return JSON.stringify({ location, temperature: "72F", conditions: "Sunny" });
}
// 2. The Agent Loop
async function runAgent(userPrompt) {
const messages = [
{ role: "system", content: "You are a helpful assistant with access to tools. Solve the user's request." },
{ role: "user", content: userPrompt }
];
let isTaskComplete = false;
let loopCount = 0;
const MAX_LOOPS = 5; // Prevent infinite loops!
while (!isTaskComplete && loopCount < MAX_LOOPS) {
loopCount++;
console.log(`--- Loop Iteration ${loopCount} ---`);
// Call the LLM
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: messages,
tools: tools,
tool_choice: "auto",
});
const responseMessage = response.choices[0].message;
messages.push(responseMessage); // Save assistant's response to history
// Check if the LLM wants to call a tool
if (responseMessage.tool_calls) {
for (const toolCall of responseMessage.tool_calls) {
if (toolCall.function.name === "get_weather") {
const args = JSON.parse(toolCall.function.arguments);
const toolResult = getWeather(args.location);
// Append the tool result back to the conversation
messages.push({
role: "tool",
tool_call_id: toolCall.id,
name: toolCall.function.name,
content: toolResult,
});
}
}
} else {
// If no tool calls, the agent has its final answer
console.log("\n[Final Answer]:", responseMessage.content);
isTaskComplete = true;
}
}
}
// Run the agent
runAgent("What is the weather like in Miami right now? Should I pack a jacket?");
Why this approach works:
This loop handles the core of all agentic behavior: Thought -> Action -> Observation -> Final Output. By building this once from scratch, you will deeply understand what LangChain or AutoGen are doing behind the scenes.
Conclusion
Building AI agents requires a shift from traditional declarative programming to orchestrating probabilistic systems. Start small: build a simple loop like the one above. Once you feel the pain of managing complex state and 20+ tools, then graduate to frameworks like LangGraph. And always keep an eye on your token counts!