Introduction to AI Agents
Understand the fundamentals of AI agents and how they're transforming the way we interact with technology
Understanding AI Agents
AI agents represent the next evolution in artificial intelligence, moving beyond passive models that simply respond to prompts toward autonomous systems that can perceive, reason, plan, and act to achieve specific goals.
Key Insight
AI agents combine the reasoning capabilities of large language models with the ability to interact with the external world through tools, creating systems that can autonomously solve complex problems and perform tasks on behalf of users.
What Makes an AI Agent?
An AI agent is characterised by several key capabilities that distinguish it from traditional AI systems:
- Autonomy: Ability to operate independently with minimal human supervision
- Goal-Orientation: Working toward specific objectives rather than just responding to prompts
- Tool Use: Leveraging external tools and APIs to interact with the world
- Memory: Maintaining context and learning from past interactions
- Planning: Breaking down complex tasks into manageable steps
- Adaptability: Adjusting strategies based on feedback and changing conditions
The Evolution of AI Systems
| System Type | Characteristics | Examples |
|---|---|---|
| Traditional AI | Rule-based, narrow focus, explicit programming | Expert systems, chess engines, traditional chatbots |
| Machine Learning | Data-driven, pattern recognition, statistical models | Recommendation systems, image classifiers, predictive analytics |
| Large Language Models | Generative, broad knowledge, natural language understanding | ChatGPT, Claude, Llama, text generation systems |
| AI Agents | Autonomous, goal-oriented, tool-using, adaptive | Personal assistants, autonomous researchers, workflow automators |
The Agent Cognition Loop
At the core of every AI agent is the agent cognition loop—a continuous cycle of perception, reasoning, planning, and action that allows the agent to interact with its environment and work toward its goals.
The Basic Agent Loop:
- Observe: Gather information from the environment or user input
- Orient: Interpret the information and update internal state
- Decide: Determine the best course of action based on goals and current state
- Act: Execute the chosen action using available tools
- Learn: Update knowledge and strategies based on outcomes
Agent Loop Implementation
class AIAgent:
def __init__(self, llm, tools, memory=None):
self.llm = llm
self.tools = {tool.name: tool for tool in tools}
self.memory = memory or []
self.state = {}
def run(self, user_input):
"""Execute the agent cognition loop."""
# 1. Observe - gather input
observation = self._observe(user_input)
# 2. Orient - interpret and update state
self._orient(observation)
# Continue the loop until goal is reached or max iterations
max_iterations = 10
for _ in range(max_iterations):
# 3. Decide - determine next action
action = self._decide()
# 4. Act - execute the action
result = self._act(action)
# 5. Learn - update knowledge based on result
self._learn(action, result)
# Check if we've reached the goal
if self._goal_achieved():
break
# Update state with new observation
self._orient(result)
return self._generate_response()
def _observe(self, user_input):
"""Gather information from user input."""
return {"type": "user_input", "content": user_input}
def _orient(self, observation):
"""Interpret the observation and update internal state."""
# Add observation to memory
self.memory.append(observation)
# Update state based on observation
if observation["type"] == "user_input":
self.state["current_goal"] = self._extract_goal(observation["content"])
elif observation["type"] == "tool_result":
self.state["last_tool_result"] = observation["content"]
def _decide(self):
"""Determine the next action based on current state."""
# Construct prompt with memory and state
prompt = self._construct_decision_prompt()
# Get decision from LLM
response = self.llm.predict(prompt)
# Parse the response to extract tool name and arguments
tool_name, tool_args = self._parse_tool_call(response)
return {"tool": tool_name, "args": tool_args}
def _act(self, action):
"""Execute the chosen action using available tools."""
tool_name = action["tool"]
tool_args = action["args"]
if tool_name in self.tools:
tool = self.tools[tool_name]
try:
result = tool(**tool_args)
return {"type": "tool_result", "tool": tool_name, "content": result}
except Exception as e:
return {"type": "error", "tool": tool_name, "content": str(e)}
else:
return {"type": "error", "content": f"Tool {tool_name} not found"}
def _learn(self, action, result):
"""Update knowledge and strategies based on action results."""
# Add action and result to memory
self.memory.append({"type": "action", "content": action})
self.memory.append(result)
# Update state based on result
if result["type"] == "error":
self.state["last_error"] = result["content"]
# Could implement more sophisticated learning here
def _goal_achieved(self):
"""Check if the current goal has been achieved."""
# Construct prompt to check goal completion
prompt = self._construct_goal_check_prompt()
# Get assessment from LLM
response = self.llm.predict(prompt)
# Parse response to determine if goal is achieved
return "GOAL_ACHIEVED" in response
def _generate_response(self):
"""Generate a final response to the user."""
# Construct prompt for response generation
prompt = self._construct_response_prompt()
# Get response from LLM
return self.llm.predict(prompt)
# Helper methods
def _extract_goal(self, user_input):
"""Extract the user's goal from their input."""
prompt = f"Extract the user's goal from their input: {user_input}"
return self.llm.predict(prompt)
def _construct_decision_prompt(self):
"""Construct a prompt for the decision phase."""
# Implementation details omitted for brevity
pass
def _parse_tool_call(self, llm_response):
"""Parse the LLM's response to extract tool name and arguments."""
# Implementation details omitted for brevity
pass
def _construct_goal_check_prompt(self):
"""Construct a prompt to check if the goal has been achieved."""
# Implementation details omitted for brevity
pass
def _construct_response_prompt(self):
"""Construct a prompt for generating the final response."""
# Implementation details omitted for brevity
pass
Types of AI Agents
AI agents come in various forms, each designed for specific types of tasks and interaction patterns.
By Autonomy Level
Autonomy Spectrum:
| Agent Type | Autonomy Level | Human Involvement | Best For |
|---|---|---|---|
| Assistive Agents | Low | Frequent guidance and confirmation | High-stakes decisions, creative tasks, personalised assistance |
| Collaborative Agents | Medium | Occasional input and direction | Complex problem-solving, research, content creation |
| Autonomous Agents | High | Initial setup and periodic review | Routine tasks, monitoring, data processing, scheduled actions |
| Fully Autonomous Systems | Very High | Oversight only | Continuous operations, real-time responses, system management |
By Functional Role
Common Agent Roles:
- Personal Assistants: Help individuals with daily tasks, scheduling, information retrieval
- Research Agents: Gather, analyze, and synthesize information from multiple sources
- Creative Agents: Generate content, designs, or creative works based on specifications
- Workflow Agents: Automate business processes and coordinate tasks across systems
- Customer Service Agents: Handle inquiries, troubleshoot issues, and provide support
- Data Agents: Process, analyze, and extract insights from large datasets
- DevOps Agents: Monitor systems, detect issues, and manage infrastructure
- Learning Agents: Personalize educational content and provide tutoring
By Architecture
Agent Architectures:
| Architecture | Description | Advantages | Limitations |
|---|---|---|---|
| Single-LLM Agents | One LLM handles all reasoning and decision-making | Simple, coherent reasoning, easy to implement | Limited by context window, potential for hallucination |
| Multi-LLM Agents | Different LLMs handle specialized tasks | Specialized expertise, cost optimization | Coordination overhead, potential inconsistencies |
| Hierarchical Agents | Manager agents delegate to specialized sub-agents | Complex task handling, separation of concerns | Complex implementation, communication overhead |
| Multi-Agent Systems | Multiple agents collaborate to solve problems | Parallel processing, diverse perspectives | Coordination challenges, resource intensive |
| Hybrid Symbolic-Neural Agents | Combine LLMs with symbolic reasoning systems | Reliable reasoning, verifiable outputs | Implementation complexity, integration challenges |
Core Components of AI Agents
Effective AI agents are built from several essential components that work together to enable autonomous operation.
1. Reasoning Engine
The reasoning engine is typically a large language model that provides the cognitive capabilities for the agent, including:
- Natural Language Understanding: Interpreting user requests and information
- Problem Solving: Breaking down complex tasks into steps
- Decision Making: Choosing appropriate actions based on context
- Explanation Generation: Providing rationales for decisions
Reasoning Engine Selection
Choose your reasoning engine based on the agent's requirements:
- GPT-4 or Claude 3 Opus: For complex reasoning and sophisticated agents
- GPT-3.5 or Claude 3 Sonnet: For simpler agents with good balance of cost and performance
- Llama 3 or Mistral: For locally-deployed agents with privacy requirements
2. Tool Use Framework
The tool use framework enables agents to interact with external systems, APIs, and data sources:
- Tool Definition: Specifying available tools and their parameters
- Tool Selection: Choosing the right tool for a given task
- Parameter Preparation: Formatting inputs correctly for tools
- Result Handling: Processing and interpreting tool outputs
from typing import List, Dict, Any, Callable
from pydantic import BaseModel, Field
# Define a tool using Pydantic for parameter validation
class Tool:
def __init__(self, name: str, description: str, function: Callable, schema: BaseModel = None):
self.name = name
self.description = description
self.function = function
self.schema = schema
def __call__(self, **kwargs):
"""Execute the tool with the provided arguments."""
# Validate arguments if schema is provided
if self.schema:
validated_args = self.schema(**kwargs).dict()
return self.function(**validated_args)
return self.function(**kwargs)
def get_schema(self) -> Dict[str, Any]:
"""Get the JSON schema for this tool."""
if self.schema:
schema_dict = self.schema.schema()
return {
"name": self.name,
"description": self.description,
"parameters": schema_dict
}
return {
"name": self.name,
"description": self.description,
"parameters": {"type": "object", "properties": {}}
}
# Example tool definitions
class SearchParameters(BaseModel):
query: str = Field(..., description="The search query")
num_results: int = Field(5, description="Number of results to return")
def search_function(query: str, num_results: int = 5) -> List[Dict[str, str]]:
"""Simulated search function."""
# In a real implementation, this would call a search API
return [{"title": f"Result {i} for {query}", "url": f"https://example.com/{i}"} for i in range(num_results)]
search_tool = Tool(
name="search",
description="Search the web for information. Use this when you need to find facts or current information.",
function=search_function,
schema=SearchParameters
)
class CalculatorParameters(BaseModel):
expression: str = Field(..., description="Mathematical expression to evaluate")
def calculator_function(expression: str) -> float:
"""Evaluate a mathematical expression."""
try:
return eval(expression)
except Exception as e:
return f"Error: {str(e)}"
calculator_tool = Tool(
name="calculator",
description="Evaluate mathematical expressions. Use this for calculations.",
function=calculator_function,
schema=CalculatorParameters
)
# Tool use framework
class ToolUseFramework:
def __init__(self, tools: List[Tool]):
self.tools = {tool.name: tool for tool in tools}
def get_tool_descriptions(self) -> str:
"""Get formatted descriptions of all available tools."""
descriptions = []
for name, tool in self.tools.items():
descriptions.append(f"{name}: {tool.description}")
return "\n".join(descriptions)
def get_tool_schemas(self) -> List[Dict[str, Any]]:
"""Get JSON schemas for all tools."""
return [tool.get_schema() for tool in self.tools.values()]
def execute_tool(self, tool_name: str, **kwargs) -> Any:
"""Execute a tool with the provided arguments."""
if tool_name not in self.tools:
return f"Error: Tool '{tool_name}' not found. Available tools: {', '.join(self.tools.keys())}"
tool = self.tools[tool_name]
try:
return tool(**kwargs)
except Exception as e:
return f"Error executing {tool_name}: {str(e)}"
# Example usage
tools = [search_tool, calculator_tool]
tool_framework = ToolUseFramework(tools)
# Get tool descriptions for prompts
tool_descriptions = tool_framework.get_tool_descriptions()
print(tool_descriptions)
# Execute a tool
result = tool_framework.execute_tool("calculator", expression="2 + 2 * 3")
print(result) # Output: 8
3. Memory Systems
Memory systems allow agents to maintain context, learn from past interactions, and build knowledge over time:
- Short-term Memory: Recent conversation history and current context
- Working Memory: Active information needed for the current task
- Long-term Memory: Persistent knowledge and learned patterns
- Episodic Memory: Records of past interactions and their outcomes
from typing import List, Dict, Any, Optional
import time
import json
from datetime import datetime
class MemorySystem:
def __init__(self, max_short_term_items: int = 10):
self.short_term_memory = [] # Recent interactions
self.working_memory = {} # Current task state
self.long_term_memory = [] # Persistent knowledge
self.max_short_term_items = max_short_term_items
def add_to_short_term(self, item: Dict[str, Any]) -> None:
"""Add an item to short-term memory."""
# Add timestamp if not present
if "timestamp" not in item:
item["timestamp"] = datetime.now().isoformat()
self.short_term_memory.append(item)
# Trim if exceeding max size
if len(self.short_term_memory) > self.max_short_term_items:
self.short_term_memory = self.short_term_memory[-self.max_short_term_items:]
def update_working_memory(self, key: str, value: Any) -> None:
"""Update a value in working memory."""
self.working_memory[key] = value
def clear_working_memory(self) -> None:
"""Clear working memory for a new task."""
self.working_memory = {}
def add_to_long_term(self, item: Dict[str, Any]) -> None:
"""Add an item to long-term memory."""
# Add timestamp if not present
if "timestamp" not in item:
item["timestamp"] = datetime.now().isoformat()
self.long_term_memory.append(item)
def search_long_term(self, query: str, limit: int = 5) -> List[Dict[str, Any]]:
"""Search long-term memory for relevant items.
In a real implementation, this would use embeddings and vector search.
This is a simplified version that does basic keyword matching.
"""
results = []
for item in self.long_term_memory:
# Simple keyword matching (would use embeddings in practice)
content = json.dumps(item).lower()
if query.lower() in content:
results.append(item)
if len(results) >= limit:
break
return results
def get_relevant_context(self, query: str) -> Dict[str, Any]:
"""Get all relevant context for a query."""
# Combine short-term and relevant long-term memory
long_term_results = self.search_long_term(query)
return {
"short_term": self.short_term_memory,
"working_memory": self.working_memory,
"relevant_long_term": long_term_results
}
def summarize_short_term(self) -> str:
"""Summarize short-term memory for long-term storage.
In a real implementation, this would use an LLM to generate a summary.
This is a simplified placeholder.
"""
# Placeholder for LLM-based summarization
return f"Summary of {len(self.short_term_memory)} recent interactions"
def commit_short_term_to_long_term(self) -> None:
"""Summarize short-term memory and commit to long-term memory."""
if not self.short_term_memory:
return
summary = self.summarize_short_term()
self.add_to_long_term({
"type": "conversation_summary",
"summary": summary,
"original_items": self.short_term_memory.copy()
})
# Example usage
memory = MemorySystem()
# Add user interaction to short-term memory
memory.add_to_short_term({
"type": "user_message",
"content": "I need to find information about climate change impacts."
})
# Update working memory with current task
memory.update_working_memory("current_task", "research_climate_change")
memory.update_working_memory("search_queries_used", ["climate change impacts", "sea level rise"])
# Add some knowledge to long-term memory
memory.add_to_long_term({
"type": "learned_fact",
"topic": "climate_change",
"fact": "Global sea levels rose about 8-9 inches since 1880."
})
# Get relevant context for a query
context = memory.get_relevant_context("climate change sea level")
print(json.dumps(context, indent=2))
# At the end of a session, commit short-term to long-term
memory.commit_short_term_to_long_term()
4. Planning and Execution
Planning and execution systems enable agents to break down complex tasks into manageable steps and carry them out effectively:
- Goal Decomposition: Breaking high-level goals into subgoals
- Step Sequencing: Determining the optimal order of operations
- Progress Tracking: Monitoring completion of steps
- Error Handling: Adapting to failures and unexpected outcomes
from typing import List, Dict, Any, Optional
from enum import Enum
import uuid
class TaskStatus(Enum):
PENDING = "pending"
IN_PROGRESS = "in_progress"
COMPLETED = "completed"
FAILED = "failed"
class Task:
def __init__(self, description: str, dependencies: List[str] = None):
self.id = str(uuid.uuid4())[:8]
self.description = description
self.dependencies = dependencies or []
self.status = TaskStatus.PENDING
self.result = None
self.error = None
def to_dict(self) -> Dict[str, Any]:
return {
"id": self.id,
"description": self.description,
"dependencies": self.dependencies,
"status": self.status.value,
"result": self.result,
"error": self.error
}
class Plan:
def __init__(self, goal: str):
self.id = str(uuid.uuid4())[:8]
self.goal = goal
self.tasks = {} # id -> Task
self.created_at = datetime.now().isoformat()
def add_task(self, description: str, dependencies: List[str] = None) -> str:
"""Add a task to the plan and return its ID."""
task = Task(description, dependencies)
self.tasks[task.id] = task
return task.id
def get_next_tasks(self) -> List[Task]:
"""Get tasks that are ready to be executed (all dependencies satisfied)."""
next_tasks = []
for task in self.tasks.values():
if task.status == TaskStatus.PENDING:
dependencies_met = True
for dep_id in task.dependencies:
if dep_id not in self.tasks or self.tasks[dep_id].status != TaskStatus.COMPLETED:
dependencies_met = False
break
if dependencies_met:
next_tasks.append(task)
return next_tasks
def update_task(self, task_id: str, status: TaskStatus, result: Any = None, error: str = None) -> None:
"""Update the status and result of a task."""
if task_id not in self.tasks:
raise ValueError(f"Task {task_id} not found in plan")
task = self.tasks[task_id]
task.status = status
task.result = result
task.error = error
def is_completed(self) -> bool:
"""Check if all tasks in the plan are completed."""
return all(task.status == TaskStatus.COMPLETED for task in self.tasks.values())
def has_failed_tasks(self) -> bool:
"""Check if any tasks have failed."""
return any(task.status == TaskStatus.FAILED for task in self.tasks.values())
def get_summary(self) -> Dict[str, Any]:
"""Get a summary of the plan's status."""
total = len(self.tasks)
completed = sum(1 for task in self.tasks.values() if task.status == TaskStatus.COMPLETED)
in_progress = sum(1 for task in self.tasks.values() if task.status == TaskStatus.IN_PROGRESS)
pending = sum(1 for task in self.tasks.values() if task.status == TaskStatus.PENDING)
failed = sum(1 for task in self.tasks.values() if task.status == TaskStatus.FAILED)
return {
"id": self.id,
"goal": self.goal,
"total_tasks": total,
"completed": completed,
"in_progress": in_progress,
"pending": pending,
"failed": failed,
"is_completed": self.is_completed(),
"has_failed": self.has_failed_tasks()
}
def to_dict(self) -> Dict[str, Any]:
"""Convert the plan to a dictionary."""
return {
"id": self.id,
"goal": self.goal,
"tasks": {task_id: task.to_dict() for task_id, task in self.tasks.items()},
"created_at": self.created_at,
"summary": self.get_summary()
}
class PlanningSystem:
def __init__(self, llm):
self.llm = llm
self.plans = {} # plan_id -> Plan
def create_plan(self, goal: str) -> str:
"""Create a new plan for a goal and return its ID."""
# Use LLM to break down the goal into tasks
planning_prompt = f"""
Create a step-by-step plan to achieve this goal: {goal}
For each step, consider:
1. What needs to be done
2. What information is needed
3. What dependencies exist (which steps must be completed first)
Format your response as a numbered list of steps.
"""
plan_text = self.llm.predict(planning_prompt)
# Create a new plan
plan = Plan(goal)
# Parse the plan text and add tasks
# This is a simplified parser that assumes a numbered list
lines = plan_text.strip().split('\n')
task_map = {} # step number -> task_id
for line in lines:
line = line.strip()
if not line:
continue
# Try to extract step number
parts = line.split('.', 1)
if len(parts) == 2 and parts[0].strip().isdigit():
step_num = int(parts[0].strip())
description = parts[1].strip()
# Assume dependencies are previous steps
dependencies = [task_map[i] for i in range(1, step_num) if i in task_map]
task_id = plan.add_task(description, dependencies)
task_map[step_num] = task_id
# Store the plan
self.plans[plan.id] = plan
return plan.id
def execute_plan(self, plan_id: str, tool_framework) -> Dict[str, Any]:
"""Execute a plan using the provided tool framework."""
if plan_id not in self.plans:
return {"error": f"Plan {plan_id} not found"}
plan = self.plans[plan_id]
# Continue executing until plan is completed or all available tasks are attempted
while not plan.is_completed() and not plan.has_failed_tasks():
next_tasks = plan.get_next_tasks()
if not next_tasks:
break
# Execute each ready task
for task in next_tasks:
# Update task status
plan.update_task(task.id, TaskStatus.IN_PROGRESS)
# Determine which tool to use and how to use it
tool_selection_prompt = f"""
Task: {task.description}
Available tools:
{tool_framework.get_tool_descriptions()}
Which tool should be used for this task? If no tool is needed, respond with "NONE".
If a tool is needed, specify the tool name and the parameters in this format:
TOOL: [tool_name]
PARAMS: [JSON formatted parameters]
"""
tool_response = self.llm.predict(tool_selection_prompt)
# Parse the tool response
tool_name = None
tool_params = {}
if "TOOL:" in tool_response:
tool_lines = tool_response.split('\n')
for line in tool_lines:
if line.startswith("TOOL:"):
tool_name = line.replace("TOOL:", "").strip()
elif line.startswith("PARAMS:"):
params_text = line.replace("PARAMS:", "").strip()
try:
tool_params = json.loads(params_text)
except:
# If JSON parsing fails, use an empty dict
tool_params = {}
# Execute the tool if specified
if tool_name and tool_name != "NONE":
try:
result = tool_framework.execute_tool(tool_name, **tool_params)
plan.update_task(task.id, TaskStatus.COMPLETED, result=result)
except Exception as e:
plan.update_task(task.id, TaskStatus.FAILED, error=str(e))
else:
# No tool needed, mark as completed
plan.update_task(task.id, TaskStatus.COMPLETED)
return plan.to_dict()
def get_plan(self, plan_id: str) -> Optional[Dict[str, Any]]:
"""Get a plan by ID."""
if plan_id not in self.plans:
return None
return self.plans[plan_id].to_dict()
# Example usage (assuming llm and tool_framework are defined)
# planning_system = PlanningSystem(llm)
# plan_id = planning_system.create_plan("Research the impact of climate change on agriculture")
# plan_result = planning_system.execute_plan(plan_id, tool_framework)
# print(json.dumps(plan_result, indent=2))
5. User Interaction Interface
The user interaction interface enables communication between the agent and its users:
- Input Processing: Interpreting user requests and commands
- Output Generation: Creating clear, helpful responses
- Clarification Mechanisms: Asking for additional information when needed
- Progress Updates: Keeping users informed about task status
from typing import List, Dict, Any, Optional, Callable
import json
class UserInteractionInterface:
def __init__(self, llm, memory_system):
self.llm = llm
self.memory_system = memory_system
self.response_formatters = {
"text": self._format_text_response,
"search_results": self._format_search_results,
"error": self._format_error_response,
"plan": self._format_plan_response,
"clarification": self._format_clarification_request
}
def process_user_input(self, user_input: str) -> Dict[str, Any]:
"""Process user input and store in memory."""
# Store in memory
self.memory_system.add_to_short_term({
"type": "user_input",
"content": user_input
})
# Analyze the input
analysis_prompt = f"""
Analyze this user input: "{user_input}"
Determine:
1. The primary intent (question, command, clarification, etc.)
2. The main topic or subject
3. Any specific constraints or preferences mentioned
4. Whether additional information is needed from the user
Format your response as JSON with these fields:
{{
"intent": "string",
"topic": "string",
"constraints": ["string"],
"needs_clarification": boolean,
"clarification_question": "string" (if needs_clarification is true)
}}
"""
analysis_response = self.llm.predict(analysis_prompt)
# Parse the JSON response
try:
analysis = json.loads(analysis_response)
except:
# Fallback if JSON parsing fails
analysis = {
"intent": "unknown",
"topic": "unknown",
"constraints": [],
"needs_clarification": False
}
# Store analysis in working memory
self.memory_system.update_working_memory("current_intent", analysis["intent"])
self.memory_system.update_working_memory("current_topic", analysis["topic"])
return analysis
def generate_response(self, response_type: str, content: Any) -> str:
"""Generate a formatted response based on type and content."""
if response_type in self.response_formatters:
formatter = self.response_formatters[response_type]
response = formatter(content)
else:
# Default to text response
response = str(content)
# Store in memory
self.memory_system.add_to_short_term({
"type": "agent_response",
"response_type": response_type,
"content": response
})
return response
def request_clarification(self, question: str) -> str:
"""Generate a clarification request."""
return self.generate_response("clarification", question)
def _format_text_response(self, content: str) -> str:
"""Format a simple text response."""
return content
def _format_search_results(self, results: List[Dict[str, str]]) -> str:
"""Format search results."""
if not results:
return "I couldn't find any relevant information."
formatted = "Here's what I found:\n\n"
for i, result in enumerate(results, 1):
formatted += f"{i}. {result['title']}\n {result['url']}\n"
return formatted
def _format_error_response(self, error: str) -> str:
"""Format an error response."""
return f"I encountered an issue: {error}\n\nCould you try rephrasing your request or providing more information?"
def _format_plan_response(self, plan: Dict[str, Any]) -> str:
"""Format a plan response."""
formatted = f"I've created a plan to achieve your goal: {plan['goal']}\n\n"
# Add summary
summary = plan['summary']
formatted += f"Progress: {summary['completed']}/{summary['total_tasks']} tasks completed\n\n"
# Add tasks
formatted += "Tasks:\n"
for task_id, task in plan['tasks'].items():
status_symbol = "✓" if task['status'] == "completed" else "⋯" if task['status'] == "in_progress" else "✗" if task['status'] == "failed" else "○"
formatted += f"{status_symbol} {task['description']}\n"
if task['result']:
formatted += f" Result: {task['result']}\n"
if task['error']:
formatted += f" Error: {task['error']}\n"
return formatted
def _format_clarification_request(self, question: str) -> str:
"""Format a clarification request."""
return f"To better assist you, I need some additional information:\n\n{question}"
# Example usage (assuming llm and memory_system are defined)
# interface = UserInteractionInterface(llm, memory_system)
# analysis = interface.process_user_input("Can you help me find information about renewable energy?")
#
# if analysis["needs_clarification"]:
# response = interface.request_clarification(analysis["clarification_question"])
# else:
# # Simulate search results
# search_results = [
# {"title": "Renewable Energy Explained", "url": "https://example.com/renewable-energy"},
# {"title": "Solar and Wind Power Basics", "url": "https://example.com/solar-wind"}
# ]
# response = interface.generate_response("search_results", search_results)
#
# print(response)
Building Your First AI Agent
Let's put everything together to build a simple but functional AI agent that can help with research tasks.
Step 1: Define the Agent's Purpose and Capabilities
Research Assistant Agent Specification:
- Purpose: Help users find, summarize, and synthesize information on specific topics
- Capabilities:
- Search for information online
- Extract key points from articles and websites
- Summarise findings in structured formats
- Answer questions based on gathered information
- Tools:
- Web search
- Web browsing
- Text extraction
- Summarisation
Step 2: Implement the Core Components
import os
import json
import requests
from typing import List, Dict, Any, Optional
from pydantic import BaseModel, Field
from datetime import datetime
# For simplicity, we'll use a mock LLM class
class MockLLM:
def predict(self, prompt: str) -> str:
"""Simulate LLM prediction (in a real implementation, this would call an actual LLM API)."""
# This is just a placeholder that returns a simple response based on the prompt
if "search" in prompt.lower():
return "I'll search for that information."
elif "summarise" in prompt.lower():
return "Here's a summary of the key points..."
elif "analyze" in prompt.lower():
return "Based on my analysis..."
else:
return "I understand your request and will help with that."
# Tool definitions
class SearchParameters(BaseModel):
query: str = Field(..., description="The search query")
num_results: int = Field(5, description="Number of results to return")
def search_function(query: str, num_results: int = 5) -> List[Dict[str, str]]:
"""Simulated search function."""
# In a real implementation, this would call a search API
return [{"title": f"Result {i} for {query}", "url": f"https://example.com/{i}"} for i in range(num_results)]
class BrowseParameters(BaseModel):
url: str = Field(..., description="The URL to browse")
def browse_function(url: str) -> str:
"""Simulated web browsing function."""
# In a real implementation, this would fetch and parse a webpage
return f"Content from {url}: This is simulated webpage content for demonstration purposes."
class SummariseParameters(BaseModel):
text: str = Field(..., description="The text to summarise")
max_length: int = Field(200, description="Maximum length of the summary")
def summarise_function(text: str, max_length: int = 200) -> str:
"""Simulated text summarisation function."""
# In a real implementation, this would use an LLM to summarise text
if len(text) > max_length:
return text[:max_length] + "..."
return text
# Tool class
class Tool:
def __init__(self, name: str, description: str, function, schema: BaseModel = None):
self.name = name
self.description = description
self.function = function
self.schema = schema
def __call__(self, **kwargs):
"""Execute the tool with the provided arguments."""
# Validate arguments if schema is provided
if self.schema:
validated_args = self.schema(**kwargs).dict()
return self.function(**validated_args)
return self.function(**kwargs)
def get_schema(self) -> Dict[str, Any]:
"""Get the JSON schema for this tool."""
if self.schema:
schema_dict = self.schema.schema()
return {
"name": self.name,
"description": self.description,
"parameters": schema_dict
}
return {
"name": self.name,
"description": self.description,
"parameters": {"type": "object", "properties": {}}
}
# Memory system
class MemorySystem:
def __init__(self, max_short_term_items: int = 10):
self.short_term_memory = [] # Recent interactions
self.working_memory = {} # Current task state
self.long_term_memory = [] # Persistent knowledge
self.max_short_term_items = max_short_term_items
def add_to_short_term(self, item: Dict[str, Any]) -> None:
"""Add an item to short-term memory."""
# Add timestamp if not present
if "timestamp" not in item:
item["timestamp"] = datetime.now().isoformat()
self.short_term_memory.append(item)
# Trim if exceeding max size
if len(self.short_term_memory) > self.max_short_term_items:
self.short_term_memory = self.short_term_memory[-self.max_short_term_items:]
def update_working_memory(self, key: str, value: Any) -> None:
"""Update a value in working memory."""
self.working_memory[key] = value
def get_short_term_memory(self) -> List[Dict[str, Any]]:
"""Get all items in short-term memory."""
return self.short_term_memory
def get_working_memory(self) -> Dict[str, Any]:
"""Get all items in working memory."""
return self.working_memory
# Research Assistant Agent
class ResearchAssistantAgent:
def __init__(self):
# Initialize LLM
self.llm = MockLLM()
# Initialize memory
self.memory = MemorySystem()
# Initialize tools
self.tools = {
"search": Tool(
name="search",
description="Search the web for information",
function=search_function,
schema=SearchParameters
),
"browse": Tool(
name="browse",
description="Browse a specific webpage",
function=browse_function,
schema=BrowseParameters
),
"summarise": Tool(
name="summarise",
description="Summarise a piece of text",
function=summarise_function,
schema=SummariseParameters
)
}
def get_tool_descriptions(self) -> str:
"""Get formatted descriptions of all available tools."""
descriptions = []
for name, tool in self.tools.items():
descriptions.append(f"{name}: {tool.description}")
return "\n".join(descriptions)
def process_user_input(self, user_input: str) -> str:
"""Process user input and generate a response."""
# Store user input in memory
self.memory.add_to_short_term({
"type": "user_input",
"content": user_input
})
# Determine the appropriate action
action_prompt = f"""
User input: "{user_input}"
Based on this input, determine what action to take.
Available tools:
{self.get_tool_descriptions()}
If a tool should be used, respond in this format:
TOOL: [tool_name]
PARAMS: [JSON formatted parameters]
If no tool is needed, respond with:
RESPONSE: [Your direct response to the user]
"""
action_decision = self.llm.predict(action_prompt)
# Parse the action decision
if "TOOL:" in action_decision:
# Extract tool name and parameters
tool_name = None
tool_params = {}
lines = action_decision.split('\n')
for line in lines:
if line.startswith("TOOL:"):
tool_name = line.replace("TOOL:", "").strip()
elif line.startswith("PARAMS:"):
params_text = line.replace("PARAMS:", "").strip()
try:
tool_params = json.loads(params_text)
except:
# If JSON parsing fails, use an empty dict
tool_params = {}
# Execute the tool
if tool_name in self.tools:
try:
tool_result = self.tools[tool_name](**tool_params)
# Store tool execution in memory
self.memory.add_to_short_term({
"type": "tool_execution",
"tool": tool_name,
"params": tool_params,
"result": tool_result
})
# Generate response based on tool result
response_prompt = f"""
User input: "{user_input}"
Tool used: {tool_name}
Tool result: {tool_result}
Generate a helpful response to the user based on this information.
"""
response = self.llm.predict(response_prompt)
# Store response in memory
self.memory.add_to_short_term({
"type": "agent_response",
"content": response
})
return response
except Exception as e:
error_message = f"Error executing tool {tool_name}: {e}"
# Store error in memory
self.memory.add_to_short_term({
"type": "error",
"tool": tool_name,
"error": str(e)
})
return error_message
else:
return f"Tool {tool_name} not found. Available tools: {', '.join(self.tools.keys())}"
elif "RESPONSE:" in action_decision:
# Extract direct response
response_lines = action_decision.split("RESPONSE:")
if len(response_lines) > 1:
response = response_lines[1].strip()
# Store response in memory
self.memory.add_to_short_term({
"type": "agent_response",
"content": response
})
return response
# Fallback response
fallback_response = "I'm not sure how to help with that. Could you provide more details or rephrase your request?"
# Store fallback response in memory
self.memory.add_to_short_term({
"type": "agent_response",
"content": fallback_response
})
return fallback_response
def get_conversation_history(self) -> List[Dict[str, Any]]:
"""Get the conversation history from memory."""
return [item for item in self.memory.get_short_term_memory()
if item["type"] in ["user_input", "agent_response"]]
# Example usage
agent = ResearchAssistantAgent()
# Simulate a conversation
responses = []
responses.append(agent.process_user_input("I need to research the impact of artificial intelligence on healthcare"))
responses.append(agent.process_user_input("Can you find information about AI diagnostic tools?"))
responses.append(agent.process_user_input("Summarise the key benefits of AI in healthcare"))
# Print conversation history
print("Conversation History:")
for item in agent.get_conversation_history():
if item["type"] == "user_input":
print(f"User: {item['content']}")
else:
print(f"Agent: {item['content']}")
Step 3: Enhance with Advanced Features
Once you have a basic agent working, you can enhance it with more advanced features:
Advanced Features to Add:
- Multi-step Planning: Break complex research tasks into steps
- Source Tracking: Keep track of where information came from
- Fact Verification: Cross-check information across multiple sources
- Personalisation: Remember user preferences and adapt accordingly
- Visualisation Generation: Create charts or diagrams to illustrate findings
Implementation Tips
- Start Simple: Begin with core functionality and add features incrementally
- Test Thoroughly: Test each component individually before integration
- Monitor Performance: Track response quality, tool usage, and error rates
- Gather Feedback: Use real user interactions to identify improvement areas
- Iterate Rapidly: Continuously refine based on testing and feedback
Common AI Agent Patterns and Anti-Patterns
Understanding common patterns and anti-patterns will help you build more effective AI agents.
Effective Patterns
Successful AI Agent Patterns:
| Pattern | Description | Benefits |
|---|---|---|
| Separation of Concerns | Divide agent functionality into distinct components | Modularity, maintainability, easier testing |
| Progressive Disclosure | Reveal capabilities and options gradually as needed | Reduced cognitive load, focused interactions |
| Explicit Reasoning | Make reasoning process visible to users | Transparency, trust, easier debugging |
| Graceful Degradation | Maintain functionality when optimal resources unavailable | Reliability, consistent user experience |
| Contextual Memory | Maintain relevant context without overwhelming the system | Coherent conversations, personalized responses |
Anti-Patterns to Avoid
AI Agent Anti-Patterns:
| Anti-Pattern | Description | Consequences |
|---|---|---|
| Monolithic Design | Building the entire agent as a single, tightly-coupled system | Difficult to maintain, test, or extend |
| Capability Overload | Adding too many features without clear organization | User confusion, diluted effectiveness |
| Excessive Autonomy | Giving agents too much freedom without appropriate guardrails | Unpredictable behavior, potential harmful actions |
| Context Flooding | Providing too much context to the LLM | Token waste, diluted focus, increased costs |
| Tool Proliferation | Adding too many similar or overlapping tools | Decision paralysis, inefficient tool selection |
Next Steps in Your AI Journey
Now that you understand the fundamentals of AI agents, you're ready to explore more advanced agent design patterns and architectures.
Key Takeaways from This Section:
- AI agents combine LLMs with tools, memory, and planning to create autonomous systems
- The agent cognition loop (Observe, Orient, Decide, Act, Learn) drives agent behaviour
- Different types of agents serve different purposes, from assistive to fully autonomous
- Core components include the LLM, tools, memory, and planning module
- Benefits include automation and efficiency, while challenges include reliability, control, and security
- Ethical development requires transparency, accountability, bias mitigation, and user control
In the next section, we delve into specific Agentic Design Patterns, providing blueprints for constructing robust and effective AI agents for various applications.
Continue to Agentic Design Patterns →