Step 8: Test & Optimize

Now that you've built your value investing AI agent, it's time to test and optimize it to ensure it provides accurate, reliable, and useful investment recommendations. Thorough testing and optimization are crucial steps that can make the difference between a mediocre tool and a truly valuable investment assistant.

Why Testing and Optimization Matter

Testing and optimization serve several important purposes for your value investing AI agent:

Key Testing Objectives

Verify accuracy of financial data retrieval and calculations
Validate investment recommendations against established value investing principles
Ensure the agent performs well across different market sectors and conditions
Identify and fix bugs, edge cases, and performance issues

Data Accuracy

Your agent's recommendations are only as good as the data it works with. Testing ensures that:

Financial data is correctly retrieved from APIs and databases
Calculations (ratios, scores, etc.) are mathematically correct
Missing or anomalous data is handled appropriately
Data updates are timely and reflect current market conditions

Recommendation Validity

Your agent should provide recommendations that align with value investing principles:

Companies with strong fundamentals and low valuations should receive positive ratings
Overvalued companies should receive appropriate caution signals
Recommendations should be consistent with the criteria you defined in Step 1
Explanations should be logical and help users understand the reasoning

Robustness Across Markets

Your agent should work well across different types of companies and market conditions:

Different sectors (tech, finance, healthcare, etc.) have different typical financial profiles
Companies of different sizes (small-cap, mid-cap, large-cap) should be analyzed appropriately
The agent should adapt to different market cycles (bull markets, bear markets, etc.)
International stocks may require different considerations than domestic ones

Technical Performance

Your agent should function reliably and efficiently:

Response times should be reasonable, even when analyzing multiple stocks
Error handling should be robust and provide useful feedback
Edge cases (e.g., IPOs with limited history, companies with unusual financials) should be handled gracefully
Resource usage (memory, CPU, API calls) should be optimized

Testing Strategies

Let's explore different strategies for testing your value investing AI agent:

Python: Unit Testing for Value Investing Agent

# test_value_agent.py

import unittest
import pandas as pd
import numpy as np
from unittest.mock import patch, MagicMock

# Import your agent class
# In a real test, you would import your actual agent class
# For this example, we'll define a simplified version
class SimpleValueInvestingAgent:
    def __init__(self):
        self.criteria = {
            'pe_ratio': {'max': 15, 'weight': 0.15, 'better': 'lower'},
            'pb_ratio': {'max': 3, 'weight': 0.15, 'better': 'lower'},
            'roe': {'min': 0.15, 'weight': 0.15, 'better': 'higher'},
            'debt_to_equity': {'max': 1.0, 'weight': 0.1, 'better': 'lower'},
            'fcf_yield': {'min': 0.02, 'weight': 0.15, 'better': 'higher'},
            'dividend_yield': {'min': 0.01, 'weight': 0.1, 'better': 'higher'},
            'earnings_growth': {'min': 0.05, 'weight': 0.1, 'better': 'higher'},
            'margin_of_safety': {'min': 0.2, 'weight': 0.1, 'better': 'higher'}
        }
    
    def fetch_data(self, ticker):
        # This would normally call an API
        pass
    
    def analyze(self, financial_data):
        # Simplified analysis logic
        if not financial_data:
            return None
        
        results = {
            'company_name': financial_data.get('name', 'Unknown Company'),
            'ticker': financial_data.get('ticker', 'Unknown'),
            'total_score': 0,
            'max_possible_score': sum(criterion['weight'] for criterion in self.criteria.values()),
            'metric_scores': {},
            'explanations': []
        }
        
        # Calculate scores for each metric
        for metric_name, criterion in self.criteria.items():
            if metric_name in financial_data and not pd.isna(financial_data[metric_name]):
                value = financial_data[metric_name]
                score = 0
                
                if criterion['better'] == 'lower' and 'max' in criterion:
                    if value <= criterion['max']:
                        score = criterion['weight'] * (1 - value / criterion['max'])
                        if score < 0:
                            score = 0
                elif criterion['better'] == 'higher' and 'min' in criterion:
                    if value >= criterion['min']:
                        score = criterion['weight'] * min(1, (value - criterion['min']) / (criterion['min'] * 2))
                
                results['metric_scores'][metric_name] = score
                results['total_score'] += score
        
        # Calculate percentage score
        if results['max_possible_score'] > 0:
            results['percentage_score'] = (results['total_score'] / results['max_possible_score']) * 100
        else:
            results['percentage_score'] = 0
        
        # Generate recommendation
        score = results['percentage_score']
        if score >= 70:
            results['rating'] = "Strong Buy"
        elif score >= 60:
            results['rating'] = "Buy"
        elif score >= 40:
            results['rating'] = "Hold"
        elif score >= 30:
            results['rating'] = "Sell"
        else:
            results['rating'] = "Strong Sell"
        
        return results

class TestValueInvestingAgent(unittest.TestCase):
    """Test cases for the Value Investing Agent."""
    
    def setUp(self):
        """Set up test fixtures."""
        self.agent = SimpleValueInvestingAgent()
        
        # Sample test data for a value stock
        self.value_stock_data = {
            'ticker': 'VALUE',
            'name': 'Value Company',
            'pe_ratio': 10.0,
            'pb_ratio': 1.5,
            'roe': 0.20,
            'debt_to_equity': 0.5,
            'fcf_yield': 0.05,
            'dividend_yield': 0.03,
            'earnings_growth': 0.08,
            'margin_of_safety': 0.25
        }
        
        # Sample test data for an overvalued stock
        self.overvalued_stock_data = {
            'ticker': 'OVER',
            'name': 'Overvalued Company',
            'pe_ratio': 50.0,
            'pb_ratio': 10.0,
            'roe': 0.10,
            'debt_to_equity': 2.0,
            'fcf_yield': 0.01,
            'dividend_yield': 0.005,
            'earnings_growth': 0.03,
            'margin_of_safety': 0.05
        }
        
        # Sample test data with missing values
        self.incomplete_stock_data = {
            'ticker': 'INCOMPLETE',
            'name': 'Incomplete Data Company',
            'pe_ratio': 12.0,
            'pb_ratio': np.nan,
            'roe': 0.18,
            'debt_to_equity': np.nan,
            'fcf_yield': 0.03,
            'dividend_yield': np.nan,
            'earnings_growth': np.nan,
            'margin_of_safety': 0.15
        }
    
    def test_analyze_value_stock(self):
        """Test that a value stock receives a positive rating."""
        result = self.agent.analyze(self.value_stock_data)
        
        # Check that the analysis was performed
        self.assertIsNotNone(result)
        
        # Check that the score is high (should be a "Buy" or "Strong Buy")
        self.assertGreaterEqual(result['percentage_score'], 60)
        self.assertIn(result['rating'], ["Buy", "Strong Buy"])
        
        # Check individual metric scores
        self.assertGreater(result['metric_scores']['pe_ratio'], 0)
        self.assertGreater(result['metric_scores']['roe'], 0)
    
    def test_analyze_overvalued_stock(self):
        """Test that an overvalued stock receives a negative rating."""
        result = self.agent.analyze(self.overvalued_stock_data)
        
        # Check that the analysis was performed
        self.assertIsNotNone(result)
        
        # Check that the score is low (should be a "Sell" or "Strong Sell")
        self.assertLessEqual(result['percentage_score'], 40)
        self.assertIn(result['rating'], ["Sell", "Strong Sell"])
        
        # Check individual metric scores
        self.assertEqual(result['metric_scores'].get('pe_ratio', 0), 0)
        self.assertEqual(result['metric_scores'].get('pb_ratio', 0), 0)
    
    def test_analyze_incomplete_data(self):
        """Test that the agent handles incomplete data gracefully."""
        result = self.agent.analyze(self.incomplete_stock_data)
        
        # Check that the analysis was performed despite missing data
        self.assertIsNotNone(result)
        
        # Check that only available metrics were scored
        self.assertIn('pe_ratio', result['metric_scores'])
        self.assertIn('roe', result['metric_scores'])
        self.assertIn('fcf_yield', result['metric_scores'])
        self.assertNotIn('pb_ratio', result['metric_scores'])
        self.assertNotIn('debt_to_equity', result['metric_scores'])
    
    def test_analyze_empty_data(self):
        """Test that the agent handles empty data gracefully."""
        result = self.agent.analyze({})
        
        # Check that the analysis returns None for empty data
        self.assertIsNone(result)
    
    def test_analyze_none_data(self):
        """Test that the agent handles None data gracefully."""
        result = self.agent.analyze(None)
        
        # Check that the analysis returns None for None data
        self.assertIsNone(result)
    
    @patch('SimpleValueInvestingAgent.fetch_data')
    def test_fetch_data(self, mock_fetch):
        """Test that the agent fetches data correctly."""
        # Mock the fetch_data method to return our test data
        mock_fetch.return_value = self.value_stock_data
        
        # Call the method
        data = self.agent.fetch_data('VALUE')
        
        # Check that the data was fetched
        self.assertEqual(data, self.value_stock_data)
        
        # Check that the method was called with the correct ticker
        mock_fetch.assert_called_once_with('VALUE')

if __name__ == '__main__':
    unittest.main()

Backtesting with Historical Data

One of the most important ways to test a value investing agent is to see how it would have performed in the past. This is called backtesting:

Python: Backtesting Value Investing Agent

# backtest_value_agent.py

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import yfinance as yf
from datetime import datetime, timedelta
import os

# Import your agent class
# In a real test, you would import your actual agent class
# For this example, we'll define a simplified version
class SimpleValueInvestingAgent:
    def __init__(self):
        self.criteria = {
            'pe_ratio': {'max': 15, 'weight': 0.15, 'better': 'lower'},
            'pb_ratio': {'max': 3, 'weight': 0.15, 'better': 'lower'},
            'roe': {'min': 0.15, 'weight': 0.15, 'better': 'higher'},
            'debt_to_equity': {'max': 1.0, 'weight': 0.1, 'better': 'lower'},
            'fcf_yield': {'min': 0.02, 'weight': 0.15, 'better': 'higher'},
            'dividend_yield': {'min': 0.01, 'weight': 0.1, 'better': 'higher'},
            'earnings_growth': {'min': 0.05, 'weight': 0.1, 'better': 'higher'},
            'margin_of_safety': {'min': 0.2, 'weight': 0.1, 'better': 'higher'}
        }
    
    def analyze_historical(self, financial_data):
        """Analyze historical financial data."""
        if not financial_data:
            return None
        
        results = {
            'company_name': financial_data.get('name', 'Unknown Company'),
            'ticker': financial_data.get('ticker', 'Unknown'),
            'date': financial_data.get('date', 'Unknown'),
            'total_score': 0,
            'max_possible_score': sum(criterion['weight'] for criterion in self.criteria.values()),
            'metric_scores': {},
        }
        
        # Calculate scores for each metric
        for metric_name, criterion in self.criteria.items():
            if metric_name in financial_data and not pd.isna(financial_data[metric_name]):
                value = financial_data[metric_name]
                score = 0
                
                if criterion['better'] == 'lower' and 'max' in criterion:
                    if value <= criterion['max']:
                        score = criterion['weight'] * (1 - value / criterion['max'])
                        if score < 0:
                            score = 0
                elif criterion['better'] == 'higher' and 'min' in criterion:
                    if value >= criterion['min']:
                        score = criterion['weight'] * min(1, (value - criterion['min']) / (criterion['min'] * 2))
                
                results['metric_scores'][metric_name] = score
                results['total_score'] += score
        
        # Calculate percentage score
        if results['max_possible_score'] > 0:
            results['percentage_score'] = (results['total_score'] / results['max_possible_score']) * 100
        else:
            results['percentage_score'] = 0
        
        # Generate recommendation
        score = results['percentage_score']
        if score >= 70:
            results['rating'] = "Strong Buy"
        elif score >= 60:
            results['rating'] = "Buy"
        elif score >= 40:
            results['rating'] = "Hold"
        elif score >= 30:
            results['rating'] = "Sell"
        else:
            results['rating'] = "Strong Sell"]
        
        return results

class BacktestEngine:
    """Engine for backtesting a value investing agent."""
    
    def __init__(self, agent, start_date, end_date, rebalance_period='quarterly'):
        """
        Initialize the backtest engine.
        
        Parameters:
        -----------
        agent : ValueInvestingAgent
            The value investing agent to backtest
        start_date : str
            Start date for the backtest (format: 'YYYY-MM-DD')
        end_date : str
            End date for the backtest (format: 'YYYY-MM-DD')
        rebalance_period : str, optional
            How often to rebalance the portfolio ('monthly', 'quarterly', 'annually')
        """
        self.agent = agent
        self.start_date = start_date
        self.end_date = end_date
        self.rebalance_period = rebalance_period
        
        # Define rebalance frequency in months
        if rebalance_period == 'monthly':
            self.rebalance_months = 1
        elif rebalance_period == 'quarterly':
            self.rebalance_months = 3
        elif rebalance_period == 'annually':
            self.rebalance_months = 12
        else:
            raise ValueError("rebalance_period must be 'monthly', 'quarterly', or 'annually'")
    
    def get_historical_financial_data(self, ticker, date):
        """
        Get historical financial data for a ticker at a specific date.
        
        In a real implementation, this would fetch historical financial statements.
        For this example, we'll use a simplified approach with random data.
        """
        # This is a placeholder. In a real implementation, you would:
        # 1. Fetch historical financial statements from a database or API
        # 2. Calculate the financial metrics as they were at that point in time
        
        # For demonstration, we'll generate synthetic data
        np.random.seed(int(datetime.strptime(date, '%Y-%m-%d').timestamp()))
        
        # Base values that change over time
        base_pe = 15 + np.random.normal(0, 3)
        base_pb = 2 + np.random.normal(0, 0.5)
        base_roe = 0.15 + np.random.normal(0, 0.03)
        base_de = 0.8 + np.random.normal(0, 0.2)
        base_fcf = 0.03 + np.random.normal(0, 0.01)
        base_div = 0.02 + np.random.normal(0, 0.005)
        base_growth = 0.06 + np.random.normal(0, 0.02)
        
        # Get stock price at that date
        try:
            stock = yf.Ticker(ticker)
            hist = stock.history(start=date, end=(datetime.strptime(date, '%Y-%m-%d') + timedelta(days=5)).strftime('%Y-%m-%d'))
            if not hist.empty:
                price = hist.iloc[0]['Close']
                high_52w = stock.history(period='1y', end=date)['High'].max()
                margin_of_safety = (high_52w - price) / high_52w
            else:
                price = 100
                margin_of_safety = 0.1
        except:
            price = 100
            margin_of_safety = 0.1
        
        return {
            'ticker': ticker,
            'name': f"{ticker} Inc.",
            'date': date,
            'price': price,
            'pe_ratio': base_pe,
            'pb_ratio': base_pb,
            'roe': base_roe,
            'debt_to_equity': base_de,
            'fcf_yield': base_fcf,
            'dividend_yield': base_div,
            'earnings_growth': base_growth,
            'margin_of_safety': margin_of_safety
        }
    
    def get_stock_returns(self, ticker, start_date, end_date):
        """Get stock returns between two dates."""
        try:
            stock = yf.Ticker(ticker)
            hist = stock.history(start=start_date, end=end_date)
            
            if hist.empty:
                return 0
            
            start_price = hist.iloc[0]['Close']
            end_price = hist.iloc[-1]['Close']
            
            # Calculate total return including dividends
            total_return = (end_price / start_price) - 1
            
            # Add dividend returns
            dividends = hist['Dividends'].sum()
            if dividends > 0:
                dividend_return = dividends / start_price
                total_return += dividend_return
            
            return total_return
            
        except Exception as e:
            print(f"Error getting returns for {ticker}: {e}")
            return 0
    
    def generate_rebalance_dates(self):
        """Generate dates for portfolio rebalancing."""
        start = datetime.strptime(self.start_date, '%Y-%m-%d')
        end = datetime.strptime(self.end_date, '%Y-%m-%d')
        
        dates = []
        current = start
        
        while current <= end:
            dates.append(current.strftime('%Y-%m-%d'))
            
            # Move to next rebalance date
            year = current.year + ((current.month - 1 + self.rebalance_months) // 12)
            month = ((current.month - 1 + self.rebalance_months) % 12) + 1
            current = datetime(year, month, min(current.day, 28))
        
        return dates
    
    def run_backtest(self, universe, top_n=5):
        """
        Run the backtest.
        
        Parameters:
        -----------
        universe : list
            List of ticker symbols to consider for the portfolio
        top_n : int, optional
            Number of top-rated stocks to include in the portfolio
            
        Returns:
        --------
        dict
            Backtest results
        """
        # Generate rebalance dates
        rebalance_dates = self.generate_rebalance_dates()
        
        # Initialize results
        portfolio_values = [1.0]  # Start with $1
        benchmark_values = [1.0]  # Start with $1
        dates = [self.start_date]
        holdings = []
        
        # Run the backtest
        for i in range(len(rebalance_dates) - 1):
            current_date = rebalance_dates[i]
            next_date = rebalance_dates[i + 1]
            
            print(f"Analyzing period: {current_date} to {next_date}")
            
            # Analyze each stock in the universe
            stock_analyses = []
            for ticker in universe:
                financial_data = self.get_historical_financial_data(ticker, current_date)
                analysis = self.agent.analyze_historical(financial_data)
                
                if analysis:
                    stock_analyses.append(analysis)
            
            # Sort by value score
            stock_analyses.sort(key=lambda x: x['percentage_score'], reverse=True)
            
            # Select top N stocks
            selected_stocks = stock_analyses[:top_n]
            
            # Record holdings
            holdings.append({
                'date': current_date,
                'stocks': [{'ticker': s['ticker'], 'rating': s['rating'], 'score': s['percentage_score']} for s in selected_stocks]
            })
            
            # Calculate returns for the period
            portfolio_return = 0
            for stock in selected_stocks:
                stock_return = self.get_stock_returns(stock['ticker'], current_date, next_date)
                portfolio_return += stock_return / len(selected_stocks)  # Equal weighting
            
            # Calculate benchmark return (S&P 500)
            benchmark_return = self.get_stock_returns('SPY', current_date, next_date)
            
            # Update portfolio and benchmark values
            portfolio_values.append(portfolio_values[-1] * (1 + portfolio_return))
            benchmark_values.append(benchmark_values[-1] * (1 + benchmark_return))
            dates.append(next_date)
            
            print(f"Period return: Portfolio: {portfolio_return:.2%}, Benchmark: {benchmark_return:.2%}")
        
        # Calculate performance metrics
        total_portfolio_return = portfolio_values[-1] - 1
        total_benchmark_return = benchmark_values[-1] - 1
        
        # Calculate annualized returns
        years = (datetime.strptime(self.end_date, '%Y-%m-%d') - datetime.strptime(self.start_date, '%Y-%m-%d')).days / 365.25
        annualized_portfolio_return = (1 + total_portfolio_return) ** (1 / years) - 1
        annualized_benchmark_return = (1 + total_benchmark_return) ** (1 / years) - 1
        
        # Calculate excess return
        excess_return = annualized_portfolio_return - annualized_benchmark_return
        
        # Calculate drawdowns
        portfolio_drawdowns = []
        benchmark_drawdowns = []
        
        portfolio_peak = portfolio_values[0]
        benchmark_peak = benchmark_values[0]
        
        for i in range(len(portfolio_values)):
            if portfolio_values[i] > portfolio_peak:
                portfolio_peak = portfolio_values[i]
            
            if benchmark_values[i] > benchmark_peak:
                benchmark_peak = benchmark_values[i]
            
            portfolio_drawdown = (portfolio_values[i] - portfolio_peak) / portfolio_peak
            benchmark_drawdown = (benchmark_values[i] - benchmark_peak) / benchmark_peak
            
            portfolio_drawdowns.append(portfolio_drawdown)
            benchmark_drawdowns.append(benchmark_drawdown)
        
        max_portfolio_drawdown = min(portfolio_drawdowns)
        max_benchmark_drawdown = min(benchmark_drawdowns)
        
        # Compile results
        results = {
            'dates': dates,
            'portfolio_values': portfolio_values,
            'benchmark_values': benchmark_values,
            'holdings': holdings,
            'total_portfolio_return': total_portfolio_return,
            'total_benchmark_return': total_benchmark_return,
            'annualized_portfolio_return': annualized_portfolio_return,
            'annualized_benchmark_return': annualized_benchmark_return,
            'excess_return': excess_return,
            'max_portfolio_drawdown': max_portfolio_drawdown,
            'max_benchmark_drawdown': max_benchmark_drawdown
        }
        
        return results
    
    def plot_results(self, results):
        """Plot backtest results."""
        plt.figure(figsize=(12, 8))
        
        # Plot portfolio vs benchmark
        plt.subplot(2, 1, 1)
        plt.plot(results['dates'], results['portfolio_values'], label='Value Portfolio')
        plt.plot(results['dates'], results['benchmark_values'], label='S&P 500')
        plt.title('Portfolio Performance')
        plt.xlabel('Date')
        plt.ylabel('Value ($)')
        plt.legend()
        plt.grid(True)
        
        # Add performance metrics as text
        plt.figtext(0.15, 0.85, f"Total Return: {results['total_portfolio_return']:.2%} vs {results['total_benchmark_return']:.2%} (S&P 500)", fontsize=12)
        plt.figtext(0.15, 0.82, f"Annualized Return: {results['annualized_portfolio_return']:.2%} vs {results['annualized_benchmark_return']:.2%} (S&P 500)", fontsize=12)
        plt.figtext(0.15, 0.79, f"Excess Return: {results['excess_return']:.2%}", fontsize=12)
        plt.figtext(0.15, 0.76, f"Max Drawdown: {results['max_portfolio_drawdown']:.2%} vs {results['max_benchmark_drawdown']:.2%} (S&P 500)", fontsize=12)
        
        # Plot holdings over time
        plt.subplot(2, 1, 2)
        
        # Extract holdings data
        holding_dates = [h['date'] for h in results['holdings']]
        tickers = set()
        for h in results['holdings']:
            for s in h['stocks']:
                tickers.add(s['ticker'])
        
        # Create a matrix of holdings
        tickers = sorted(list(tickers))
        holdings_matrix = np.zeros((len(holding_dates), len(tickers)))
        
        for i, h in enumerate(results['holdings']):
            for s in h['stocks']:
                if s['ticker'] in tickers:
                    j = tickers.index(s['ticker'])
                    holdings_matrix[i, j] = 1
        
        plt.imshow(holdings_matrix, aspect='auto', cmap='Blues')
        plt.yticks(range(len(holding_dates)), holding_dates)
        plt.xticks(range(len(tickers)), tickers, rotation=90)
        plt.title('Portfolio Holdings Over Time')
        plt.xlabel('Stock')
        plt.ylabel('Rebalance Date')
        plt.colorbar(label='Holding Weight')
        
        plt.tight_layout()
        plt.savefig('backtest_results.png')
        plt.close()
        
        print("Backtest results plot saved as 'backtest_results.png'")

# Example usage
if __name__ == "__main__":
    # Create the agent
    agent = SimpleValueInvestingAgent()
    
    # Create the backtest engine
    backtest = BacktestEngine(
        agent=agent,
        start_date='2018-01-01',
        end_date='2023-01-01',
        rebalance_period='quarterly'
    )
    
    # Define universe of stocks to consider
    universe = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META', 'BRK-B', 'JNJ', 'JPM', 'V', 'PG', 
                'UNH', 'HD', 'BAC', 'XOM', 'NVDA', 'DIS', 'ADBE', 'CRM', 'NFLX', 'CSCO']
    
    # Run the backtest
    results = backtest.run_backtest(universe=universe, top_n=5)
    
    # Plot the results
    backtest.plot_results(results)
    
    # Print summary
    print("\nBacktest Summary:")
    print(f"Period: {backtest.start_date} to {backtest.end_date}")
    print(f"Rebalance Frequency: {backtest.rebalance_period}")
    print(f"Total Return: {results['total_portfolio_return']:.2%} vs {results['total_benchmark_return']:.2%} (S&P 500)")
    print(f"Annualized Return: {results['annualized_portfolio_return']:.2%} vs {results['annualized_benchmark_return']:.2%} (S&P 500)")
    print(f"Excess Return: {results['excess_return']:.2%}")
    print(f"Max Drawdown: {results['max_portfolio_drawdown']:.2%} vs {results['max_benchmark_drawdown']:.2%} (S&P 500)")

Optimizing Your Agent

Based on your testing results, you'll likely want to optimize your value investing agent. Here are some approaches to optimization:

Optimization Strategies

Consider these strategies for improving your value investing AI agent:

1. Criteria Tuning

Adjust your value investing criteria based on backtest results:


# Example of criteria tuning
def tune_criteria(agent, universe, start_date, end_date):
    """Tune criteria weights to optimize performance."""
    best_return = -float('inf')
    best_weights = None
    
    # Define weight combinations to test
    weight_options = [0.05, 0.1, 0.15, 0.2, 0.25]
    
    # Test different weight combinations
    for pe_weight in weight_options:
        for pb_weight in weight_options:
            for roe_weight in weight_options:
                # Ensure weights sum to 1.0
                remaining_weight = 1.0 - (pe_weight + pb_weight + roe_weight)
                if remaining_weight <= 0:
                    continue
                
                # Update agent criteria weights
                agent.criteria['pe_ratio']['weight'] = pe_weight
                agent.criteria['pb_ratio']['weight'] = pb_weight
                agent.criteria['roe']['weight'] = roe_weight
                agent.criteria['debt_to_equity']['weight'] = remaining_weight / 5
                agent.criteria['fcf_yield']['weight'] = remaining_weight / 5
                agent.criteria['dividend_yield']['weight'] = remaining_weight / 5
                agent.criteria['earnings_growth']['weight'] = remaining_weight / 5
                agent.criteria['margin_of_safety']['weight'] = remaining_weight / 5
                
                # Run backtest with these weights
                backtest = BacktestEngine(agent, start_date, end_date)
                results = backtest.run_backtest(universe)
                
                # Check if this is the best performance so far
                if results['excess_return'] > best_return:
                    best_return = results['excess_return']
                    best_weights = {
                        'pe_ratio': pe_weight,
                        'pb_ratio': pb_weight,
                        'roe': roe_weight,
                        'debt_to_equity': remaining_weight / 5,
                        'fcf_yield': remaining_weight / 5,
                        'dividend_yield': remaining_weight / 5,
                        'earnings_growth': remaining_weight / 5,
                        'margin_of_safety': remaining_weight / 5
                    }
    
    return best_weights, best_return

2. Threshold Optimization

Fine-tune the thresholds for each criterion:


# Example of threshold optimization
def optimize_thresholds(agent, universe, start_date, end_date):
    """Optimize criteria thresholds."""
    best_return = -float('inf')
    best_thresholds = None
    
    # Define threshold options to test
    pe_options = [10, 15, 20, 25]
    pb_options = [1, 2, 3, 4]
    roe_options = [0.1, 0.15, 0.2, 0.25]
    
    # Test different threshold combinations
    for pe_max in pe_options:
        for pb_max in pb_options:
            for roe_min in roe_options:
                # Update agent criteria thresholds
                agent.criteria['pe_ratio']['max'] = pe_max
                agent.criteria['pb_ratio']['max'] = pb_max
                agent.criteria['roe']['min'] = roe_min
                
                # Run backtest with these thresholds
                backtest = BacktestEngine(agent, start_date, end_date)
                results = backtest.run_backtest(universe)
                
                # Check if this is the best performance so far
                if results['excess_return'] > best_return:
                    best_return = results['excess_return']
                    best_thresholds = {
                        'pe_ratio_max': pe_max,
                        'pb_ratio_max': pb_max,
                        'roe_min': roe_min
                    }
    
    return best_thresholds, best_return

3. Sector-Specific Adjustments

Customize criteria for different market sectors:


# Example of sector-specific criteria
sector_criteria = {
    'Technology': {
        'pe_ratio': {'max': 25, 'weight': 0.15},  # Higher P/E acceptable for tech
        'pb_ratio': {'max': 5, 'weight': 0.15},   # Higher P/B acceptable for tech
        'roe': {'min': 0.2, 'weight': 0.2},       # Higher ROE expected for tech
        # Other criteria...
    },
    'Financial': {
        'pe_ratio': {'max': 12, 'weight': 0.1},   # Lower P/E expected for financials
        'pb_ratio': {'max': 1.5, 'weight': 0.2},  # P/B more important for financials
        'roe': {'min': 0.12, 'weight': 0.15},     # Different ROE expectations
        # Other criteria...
    },
    # Other sectors...
}

# Modify agent to use sector-specific criteria
def analyze_with_sector(financial_data):
    sector = financial_data.get('sector', 'Unknown')
    criteria = sector_criteria.get(sector, default_criteria)
    # Proceed with analysis using sector-specific criteria

4. Performance Optimization

Improve the technical performance of your agent:

Caching: Cache API responses to reduce redundant calls
Parallel Processing: Use multiprocessing for analyzing multiple stocks
Batch Processing: Process stocks in batches to optimize API usage
Error Handling: Improve robustness with better error handling


# Example of implementing caching
import functools
import time

# Simple time-based cache decorator
def cache_with_timeout(timeout_seconds=3600):
    """Cache function results with a timeout."""
    cache = {}
    
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            key = str(args) + str(kwargs)
            current_time = time.time()
            
            # Check if result is in cache and not expired
            if key in cache and current_time - cache[key]['timestamp'] < timeout_seconds:
                return cache[key]['result']
            
            # Call the function and cache the result
            result = func(*args, **kwargs)
            cache[key] = {
                'result': result,
                'timestamp': current_time
            }
            
            return result
        return wrapper
    return decorator

# Apply to data fetching method
@cache_with_timeout(timeout_seconds=3600)  # Cache for 1 hour
def fetch_data(ticker):
    # Fetch data from API...
    pass

A/B Testing Different Versions

Once you've developed multiple versions of your agent through optimization, you can compare them using A/B testing:

Python: A/B Testing Different Agent Versions

# ab_test_agents.py

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import yfinance as yf

# Define different agent versions
class BaseValueAgent:
    """Base value investing agent with original criteria."""
    
    def __init__(self):
        self.name = "Base Agent"
        self.criteria = {
            'pe_ratio': {'max': 15, 'weight': 0.15, 'better': 'lower'},
            'pb_ratio': {'max': 3, 'weight': 0.15, 'better': 'lower'},
            'roe': {'min': 0.15, 'weight': 0.15, 'better': 'higher'},
            'debt_to_equity': {'max': 1.0, 'weight': 0.1, 'better': 'lower'},
            'fcf_yield': {'min': 0.02, 'weight': 0.15, 'better': 'higher'},
            'dividend_yield': {'min': 0.01, 'weight': 0.1, 'better': 'higher'},
            'earnings_growth': {'min': 0.05, 'weight': 0.1, 'better': 'higher'},
            'margin_of_safety': {'min': 0.2, 'weight': 0.1, 'better': 'higher'}
        }
    
    def analyze(self, financial_data):
        """Analyze company data using value investing criteria."""
        # Implementation as before...
        pass

class OptimizedWeightsAgent(BaseValueAgent):
    """Agent with optimized criteria weights."""
    
    def __init__(self):
        super().__init__()
        self.name = "Optimized Weights Agent"
        # Update weights based on optimization results
        self.criteria['pe_ratio']['weight'] = 0.2
        self.criteria['pb_ratio']['weight'] = 0.1
        self.criteria['roe']['weight'] = 0.2
        self.criteria['debt_to_equity']['weight'] = 0.05
        self.criteria['fcf_yield']['weight'] = 0.2
        self.criteria['dividend_yield']['weight'] = 0.05
        self.criteria['earnings_growth']['weight'] = 0.1
        self.criteria['margin_of_safety']['weight'] = 0.1

class OptimizedThresholdsAgent(BaseValueAgent):
    """Agent with optimized criteria thresholds."""
    
    def __init__(self):
        super().__init__()
        self.name = "Optimized Thresholds Agent"
        # Update thresholds based on optimization results
        self.criteria['pe_ratio']['max'] = 20
        self.criteria['pb_ratio']['max'] = 4
        self.criteria['roe']['min'] = 0.12
        self.criteria['debt_to_equity']['max'] = 1.2
        self.criteria['fcf_yield']['min'] = 0.015
        self.criteria['dividend_yield']['min'] = 0.005
        self.criteria['earnings_growth']['min'] = 0.04
        self.criteria['margin_of_safety']['min'] = 0.15

class SectorSpecificAgent(BaseValueAgent):
    """Agent with sector-specific criteria."""
    
    def __init__(self):
        super().__init__()
        self.name = "Sector-Specific Agent"
        # Define sector-specific criteria
        self.sector_criteria = {
            'Technology': {
                'pe_ratio': {'max': 25, 'weight': 0.15, 'better': 'lower'},
                'pb_ratio': {'max': 5, 'weight': 0.15, 'better': 'lower'},
                'roe': {'min': 0.2, 'weight': 0.2, 'better': 'higher'},
                'debt_to_equity': {'max': 1.2, 'weight': 0.05, 'better': 'lower'},
                'fcf_yield': {'min': 0.015, 'weight': 0.2, 'better': 'higher'},
                'dividend_yield': {'min': 0.005, 'weight': 0.05, 'better': 'higher'},
                'earnings_growth': {'min': 0.08, 'weight': 0.1, 'better': 'higher'},
                'margin_of_safety': {'min': 0.15, 'weight': 0.1, 'better': 'higher'}
            },
            'Financial': {
                'pe_ratio': {'max': 12, 'weight': 0.1, 'better': 'lower'},
                'pb_ratio': {'max': 1.5, 'weight': 0.2, 'better': 'lower'},
                'roe': {'min': 0.12, 'weight': 0.15, 'better': 'higher'},
                'debt_to_equity': {'max': 5.0, 'weight': 0.05, 'better': 'lower'},
                'fcf_yield': {'min': 0.03, 'weight': 0.15, 'better': 'higher'},
                'dividend_yield': {'min': 0.02, 'weight': 0.15, 'better': 'higher'},
                'earnings_growth': {'min': 0.04, 'weight': 0.1, 'better': 'higher'},
                'margin_of_safety': {'min': 0.2, 'weight': 0.1, 'better': 'higher'}
            },
            # Other sectors...
        }
    
    def analyze(self, financial_data):
        """Analyze company data using sector-specific criteria."""
        sector = financial_data.get('sector', 'Unknown')
        
        # Use sector-specific criteria if available, otherwise use default
        if sector in self.sector_criteria:
            original_criteria = self.criteria
            self.criteria = self.sector_criteria[sector]
            result = super().analyze(financial_data)
            self.criteria = original_criteria
            return result
        else:
            return super().analyze(financial_data)

def run_ab_test(agents, universe, start_date, end_date, rebalance_period='quarterly', top_n=5):
    """
    Run A/B test comparing multiple agent versions.
    
    Parameters:
    -----------
    agents : list
        List of agent instances to compare
    universe : list
        List of ticker symbols to consider
    start_date : str
        Start date for the test
    end_date : str
        End date for the test
    rebalance_period : str, optional
        Rebalance frequency
    top_n : int, optional
        Number of stocks to include in each portfolio
        
    Returns:
    --------
    dict
        Test results
    """
    results = {}
    
    for agent in agents:
        print(f"\nTesting {agent.name}...")
        
        # Create backtest engine for this agent
        backtest = BacktestEngine(
            agent=agent,
            start_date=start_date,
            end_date=end_date,
            rebalance_period=rebalance_period
        )
        
        # Run backtest
        agent_results = backtest.run_backtest(universe=universe, top_n=top_n)
        
        # Store results
        results[agent.name] = agent_results
    
    return results

def plot_ab_test_results(results, start_date, end_date):
    """Plot A/B test results."""
    plt.figure(figsize=(12, 10))
    
    # Plot portfolio values
    plt.subplot(2, 1, 1)
    
    # Get the first agent's dates for x-axis
    first_agent = list(results.keys())[0]
    dates = results[first_agent]['dates']
    
    # Plot each agent's portfolio value
    for agent_name, agent_results in results.items():
        plt.plot(dates, agent_results['portfolio_values'], label=agent_name)
    
    # Plot benchmark (S&P 500)
    plt.plot(dates, results[first_agent]['benchmark_values'], label='S&P 500', linestyle='--')
    
    plt.title('Portfolio Performance Comparison')
    plt.xlabel('Date')
    plt.ylabel('Value ($)')
    plt.legend()
    plt.grid(True)
    
    # Plot performance metrics
    plt.subplot(2, 1, 2)
    
    # Extract metrics
    agent_names = list(results.keys())
    total_returns = [results[name]['total_portfolio_return'] for name in agent_names]
    annualized_returns = [results[name]['annualized_portfolio_return'] for name in agent_names]
    excess_returns = [results[name]['excess_return'] for name in agent_names]
    max_drawdowns = [results[name]['max_portfolio_drawdown'] for name in agent_names]
    
    # Add benchmark
    agent_names.append('S&P 500')
    total_returns.append(results[first_agent]['total_benchmark_return'])
    annualized_returns.append(results[first_agent]['annualized_benchmark_return'])
    excess_returns.append(0)  # Benchmark excess return is 0 by definition
    max_drawdowns.append(results[first_agent]['max_benchmark_drawdown'])
    
    # Create bar chart
    x = np.arange(len(agent_names))
    width = 0.2
    
    plt.bar(x - width*1.5, [r*100 for r in total_returns], width, label='Total Return (%)')
    plt.bar(x - width/2, [r*100 for r in annualized_returns], width, label='Annualized Return (%)')
    plt.bar(x + width/2, [r*100 for r in excess_returns], width, label='Excess Return (%)')
    plt.bar(x + width*1.5, [r*100 for r in max_drawdowns], width, label='Max Drawdown (%)')
    
    plt.xlabel('Agent')
    plt.ylabel('Percentage (%)')
    plt.title('Performance Metrics Comparison')
    plt.xticks(x, agent_names, rotation=45)
    plt.legend()
    plt.grid(True, axis='y')
    
    plt.tight_layout()
    plt.savefig('ab_test_results.png')
    plt.close()
    
    print("A/B test results plot saved as 'ab_test_results.png'")
    
    # Create a summary table
    summary = pd.DataFrame({
        'Agent': agent_names,
        'Total Return (%)': [r*100 for r in total_returns],
        'Annualized Return (%)': [r*100 for r in annualized_returns],
        'Excess Return (%)': [r*100 for r in excess_returns],
        'Max Drawdown (%)': [r*100 for r in max_drawdowns]
    })
    
    # Sort by annualized return
    summary = summary.sort_values('Annualized Return (%)', ascending=False)
    
    # Save to CSV
    summary.to_csv('ab_test_summary.csv', index=False)
    print("A/B test summary saved as 'ab_test_summary.csv'")
    
    return summary

# Example usage
if __name__ == "__main__":
    # Create agent instances
    agents = [
        BaseValueAgent(),
        OptimizedWeightsAgent(),
        OptimizedThresholdsAgent(),
        SectorSpecificAgent()
    ]
    
    # Define universe of stocks
    universe = ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META', 'BRK-B', 'JNJ', 'JPM', 'V', 'PG', 
                'UNH', 'HD', 'BAC', 'XOM', 'NVDA', 'DIS', 'ADBE', 'CRM', 'NFLX', 'CSCO']
    
    # Run A/B test
    results = run_ab_test(
        agents=agents,
        universe=universe,
        start_date='2018-01-01',
        end_date='2023-01-01',
        rebalance_period='quarterly',
        top_n=5
    )
    
    # Plot and summarize results
    summary = plot_ab_test_results(results, '2018-01-01', '2023-01-01')
    
    # Print the winner
    winner = summary.iloc[0]['Agent']
    winner_return = summary.iloc[0]['Annualized Return (%)']
    print(f"\nThe best performing agent is: {winner} with an annualized return of {winner_return:.2f}%")

Knowledge Check

What is the primary purpose of backtesting a value investing AI agent?

To predict future stock prices with high accuracy
To evaluate how the agent's investment strategy would have performed in the past
To automatically execute trades in the stock market
To generate random stock recommendations

Which of the following is NOT a common approach to optimizing a value investing AI agent?

Adjusting the weights of different criteria
Fine-tuning thresholds for financial metrics
Implementing sector-specific criteria
Maximizing trading frequency to capture short-term price movements