Building an AI Agent for Value Investing

A step-by-step guide for beginners

Step 2: Collect Financial & Market Data

Now that you've defined your value investing criteria, the next step is to collect the financial and market data needed to evaluate companies. This data will serve as the foundation for your AI agent's analysis.

Data Sources for Value Investing

There are several reliable sources for financial and market data, ranging from free APIs to comprehensive financial databases. Here are the main categories:

Key Data Sources

  • Financial APIs (Yahoo Finance, Alpha Vantage, Financial Modeling Prep, EOD Historical Data)
  • Financial Reports (10-Ks and 10-Qs from SEC EDGAR database)
  • Market Sentiment (News headlines, analyst reports, social media feeds)

Detailed Explanations

Financial APIs: These provide programmatic access to financial data, making them ideal for automated systems.

  • Yahoo Finance: Offers a wide range of financial data including stock prices, financial statements, and key statistics. The unofficial Python library yfinance makes it easy to access this data.
  • Alpha Vantage: Provides real-time and historical stock data, forex, and cryptocurrency data. Offers a free API key with limited requests.
  • Financial Modeling Prep: Provides financial statements, ratios, and other financial data. Offers both free and paid tiers.
  • EOD Historical Data: Comprehensive financial data including end-of-day prices, fundamentals, and more.

Financial Reports: These provide detailed information about a company's financial health and operations.

  • SEC EDGAR Database: Contains all the filings that public companies are required to submit to the U.S. Securities and Exchange Commission, including annual reports (10-K) and quarterly reports (10-Q).

Market Sentiment: These sources provide insights into market perception and sentiment.

  • News Headlines: Financial news can provide context and sentiment around companies and markets.
  • Analyst Reports: Professional analyses of companies and their prospects.
  • Social Media: Platforms like Twitter can provide real-time sentiment data.

Using Python to Collect Financial Data

Python offers several libraries that make it easy to collect financial data from various sources. Let's look at how to use yfinance, one of the most popular libraries for accessing Yahoo Finance data.

Python: Collecting Stock Data with yfinance
# Install required libraries (run this once)
# pip install yfinance pandas matplotlib

import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

# Define the ticker symbol and time period
ticker_symbol = "AAPL"  # Apple Inc.
end_date = datetime.now()
start_date = end_date - timedelta(days=365)  # 1 year of data

# Download stock data
stock_data = yf.download(ticker_symbol, start=start_date, end=end_date)

# Display the first few rows of data
print(f"Stock data for {ticker_symbol}:")
print(stock_data.head())

# Get company information
company_info = yf.Ticker(ticker_symbol)

# Get key financial metrics
print("\nKey Financial Metrics:")
print(f"Market Cap: ${company_info.info.get('marketCap', 'N/A'):,}")
print(f"P/E Ratio: {company_info.info.get('trailingPE', 'N/A')}")
print(f"Forward P/E: {company_info.info.get('forwardPE', 'N/A')}")
print(f"Price-to-Book: {company_info.info.get('priceToBook', 'N/A')}")
print(f"Dividend Yield: {company_info.info.get('dividendYield', 'N/A') * 100 if company_info.info.get('dividendYield') else 'N/A'}%")
print(f"Return on Equity: {company_info.info.get('returnOnEquity', 'N/A') * 100 if company_info.info.get('returnOnEquity') else 'N/A'}%")
print(f"Debt-to-Equity: {company_info.info.get('debtToEquity', 'N/A')}")

# Get income statement data
income_stmt = company_info.income_stmt
print("\nRecent Annual Revenue:")
if not income_stmt.empty:
    print(income_stmt.loc['Total Revenue'].iloc[::-1])

# Get balance sheet data
balance_sheet = company_info.balance_sheet
print("\nRecent Annual Total Assets:")
if not balance_sheet.empty:
    print(balance_sheet.loc['Total Assets'].iloc[::-1])

# Store data in a structured format (CSV in this example)
stock_data.to_csv(f"{ticker_symbol}_stock_data.csv")
print(f"\nStock data saved to {ticker_symbol}_stock_data.csv")

# Plot the stock price
plt.figure(figsize=(12, 6))
plt.plot(stock_data['Close'])
plt.title(f"{ticker_symbol} Stock Price - Past Year")
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.grid(True)
plt.tight_layout()
plt.savefig(f"{ticker_symbol}_stock_chart.png")
print(f"Stock chart saved to {ticker_symbol}_stock_chart.png")

Collecting Data from Multiple Sources

For a comprehensive analysis, you'll want to collect data from multiple sources. Here's an example of how to collect data from both Yahoo Finance and Alpha Vantage:

Python: Collecting Data from Multiple Sources
# Install required libraries (run this once)
# pip install yfinance alpha_vantage pandas

import yfinance as yf
from alpha_vantage.fundamentaldata import FundamentalData
import pandas as pd
import os

# Set up Alpha Vantage API (you would need to get your own API key)
# Sign up at: https://www.alphavantage.co/support/#api-key
alpha_vantage_api_key = "YOUR_API_KEY"  # Replace with your actual API key
fd = FundamentalData(key=alpha_vantage_api_key)

# Define the ticker symbol
ticker_symbol = "MSFT"  # Microsoft Corporation

# Create a directory to store the data
os.makedirs("financial_data", exist_ok=True)

# 1. Collect data from Yahoo Finance
print(f"Collecting data for {ticker_symbol} from Yahoo Finance...")
yf_ticker = yf.Ticker(ticker_symbol)

# Get company profile
company_profile = {
    "Name": yf_ticker.info.get('longName', 'N/A'),
    "Industry": yf_ticker.info.get('industry', 'N/A'),
    "Sector": yf_ticker.info.get('sector', 'N/A'),
    "Country": yf_ticker.info.get('country', 'N/A'),
    "Website": yf_ticker.info.get('website', 'N/A'),
    "Summary": yf_ticker.info.get('longBusinessSummary', 'N/A')
}

# Save company profile
pd.DataFrame([company_profile]).to_csv(f"financial_data/{ticker_symbol}_profile.csv", index=False)
print(f"Company profile saved to financial_data/{ticker_symbol}_profile.csv")

# Get financial ratios
financial_ratios = {
    "P/E Ratio": yf_ticker.info.get('trailingPE', 'N/A'),
    "Forward P/E": yf_ticker.info.get('forwardPE', 'N/A'),
    "Price-to-Book": yf_ticker.info.get('priceToBook', 'N/A'),
    "Price-to-Sales": yf_ticker.info.get('priceToSalesTrailing12Months', 'N/A'),
    "Dividend Yield (%)": yf_ticker.info.get('dividendYield', 'N/A') * 100 if yf_ticker.info.get('dividendYield') else 'N/A',
    "ROE (%)": yf_ticker.info.get('returnOnEquity', 'N/A') * 100 if yf_ticker.info.get('returnOnEquity') else 'N/A',
    "ROA (%)": yf_ticker.info.get('returnOnAssets', 'N/A') * 100 if yf_ticker.info.get('returnOnAssets') else 'N/A',
    "Debt-to-Equity": yf_ticker.info.get('debtToEquity', 'N/A')
}

# Save financial ratios
pd.DataFrame([financial_ratios]).to_csv(f"financial_data/{ticker_symbol}_ratios_yf.csv", index=False)
print(f"Financial ratios saved to financial_data/{ticker_symbol}_ratios_yf.csv")

# 2. Collect data from Alpha Vantage (if API key is provided)
if alpha_vantage_api_key != "YOUR_API_KEY":
    print(f"\nCollecting data for {ticker_symbol} from Alpha Vantage...")
    
    try:
        # Get income statement
        income_statement, _ = fd.get_income_statement_annual(ticker_symbol)
        income_statement.to_csv(f"financial_data/{ticker_symbol}_income_statement_av.csv")
        print(f"Income statement saved to financial_data/{ticker_symbol}_income_statement_av.csv")
        
        # Get balance sheet
        balance_sheet, _ = fd.get_balance_sheet_annual(ticker_symbol)
        balance_sheet.to_csv(f"financial_data/{ticker_symbol}_balance_sheet_av.csv")
        print(f"Balance sheet saved to financial_data/{ticker_symbol}_balance_sheet_av.csv")
        
        # Get cash flow statement
        cash_flow, _ = fd.get_cash_flow_annual(ticker_symbol)
        cash_flow.to_csv(f"financial_data/{ticker_symbol}_cash_flow_av.csv")
        print(f"Cash flow statement saved to financial_data/{ticker_symbol}_cash_flow_av.csv")
        
        # Get company overview (contains many financial metrics)
        overview, _ = fd.get_company_overview(ticker_symbol)
        overview.to_csv(f"financial_data/{ticker_symbol}_overview_av.csv")
        print(f"Company overview saved to financial_data/{ticker_symbol}_overview_av.csv")
    
    except Exception as e:
        print(f"Error collecting data from Alpha Vantage: {e}")
        print("Note: Alpha Vantage has rate limits for free API keys.")
else:
    print("\nSkipping Alpha Vantage data collection. Please provide a valid API key.")

print("\nData collection complete!")

Storing Financial Data

Once you've collected the data, you'll need to store it in a structured format for later analysis. Here are some common options:

Data Storage Options

Choose the storage option that best fits your project's needs:

CSV Files

Pros: Simple, portable, human-readable, works well with pandas

Cons: Not suitable for very large datasets, limited query capabilities

Best for: Small to medium-sized projects, prototyping

SQL Databases (e.g., MySQL, PostgreSQL)

Pros: Structured, efficient queries, good for relational data, ACID compliance

Cons: Requires database setup, schema design

Best for: Projects with complex relationships between data, when query performance is important

NoSQL Databases (e.g., MongoDB)

Pros: Flexible schema, good for semi-structured data, scalable

Cons: Less structured, may require more complex queries

Best for: Projects with evolving data structures, when storing JSON-like documents

Time Series Databases (e.g., InfluxDB)

Pros: Optimized for time-series data like stock prices

Cons: Specialized, may have a learning curve

Best for: Projects focused heavily on time-series analysis

Knowledge Check

Which of the following is NOT a common source of financial data for value investing analysis?

  • Yahoo Finance API
  • SEC EDGAR Database
  • Weather Data Services
  • Financial News Headlines

Which Python library is commonly used to access Yahoo Finance data?

  • yfinance
  • pandas-finance
  • yahoo-api
  • finance-py
22%
Introduction Completed
Step 1 Completed
Step 2 Current
Steps 3-9 Pending