Step 2: Collect Financial & Market Data
Now that you've defined your value investing criteria, the next step is to collect the financial and market data needed to evaluate companies. This data will serve as the foundation for your AI agent's analysis.
Data Sources for Value Investing
There are several reliable sources for financial and market data, ranging from free APIs to comprehensive financial databases. Here are the main categories:
Key Data Sources
- Financial APIs (Yahoo Finance, Alpha Vantage, Financial Modeling Prep, EOD Historical Data)
- Financial Reports (10-Ks and 10-Qs from SEC EDGAR database)
- Market Sentiment (News headlines, analyst reports, social media feeds)
Detailed Explanations
Financial APIs: These provide programmatic access to financial data, making them ideal for automated systems.
- Yahoo Finance: Offers a wide range of financial data including stock prices, financial statements, and key statistics. The unofficial Python library
yfinancemakes it easy to access this data. - Alpha Vantage: Provides real-time and historical stock data, forex, and cryptocurrency data. Offers a free API key with limited requests.
- Financial Modeling Prep: Provides financial statements, ratios, and other financial data. Offers both free and paid tiers.
- EOD Historical Data: Comprehensive financial data including end-of-day prices, fundamentals, and more.
Financial Reports: These provide detailed information about a company's financial health and operations.
- SEC EDGAR Database: Contains all the filings that public companies are required to submit to the U.S. Securities and Exchange Commission, including annual reports (10-K) and quarterly reports (10-Q).
Market Sentiment: These sources provide insights into market perception and sentiment.
- News Headlines: Financial news can provide context and sentiment around companies and markets.
- Analyst Reports: Professional analyses of companies and their prospects.
- Social Media: Platforms like Twitter can provide real-time sentiment data.
Using Python to Collect Financial Data
Python offers several libraries that make it easy to collect financial data from various sources. Let's look at how to use yfinance, one of the most popular libraries for accessing Yahoo Finance data.
# Install required libraries (run this once)
# pip install yfinance pandas matplotlib
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
# Define the ticker symbol and time period
ticker_symbol = "AAPL" # Apple Inc.
end_date = datetime.now()
start_date = end_date - timedelta(days=365) # 1 year of data
# Download stock data
stock_data = yf.download(ticker_symbol, start=start_date, end=end_date)
# Display the first few rows of data
print(f"Stock data for {ticker_symbol}:")
print(stock_data.head())
# Get company information
company_info = yf.Ticker(ticker_symbol)
# Get key financial metrics
print("\nKey Financial Metrics:")
print(f"Market Cap: ${company_info.info.get('marketCap', 'N/A'):,}")
print(f"P/E Ratio: {company_info.info.get('trailingPE', 'N/A')}")
print(f"Forward P/E: {company_info.info.get('forwardPE', 'N/A')}")
print(f"Price-to-Book: {company_info.info.get('priceToBook', 'N/A')}")
print(f"Dividend Yield: {company_info.info.get('dividendYield', 'N/A') * 100 if company_info.info.get('dividendYield') else 'N/A'}%")
print(f"Return on Equity: {company_info.info.get('returnOnEquity', 'N/A') * 100 if company_info.info.get('returnOnEquity') else 'N/A'}%")
print(f"Debt-to-Equity: {company_info.info.get('debtToEquity', 'N/A')}")
# Get income statement data
income_stmt = company_info.income_stmt
print("\nRecent Annual Revenue:")
if not income_stmt.empty:
print(income_stmt.loc['Total Revenue'].iloc[::-1])
# Get balance sheet data
balance_sheet = company_info.balance_sheet
print("\nRecent Annual Total Assets:")
if not balance_sheet.empty:
print(balance_sheet.loc['Total Assets'].iloc[::-1])
# Store data in a structured format (CSV in this example)
stock_data.to_csv(f"{ticker_symbol}_stock_data.csv")
print(f"\nStock data saved to {ticker_symbol}_stock_data.csv")
# Plot the stock price
plt.figure(figsize=(12, 6))
plt.plot(stock_data['Close'])
plt.title(f"{ticker_symbol} Stock Price - Past Year")
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.grid(True)
plt.tight_layout()
plt.savefig(f"{ticker_symbol}_stock_chart.png")
print(f"Stock chart saved to {ticker_symbol}_stock_chart.png")
Collecting Data from Multiple Sources
For a comprehensive analysis, you'll want to collect data from multiple sources. Here's an example of how to collect data from both Yahoo Finance and Alpha Vantage:
# Install required libraries (run this once)
# pip install yfinance alpha_vantage pandas
import yfinance as yf
from alpha_vantage.fundamentaldata import FundamentalData
import pandas as pd
import os
# Set up Alpha Vantage API (you would need to get your own API key)
# Sign up at: https://www.alphavantage.co/support/#api-key
alpha_vantage_api_key = "YOUR_API_KEY" # Replace with your actual API key
fd = FundamentalData(key=alpha_vantage_api_key)
# Define the ticker symbol
ticker_symbol = "MSFT" # Microsoft Corporation
# Create a directory to store the data
os.makedirs("financial_data", exist_ok=True)
# 1. Collect data from Yahoo Finance
print(f"Collecting data for {ticker_symbol} from Yahoo Finance...")
yf_ticker = yf.Ticker(ticker_symbol)
# Get company profile
company_profile = {
"Name": yf_ticker.info.get('longName', 'N/A'),
"Industry": yf_ticker.info.get('industry', 'N/A'),
"Sector": yf_ticker.info.get('sector', 'N/A'),
"Country": yf_ticker.info.get('country', 'N/A'),
"Website": yf_ticker.info.get('website', 'N/A'),
"Summary": yf_ticker.info.get('longBusinessSummary', 'N/A')
}
# Save company profile
pd.DataFrame([company_profile]).to_csv(f"financial_data/{ticker_symbol}_profile.csv", index=False)
print(f"Company profile saved to financial_data/{ticker_symbol}_profile.csv")
# Get financial ratios
financial_ratios = {
"P/E Ratio": yf_ticker.info.get('trailingPE', 'N/A'),
"Forward P/E": yf_ticker.info.get('forwardPE', 'N/A'),
"Price-to-Book": yf_ticker.info.get('priceToBook', 'N/A'),
"Price-to-Sales": yf_ticker.info.get('priceToSalesTrailing12Months', 'N/A'),
"Dividend Yield (%)": yf_ticker.info.get('dividendYield', 'N/A') * 100 if yf_ticker.info.get('dividendYield') else 'N/A',
"ROE (%)": yf_ticker.info.get('returnOnEquity', 'N/A') * 100 if yf_ticker.info.get('returnOnEquity') else 'N/A',
"ROA (%)": yf_ticker.info.get('returnOnAssets', 'N/A') * 100 if yf_ticker.info.get('returnOnAssets') else 'N/A',
"Debt-to-Equity": yf_ticker.info.get('debtToEquity', 'N/A')
}
# Save financial ratios
pd.DataFrame([financial_ratios]).to_csv(f"financial_data/{ticker_symbol}_ratios_yf.csv", index=False)
print(f"Financial ratios saved to financial_data/{ticker_symbol}_ratios_yf.csv")
# 2. Collect data from Alpha Vantage (if API key is provided)
if alpha_vantage_api_key != "YOUR_API_KEY":
print(f"\nCollecting data for {ticker_symbol} from Alpha Vantage...")
try:
# Get income statement
income_statement, _ = fd.get_income_statement_annual(ticker_symbol)
income_statement.to_csv(f"financial_data/{ticker_symbol}_income_statement_av.csv")
print(f"Income statement saved to financial_data/{ticker_symbol}_income_statement_av.csv")
# Get balance sheet
balance_sheet, _ = fd.get_balance_sheet_annual(ticker_symbol)
balance_sheet.to_csv(f"financial_data/{ticker_symbol}_balance_sheet_av.csv")
print(f"Balance sheet saved to financial_data/{ticker_symbol}_balance_sheet_av.csv")
# Get cash flow statement
cash_flow, _ = fd.get_cash_flow_annual(ticker_symbol)
cash_flow.to_csv(f"financial_data/{ticker_symbol}_cash_flow_av.csv")
print(f"Cash flow statement saved to financial_data/{ticker_symbol}_cash_flow_av.csv")
# Get company overview (contains many financial metrics)
overview, _ = fd.get_company_overview(ticker_symbol)
overview.to_csv(f"financial_data/{ticker_symbol}_overview_av.csv")
print(f"Company overview saved to financial_data/{ticker_symbol}_overview_av.csv")
except Exception as e:
print(f"Error collecting data from Alpha Vantage: {e}")
print("Note: Alpha Vantage has rate limits for free API keys.")
else:
print("\nSkipping Alpha Vantage data collection. Please provide a valid API key.")
print("\nData collection complete!")
Storing Financial Data
Once you've collected the data, you'll need to store it in a structured format for later analysis. Here are some common options:
Data Storage Options
Choose the storage option that best fits your project's needs:
CSV Files
Pros: Simple, portable, human-readable, works well with pandas
Cons: Not suitable for very large datasets, limited query capabilities
Best for: Small to medium-sized projects, prototyping
SQL Databases (e.g., MySQL, PostgreSQL)
Pros: Structured, efficient queries, good for relational data, ACID compliance
Cons: Requires database setup, schema design
Best for: Projects with complex relationships between data, when query performance is important
NoSQL Databases (e.g., MongoDB)
Pros: Flexible schema, good for semi-structured data, scalable
Cons: Less structured, may require more complex queries
Best for: Projects with evolving data structures, when storing JSON-like documents
Time Series Databases (e.g., InfluxDB)
Pros: Optimized for time-series data like stock prices
Cons: Specialized, may have a learning curve
Best for: Projects focused heavily on time-series analysis
Knowledge Check
Which of the following is NOT a common source of financial data for value investing analysis?
Which Python library is commonly used to access Yahoo Finance data?