Step 4: Feature Engineering
Feature engineering is a critical step in building your value investing AI agent. It involves creating new features or transforming existing ones to better represent the underlying patterns that determine a company's intrinsic value. Well-designed features can significantly improve your model's ability to identify undervalued companies.
What is Feature Engineering?
Feature engineering is the process of using domain knowledge to extract or create features (variables) from raw data that make machine learning algorithms work more effectively. For value investing, this means transforming raw financial data into meaningful indicators that align with value investing principles.
Key Value Investing Features
Here are some essential features to engineer for your value investing AI agent:
- Intrinsic Value Estimation (e.g., Discounted Cash Flow)
- Financial Ratios (ROE, P/E, P/B, etc.)
- Valuation Scores Based on Thresholds
- Sentiment Scores from NLP Analysis
Detailed Explanations
Intrinsic Value Estimation: Calculating what a company is truly worth based on its fundamentals, often using methods like Discounted Cash Flow (DCF) analysis.
Financial Ratios: Derived metrics that provide insights into a company's financial health, profitability, and valuation.
Valuation Scores: Composite scores created by applying thresholds to various metrics (e.g., P/E < 15 → score +1).
Sentiment Scores: Numerical representations of market sentiment derived from news articles, financial reports, and social media.
Calculating Intrinsic Value with DCF
One of the most important features for value investing is an estimate of a company's intrinsic value. The Discounted Cash Flow (DCF) method is a popular approach for this calculation:
# Install required libraries (run this once)
# pip install pandas numpy matplotlib
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
def calculate_dcf(
ticker,
current_fcf,
growth_rate_years_1_5,
growth_rate_years_6_10,
terminal_growth_rate,
discount_rate,
shares_outstanding,
current_price=None
):
"""
Calculate intrinsic value using Discounted Cash Flow method.
Parameters:
-----------
ticker : str
Company ticker symbol
current_fcf : float
Current annual free cash flow in millions
growth_rate_years_1_5 : float
Expected annual growth rate for years 1-5 (e.g., 0.15 for 15%)
growth_rate_years_6_10 : float
Expected annual growth rate for years 6-10 (e.g., 0.10 for 10%)
terminal_growth_rate : float
Expected perpetual growth rate after year 10 (e.g., 0.03 for 3%)
discount_rate : float
Required rate of return (e.g., 0.10 for 10%)
shares_outstanding : float
Number of shares outstanding in millions
current_price : float, optional
Current market price per share
Returns:
--------
dict
Dictionary containing DCF results
"""
# Initialize lists to store values
years = list(range(1, 11))
fcf_projections = []
present_values = []
# Project FCF for years 1-5
fcf = current_fcf
for _ in range(5):
fcf = fcf * (1 + growth_rate_years_1_5)
fcf_projections.append(fcf)
present_values.append(fcf / ((1 + discount_rate) ** len(fcf_projections)))
# Project FCF for years 6-10
for _ in range(5):
fcf = fcf * (1 + growth_rate_years_6_10)
fcf_projections.append(fcf)
present_values.append(fcf / ((1 + discount_rate) ** len(fcf_projections)))
# Calculate terminal value
terminal_value = fcf_projections[-1] * (1 + terminal_growth_rate) / (discount_rate - terminal_growth_rate)
# Discount terminal value to present
discounted_terminal_value = terminal_value / ((1 + discount_rate) ** 10)
# Sum all present values
total_present_value = sum(present_values) + discounted_terminal_value
# Calculate intrinsic value per share
intrinsic_value_per_share = total_present_value / shares_outstanding
# Calculate margin of safety if current price is provided
margin_of_safety = None
if current_price:
margin_of_safety = (intrinsic_value_per_share - current_price) / intrinsic_value_per_share * 100
# Prepare results
results = {
'ticker': ticker,
'current_fcf_millions': current_fcf,
'projected_fcf_millions': fcf_projections,
'present_values_millions': present_values,
'terminal_value_millions': terminal_value,
'discounted_terminal_value_millions': discounted_terminal_value,
'total_present_value_millions': total_present_value,
'shares_outstanding_millions': shares_outstanding,
'intrinsic_value_per_share': intrinsic_value_per_share,
'current_price': current_price,
'margin_of_safety_percent': margin_of_safety
}
return results
# Example usage: Calculate DCF for a fictional company
dcf_results = calculate_dcf(
ticker='EXCO',
current_fcf=1000, # $1 billion in FCF
growth_rate_years_1_5=0.15, # 15% growth for first 5 years
growth_rate_years_6_10=0.08, # 8% growth for years 6-10
terminal_growth_rate=0.03, # 3% perpetual growth
discount_rate=0.10, # 10% discount rate
shares_outstanding=500, # 500 million shares
current_price=75 # $75 per share
)
# Print key results
print(f"DCF Analysis for {dcf_results['ticker']}")
print(f"Current Free Cash Flow: ${dcf_results['current_fcf_millions']} million")
print(f"Intrinsic Value per Share: ${dcf_results['intrinsic_value_per_share']:.2f}")
if dcf_results['current_price']:
print(f"Current Market Price: ${dcf_results['current_price']:.2f}")
print(f"Margin of Safety: {dcf_results['margin_of_safety_percent']:.2f}%")
if dcf_results['margin_of_safety_percent'] > 0:
print("Verdict: Potentially Undervalued")
else:
print("Verdict: Potentially Overvalued")
# Visualize the projected cash flows
plt.figure(figsize=(12, 6))
years = list(range(1, 11))
plt.bar(years, dcf_results['projected_fcf_millions'], color='skyblue')
plt.title(f"Projected Free Cash Flow for {dcf_results['ticker']}")
plt.xlabel('Year')
plt.ylabel('Free Cash Flow ($ millions)')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.savefig('projected_fcf.png')
print("Projected FCF chart saved as 'projected_fcf.png'")
# Create a waterfall chart to show DCF components
components = [
'Sum of Discounted FCF',
'Discounted Terminal Value',
'Total Present Value'
]
values = [
sum(dcf_results['present_values_millions']),
dcf_results['discounted_terminal_value_millions'],
dcf_results['total_present_value_millions']
]
plt.figure(figsize=(10, 6))
plt.bar(components[0], values[0], color='lightblue')
plt.bar(components[1], values[1], color='lightgreen')
plt.bar(components[2], values[2], color='coral')
plt.title(f"DCF Components for {dcf_results['ticker']}")
plt.ylabel('Value ($ millions)')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.savefig('dcf_components.png')
print("DCF components chart saved as 'dcf_components.png'")
Creating a Composite Value Score
A composite value score combines multiple financial metrics into a single score that represents how well a company aligns with value investing principles:
# Install required libraries (run this once)
# pip install pandas numpy matplotlib seaborn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Sample data for multiple companies
companies_data = {
'Ticker': ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META', 'TSLA', 'NVDA', 'JPM', 'V', 'JNJ'],
'Name': ['Apple Inc.', 'Microsoft Corp.', 'Alphabet Inc.', 'Amazon.com Inc.', 'Meta Platforms Inc.',
'Tesla Inc.', 'NVIDIA Corp.', 'JPMorgan Chase & Co.', 'Visa Inc.', 'Johnson & Johnson'],
'PE_Ratio': [25.6, 30.2, 22.8, 40.5, 18.7, 55.3, 45.8, 12.3, 28.9, 15.6],
'PB_Ratio': [35.2, 12.8, 5.3, 10.2, 4.8, 15.7, 25.3, 1.5, 12.8, 5.2],
'ROE': [0.35, 0.42, 0.25, 0.22, 0.18, 0.15, 0.38, 0.12, 0.32, 0.21],
'Debt_to_Equity': [1.2, 0.5, 0.3, 0.8, 0.4, 0.6, 0.2, 2.5, 0.7, 0.9],
'FCF_Yield': [0.03, 0.025, 0.02, 0.015, 0.035, 0.01, 0.02, 0.045, 0.03, 0.04],
'Dividend_Yield': [0.005, 0.008, 0.0, 0.0, 0.0, 0.0, 0.001, 0.03, 0.007, 0.025],
'Sentiment_Score': [0.65, 0.72, 0.58, 0.62, 0.45, 0.52, 0.78, 0.55, 0.60, 0.63]
}
# Create DataFrame
df = pd.DataFrame(companies_data)
print("Company Financial Data:")
print(df[['Ticker', 'Name', 'PE_Ratio', 'PB_Ratio', 'ROE']].head())
# Define value investing criteria and thresholds
value_criteria = {
# Price ratios (lower is better)
'PE_Ratio': {'max': 20, 'weight': 0.15, 'better': 'lower'},
'PB_Ratio': {'max': 3, 'weight': 0.15, 'better': 'lower'},
# Profitability (higher is better)
'ROE': {'min': 0.15, 'weight': 0.15, 'better': 'higher'},
# Debt metrics (lower is better)
'Debt_to_Equity': {'max': 1.0, 'weight': 0.1, 'better': 'lower'},
# Cash flow metrics (higher is better)
'FCF_Yield': {'min': 0.02, 'weight': 0.15, 'better': 'higher'},
# Dividend metrics (higher is better)
'Dividend_Yield': {'min': 0.01, 'weight': 0.1, 'better': 'higher'},
# Sentiment (higher is better)
'Sentiment_Score': {'min': 0.5, 'weight': 0.2, 'better': 'higher'}
}
# Function to calculate value score for each criterion
def calculate_criterion_score(value, criterion_details):
if criterion_details['better'] == 'lower':
# For metrics where lower is better (like P/E ratio)
if 'max' in criterion_details:
if value <= criterion_details['max']:
# Scale the score based on how much below the max it is
# The lower, the better (up to a reasonable minimum)
reasonable_min = criterion_details['max'] * 0.2 # Assume 20% of max is a reasonable minimum
normalized = 1 - max(0, min(1, (value - reasonable_min) / (criterion_details['max'] - reasonable_min)))
return normalized * criterion_details['weight']
else:
return 0
else: # 'higher' is better
# For metrics where higher is better (like ROE)
if 'min' in criterion_details:
if value >= criterion_details['min']:
# Scale the score based on how much above the min it is
# The higher, the better (up to a reasonable maximum)
reasonable_max = criterion_details['min'] * 3 # Assume 3x min is a reasonable maximum
normalized = min(1, (value - criterion_details['min']) / (reasonable_max - criterion_details['min']))
return normalized * criterion_details['weight']
else:
return 0
return 0 # Default case
# Calculate value scores for each company
def calculate_value_score(company_data):
scores = {}
total_score = 0
for criterion, details in value_criteria.items():
if criterion in company_data:
criterion_score = calculate_criterion_score(company_data[criterion], details)
scores[criterion] = criterion_score
total_score += criterion_score
# Normalize to 0-100 scale
normalized_score = total_score * 100
return scores, normalized_score
# Apply scoring to each company
df['Criterion_Scores'] = df.apply(lambda row: calculate_value_score(row)[0], axis=1)
df['Value_Score'] = df.apply(lambda row: calculate_value_score(row)[1], axis=1)
# Sort by value score (descending)
df_sorted = df.sort_values('Value_Score', ascending=False).reset_index(drop=True)
print("\nCompanies Ranked by Value Score:")
print(df_sorted[['Ticker', 'Name', 'Value_Score']].head(10))
# Print detailed breakdown for top company
top_company = df_sorted.iloc[0]
print(f"\nDetailed Breakdown for Top Value Company: {top_company['Name']} ({top_company['Ticker']})")
print(f"Overall Value Score: {top_company['Value_Score']:.2f}/100")
print("\nIndividual Criterion Scores:")
for criterion, score in top_company['Criterion_Scores'].items():
max_score = value_criteria[criterion]['weight'] * 100
print(f"- {criterion}: {score*100:.2f}/{max_score:.2f}")
# Visualize value scores
plt.figure(figsize=(12, 6))
sns.barplot(x='Ticker', y='Value_Score', data=df_sorted)
plt.title('Companies Ranked by Value Score')
plt.xlabel('Company')
plt.ylabel('Value Score (0-100)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('value_scores.png')
print("\nValue scores chart saved as 'value_scores.png'")
# Create a heatmap of the criteria scores for all companies
plt.figure(figsize=(14, 8))
heatmap_data = pd.DataFrame(df_sorted['Criterion_Scores'].tolist(), index=df_sorted['Ticker']).fillna(0)
# Normalize for better visualization
for col in heatmap_data.columns:
heatmap_data[col] = heatmap_data[col] / value_criteria[col]['weight']
sns.heatmap(heatmap_data, annot=True, cmap='YlGnBu', fmt='.2f')
plt.title('Value Investing Criteria Scores by Company')
plt.tight_layout()
plt.savefig('criteria_heatmap.png')
print("Criteria heatmap saved as 'criteria_heatmap.png'")
# Save the results
df_sorted.to_csv('value_investing_scores.csv', index=False)
print("\nValue investing scores saved to 'value_investing_scores.csv'")
Incorporating Sentiment Analysis
Sentiment analysis from news and financial reports can provide valuable context for your value investing decisions. Here's how to incorporate sentiment into your feature set:
Sentiment Integration Approaches
Consider these approaches for incorporating sentiment into your value investing model:
1. Direct Feature Integration
Include sentiment scores directly as features in your model:
# Example of adding sentiment as a direct feature
features = [
'PE_Ratio', 'PB_Ratio', 'ROE', 'Debt_to_Equity',
'FCF_Yield', 'Dividend_Yield', 'Sentiment_Score'
]
2. Sentiment-Adjusted Metrics
Adjust traditional financial metrics based on sentiment:
# Example of sentiment-adjusted P/E ratio
def calculate_sentiment_adjusted_pe(pe_ratio, sentiment_score):
# Normalize sentiment to 0.5-1.5 range (0.5 for negative, 1.5 for positive)
sentiment_factor = 0.5 + sentiment_score
# Adjust P/E ratio (lower for positive sentiment, higher for negative)
adjusted_pe = pe_ratio / sentiment_factor
return adjusted_pe
3. Sentiment as a Filter
Use sentiment as a secondary filter after financial analysis:
# Example of using sentiment as a filter
def apply_sentiment_filter(companies, sentiment_threshold=0.5):
return companies[companies['Sentiment_Score'] >= sentiment_threshold]
4. Time Series Sentiment Trends
Track sentiment changes over time for more nuanced analysis:
# Example of tracking sentiment trends
def calculate_sentiment_momentum(sentiment_history):
# Calculate the trend in sentiment over the last n periods
return np.polyfit(range(len(sentiment_history)), sentiment_history, 1)[0]
Knowledge Check
What is the primary purpose of the Discounted Cash Flow (DCF) method in value investing?
In a composite value score, why might you assign different weights to different criteria?