Step 4: Feature Engineering

Feature engineering is a critical step in building your value investing AI agent. It involves creating new features or transforming existing ones to better represent the underlying patterns that determine a company's intrinsic value. Well-designed features can significantly improve your model's ability to identify undervalued companies.

What is Feature Engineering?

Feature engineering is the process of using domain knowledge to extract or create features (variables) from raw data that make machine learning algorithms work more effectively. For value investing, this means transforming raw financial data into meaningful indicators that align with value investing principles.

Key Value Investing Features

Here are some essential features to engineer for your value investing AI agent:

Intrinsic Value Estimation (e.g., Discounted Cash Flow)
Financial Ratios (ROE, P/E, P/B, etc.)
Valuation Scores Based on Thresholds
Sentiment Scores from NLP Analysis

Detailed Explanations

Intrinsic Value Estimation: Calculating what a company is truly worth based on its fundamentals, often using methods like Discounted Cash Flow (DCF) analysis.

Financial Ratios: Derived metrics that provide insights into a company's financial health, profitability, and valuation.

Valuation Scores: Composite scores created by applying thresholds to various metrics (e.g., P/E < 15 → score +1).

Sentiment Scores: Numerical representations of market sentiment derived from news articles, financial reports, and social media.

Calculating Intrinsic Value with DCF

One of the most important features for value investing is an estimate of a company's intrinsic value. The Discounted Cash Flow (DCF) method is a popular approach for this calculation:

Python: Discounted Cash Flow (DCF) Calculation

# Install required libraries (run this once)
# pip install pandas numpy matplotlib

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

def calculate_dcf(
    ticker,
    current_fcf,
    growth_rate_years_1_5,
    growth_rate_years_6_10,
    terminal_growth_rate,
    discount_rate,
    shares_outstanding,
    current_price=None
):
    """
    Calculate intrinsic value using Discounted Cash Flow method.
    
    Parameters:
    -----------
    ticker : str
        Company ticker symbol
    current_fcf : float
        Current annual free cash flow in millions
    growth_rate_years_1_5 : float
        Expected annual growth rate for years 1-5 (e.g., 0.15 for 15%)
    growth_rate_years_6_10 : float
        Expected annual growth rate for years 6-10 (e.g., 0.10 for 10%)
    terminal_growth_rate : float
        Expected perpetual growth rate after year 10 (e.g., 0.03 for 3%)
    discount_rate : float
        Required rate of return (e.g., 0.10 for 10%)
    shares_outstanding : float
        Number of shares outstanding in millions
    current_price : float, optional
        Current market price per share
        
    Returns:
    --------
    dict
        Dictionary containing DCF results
    """
    # Initialize lists to store values
    years = list(range(1, 11))
    fcf_projections = []
    present_values = []
    
    # Project FCF for years 1-5
    fcf = current_fcf
    for _ in range(5):
        fcf = fcf * (1 + growth_rate_years_1_5)
        fcf_projections.append(fcf)
        present_values.append(fcf / ((1 + discount_rate) ** len(fcf_projections)))
    
    # Project FCF for years 6-10
    for _ in range(5):
        fcf = fcf * (1 + growth_rate_years_6_10)
        fcf_projections.append(fcf)
        present_values.append(fcf / ((1 + discount_rate) ** len(fcf_projections)))
    
    # Calculate terminal value
    terminal_value = fcf_projections[-1] * (1 + terminal_growth_rate) / (discount_rate - terminal_growth_rate)
    
    # Discount terminal value to present
    discounted_terminal_value = terminal_value / ((1 + discount_rate) ** 10)
    
    # Sum all present values
    total_present_value = sum(present_values) + discounted_terminal_value
    
    # Calculate intrinsic value per share
    intrinsic_value_per_share = total_present_value / shares_outstanding
    
    # Calculate margin of safety if current price is provided
    margin_of_safety = None
    if current_price:
        margin_of_safety = (intrinsic_value_per_share - current_price) / intrinsic_value_per_share * 100
    
    # Prepare results
    results = {
        'ticker': ticker,
        'current_fcf_millions': current_fcf,
        'projected_fcf_millions': fcf_projections,
        'present_values_millions': present_values,
        'terminal_value_millions': terminal_value,
        'discounted_terminal_value_millions': discounted_terminal_value,
        'total_present_value_millions': total_present_value,
        'shares_outstanding_millions': shares_outstanding,
        'intrinsic_value_per_share': intrinsic_value_per_share,
        'current_price': current_price,
        'margin_of_safety_percent': margin_of_safety
    }
    
    return results

# Example usage: Calculate DCF for a fictional company
dcf_results = calculate_dcf(
    ticker='EXCO',
    current_fcf=1000,  # $1 billion in FCF
    growth_rate_years_1_5=0.15,  # 15% growth for first 5 years
    growth_rate_years_6_10=0.08,  # 8% growth for years 6-10
    terminal_growth_rate=0.03,  # 3% perpetual growth
    discount_rate=0.10,  # 10% discount rate
    shares_outstanding=500,  # 500 million shares
    current_price=75  # $75 per share
)

# Print key results
print(f"DCF Analysis for {dcf_results['ticker']}")
print(f"Current Free Cash Flow: ${dcf_results['current_fcf_millions']} million")
print(f"Intrinsic Value per Share: ${dcf_results['intrinsic_value_per_share']:.2f}")
if dcf_results['current_price']:
    print(f"Current Market Price: ${dcf_results['current_price']:.2f}")
    print(f"Margin of Safety: {dcf_results['margin_of_safety_percent']:.2f}%")
    if dcf_results['margin_of_safety_percent'] > 0:
        print("Verdict: Potentially Undervalued")
    else:
        print("Verdict: Potentially Overvalued")

# Visualize the projected cash flows
plt.figure(figsize=(12, 6))
years = list(range(1, 11))
plt.bar(years, dcf_results['projected_fcf_millions'], color='skyblue')
plt.title(f"Projected Free Cash Flow for {dcf_results['ticker']}")
plt.xlabel('Year')
plt.ylabel('Free Cash Flow ($ millions)')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.savefig('projected_fcf.png')
print("Projected FCF chart saved as 'projected_fcf.png'")

# Create a waterfall chart to show DCF components
components = [
    'Sum of Discounted FCF',
    'Discounted Terminal Value',
    'Total Present Value'
]
values = [
    sum(dcf_results['present_values_millions']),
    dcf_results['discounted_terminal_value_millions'],
    dcf_results['total_present_value_millions']
]

plt.figure(figsize=(10, 6))
plt.bar(components[0], values[0], color='lightblue')
plt.bar(components[1], values[1], color='lightgreen')
plt.bar(components[2], values[2], color='coral')
plt.title(f"DCF Components for {dcf_results['ticker']}")
plt.ylabel('Value ($ millions)')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.savefig('dcf_components.png')
print("DCF components chart saved as 'dcf_components.png'")

Creating a Composite Value Score

A composite value score combines multiple financial metrics into a single score that represents how well a company aligns with value investing principles:

Python: Creating a Composite Value Score

# Install required libraries (run this once)
# pip install pandas numpy matplotlib seaborn

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Sample data for multiple companies
companies_data = {
    'Ticker': ['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META', 'TSLA', 'NVDA', 'JPM', 'V', 'JNJ'],
    'Name': ['Apple Inc.', 'Microsoft Corp.', 'Alphabet Inc.', 'Amazon.com Inc.', 'Meta Platforms Inc.', 
             'Tesla Inc.', 'NVIDIA Corp.', 'JPMorgan Chase & Co.', 'Visa Inc.', 'Johnson & Johnson'],
    'PE_Ratio': [25.6, 30.2, 22.8, 40.5, 18.7, 55.3, 45.8, 12.3, 28.9, 15.6],
    'PB_Ratio': [35.2, 12.8, 5.3, 10.2, 4.8, 15.7, 25.3, 1.5, 12.8, 5.2],
    'ROE': [0.35, 0.42, 0.25, 0.22, 0.18, 0.15, 0.38, 0.12, 0.32, 0.21],
    'Debt_to_Equity': [1.2, 0.5, 0.3, 0.8, 0.4, 0.6, 0.2, 2.5, 0.7, 0.9],
    'FCF_Yield': [0.03, 0.025, 0.02, 0.015, 0.035, 0.01, 0.02, 0.045, 0.03, 0.04],
    'Dividend_Yield': [0.005, 0.008, 0.0, 0.0, 0.0, 0.0, 0.001, 0.03, 0.007, 0.025],
    'Sentiment_Score': [0.65, 0.72, 0.58, 0.62, 0.45, 0.52, 0.78, 0.55, 0.60, 0.63]
}

# Create DataFrame
df = pd.DataFrame(companies_data)
print("Company Financial Data:")
print(df[['Ticker', 'Name', 'PE_Ratio', 'PB_Ratio', 'ROE']].head())

# Define value investing criteria and thresholds
value_criteria = {
    # Price ratios (lower is better)
    'PE_Ratio': {'max': 20, 'weight': 0.15, 'better': 'lower'},
    'PB_Ratio': {'max': 3, 'weight': 0.15, 'better': 'lower'},
    
    # Profitability (higher is better)
    'ROE': {'min': 0.15, 'weight': 0.15, 'better': 'higher'},
    
    # Debt metrics (lower is better)
    'Debt_to_Equity': {'max': 1.0, 'weight': 0.1, 'better': 'lower'},
    
    # Cash flow metrics (higher is better)
    'FCF_Yield': {'min': 0.02, 'weight': 0.15, 'better': 'higher'},
    
    # Dividend metrics (higher is better)
    'Dividend_Yield': {'min': 0.01, 'weight': 0.1, 'better': 'higher'},
    
    # Sentiment (higher is better)
    'Sentiment_Score': {'min': 0.5, 'weight': 0.2, 'better': 'higher'}
}

# Function to calculate value score for each criterion
def calculate_criterion_score(value, criterion_details):
    if criterion_details['better'] == 'lower':
        # For metrics where lower is better (like P/E ratio)
        if 'max' in criterion_details:
            if value <= criterion_details['max']:
                # Scale the score based on how much below the max it is
                # The lower, the better (up to a reasonable minimum)
                reasonable_min = criterion_details['max'] * 0.2  # Assume 20% of max is a reasonable minimum
                normalized = 1 - max(0, min(1, (value - reasonable_min) / (criterion_details['max'] - reasonable_min)))
                return normalized * criterion_details['weight']
            else:
                return 0
    else:  # 'higher' is better
        # For metrics where higher is better (like ROE)
        if 'min' in criterion_details:
            if value >= criterion_details['min']:
                # Scale the score based on how much above the min it is
                # The higher, the better (up to a reasonable maximum)
                reasonable_max = criterion_details['min'] * 3  # Assume 3x min is a reasonable maximum
                normalized = min(1, (value - criterion_details['min']) / (reasonable_max - criterion_details['min']))
                return normalized * criterion_details['weight']
            else:
                return 0
    
    return 0  # Default case

# Calculate value scores for each company
def calculate_value_score(company_data):
    scores = {}
    total_score = 0
    
    for criterion, details in value_criteria.items():
        if criterion in company_data:
            criterion_score = calculate_criterion_score(company_data[criterion], details)
            scores[criterion] = criterion_score
            total_score += criterion_score
    
    # Normalize to 0-100 scale
    normalized_score = total_score * 100
    
    return scores, normalized_score

# Apply scoring to each company
df['Criterion_Scores'] = df.apply(lambda row: calculate_value_score(row)[0], axis=1)
df['Value_Score'] = df.apply(lambda row: calculate_value_score(row)[1], axis=1)

# Sort by value score (descending)
df_sorted = df.sort_values('Value_Score', ascending=False).reset_index(drop=True)

print("\nCompanies Ranked by Value Score:")
print(df_sorted[['Ticker', 'Name', 'Value_Score']].head(10))

# Print detailed breakdown for top company
top_company = df_sorted.iloc[0]
print(f"\nDetailed Breakdown for Top Value Company: {top_company['Name']} ({top_company['Ticker']})")
print(f"Overall Value Score: {top_company['Value_Score']:.2f}/100")
print("\nIndividual Criterion Scores:")
for criterion, score in top_company['Criterion_Scores'].items():
    max_score = value_criteria[criterion]['weight'] * 100
    print(f"- {criterion}: {score*100:.2f}/{max_score:.2f}")

# Visualize value scores
plt.figure(figsize=(12, 6))
sns.barplot(x='Ticker', y='Value_Score', data=df_sorted)
plt.title('Companies Ranked by Value Score')
plt.xlabel('Company')
plt.ylabel('Value Score (0-100)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('value_scores.png')
print("\nValue scores chart saved as 'value_scores.png'")

# Create a heatmap of the criteria scores for all companies
plt.figure(figsize=(14, 8))
heatmap_data = pd.DataFrame(df_sorted['Criterion_Scores'].tolist(), index=df_sorted['Ticker']).fillna(0)
# Normalize for better visualization
for col in heatmap_data.columns:
    heatmap_data[col] = heatmap_data[col] / value_criteria[col]['weight']
sns.heatmap(heatmap_data, annot=True, cmap='YlGnBu', fmt='.2f')
plt.title('Value Investing Criteria Scores by Company')
plt.tight_layout()
plt.savefig('criteria_heatmap.png')
print("Criteria heatmap saved as 'criteria_heatmap.png'")

# Save the results
df_sorted.to_csv('value_investing_scores.csv', index=False)
print("\nValue investing scores saved to 'value_investing_scores.csv'")

Incorporating Sentiment Analysis

Sentiment analysis from news and financial reports can provide valuable context for your value investing decisions. Here's how to incorporate sentiment into your feature set:

Sentiment Integration Approaches

Consider these approaches for incorporating sentiment into your value investing model:

1. Direct Feature Integration

Include sentiment scores directly as features in your model:


# Example of adding sentiment as a direct feature
features = [
    'PE_Ratio', 'PB_Ratio', 'ROE', 'Debt_to_Equity', 
    'FCF_Yield', 'Dividend_Yield', 'Sentiment_Score'
]

2. Sentiment-Adjusted Metrics

Adjust traditional financial metrics based on sentiment:


# Example of sentiment-adjusted P/E ratio
def calculate_sentiment_adjusted_pe(pe_ratio, sentiment_score):
    # Normalize sentiment to 0.5-1.5 range (0.5 for negative, 1.5 for positive)
    sentiment_factor = 0.5 + sentiment_score
    # Adjust P/E ratio (lower for positive sentiment, higher for negative)
    adjusted_pe = pe_ratio / sentiment_factor
    return adjusted_pe

3. Sentiment as a Filter

Use sentiment as a secondary filter after financial analysis:


# Example of using sentiment as a filter
def apply_sentiment_filter(companies, sentiment_threshold=0.5):
    return companies[companies['Sentiment_Score'] >= sentiment_threshold]

4. Time Series Sentiment Trends

Track sentiment changes over time for more nuanced analysis:


# Example of tracking sentiment trends
def calculate_sentiment_momentum(sentiment_history):
    # Calculate the trend in sentiment over the last n periods
    return np.polyfit(range(len(sentiment_history)), sentiment_history, 1)[0]

Knowledge Check

What is the primary purpose of the Discounted Cash Flow (DCF) method in value investing?

To estimate a company's intrinsic value based on projected future cash flows
To calculate a company's current market capitalization
To determine a company's historical growth rate
To predict short-term stock price movements

In a composite value score, why might you assign different weights to different criteria?

To make the calculation simpler
Because all criteria use different units of measurement
To reflect the relative importance of each criterion in your value investing strategy
To ensure the final score is always between 0 and 100