AgenticTradingML — AI-Powered Autonomous Trading Platform

Feature Engineering: The Art of Prep

Machine learning models are only as good as the data you feed them. Feature engineering is the process of creating useful input variables from raw price data.

Price-Based Features

Returns: (P_t - P_{t-1}) / P_{t-1} (simple percentage change)
Log Returns: ln(P_t / P_{t-1}) (mathematically cleaner for modeling)
Rolling Mean: Average price over last N periods (trend)
Rolling Std: Price volatility over last N periods
Price vs. SMA: How far price is from its average (reversion signal)
High/Low Range: (High - Low) / Close (intraday volatility)

python

import pandas as pd
import numpy as np

df = pd.read_csv('SPY.csv')

# Create features
df['Return'] = df['Close'].pct_change()
df['Log_Return'] = np.log(df['Close'] / df['Close'].shift(1))
df['SMA_20'] = df['Close'].rolling(20).mean()
df['STD_20'] = df['Close'].rolling(20).std()
df['Price_vs_SMA'] = (df['Close'] - df['SMA_20']) / df['SMA_20']
df['Range'] = (df['High'] - df['Low']) / df['Close']

Technical Indicator Features

RSI: Momentum 0-100
MACD: Trend-following indicator
ATR: Volatility measure
OBV: On-Balance Volume (accumulation/distribution)
Bollinger Bands: Overbought/oversold zones

Important: All features should be normalized to the same scale (usually [-1, 1] or [0, 1]) so the model doesn't overweight large-magnitude features.

python

from sklearn.preprocessing import StandardScaler

# Normalize features to mean=0, std=1
scaler = StandardScaler()
features_scaled = scaler.fit_transform(df[['RSI', 'MACD', 'ATR']])
df['RSI_scaled'] = features_scaled[:, 0]
df['MACD_scaled'] = features_scaled[:, 1]
df['ATR_scaled'] = features_scaled[:, 2]

Volume Features

Volume Ratio: Today's volume / 20-day average
Volume Trend: Is volume increasing or decreasing?
VWAP Deviation: How far from volume-weighted average price

Time Features (Seasonality)

Day of Week: Monday, Tuesday, etc. (evidence of different momentum)
Month: Certain months have higher returns (January Effect)
Quarter End: End-of-period portfolio rebalancing effects

Label Creation: What Are We Predicting?

Binary Classification: Will price be higher in 5 days? (1=yes, 0=no)
Multi-Class: Up/Down/Flat over next 5 days
Regression: Predict the exact return (%, continuous number)
Direction Only: Will it go up or down? (ignore magnitude)

python

# Create binary label: will price be higher in 5 days?
df['Return_5d'] = df['Close'].shift(-5) / df['Close'] - 1
df['Label'] = (df['Return_5d'] > 0).astype(int)  # 1 if up, 0 if down

# Shift features so we're only using data available at decision time
for col in feature_columns:
    df[col] = df[col].shift(1)  # Shift to prevent lookahead bias

# Remove NaN rows
df = df.dropna()

Watch for Lookahead Bias

CRITICAL: Create your label using forward prices (5 days ahead). But create your features using only past prices. If you use day t's close in features AND day t's close in label, you're cheating.

Feature Engineering: Creating Predictive Inputs

Feature Engineering: The Art of Prep

Price-Based Features

Technical Indicator Features

Volume Features

Time Features (Seasonality)

Label Creation: What Are We Predicting?

Watch for Lookahead Bias