Executive summary.
Financial markets are interconnected systems where one asset's move cascades through the network. Understanding that structure is the whole game for risk management, portfolio construction, and spotting systemic vulnerabilities before they matter.
This project analyzes correlation networks among S&P 500 stocks to reveal hidden market structure. By applying network science to five years of daily price data, I identified the most systemically important nodes, the community structure the graph settles into, and — most usefully — how that topology degrades during periods of market stress.
During the 2022 drawdown, network density rose +37%. Diversification fails precisely when investors need it most.
When the market sells off, pairwise correlations collapse toward one. The graph densifies, community structure blurs, and what looked like an 11-sector market starts behaving like a single factor.
Methodology & pipeline.
The analysis pipeline was built in Python for reproducibility and scalability. Data was sourced via the Yahoo Finance API, cleaned with Pandas, and modeled as a weighted graph where nodes represent stocks and edges represent pairwise Pearson correlations above a threshold.
1 · Data ingestion
Collected daily adjusted closing prices for all current S&P 500 constituents (Jan 2020 – Oct 2025). Computed daily log returns to stabilize variance and strip out price-level effects before any correlation is taken.
2 · Graph construction
Built the graph using NetworkX. To control noise, I applied a thresholding step: only pairs with |ρ| > 0.5 become edges. Weights carry the sign, so the layout respects both co-movement and counter-movement.
The graph, interactive.
A force-directed layout of 60 representative constituents across the 11 Louvain communities. Drag a node to pull on the whole web. Raise the threshold slider and watch low-signal edges drop away; what's left is the backbone.
Correlation breakdown.
The headline finding of this project lives in a single visual: the sector-level ρ-matrix, compared across three regimes. Toggle between calm, stressed, and crisis windows. In calm markets, sectors are nicely separated (dark diagonal, light off-diagonal). In a crash, the whole matrix floods red — every sector moves together.
Results & findings.
Sectoral clustering (community detection)
Using the Louvain algorithm for modularity optimization, the network partitioned into 11 distinct communities. These largely aligned with GICS sectors, but the algorithm surfaced a handful of cross-sector dependencies worth naming:
- Tech – Consumer bridge. AMZN and AAPL act as bridges between the Technology and Consumer Discretionary clusters. Remove them and the graph's diameter jumps.
- Energy isolation. Energy forms the most isolated cluster — the lowest mean correlation with everything else. In network terms, it's the best diversifier on the board.
- Financials & Industrials couple. These two sectors show the highest inter-cluster edge density, consistent with shared exposure to the rate cycle.
Centrality & systemic risk
Computed eigenvector centrality to identify the most influential nodes. Unlike market cap, centrality measures how connected a stock is to other highly connected stocks — closer to the regulators' notion of systemic importance.
Finding. Financials and Industrials showed higher average centrality than Tech, despite smaller market caps. The practical read: a shock originating in a major bank propagates through the system faster than a same-magnitude shock to a tech giant, because the bank sits closer to the weighted core of the graph.
Technical stack.
The project leverages:
- Pandas / NumPy. Vectorized operations for return computation and correlation matrices.
- NetworkX. Graph algorithms — centrality, MST, efficiency, shortest paths.
- python-louvain. Modularity-optimizing community detection.
- Matplotlib / Seaborn. Static visualization of heatmaps and degree distributions.
- yfinance. Bulk OHLCV ingestion for the full universe.
Future directions.
The current analysis uses static correlation windows. To sharpen predictive power for trading applications, the next phase will implement:
- Dynamic Time Warping. To capture lagged and non-linear co-movement that Pearson misses.
- Rolling-window topology. Density and modularity as a time series — early-warning indicators for regime change.
- Intraday graphs. Rebuild the network at 5-minute bars; ask whether the "density spike" precedes drawdowns or just accompanies them.