Financial Sentiment Models, Phases, GDELT, and Industry Usage

Date: 2026-05-08 EST

Why This Document Exists

This note captures the current thinking around building a financial news intelligence and sentiment system for the DGX trading stack.

The goals are:

understand what model types are available
choose a sane phased rollout
decide where GDELT fits
understand how large industry players actually use these signals
identify how to find "gems" or anomalies instead of just generic positive or negative headlines

Executive Summary

The most important design principle is this:

Do not build one giant "financial AI model" first.

Instead, build a layered system:

fast article-level sentiment scoring
entity and event extraction
time-windowed aggregation and feature engineering
backtesting and alpha validation
optional LLM reasoning and analyst support

The common failure mode in this space is building a large chatbot or generalized finance copilot before proving that a structured text-derived signal actually improves trading decisions.

My recommended order for this project is:

start with a small, fast, finance-tuned sentiment classifier
write scores into Postgres
build ticker-level news-flow features
test whether those features improve existing models or rules
only then add event taxonomies, GDELT overlays, and LLM reasoning layers

The Model Landscape

There are four major model families worth separating conceptually.

1. Sentiment Classifiers

These are the best starting point for production article scoring.

Examples:

ProsusAI/finbert
ahmedrachid/FinancialBERT-Sentiment-Analysis
mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis

What they do well:

classify text as positive, neutral, or negative
provide stable probability outputs
run much faster and cheaper than large generative models
are easy to benchmark honestly
are easy to convert into a continuous score such as P_positive - P_negative

What they do not do well:

long-context reasoning
nuanced event interpretation
explanation and summarization
higher-order inference such as "this is positive for suppliers but negative for competitors"

2. Finance-Domain Encoders

These are not necessarily production scorers on day one, but they matter if we later fine-tune our own tasks.

Examples:

FinBERT variants
FLANG-BERT
SEC-BERT

What they are good for:

creating finance-aware text embeddings
building custom heads for classification or ranking
extending beyond headlines into filings, earnings calls, and disclosures

Where they fit:

later-stage feature engineering
custom fine-tuning
document-type-specific models

3. Finance LLMs

Examples:

BloombergGPT
FinGPT
other instruction-tuned finance models

These are important conceptually, but they should not be the first production scoring model for this stack.

What they are good for:

summarization
clustering and explanation
analyst note generation
contradiction detection
question answering over stored news and model outputs

What they are not ideal for initially:

high-volume low-latency scoring of every article
stable, cheap, auditable classification at scale

4. Event and News Intelligence Systems

This is how industry-grade pipelines usually look in practice.

They do not stop at "sentiment."

They add:

entity recognition
relevance scoring
novelty scoring
event classification
source credibility
temporal aggregation
cross-entity relationship logic

This is closer to how Bloomberg, RavenPack, and similar platforms create value.

Specific Model Opinions

`ProsusAI/finbert`

This remains the cleanest baseline to start with.

Why:

finance-domain BERT
widely referenced
straightforward to deploy
a strong baseline for Financial PhraseBank-style sentiment

Recommended use:

first production baseline
benchmark anchor
article scoring reference model

`ahmedrachid/FinancialBERT-Sentiment-Analysis`

A strong practical comparison model.

Why:

also finance-tuned
easy to compare against FinBERT
useful as a second baseline to test robustness

Recommended use:

benchmark challenger
model comparison table in the dashboard

`mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis`

Attractive for speed-sensitive pipelines.

Why:

lighter
faster inference
easier to scale across large article volumes

Recommended use:

throughput-oriented benchmark
candidate production scorer when volume matters more than marginal accuracy

FLANG-BERT

Interesting for later custom work.

Why:

trained with finance-aware objectives
useful if we later train custom task heads

Recommended use:

phase 3 or phase 4 experimentation
embedding backbone for clustering or custom ranking tasks

SEC-BERT

Useful, but not first for headline scoring.

Why:

better aligned with SEC filings and formal disclosures
more valuable when processing 10-K, 10-Q, 8-K, or transcript data

Recommended use:

filing analysis
earnings call follow-up work
risk/disclosure change detection

BloombergGPT / FinGPT / Finance LLMs

These matter strategically, but should come after the structured scorer pipeline.

Recommended use:

explanation layer
analyst workflows
anomaly narration
retrieval and reasoning over scored news and model outputs

My Recommended Model Strategy

If I were building this stack for DGX from scratch, I would do this:

FinBERT as the canonical baseline
FinancialBERT as the second benchmark
mrm8488 DistilRoBERTa as the speed benchmark
one of those three becomes the production scorer
later add FLANG or SEC-BERT for broader text tasks
only after that add an LLM for explanation and research workflows

The reason is simple:

discriminative models are faster, cheaper, easier to backtest, and easier to operationalize honestly.

Why Sentiment Alone Is Not Enough

Raw sentiment is usually too naive.

What matters more in practice is:

who the article is actually about
whether the article is new or repetitive
whether the source matters
whether the market already priced it in
whether the event type is important
whether the price response matches the text signal

This means the real production object is not just an article sentiment score.

It is a news intelligence feature store.

Phased Rollout

Phase 1: Article-Level Sentiment Scoring

Objective:

score stored news articles using one or more models
save the raw model outputs into the database

Minimum schema fields:

article_id
ticker
model_id
model_version
label
confidence
prob_positive
prob_neutral
prob_negative
score
latency_ms
scored_at

Primary formula:

score = P_positive - P_negative

Goal:

prove the ingestion to scoring to storage path works end-to-end

Phase 2: News-Flow Feature Engineering

Objective:

stop thinking in single articles
build ticker-level rolling features

Recommended windows:

1 hour
4 hours
24 hours
72 hours

Recommended fields:

score_raw
score_flow
article_count
positive_ratio
negative_ratio
neutral_ratio
velocity
novelty_count
source_weighted_score
relevance_weighted_score

Goal:

create model-ready and strategy-ready features instead of raw text outputs

Phase 3: Research and Backtesting

Objective:

determine whether the new features improve decision quality

Questions to test:

does sentiment improve entry timing
does sentiment improve exit timing
does it reduce false positives in technical systems
does it improve cross-sectional ranking
does it improve regime detection
does it improve leveraged ETF timing like TQQQ and SQQQ

Key principle:

sentiment must prove incremental value over existing technical and model signals

Phase 4: Event and Entity Intelligence

Objective:

separate sentiment from event meaning

Add:

event type classification
entity linking
topic tags
source credibility
novelty flags
multi-ticker co-mention logic

Examples of event categories:

earnings
analyst actions
product launches
legal/regulatory issues
layoffs
capital raises
guidance revisions
macro policy
supply chain disruptions
sanctions and geopolitics

Goal:

stop treating every positive article as economically equivalent

Phase 5: Global Event and Macro Overlay

Objective:

bring in world and regime context

This is where GDELT or a similar event database becomes useful.

Goal:

connect single-name news to macro, geopolitical, and sector context

Phase 6: LLM Analyst Layer

Objective:

provide explanation and research ergonomics

Potential tasks:

summarize last 24 hours of sentiment flow
explain why a ticker's sentiment diverged from price
identify which event clusters drove a score change
generate analyst notes
answer questions over scored articles, aggregates, and model outputs

Goal:

help humans reason about the signal
do not confuse explanation with production alpha generation

Where GDELT Fits

GDELT is useful, but only for the right problem.

What GDELT Is Good At

macro and geopolitical awareness
regime detection
sanctions, conflict, protest, and diplomatic developments
multi-language global event tracking
broad narrative shifts across geographies and sectors

What GDELT Is Less Good At

high-precision single-name equity alpha by itself
clean company-specific finance event detection
reliable ticker-level corporate action interpretation without extra filtering

Best Use of GDELT in This Stack

Use GDELT as a macro overlay, not as the only news source for stock-level sentiment.

Recommended uses:

detect elevated geopolitical risk
detect sector-level narrative stress
detect macro regime changes
identify geographic hotspots tied to holdings or suppliers
build global stress indicators

Why Not Use GDELT Alone

Single-stock alpha often needs:

cleaner company-level articles
better entity precision
less noise
stronger alignment to corporate financial events

So the practical architecture is:

finance-native corporate news for ticker scoring
GDELT for macro and event overlays

How Industry Actually Uses This

Large firms usually do not trade "raw sentiment" alone.

Instead, they build machine-readable news pipelines with multiple layers of enrichment.

Typical production features include:

sentiment
relevance
novelty
event category
temporal decay
source ranking
entity mapping
confidence
cross-asset or cross-sector context

This is consistent with what public platforms like Bloomberg and RavenPack describe in their public materials.

The pattern is:

structured news analytics
integration into existing factor, risk, or execution systems
alpha plus monitoring plus risk control

What "The Big Guys" Usually Do

They tend to:

rank names cross-sectionally
gate trades using news conditions
downweight signals around adverse event shocks
identify risk clusters
feed text features into larger multi-factor or ML systems
use analysts and PMs to interpret the structured outputs

They usually do not:

trade solely on "headline sounds positive"
rely on one monolithic model
ignore event taxonomy and price reaction

How To Find Gems and Anomalies

The edge is often in mismatches, not in raw positivity.

High-Value Mismatch Patterns

positive sentiment with weak price response
negative sentiment that the market refuses to sell off
strong novelty with low article count but high relevance
small-cap or undercovered names with unusually strong sentiment acceleration
sector spillover where the obvious name gets attention but a related name is underreacting
disagreement between company PR and independent press coverage
repeated negative headlines with improving price behavior
repeated positive headlines with deteriorating price behavior

Useful Derived Features

sentiment z-score
novelty z-score
relevance-weighted sentiment
source-dispersion score
article velocity change
sector-relative sentiment
price-response gap
volume-response gap
sentiment acceleration
cross-entity co-mention shock

Why Co-Mention Networks Matter

The first-order story is not always the best trade.

Example:

an AI infrastructure headline may benefit a supplier more than the headline company
a negative regulatory headline on one firm may imply positive share gain for a competitor
a supply-chain disruption story may hit a downstream firm harder than the name in the headline

This is how "gems" often appear:

the market focuses on the headline ticker, while the second-order beneficiary or loser is mispriced.

Common Mistakes To Avoid

Mistake 1: Treating Sentiment as Universal

Some news is not globally positive or negative.

Example:

rising rates are not uniformly bearish or bullish
the implication depends on asset class, sector, positioning, and what was already priced in

Mistake 2: Ignoring Event Type

"Positive language" does not always mean positive market impact.

Examples:

"strong capital raise" may still be dilutive
"aggressive restructuring" may sound disciplined but imply distress

Mistake 3: Mixing Horizons

Some stories matter intraday. Some matter for days. Some matter for months.

The system should eventually support multiple horizons.

Mistake 4: Ignoring Price Response

Text alone is incomplete.

The best anomalies often come from:

text signal strong, price weak
text weak, price strong
text improving, price deteriorating

Mistake 5: Not Normalizing for Coverage Volume

Large-cap names always get more articles.

Use normalized features such as:

rolling z-scores
relative counts
surprise versus baseline coverage

Recommended Architecture for This Repo

Near-Term Production Path

score existing news_articles using one baseline model
write outputs into a new sentiment table
aggregate to ticker-level windows
expose /newsflow in the dashboard
test feature value against existing TQQQ and other models

Recommended Initial Benchmark Set

FinBERT
FinancialBERT
DistilRoBERTa financial sentiment model

Recommended Database Objects

At minimum:

article-level sentiment table
ticker-level rolling aggregate table
benchmark results table
model registry metadata

Recommended Dashboard Panels

latest scored articles
ticker heat map
rolling sentiment by ticker
model comparison table
anomaly watchlist
sentiment vs price divergence panel

Research Questions Worth Testing

Does negative sentiment reduce false positives in long-only technical signals
Does positive sentiment improve breakout confirmation
Are sentiment-flow changes more useful than raw level
Does novelty outperform raw article count
Does source weighting improve predictive value
Does sector-relative sentiment outperform absolute sentiment
Do co-mention network shocks lead to second-order trades
Does GDELT improve regime filters for leveraged ETFs and macro-sensitive names

Practical Recommendation

If we want the shortest path to value:

start with one article-level scorer
build aggregated news-flow features
test whether they improve existing systems
only then widen to event intelligence, GDELT, and LLM reasoning

This is the safest path because it is measurable, operationally simple, and aligned with how serious shops treat text as one signal layer among many.

Recommended Next Steps for This Repository

create the article-level sentiment schema
score a historical sample of stored articles
benchmark the three initial models
build 1h, 4h, 24h, and 72h aggregates
surface results in the dashboard
evaluate signal contribution against TQQQ and LDTM workflows
add event tagging and novelty scoring
add GDELT as a macro overlay after the core pipeline is proven

References

FinBERT Hugging Face model card: https://huggingface.co/ProsusAI/finbert
FinBERT paper: https://arxiv.org/abs/1908.10063
FinBERT in financial communications: https://arxiv.org/abs/2006.08097
FinancialBERT Hugging Face model card: https://huggingface.co/ahmedrachid/FinancialBERT-Sentiment-Analysis
DistilRoBERTa finance sentiment model: https://huggingface.co/mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis
FLANG-BERT and FLUE benchmark: https://arxiv.org/abs/2211.00083
SEC-BERT Hugging Face model card: https://huggingface.co/nlpaueb/sec-bert-base
BloombergGPT paper: https://arxiv.org/abs/2303.17564
FinGPT paper: https://arxiv.org/abs/2306.06031
PIXIU / FinMA paper entry: https://huggingface.co/papers/2306.05443
Financial LLM survey: https://arxiv.org/abs/2402.02315
GDELT data overview: https://www.gdeltproject.org/data.html?source=post_page---------------------------
Event Registry overview: https://eventregistry.org/about
Bloomberg public news analytics examples: https://www.bloomberg.com/professional/insights/data/can-get-edge-trading-news-sentiment-data/ and https://www.bloomberg.com/professional/insights/data/trading-news-use-machine-readable-data-find-alpha/
RavenPack news analytics product page: https://www.ravenpack.com/products/edge/data/news-analytics