Financial Sentiment Models, Phases, GDELT, and Industry Usage
Date: 2026-05-08 EST
Why This Document Exists
This note captures the current thinking around building a financial news intelligence and sentiment system for the DGX trading stack.
The goals are:
- understand what model types are available
- choose a sane phased rollout
- decide where GDELT fits
- understand how large industry players actually use these signals
- identify how to find "gems" or anomalies instead of just generic positive or negative headlines
Executive Summary
The most important design principle is this:
Do not build one giant "financial AI model" first.
Instead, build a layered system:
- fast article-level sentiment scoring
- entity and event extraction
- time-windowed aggregation and feature engineering
- backtesting and alpha validation
- optional LLM reasoning and analyst support
The common failure mode in this space is building a large chatbot or generalized finance copilot before proving that a structured text-derived signal actually improves trading decisions.
My recommended order for this project is:
- start with a small, fast, finance-tuned sentiment classifier
- write scores into Postgres
- build ticker-level news-flow features
- test whether those features improve existing models or rules
- only then add event taxonomies, GDELT overlays, and LLM reasoning layers
The Model Landscape
There are four major model families worth separating conceptually.
1. Sentiment Classifiers
These are the best starting point for production article scoring.
Examples:
ProsusAI/finbertahmedrachid/FinancialBERT-Sentiment-Analysismrm8488/distilroberta-finetuned-financial-news-sentiment-analysis
What they do well:
- classify text as positive, neutral, or negative
- provide stable probability outputs
- run much faster and cheaper than large generative models
- are easy to benchmark honestly
- are easy to convert into a continuous score such as
P_positive - P_negative
What they do not do well:
- long-context reasoning
- nuanced event interpretation
- explanation and summarization
- higher-order inference such as "this is positive for suppliers but negative for competitors"
2. Finance-Domain Encoders
These are not necessarily production scorers on day one, but they matter if we later fine-tune our own tasks.
Examples:
- FinBERT variants
- FLANG-BERT
- SEC-BERT
What they are good for:
- creating finance-aware text embeddings
- building custom heads for classification or ranking
- extending beyond headlines into filings, earnings calls, and disclosures
Where they fit:
- later-stage feature engineering
- custom fine-tuning
- document-type-specific models
3. Finance LLMs
Examples:
- BloombergGPT
- FinGPT
- other instruction-tuned finance models
These are important conceptually, but they should not be the first production scoring model for this stack.
What they are good for:
- summarization
- clustering and explanation
- analyst note generation
- contradiction detection
- question answering over stored news and model outputs
What they are not ideal for initially:
- high-volume low-latency scoring of every article
- stable, cheap, auditable classification at scale
4. Event and News Intelligence Systems
This is how industry-grade pipelines usually look in practice.
They do not stop at "sentiment."
They add:
- entity recognition
- relevance scoring
- novelty scoring
- event classification
- source credibility
- temporal aggregation
- cross-entity relationship logic
This is closer to how Bloomberg, RavenPack, and similar platforms create value.
Specific Model Opinions
ProsusAI/finbert
This remains the cleanest baseline to start with.
Why:
- finance-domain BERT
- widely referenced
- straightforward to deploy
- a strong baseline for Financial PhraseBank-style sentiment
Recommended use:
- first production baseline
- benchmark anchor
- article scoring reference model
ahmedrachid/FinancialBERT-Sentiment-Analysis
A strong practical comparison model.
Why:
- also finance-tuned
- easy to compare against FinBERT
- useful as a second baseline to test robustness
Recommended use:
- benchmark challenger
- model comparison table in the dashboard
mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis
Attractive for speed-sensitive pipelines.
Why:
- lighter
- faster inference
- easier to scale across large article volumes
Recommended use:
- throughput-oriented benchmark
- candidate production scorer when volume matters more than marginal accuracy
FLANG-BERT
Interesting for later custom work.
Why:
- trained with finance-aware objectives
- useful if we later train custom task heads
Recommended use:
- phase 3 or phase 4 experimentation
- embedding backbone for clustering or custom ranking tasks
SEC-BERT
Useful, but not first for headline scoring.
Why:
- better aligned with SEC filings and formal disclosures
- more valuable when processing 10-K, 10-Q, 8-K, or transcript data
Recommended use:
- filing analysis
- earnings call follow-up work
- risk/disclosure change detection
BloombergGPT / FinGPT / Finance LLMs
These matter strategically, but should come after the structured scorer pipeline.
Recommended use:
- explanation layer
- analyst workflows
- anomaly narration
- retrieval and reasoning over scored news and model outputs
My Recommended Model Strategy
If I were building this stack for DGX from scratch, I would do this:
FinBERTas the canonical baselineFinancialBERTas the second benchmarkmrm8488DistilRoBERTa as the speed benchmark- one of those three becomes the production scorer
- later add FLANG or SEC-BERT for broader text tasks
- only after that add an LLM for explanation and research workflows
The reason is simple:
discriminative models are faster, cheaper, easier to backtest, and easier to operationalize honestly.
Why Sentiment Alone Is Not Enough
Raw sentiment is usually too naive.
What matters more in practice is:
- who the article is actually about
- whether the article is new or repetitive
- whether the source matters
- whether the market already priced it in
- whether the event type is important
- whether the price response matches the text signal
This means the real production object is not just an article sentiment score.
It is a news intelligence feature store.
Phased Rollout
Phase 1: Article-Level Sentiment Scoring
Objective:
- score stored news articles using one or more models
- save the raw model outputs into the database
Minimum schema fields:
article_idtickermodel_idmodel_versionlabelconfidenceprob_positiveprob_neutralprob_negativescorelatency_msscored_at
Primary formula:
score = P_positive - P_negative
Goal:
- prove the ingestion to scoring to storage path works end-to-end
Phase 2: News-Flow Feature Engineering
Objective:
- stop thinking in single articles
- build ticker-level rolling features
Recommended windows:
- 1 hour
- 4 hours
- 24 hours
- 72 hours
Recommended fields:
score_rawscore_flowarticle_countpositive_rationegative_rationeutral_ratiovelocitynovelty_countsource_weighted_scorerelevance_weighted_score
Goal:
- create model-ready and strategy-ready features instead of raw text outputs
Phase 3: Research and Backtesting
Objective:
- determine whether the new features improve decision quality
Questions to test:
- does sentiment improve entry timing
- does sentiment improve exit timing
- does it reduce false positives in technical systems
- does it improve cross-sectional ranking
- does it improve regime detection
- does it improve leveraged ETF timing like TQQQ and SQQQ
Key principle:
- sentiment must prove incremental value over existing technical and model signals
Phase 4: Event and Entity Intelligence
Objective:
- separate sentiment from event meaning
Add:
- event type classification
- entity linking
- topic tags
- source credibility
- novelty flags
- multi-ticker co-mention logic
Examples of event categories:
- earnings
- analyst actions
- product launches
- legal/regulatory issues
- layoffs
- capital raises
- guidance revisions
- macro policy
- supply chain disruptions
- sanctions and geopolitics
Goal:
- stop treating every positive article as economically equivalent
Phase 5: Global Event and Macro Overlay
Objective:
- bring in world and regime context
This is where GDELT or a similar event database becomes useful.
Goal:
- connect single-name news to macro, geopolitical, and sector context
Phase 6: LLM Analyst Layer
Objective:
- provide explanation and research ergonomics
Potential tasks:
- summarize last 24 hours of sentiment flow
- explain why a ticker's sentiment diverged from price
- identify which event clusters drove a score change
- generate analyst notes
- answer questions over scored articles, aggregates, and model outputs
Goal:
- help humans reason about the signal
- do not confuse explanation with production alpha generation
Where GDELT Fits
GDELT is useful, but only for the right problem.
What GDELT Is Good At
- macro and geopolitical awareness
- regime detection
- sanctions, conflict, protest, and diplomatic developments
- multi-language global event tracking
- broad narrative shifts across geographies and sectors
What GDELT Is Less Good At
- high-precision single-name equity alpha by itself
- clean company-specific finance event detection
- reliable ticker-level corporate action interpretation without extra filtering
Best Use of GDELT in This Stack
Use GDELT as a macro overlay, not as the only news source for stock-level sentiment.
Recommended uses:
- detect elevated geopolitical risk
- detect sector-level narrative stress
- detect macro regime changes
- identify geographic hotspots tied to holdings or suppliers
- build global stress indicators
Why Not Use GDELT Alone
Single-stock alpha often needs:
- cleaner company-level articles
- better entity precision
- less noise
- stronger alignment to corporate financial events
So the practical architecture is:
- finance-native corporate news for ticker scoring
- GDELT for macro and event overlays
How Industry Actually Uses This
Large firms usually do not trade "raw sentiment" alone.
Instead, they build machine-readable news pipelines with multiple layers of enrichment.
Typical production features include:
- sentiment
- relevance
- novelty
- event category
- temporal decay
- source ranking
- entity mapping
- confidence
- cross-asset or cross-sector context
This is consistent with what public platforms like Bloomberg and RavenPack describe in their public materials.
The pattern is:
- structured news analytics
- integration into existing factor, risk, or execution systems
- alpha plus monitoring plus risk control
What "The Big Guys" Usually Do
They tend to:
- rank names cross-sectionally
- gate trades using news conditions
- downweight signals around adverse event shocks
- identify risk clusters
- feed text features into larger multi-factor or ML systems
- use analysts and PMs to interpret the structured outputs
They usually do not:
- trade solely on "headline sounds positive"
- rely on one monolithic model
- ignore event taxonomy and price reaction
How To Find Gems and Anomalies
The edge is often in mismatches, not in raw positivity.
High-Value Mismatch Patterns
- positive sentiment with weak price response
- negative sentiment that the market refuses to sell off
- strong novelty with low article count but high relevance
- small-cap or undercovered names with unusually strong sentiment acceleration
- sector spillover where the obvious name gets attention but a related name is underreacting
- disagreement between company PR and independent press coverage
- repeated negative headlines with improving price behavior
- repeated positive headlines with deteriorating price behavior
Useful Derived Features
- sentiment z-score
- novelty z-score
- relevance-weighted sentiment
- source-dispersion score
- article velocity change
- sector-relative sentiment
- price-response gap
- volume-response gap
- sentiment acceleration
- cross-entity co-mention shock
Why Co-Mention Networks Matter
The first-order story is not always the best trade.
Example:
- an AI infrastructure headline may benefit a supplier more than the headline company
- a negative regulatory headline on one firm may imply positive share gain for a competitor
- a supply-chain disruption story may hit a downstream firm harder than the name in the headline
This is how "gems" often appear:
the market focuses on the headline ticker, while the second-order beneficiary or loser is mispriced.
Common Mistakes To Avoid
Mistake 1: Treating Sentiment as Universal
Some news is not globally positive or negative.
Example:
- rising rates are not uniformly bearish or bullish
- the implication depends on asset class, sector, positioning, and what was already priced in
Mistake 2: Ignoring Event Type
"Positive language" does not always mean positive market impact.
Examples:
- "strong capital raise" may still be dilutive
- "aggressive restructuring" may sound disciplined but imply distress
Mistake 3: Mixing Horizons
Some stories matter intraday. Some matter for days. Some matter for months.
The system should eventually support multiple horizons.
Mistake 4: Ignoring Price Response
Text alone is incomplete.
The best anomalies often come from:
- text signal strong, price weak
- text weak, price strong
- text improving, price deteriorating
Mistake 5: Not Normalizing for Coverage Volume
Large-cap names always get more articles.
Use normalized features such as:
- rolling z-scores
- relative counts
- surprise versus baseline coverage
Recommended Architecture for This Repo
Near-Term Production Path
- score existing
news_articlesusing one baseline model - write outputs into a new sentiment table
- aggregate to ticker-level windows
- expose
/newsflowin the dashboard - test feature value against existing TQQQ and other models
Recommended Initial Benchmark Set
- FinBERT
- FinancialBERT
- DistilRoBERTa financial sentiment model
Recommended Database Objects
At minimum:
- article-level sentiment table
- ticker-level rolling aggregate table
- benchmark results table
- model registry metadata
Recommended Dashboard Panels
- latest scored articles
- ticker heat map
- rolling sentiment by ticker
- model comparison table
- anomaly watchlist
- sentiment vs price divergence panel
Research Questions Worth Testing
- Does negative sentiment reduce false positives in long-only technical signals
- Does positive sentiment improve breakout confirmation
- Are sentiment-flow changes more useful than raw level
- Does novelty outperform raw article count
- Does source weighting improve predictive value
- Does sector-relative sentiment outperform absolute sentiment
- Do co-mention network shocks lead to second-order trades
- Does GDELT improve regime filters for leveraged ETFs and macro-sensitive names
Practical Recommendation
If we want the shortest path to value:
- start with one article-level scorer
- build aggregated news-flow features
- test whether they improve existing systems
- only then widen to event intelligence, GDELT, and LLM reasoning
This is the safest path because it is measurable, operationally simple, and aligned with how serious shops treat text as one signal layer among many.
Recommended Next Steps for This Repository
- create the article-level sentiment schema
- score a historical sample of stored articles
- benchmark the three initial models
- build 1h, 4h, 24h, and 72h aggregates
- surface results in the dashboard
- evaluate signal contribution against TQQQ and LDTM workflows
- add event tagging and novelty scoring
- add GDELT as a macro overlay after the core pipeline is proven
References
- FinBERT Hugging Face model card: https://huggingface.co/ProsusAI/finbert
- FinBERT paper: https://arxiv.org/abs/1908.10063
- FinBERT in financial communications: https://arxiv.org/abs/2006.08097
- FinancialBERT Hugging Face model card: https://huggingface.co/ahmedrachid/FinancialBERT-Sentiment-Analysis
- DistilRoBERTa finance sentiment model: https://huggingface.co/mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis
- FLANG-BERT and FLUE benchmark: https://arxiv.org/abs/2211.00083
- SEC-BERT Hugging Face model card: https://huggingface.co/nlpaueb/sec-bert-base
- BloombergGPT paper: https://arxiv.org/abs/2303.17564
- FinGPT paper: https://arxiv.org/abs/2306.06031
- PIXIU / FinMA paper entry: https://huggingface.co/papers/2306.05443
- Financial LLM survey: https://arxiv.org/abs/2402.02315
- GDELT data overview: https://www.gdeltproject.org/data.html?source=post_page---------------------------
- Event Registry overview: https://eventregistry.org/about
- Bloomberg public news analytics examples: https://www.bloomberg.com/professional/insights/data/can-get-edge-trading-news-sentiment-data/ and https://www.bloomberg.com/professional/insights/data/trading-news-use-machine-readable-data-find-alpha/
- RavenPack news analytics product page: https://www.ravenpack.com/products/edge/data/news-analytics