Building a news curator tailored to you
Table of Contents
The problem with news today #
Let’s be honest: consuming news in 2025 is exhausting. Between algorithmic feeds that optimize for engagement over quality, clickbait headlines designed to trigger emotional responses, and the overwhelming volume of information competing for our attention, finding genuinely useful news feels like searching for a needle in a haystack.
What if we could build something simpler? Something that learns what we actually care about and gets better over time, without the complexity of neural networks or the privacy concerns of cloud-based recommendation engines?
That’s exactly what we’re going to build: a personalized news curator that runs locally, learns from your preferences through simple like/dislike feedback, and gets smarter about what articles to show you.
Why build your own news curator? #
Privacy is a major factor: your reading habits and preferences stay on your machine rather than being harvested by third parties. You also get complete control over which news sources are considered trustworthy, and you can see exactly how the recommendation algorithm works under the hood.
The core insight is simple: if you like articles about “renewable energy” and “climate change,” you’ll probably like more articles containing those keywords. If you dislike articles about “celebrity gossip,” the system should avoid showing you more of that content.
The technical approach #
The 70/30 strategy #
Here’s one of the most interesting design decisions: our recommendation engine uses a 70/30 split. Seventy percent of articles come from personalized content based on your learned keyword preferences, while thirty percent comes from general trending news to prevent you from living in a filter bubble.
This prevents the system from becoming too narrow in its recommendations while still giving you mostly content you’ll find relevant. It’s a much simpler approach than complex diversity algorithms used by major platforms.
Keyword-based preference learning #
Instead of analyzing semantic meaning or using word embeddings, we use a straightforward keyword extraction and weighting system:
def _extract_keywords(self, text: str) -> List[str]:
# Simple stopwords to filter out
stopwords = {"the", "a", "an", "and", "or", "but", ...}
# Basic tokenization and filtering
words = text.lower().replace(",", "").replace(".", "").split()
keywords = [word for word in words if len(word) > 2 and word not in stopwords]
# Return first 5 keywords
return keywords[:5]
When you like an article, every keyword gets a +0.1 weight boost. Dislike it? Each keyword gets -0.1. Over time, this builds a preference profile that’s both transparent and effective.
Building the curator #
Initial setup #
Let’s build this thing! We’ll need both a Python backend and a React frontend.
First, install the prerequisites. We’ll assume you’re on macOS:
brew install python node
Install the Python package manager uv
using the official instructions.
Create your project directory:
mkdir simple-news-curator
cd simple-news-curator
Backend setup #
Let’s start with the backend. Create a backend
directory and set up the Python environment:
mkdir backend
cd backend
uv init --name news-curator .
Add the required dependencies:
uv add fastapi uvicorn python-dotenv requests
The core architecture consists of three main components:
- News fetcher: Interfaces with TheNewsAPI.com to retrieve articles
- Database manager: Handles SQLite storage for articles, preferences, and user reactions
- Recommendation engine: Combines personalized and general content
The news fetcher #
TheNewsAPI.com uses a specific query syntax that we can leverage:
def fetch_by_keyword(self, keywords: List[str], limit: int = 10) -> List[Dict]:
search_query = " | ".join(keywords) # OR logic for keywords
params = {
"search": search_query,
"search_fields": "keywords,title,description",
"exclude_categories": ",".join(EXCLUDED_CATEGORIES),
"domains": ",".join(TRUSTED_DOMAINS),
"published_after": self._thirty_days_ago().strftime("%Y-%m-%d"),
}
The " | "
syntax tells the API to use OR logic between keywords rather than AND. This is crucial - if a user likes “machine learning” and “climate change,” we want articles about either topic, not just articles that mention both. The search_fields
parameter lets us search across title, description, and the API’s own keyword extraction, giving us broader coverage than title-only matching.
The preference engine #
The learning mechanism is where things get interesting. When you react to an article, the system extracts keywords and adjusts weights:
def process_reaction(self, article_id: str, reaction: str):
keywords = [kw.strip() for kw in article["keywords"].split(",") if kw.strip()]
adjustment = self.weight_adjustment if reaction == "like" else -self.weight_adjustment
for keyword in keywords:
current_weight = self.db.get_preference_weight(keyword)
new_weight = max(self.min_weight, min(self.max_weight, current_weight + adjustment))
self.db.update_preference_weight(keyword, new_weight)
Each keyword gets a +0.1 or -0.1 adjustment, with weights clamped between -1.0 and +1.0. This creates a natural decay effect - if you consistently dislike “crypto” articles, that keyword’s weight will bottom out at -1.0 and stop affecting the math. The system also handles the cold start problem with bootstrap preferences, giving slight positive weights to broad categories like “science” and “technology” for new users.
Database design #
The SQLite schema is simple:
-- Store article metadata and content
CREATE TABLE articles (
id TEXT PRIMARY KEY,
title TEXT NOT NULL,
content TEXT,
url TEXT UNIQUE NOT NULL,
source TEXT,
published_at TEXT,
keywords TEXT
);
-- Track keyword preferences (-1.0 to +1.0)
CREATE TABLE preferences (
keyword TEXT PRIMARY KEY,
weight REAL NOT NULL DEFAULT 0.0
);
-- Log user reactions for learning
CREATE TABLE user_reactions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
article_id TEXT NOT NULL,
reaction TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
No complex relationships, no normalization overhead - just the minimum viable schema for personalized recommendations.
Frontend with React Router #
The frontend is a single-page application using React Router v7. Here’s one of the most satisfying implementation details: making the entire article card clickable while preventing conflicts with the reaction buttons:
const handleCardClick = () => {
window.open(article.url, '_blank', 'noopener,noreferrer');
};
const handleReaction = async (reaction: string, event: React.MouseEvent) => {
event.stopPropagation(); // Prevent card click when clicking reaction buttons
setReacting(true);
await onReaction(article.id, reaction);
setReacting(false);
};
This kind of interaction design detail makes the difference between a clunky prototype and something genuinely pleasant to use.
Environment setup #
You’ll need an API key from TheNewsAPI.com (they offer a free tier). Create a .env
file in the backend directory:
THENEWSAPI_KEY=your_api_key_here
Running the system #
Start the backend #
In the backend directory:
uv run main.py
The FastAPI server starts on http://localhost:8000
with automatic API documentation at /docs
.
Start the frontend #
Set up the React frontend:
cd ../frontend
npm install
npm run dev
The frontend runs on http://localhost:5173
.
Using the curator #
Open your browser to http://localhost:5173
. You’ll see a clean interface with article cards. Click cards to read full articles, and use the 👍/👎 buttons to train your preferences.
The magic happens behind the scenes: each reaction updates keyword weights in the database, and the next time you refresh, the recommendation algorithm incorporates your feedback.
Interesting technical nuggets #
Keyword extraction vs API keywords #
An interesting discovery was that TheNewsAPI.com provides its own keyword extraction, but we also extract keywords from titles using simple tokenization. This dual approach captures both the API’s semantic understanding and our own domain-specific parsing. The API keywords tend to be more general (“technology”, “business”) while our extracted keywords are more specific (“cryptocurrency”, “renewable-energy”).
The 70/30 recommendation mixing #
The mixing strategy happens at the API level, not in post-processing. We make separate API calls for personalized content (using user’s top 5 positive keywords) and general trending content, then merge the results. This is more efficient than fetching everything and filtering locally, and it means the API’s relevance scoring works properly for each content type.
Preference weight saturation #
The [-1.0, +1.0] weight bounds create interesting behavior. Once a keyword hits -1.0 (maximum dislike), it stops affecting the user’s experience - the system essentially “gives up” on that topic. This prevents the preference learning from becoming overly sensitive to single bad articles in otherwise-good categories.
What makes this approach special #
Unlike complex recommendation systems that require massive datasets and opaque neural networks, this curator is transparent (you can see exactly why articles were recommended), fast (no training cycles or model deployment complexity), private (everything runs locally on your machine), and controllable (easy to modify news sources, categories, or recommendation logic).
The system demonstrates that effective personalization doesn’t require sophisticated machine learning. Sometimes the simplest approach that actually works is better than the most advanced approach that’s too complex to understand or modify.
Concluding thoughts #
Building your own news curator is surprisingly satisfying. You end up with a tool that genuinely gets better at serving your interests while maintaining complete transparency about how it works.
The 70/30 personalization strategy strikes a nice balance between relevance and discovery. The keyword-based learning is simple enough to debug but sophisticated enough to provide real value. And the local-first approach means your reading habits stay private.
More importantly, you gain insight into how recommendation systems work under the hood. In an era where algorithmic feeds shape so much of our information consumption, understanding these systems becomes a form of digital literacy.
You can find the full code for this news curator here.