AI News Generation: How It Works
AI news generation in production is a pipeline, not a single chat prompt. Signals enter as RSS or API items, are normalized and deduplicated, optionally clustered, then turned into drafts under strict templates. Translation, images, and publishing each have their own failure modes—so each stage needs observability.
Each stage should emit structured logs: item ID, model version, template ID, latency, and error class. Without that, “the AI wrote something wrong” is not debuggable—it is folklore. Treat prompts and templates as versioned artifacts, like application code.
Stages at a glance
flowchart LR A[RSS/API ingest] --> B[Normalize] B --> C[Dedupe + cluster] C --> D[Draft + guardrails] D --> E[Translate optional] E --> F[Media + WP publish]
Models propose wording; your configuration enforces structure, tone, banned claims, and attribution. Retries and dead-letter queues matter as much as the model choice when WordPress or plugins misbehave.
Clustering is optional but powerful: when five items describe the same acquisition, you may want one merged article with a timeline instead of five thin duplicates. The decision rules belong in configuration—not in someone’s daily judgment.
Why templates win
Headline length, paragraph count, and required source links should be explicit contracts. That keeps hundreds of posts consistent and makes regression tests meaningful when you change models.
Templates also constrain cost: you can cap tokens per stage, forbid open-ended “write more” instructions, and require summaries before expansion. Unbounded prompts are how bills spike before quality does.
Metrics that matter
Track publish success rate, time-to-publish, and indexation—not “vibes.” Alert on rising edit rates or duplicate escapes before readers notice.
Segment metrics by language and by topic: a spike in one vertical often means a bad feed or a bad template, not a global model failure.
Safety, rights, and compliance
Blocklists for people, companies, and sensitive geographies should run before generation when possible. If your pipeline touches elections, health, or finance, add human review gates or stricter templates—platform policy and local law still apply to automated output.
Keep records of third-party content licenses and embedding rights for images. “The model found it on the web” is not a rights strategy.
Changing models without drama
When you switch vendors or bump model versions, run shadow comparisons on frozen fixtures: same inputs, diff outputs. Ship only after headline and body similarity scores stay within thresholds you define.
Avoid “big bang” cutovers on Friday afternoons. Roll out per language or per topic, watch error budgets, and roll back if publish latency doubles.
Data contracts between stages
Treat handoffs like API boundaries. The normalize stage should output validated fields with explicit nullability: if summary is empty, downstream stages must not “invent” one from the title alone unless that behavior is documented and tested. The draft stage should return structured sections—lead, bullets, sources—not a blob of markdown that your WordPress importer parses by prayer.
Use schema validation in CI: a failing fixture should break the build. That sounds heavy for a blog pipeline; it is lighter than explaining to legal why a thousand posts share the same malformed disclaimer.
Evaluation sets: what to freeze before you “tune the prompt”
Build a frozen evaluation set from real items: include duplicates, multilingual headlines, unusually short posts, and items with ambiguous timestamps. Score outputs with a mix of automatic checks (banned phrases, required links) and periodic human rubric reviews on a sample.
When someone proposes a clever new instruction, run it against the full eval set—not just the three examples in the Slack thread. Most prompt regressions are “better on my demo, worse on the world.”
Latency budgets end-to-end
Set a target from ingest to live URL—say three minutes for breaking tiers and thirty minutes for digests. Decompose the budget: fetch, normalize, generate, media, publish. If generation dominates, consider smaller contexts or cheaper draft-then-expand flows. If WordPress dominates, fix plugins before you buy a bigger LLM.
Expose queue age as a metric. Rising queue age is an early warning of systemic overload, not “random slowness.”
FAQ: operations and governance
Who approves template changes? At minimum: an editorial owner for voice, an engineer for blast radius, and a security reviewer if prompts include new third-party URLs or tools. Keep approvals in tickets tied to version tags.
How do we stop bad posts quickly? Prefer disabling a template or a feed tier over scrambling to edit individual URLs. Bulk operations should be scripted and logged—panic edits in the CMS do not scale and rarely audit well.
What do we tell readers? A short, plain-language automation disclosure builds trust. It can live in the footer and be linked from automated posts. Avoid both overclaiming (“AI verified everything”) and underclaiming (“may contain errors” with no process).
Long-form note for printouts
If you are reading this as a printed playbook, circle the metrics section and assign owners in pen—literally. Pipelines fail when metrics exist only in Grafana and accountability exists only in Slack memory. The goal of this article is not to convince you that LLMs are magic; it is to help you ship news-like content with the same seriousness you would apply to payments, email delivery, or identity systems. The story is never done when the text looks fluent; it is done when you can trace it, test it, and shut it down safely.
Appendix: model selection without marketing noise
Ignore leaderboard hype; choose models against your eval set and your cost constraints. A slightly smaller model with tight templates often beats a flagship model with loose prompts. Measure latency and failure modes, not benchmark trivia.
Keep two providers in testing even if you standardize on one in production—vendor outages are not theoretical.
Appendix: multilingual pipelines
If you translate, decide whether translation happens before or after templating; both orders have tradeoffs. Ensure proper nouns and legal language are protected by glossary rules. Per-language eval sets are mandatory—do not assume parity across locales.
Appendix: example draft template outlines (copy-paste)
Treat these as contracts your model output must satisfy—store them next to template IDs in git.
template_id: BREAKING_BRIEF
required_sections:
headline: { max_chars: 90, must_include_source_name: true }
dek: { max_chars: 180, ban_claims: [prediction, uncited_quote] }
bullets: { count: 3, each_max_chars: 140, source_link_per_bullet: true }
sources_block: { min_links: 1, label: "Read the primary coverage" }
forbidden: [first_person, investment_advice, medical_diagnosis]
tone: neutral_wire
language: inherit_from_pipelinetemplate_id: WEEKLY_ROUNDUP
required_sections:
title: { pattern: "Week in [Vertical]: {date_range}" }
intro: { max_words: 90, must_state_scope: true }
clusters:
- cluster_headline
- one_paragraph_summary
- link_to_representative_source
footnote: { automation_disclosure: site_policy_url }
caps:
max_clusters: 8
max_tokens_total: 2200
forbidden: [breaking_tense_for_past_events]Appendix: document length and printing
This article is intentionally long enough to print as a standalone operations guide. The diagrams, checklists, and appendices are not decorative—they reduce on-call panic and prevent “tribal” fixes. If your printed copy is heavily annotated, it is succeeding.
