Help:Ephemera Agent/Context Assembly

From Encyclopedia Ephemera

Overview

Before any generative LLM call, agent.php assembles a structured context package from the wiki. This ensures the generator model has access to relevant canon, instructions, and evidence without needing to make additional API calls during generation.

The pipeline is active when agent.php is deployed on your server. For simple query tasks, it is bypassed entirely via a fast pre-check.


When Context Assembly Is Bypassed

The system may skip context assembly and call the LLM directly when:

  • The task is classified as a simple query
  • No page creation or editing is required
  • The user is performing a quick lookup, browse action, or lightweight maintenance task

Pipeline Steps

  1. Load Instructions:Config from the wiki to establish runtime parameters
  2. Fast pre-check (is_obviously_query()) — if the task is clearly read-only (starts with "show", "list", "search", etc. and contains no generative keywords), skip the entire pipeline and route directly to the generator
  3. Planner LLM call — classify task type and extract named entities (cheap/fast model)
  4. Load World Bible — always first; anchors all generation to the universe
  5. Load core instructions — Canon Policy, Continuity Rules
  6. Resolve instruction dependency graph — BFS traversal of requires links, up to max_depth hops
  7. Fetch encyclopedia pages — one per named entity
  8. 1-hop link expansion — fetch top-k related pages ranked by co-occurrence frequency across seed pages
  9. Search Sources: namespace — top results per entity
  10. Fetch filtered talk sections — Reliability Assessment, Bias Analysis, Editorial Notes only (never full talk pages)
  11. Wikipedia lookup — if enabled, fetch the introductory paragraph for each entity from the Wikipedia API
  12. Deduplicate — remove pages that appear in multiple sections
  13. Rank by keyword overlap — score pages by overlap with the task description using TF-style scoring
  14. Trim to token budget — drop lowest-priority content first (Wikipedia → talk → sources → encyclopedia). Instructions are never trimmed.
  15. Serialise — assemble into a structured prompt string with section headers

Context Prompt Format

The assembled context is prepended to the generator's system prompt in this order:

[INSTRUCTIONS]
--- Instructions:World Bible ---
{content}
--- Instructions:Core/Canon Policy ---
{content}
...

[ENCYCLOPEDIA CONTEXT]
--- PageTitle ---
{content}
...

[SOURCES]
--- Sources:DocumentTitle ---
{content}
...

[EDITORIAL NOTES]
--- Talk: Sources:DocumentTitle ---
= Reliability Assessment =
{content}
= Bias Analysis =
{content}
...

[REAL-WORLD REFERENCE — Wikipedia]
Factual real-world information for grounding only.
Do NOT treat as in-universe canon for Encyclopedia Ephemera.
--- EntityName ---
Wikipedia: ArticleTitle
{intro paragraph}

Task Types

The planner model classifies each task into one of these types:

Type Description
create_encyclopedia_article Create a new encyclopedia article about an entity, event, or concept
create_source Create an in-universe source document
edit_page Edit or update an existing wiki page
maintenance_report Produce a maintenance or audit report
query Read-only information retrieval
batch_operation Perform the same operation across multiple pages
stub_expansion Expand a stub or incomplete article
red_link_generation Create a page for a detected red link
unknown Could not be determined — defaults to generative path

Additional task types can be added via Instructions:Config without PHP changes.

Instruction Dependency Graph

Each instruction page declares its dependencies in a requires metadata field. The pipeline resolves this graph using breadth-first traversal:

  • max_depth (default: 2) — stop after this many hops
  • fully_resolve (default: false) — if true, ignore max_depth and resolve the complete graph

Example: selecting Instructions:Create/Source/Interview automatically also fetches Instructions:Create/Source (Base Workflow) (depth 1) and Instructions:Core/Continuity Rules (depth 2).

Debugging

The ▶ CONTEXT panel in the agent log shows what was assembled after each context-aware call. For deeper debugging, the raw context_meta object is included in every agent.php response and visible in browser developer tools.