Latest revision as of 16:29, 12 May 2026

Instruction Metadata
id	create-source-ingest
type	workflow
applies_to	ingest
task_type
priority	high
status	active
canonical	true
include_by_default	yes
requires
tags	source,ingest,entities,pre-linking

Summary

Workflow for ingesting source documents into Encyclopedia Ephemera. Extracts named entities, creates a Sources: page, and links it to related encyclopedia pages. Entity extraction uses a staged pipeline: deterministic wikilink detection first, then optional LLM calibration guided by this page's configuration.

Workflow

User fills in source metadata (title, author, date, type, publisher, url, summary) in the ingestion form.
User pastes the source document.
System extracts explicit wikilinks from the document and validates each against the wiki.
If AI extraction is enabled: LLM identifies additional named entities using the entity type schema below.
User reviews the three-group entity list (Known / Suggested / Low Confidence), adds or removes entities.
User confirms. System creates the Sources: page with metadata, Related Pages section, and provenance comment.

Integration Decisions

After source creation, the integration review workflow (see Instructions:Maintenance/Source Integration Review) evaluates how related encyclopedia pages should be updated. Valid decisions per candidate:

no_action: Article already covers this information adequately.
citation_only: Add a citation link only; no content change needed.
citation_with_note: Add citation plus a brief inline note.
expansion_needed: Article requires substantive expansion using this source.
contradiction_review: Source conflicts with existing article content; flag for human review.
new_page: Entity appears in source but has no encyclopedia article yet.
defer: Insufficient information to decide now; revisit when more sources exist.

Most candidates should receive no_action. Only create integration tasks for clear, high-confidence needs.

Pre-Linking Configuration

PHP reads the values below at ingestion time. Edit these to tune behaviour for your deployment.

source_subtypes: Available source type options for the ingestion form dropdown.; News Article; Interview; Personal Log; Official Statement; Academic Paper; Corporate Advertisement; Government Resolution; Government Report; Propaganda Broadcast; Legal Document

pre_link_min_title_length: Skip wiki titles shorter than this character count. Default: 4.; 4

pre_link_stoplist: Wiki page titles to skip even when they appear in a document. These are titles that match too broadly — real words that are also article names but shouldn't be auto-linked.; Source; Project; Help; Template; Category; Sol; Earth; Mars; Energy; Field; Law; Station

pre_link_prefixes: Honorific and title prefixes to strip when matching entity names. One entry per line.; Dr.; Prof.; Cmdr.; Admiral; Captain; Director; Chief; Minister; Secretary; Commissioner; The; A; An

Entity Ontology

These fields tell the LLM what kinds of named entities Encyclopedia Ephemera tracks, and provide examples to anchor its extraction. Edit examples as the wiki grows.

entity_types: Type schema for LLM entity extraction. Format: TypeName: short description.; People: named individuals — characters, officials, journalists, scientists, historical figures; Places: locations, regions, habitats, stations, settlements, orbital structures, planetary bodies; Organisations: factions, corporations, governments, institutions, fleets, unions, authorities; Events: named incidents, treaties, conflicts, discoveries, programmes, missions, crises; Technologies: named systems, vessel classes, devices, protocols, artefacts, programmes

example_entities: Hand-curated examples per type, used to anchor LLM extraction. Format: TypeName: Example1, Example2, Example3.; People: Alex Chambers, Maya Sato, Director Chen Wei; Places: New Troy, AquaNebula, Arcadia, Yuemin District; Organisations: Jovian Union, MercuryLink, Hegemony Worlds Authority; Events: Yuemin District Unrest; Technologies: Asterion Protocol

LLM Extraction

boilerplate_filter_instruction: Appended verbatim to the LLM entity extraction prompt.; Do NOT return: volume numbers, issue numbers, page numbers, journal names, publisher names, citation fragments, partial strings, dates, generic terms, or common English words that are not proper nouns. Do NOT return single letters, abbreviations without clear referents, or entries from the pre-link stoplist above.