Instructions:Create/Source/Ingest

From Encyclopedia Ephemera
Revision as of 16:29, 12 May 2026 by EphemeraAdmin (talk | contribs) (1 revision imported)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Instruction Metadata
id create-source-ingest
type workflow
applies_to ingest
task_type
priority high
status active
canonical true
include_by_default yes
requires
tags source,ingest,entities,pre-linking


Summary

Workflow for ingesting source documents into Encyclopedia Ephemera. Extracts named entities, creates a Sources: page, and links it to related encyclopedia pages. Entity extraction uses a staged pipeline: deterministic wikilink detection first, then optional LLM calibration guided by this page's configuration.

Workflow

  1. User fills in source metadata (title, author, date, type, publisher, url, summary) in the ingestion form.
  2. User pastes the source document.
  3. System extracts explicit wikilinks from the document and validates each against the wiki.
  4. If AI extraction is enabled: LLM identifies additional named entities using the entity type schema below.
  5. User reviews the three-group entity list (Known / Suggested / Low Confidence), adds or removes entities.
  6. User confirms. System creates the Sources: page with metadata, Related Pages section, and provenance comment.

Integration Decisions

After source creation, the integration review workflow (see Instructions:Maintenance/Source Integration Review) evaluates how related encyclopedia pages should be updated. Valid decisions per candidate:

no_action
Article already covers this information adequately.
citation_only
Add a citation link only; no content change needed.
citation_with_note
Add citation plus a brief inline note.
expansion_needed
Article requires substantive expansion using this source.
contradiction_review
Source conflicts with existing article content; flag for human review.
new_page
Entity appears in source but has no encyclopedia article yet.
defer
Insufficient information to decide now; revisit when more sources exist.

Most candidates should receive no_action. Only create integration tasks for clear, high-confidence needs.

Pre-Linking Configuration

PHP reads the values below at ingestion time. Edit these to tune behaviour for your deployment.

source_subtypes
Available source type options for the ingestion form dropdown.
News Article
Interview
Personal Log
Official Statement
Academic Paper
Corporate Advertisement
Government Resolution
Government Report
Propaganda Broadcast
Legal Document
pre_link_min_title_length
Skip wiki titles shorter than this character count. Default: 4.
4
pre_link_stoplist
Wiki page titles to skip even when they appear in a document. These are titles that match too broadly — real words that are also article names but shouldn't be auto-linked.
Source
Project
Help
Template
Category
Sol
Earth
Mars
Energy
Field
Law
Station
pre_link_prefixes
Honorific and title prefixes to strip when matching entity names. One entry per line.
Dr.
Prof.
Cmdr.
Admiral
Captain
Director
Chief
Minister
Secretary
Commissioner
The
A
An

Entity Ontology

These fields tell the LLM what kinds of named entities Encyclopedia Ephemera tracks, and provide examples to anchor its extraction. Edit examples as the wiki grows.

entity_types
Type schema for LLM entity extraction. Format: TypeName: short description.
People: named individuals — characters, officials, journalists, scientists, historical figures
Places: locations, regions, habitats, stations, settlements, orbital structures, planetary bodies
Organisations: factions, corporations, governments, institutions, fleets, unions, authorities
Events: named incidents, treaties, conflicts, discoveries, programmes, missions, crises
Technologies: named systems, vessel classes, devices, protocols, artefacts, programmes
example_entities
Hand-curated examples per type, used to anchor LLM extraction. Format: TypeName: Example1, Example2, Example3.
People: Alex Chambers, Maya Sato, Director Chen Wei
Places: New Troy, AquaNebula, Arcadia, Yuemin District
Organisations: Jovian Union, MercuryLink, Hegemony Worlds Authority
Events: Yuemin District Unrest
Technologies: Asterion Protocol

LLM Extraction

boilerplate_filter_instruction
Appended verbatim to the LLM entity extraction prompt.
Do NOT return: volume numbers, issue numbers, page numbers, journal names, publisher names, citation fragments, partial strings, dates, generic terms, or common English words that are not proper nouns. Do NOT return single letters, abbreviations without clear referents, or entries from the pre-link stoplist above.