Creating an Extended Matrix from Different Sources
The Extended Matrix (EM) can be created through several pathways, each
suited to different project needs and workflows. This guide covers the
supported methods, focusing on the unified em_data.xlsx flow
introduced in EMtools 1.5 / s3Dgraphy 1.6.
The Knowledge Tree
The Extended Matrix knowledge system works like a tree: the GraphML file is the trunk, providing the stratigraphic sequence, chronological scaffolding, and fundamental relationships between units. The leaves are the detailed, granular data that give richness to each stratigraphic unit: definitions, interpretations, materials, measurements, dating evidence, and so on.
A single em_data.xlsx file carries both the trunk and the leaves
for a given graph. It is consumed in one pass by the UnifiedXLSXImporter
to produce a complete s3Dgraphy graph, ready to be written as GraphML
(for yEd editing) or merged into an already-loaded GraphML (conflict
resolution).
See also
For a full explanation of this architecture, see The Knowledge Tree in the Extended Matrix documentation.
Overview
Three parallel paths are supported for creating and evolving an EM graph:
From GraphML (yEd) — Manual creation or editing of the GraphML file with the yEd graph editor. Full control over the graph structure, traditional stratigrapher-driven workflow.
From em_data.xlsx — Structured tabular input using the unified 5-sheet schema. Two sub-paths:
AI-assisted — copy the StratiMiner prompt into Claude / ChatGPT / Gemini, attach the PDFs, paste the returned xlsx.
Manual — save the empty template and fill it by hand, ideal for migrating pre-existing archaeological databases with explicit stratigraphic relations.
From existing databases — Import from pyArchInit and other tabular sources via the s3Dgraphy mapping system. See From Existing Databases at the end of this page.
Paths 1 and 2 converge on the same in-memory graph and can be mixed freely in the same project.
From GraphML (yEd)
The traditional method for creating an EM is to use the yEd Graph Editor to manually build the GraphML file. This approach gives full control over the graph structure and is well-suited for:
Small to medium stratigraphic sequences;
Projects where the stratigrapher directly builds the graph;
Fine-tuning and validation of automatically generated graphs.
For details on the GraphML structure and node types, see EM Data Tree.
Note
For a comprehensive guide on the Extended Matrix formal language, node types, and how to construct a valid EM graph, refer to the Extended Matrix documentation. The nodes introduction and stratigraphic nodes pages are particularly useful for understanding what each node type represents.
From em_data.xlsx (Unified schema)
The unified xlsx format is a single file with five typed sheets
that together describe both the stratigraphic skeleton and its full
paradata chain. It replaces the legacy two-file workflow
(stratigraphy.xlsx + em_paradata.xlsx) used by earlier EMtools
versions.
The StratiMiner panel in the EMtools EM Bridge tab offers both paths
to create an em_data.xlsx and both paths to use one.
Create em_data.xlsx
Option A — AI-assisted
Open EM Bridge → StratiMiner (Experimental) (requires the Experimental Features flag).
Under CREATE em_data.xlsx, set the Language (default: the same as the source document) and the Documents folder pointing at the directory that holds the source PDFs.
Select the optional toggles:
Validation script — includes a Python snippet the AI must run on its output to catch duplicates, missing references, missing
COMBINER_REASONINGand stratigraphic cycles. Strongly recommended.End-of-session checklist — the AI-side QA list for the final handoff.
Include stratigraphy-only mode — appends an extra section describing the reduced flow for legacy databases with no paradata attribution. Enable only if your source data matches that case.
Click Copy StratiMiner Prompt. The prompt is placed in the clipboard, with the documents-folder path injected and all the toggles applied.
Paste the prompt into your AI assistant (Claude, ChatGPT, Gemini) together with the PDFs. The AI returns a single
em_data.xlsx.
Option B — Manual
Click Save em_data.xlsx Template under Option B. A Save dialog opens; choose a directory. An empty
em_data_template.xlsxis copied from the s3Dgraphy package.Open the template in Excel or LibreOffice. Every header cell carries a tooltip that describes the expected content.
Fill the five sheets (see The 5-sheet schema below). The minimal required content is: at least one row in
Units,AuthorsandClaims.
Both options produce the same file format and are interchangeable.
Use em_data.xlsx
Path A — Build a brand-new GraphML
Under USE em_data.xlsx, pick the
em_data.xlsxfile.Optionally tick Also write .graphml on import and pick the output path. (The panel auto-suggests one next to the xlsx.)
Click Build GraphML from em_data.xlsx. The xlsx is parsed by
UnifiedXLSXImporterinto a fresh in-memory graph, and (if you enabled it) immediately written out as.graphml.The resulting
.graphmlcan be opened in yEd for visual editing, or imported back into EMtools via the standard Import EM file flow.
Path B — Merge into an already-loaded GraphML
Make sure a GraphML is loaded and active in the EM tree tab.
Under USE em_data.xlsx → Merge into active GraphML, click Merge into Active Graph….
A file picker opens — select the
em_data.xlsx. The merger auto-detects the unified 5-sheet schema (falls back to the legacy stratigraphy.xlsx format for backward compatibility) and compares it with the active graph.Differences surface in the Conflict Resolution panel: qualia added, qualia value changed, new per-claim attribution sources, added authors / documents / epochs, relation-edge attribution changes. You accept or reject each conflict.
Apply the resolutions; accepted changes are written into the active in-memory graph. Save the graph with Save GraphML / Save As… to persist the merged state to disk.
The 5-sheet schema
An em_data.xlsx file has exactly five sheets, in this order:
1. Units — the stratigraphic skeleton
Column |
Required |
Description |
|---|---|---|
|
Yes |
Unique unit id ( |
|
Yes |
Stratigraphic class: |
|
No |
Short human label. Falls back to |
Units only declares the existence of a node. Every fact about it
(dimensions, materials, datation, relationships) goes into the
Claims sheet.
2. Epochs — swimlanes and non-overlapping phases
Column |
Required |
Description |
|---|---|---|
|
Yes |
Short phase code ( |
|
Yes |
Human-readable name ( |
|
Yes |
Start year as an integer (negative = BCE) |
|
Yes |
End year as an integer |
|
No |
Swimlane fill colour ( |
Epochs must be non-overlapping. If a unit spans multiple phases,
it is a single belongs_to_epoch claim pointing at its primary
phase; additional survival spans are handled by survive_in_epoch
edges added by the downstream chronology resolver.
3. Claims — the long-table, one row per asserted fact
Every piece of information about a unit (or an epoch) lives here. A row carries one of four kinds of content:
Scalar qualia —
PROPERTY_TYPE∈definition,material_type,length,width,height,shape,conservation_state,interpretation,comparanda, …Temporal qualia —
absolute_time_start/absolute_time_end. Feed the DP-32 chronology resolver.Epoch membership —
belongs_to_epochwithTARGET2_IDpointing at anEpochs.ID.Stratigraphic relation —
overlies,cuts,fills,abuts,bonded_to,equals,is_after…;TARGET_IDis the source endpoint,TARGET2_IDthe target endpoint.
Each row also carries its own per-claim attribution:
Column group |
Fields |
Meaning |
|---|---|---|
Attribution #1 |
|
The verbatim excerpt ( |
Attribution #2 (optional) |
|
Second converging source. When both #1 and #2 are populated,
|
4. Authors — the normalized author catalog
Column |
Required |
Description |
|---|---|---|
|
Yes |
|
|
Yes |
|
|
No |
Human-readable display ( |
|
No |
ORCID for humans; model version / pipeline id for AI agents |
|
No |
Institutional affiliation |
5. Documents — the normalized source catalog
Column |
Required |
Description |
|---|---|---|
|
Yes |
|
|
Yes |
Filename on disk |
|
No |
Full bibliographic title |
|
No |
Publication year |
|
No |
Comma-separated |
From Existing Databases
EMtools supports import from archaeological database systems via s3Dgraphy’s mapping system.
pyArchInit
pyArchInit is an archaeological information system based on QGIS.
Architecture
The integration uses three independent layers — each one can be maintained, upgraded, or replaced without breaking the others.
+------------------------------------------------------------------+
| |
| PyArchInit project (QGIS plugin) |
| - Stratigraphic records, 2D GIS data |
| - Maintained by Luca Mandolesi and the PyArchInit community |
| |
+-----------------------------+------------------------------------+
|
| s3Dgraphy library (mapping layer)
| - Reads the PyArchInit database
| - Either references records live,
| or bakes them into the EM graph
| - pyarchinit_us_mapping is the
| canonical mapping for the US table
|
v
+------------------------------------------------------------------+
| |
| EM-Tools (Blender add-on) |
| - Consumes the s3Dgraphy graph |
| - Drives the Extended Matrix workflow |
| |
+------------------------------------------------------------------+
Two integration modes
Connection mode (recommended for live projects) — the PyArchInit database stays the source of truth for stratigraphic records. s3Dgraphy reads it on demand. Changes in PyArchInit propagate to EM on the next read.
Bake mode — the PyArchInit records are imported once into the EM graph as auxiliary nodes. Subsequent edits happen on the EM side. Useful for archive projects or for finalised excavations.
Operational workflow
There are two ways to use pyArchInit data with the Extended Matrix:
1. Generate GraphML from pyArchInit (creating the trunk)
pyArchInit has a built-in tool that can export stratigraphic data directly as a GraphML file in Extended Matrix format. This is the recommended approach when you want to create a new EM graph from an existing pyArchInit database. See the pyArchInit documentation on the HerRIS Matrix for Extended Matrix Tool (in Italian).
2. Import pyArchInit as auxiliary file (adding leaves)
When you already have a GraphML and want to enrich it with property
data from a pyArchInit database, you can add it as an auxiliary
file in EMtools. In this mode, the pyArchInit SQLite database is
imported using the pyarchinit mapping type, and properties are
added to existing graph nodes (matched by unit ID). The graph
structure is not modified.
To import as auxiliary:
Import your GraphML into EMtools first
In the EM Data Tree panel, add an auxiliary file
Select file type pyArchInit
Select the SQLite database file
Choose the appropriate mapping (
pyarchinit_us_mapping)Click Import — properties from the database are added to matching nodes
Legacy two-file workflow (deprecated)
Before EMtools 1.5 the AI-assisted flow produced two files
(stratigraphy.xlsx + em_paradata.xlsx) that had to be imported
in separate steps. That workflow is deprecated but still usable for
backward compatibility:
The
MappedXLSXImporter+QualiaImporterpair is still shipped and registered.The
em.merge_xlsx_startoperator auto-detects the legacy schema (sheet namedStratigraphy) and falls back to the old importer path.Legacy xlsx files can be converted to the unified schema by: import with the legacy pair → resulting graph exported to
em_data.xlsxviaUnifiedXLSXExporter.
New projects should use the unified em_data.xlsx schema from the
start.