em_data

The em_data.xlsx workbook is the tabular root of the Extended Matrix — the place where the leaves of the knowledge tree live as spreadsheet rows. Together with the GraphML file (the visual root) and the JSON export (the publication root), it forms one of the three file-shaped surfaces from which an s3D Graph can be reconstructed in memory and from which all downstream artifacts (em-graph in yEd, Heriverse JSON, the CIDOC triplestore) are derived.

This page is the canonical reference for the workbook itself: its sheets, its conventions, and the contracts each concept page (Property, Document, Author, License, Activity, Location, …) satisfies when it projects itself onto a column.

Note

This is a living reference. The base shape (5 sheets, the Units / Epochs / Claims / Authors / Documents skeleton already described in The Knowledge Tree) is stable. The per-concept column contracts are the edge that grows: each new concept page in this manual closes one such contract. Readers following an older revision of the workbook should look at the concept pages first — they are now the source of truth for which columns belong to which concept.

Why a tabular companion to the graph?

The graph is good at structure: nodes, typed arcs, propagation rules. A spreadsheet is good at the leaves: descriptions, dates, dimensions, literature references — the long-form data that no graph editor can edit pleasantly. em_data.xlsx is the single place where those leaves live, so that:

  • Humans can author them with familiar tools (Excel, LibreOffice, Google Sheets, pandas).

  • AI extractors (StratiMiner is the canonical one — see The Knowledge Tree) can produce them as their natural output.

  • The s3Dgraphy UnifiedXLSXImporter can ingest them into an in-memory s3D Graph from which a GraphML, a JSON, or a CIDOC triplestore can all be regenerated deterministically.

The contract is bidirectional: the workbook is both an input format (plant the matrix from a spreadsheet) and an output format (export the in-memory graph back into a workbook for hand-editing). Round-trip is preserved as long as you stick to the conventions on this page.

Workbook shape

The base em_data.xlsx has five sheets:

Sheet

Purpose

Units

One row per stratigraphic unit (real or virtual): ID, type (US / USVs / USVn / SF / VSF / USD …), description, interpretation, building technique, materials, dimensions, and the per-concept columns described below (epochs, locations, activities, …).

Epochs

The temporal scaffolding: epoch ID, name, start / end, description. Referenced from Units and from per-property temporal stamps.

Claims

The paradata layer: each row is one interpretive claim about a Unit (e.g. an attribution of dimension, a chronological inference, a building-technique determination), with the Authors and Documents that justify it.

Authors

Registry of human and AI authors. Referenced by ID from Claims and from header metadata.

Documents

Registry of source documents (photographs, reports, drawings, analyses), with role / content_nature / geometry classifiers and a relative path into DosCo/. Referenced by ID from Claims and from interpretive columns.

For the broader project layout (my_project.graphml / em_data.xlsx / source_list.xlsx / DosCo/) see Nodes overview and Project Organization and Workflow.

Authoring conventions

Three small conventions make the workbook robust under round-trip and under collaboration. Every concept page in this manual assumes them.

Multi-valued cells

When a column accepts more than one value, values are separated by semicolon (;) with a single space after each separator:

topo:Pompei; study:Saggio4/A2; func:Casa_del_Fauno/Room12

Trailing semicolons are tolerated. Quoting is not required — the separator is reserved.

Hierarchical paths

When a value is hierarchical, the levels are joined by forward slash (/), root first, with no surrounding spaces:

Pompei/Sector_4/Casa_del_Fauno/Room_12

Slashes inside a single level are not supported — use underscores or camel case to disambiguate.

Kind prefixes

When a column carries values that belong to multiple epistemic planes, each value is prefixed with a short kind tag and a colon. The Location concept uses this convention with three tags (topo:, study:, func:); the same pattern is reused by other concepts that need plane-discrimination on the same axis. See the relevant concept page for the tag vocabulary.

Primary value

When a column accepts multiple values and the formalism allows one of them to be primary (the typical case for membership relations rendered as yEd group folders in em-graph), the first value is treated as primary by default. Concept pages that override this default say so explicitly.

Concept-by-concept projection

Each concept formalised in this manual describes its own column contract on its dedicated page. The pages currently closed are:

Concept

Sheet · column(s)

Manual page

Location

Units · location_paths

Location

Activity

Units · activity_path

Activity Nodes Group

(more to come)

This list grows as concept pages migrate to the new template.

If you are looking for a column that does not appear above yet, the authoritative source is the column header in the workbook produced by s3dgraphy’s exporter; the concept-page contracts will catch up during the EM 1.5 / 1.6 documentation refactor.

Round-trip with AI extraction

The The Knowledge Tree page describes how an em_data.xlsx can be produced by AI from PDFs and field notes via the StratiMiner prompt. The conventions on this page are the contract that the AI output is held to: a workbook produced by StratiMiner that respects multi-valued cells, hierarchical paths, and kind prefixes is ingestible by s3Dgraphy’s UnifiedXLSXImporter without manual post-processing. Concept pages that introduce a new column also publish, on their pages, the prompt fragment that StratiMiner uses to populate it.

See also

  • The Knowledge Tree — the wider metaphor: graph as trunk, workbook as one of the three roots, AI / yEd / pyArchInit as entry points.

  • Nodes overview — the project-folder shape that hosts em_data.xlsx alongside the GraphML and the DosCo/ directory.

  • Location — first concept page that closes a column contract on this template.