Document Nodes: Managing Archaeological Sources
Introduction
Document nodes (also known as source nodes) are fundamental elements in the Extended Matrix framework, representing primary and secondary sources that support our archaeological interpretations. As described in the paradata nodes section, they form part of the validation chain for archaeological properties. This chapter provides an operational deep-dive into how to effectively manage and organize these sources in practice. Each document is assigned a unique identifier (e.g., “D.01”, “D.02”) that serves as a reference throughout the documentation process.
Note
The structured version of the Document (Source) Nodes is available in JSON format at EM Blender Tools - Document Types JSON.
new 3D representation of Document Nodes
Document nodes can be represented in 3D space as a collection of digital assets, each corresponding to a specific source. These assets can be visualized in a virtual environment, providing a spatial representation of the documentation sources. The 3D representation is normally created withih the context of a 3D model of the archaeological site or object. In the EM framework, the 3D representation of document nodes is used to visualize the spatial distribution of sources and their relationships to the archaeological properties they validate. They are created using the Blender software and can be exported along with the overall scene in the GLTF format to be reused in EMviq or in Heriverse web-app.
Document Types and Classification
The Extended Matrix framework organizes documentation into standardized categories, each mapped to established cultural heritage vocabularies.
Spatial Documentation
- Getty AAT:
- CIDOC CRM:
E36_Visual_Itemwith property :property:`P67_refers_to`- Dublin Core:
dcterms:spatial
Documents containing spatial information and measurements:
- 3D Models
Formats: GLTF, OBJ, PLY, FBX, 3DS, E57
Key extractions: dimensions, spatial relationships, geometric features
- Required metadata:
creation_date
creator
software_used
coordinate_system
spatial_resolution
- Optional metadata:
accuracy_assessment
processing_workflow
registration_method
point_cloud_density
- Supported extractors:
3D model analysis
Geometric analysis
Spatial pattern analysis
- Technical Drawings
Formats: DWG, DXF, PDF, SVG
Key extractions: dimensions, construction details, spatial layout
- Required metadata:
creation_date
author
scale
drawing_type
reference_system
- Optional metadata:
revision_history
drawing_conventions
associated_specifications
Scientific Documentation
- Getty AAT:
- CIDOC CRM:
E31_Documentwith property :property:`P140_assigned_attribute_to`
- Material Analysis Reports
Formats: PDF, DOCX, XLSX
- Key extractions:
material_composition
physical_properties
chemical_properties
degradation_patterns
- Required metadata:
analysis_date
laboratory
analysis_method
sampling_strategy
analyst
- Optional metadata:
equipment_used
calibration_data
error_margins
- Dating Analysis Reports
Formats: PDF, DOCX, XLSX
- Key extractions:
absolute_date
date_range
dating_method_reliability
chronological_context
- Required metadata:
analysis_date
laboratory
dating_method
sample_description
calibration_curve
Historical Documentation
- Getty AAT:
- CIDOC CRM:
E31_Documentwith property :property:`P70_documents`- Dublin Core:
dcterms:source
- Archival Documents
Formats: PDF, TXT, DOCX, TIFF
- Key extractions:
historical_context
construction_history
ownership_history
modification_events
- Required metadata:
archive_reference
document_date
document_type
archival_location
- Optional metadata:
transcription_details
preservation_state
access_restrictions
- Historical Photographs
Formats: TIFF, JPG, PDF
- Key extractions:
historical_appearance
temporal_changes
architectural_features
urban_context
- Required metadata:
photo_date
photographer
archive_reference
subject_location
- Optional metadata:
camera_details
print_type
negative_reference
Conservation Documentation
- Getty AAT:
- CIDOC CRM:
E31_Documentwith property :property:`P140_assigned_attribute_to`- Dublin Core:
dcterms:provenance
- Condition Reports
Formats: PDF, DOCX, XLSX
- Key extractions:
conservation_state
degradation_patterns
risk_factors
intervention_priorities
- Required metadata:
assessment_date
assessor
assessment_method
condition_classification
- Optional metadata:
environmental_data
previous_treatments
monitoring_history
- Intervention Reports
Formats: PDF, DOCX, XLSX
- Key extractions:
treatment_methods
materials_used
intervention_results
follow_up_recommendations
- Required metadata:
intervention_date
conservator
intervention_type
materials_used
documentation_method
- Optional metadata:
preliminary_tests
environmental_conditions
post_treatment_monitoring
Note
All Getty AAT links point to the Art & Architecture Thesaurus, providing standardized terminology for cultural heritage documentation. CIDOC CRM mappings follow the latest version (7.1.1) of the standard.
Source List Tool
Fig. 21 The Source List tool provides a structured approach to collecting and organizing documentary sources. Each row represents a document with its metadata and potential validation properties.
The Source List is designed to track: * Document identification (unique ID) * Description of the source * Original bibliographic reference or URL * Properties that can be validated using this source * Document type (3D model, photo, drawing, text, etc.) * Preview (when available)
Source List schema
Added in version 1.3: Introduced as the formalized source list for data collection.
The Source List is a single-purpose XLSX file (source_list.xlsx)
sitting at the project root next to the .graphml. It registers
every bibliographic and archival source referenced by Document nodes
in the graph and assigns each one a stable project-local identifier
(D.NN) that propagates to the DosCo folder and to the graph itself.
Column reference
Column |
Purpose |
Format |
Example |
Required |
|---|---|---|---|---|
Name |
Project-local unique ID |
|
|
yes |
Description |
Natural-language description of the source |
Free text, ~1 sentence |
“Photogrammetric model of the Great Temple, 2015” |
yes |
Url |
Citation / DOI / web URL |
Bibliographic citation or URL |
“Daicoviciu H. et al., Sargetia XIV, 1979” |
recommended |
Property that can validate |
Qualia / properties this source can support |
Comma-separated names (see Properties (Qualia)) |
|
recommended |
original id. |
Archive or library reference |
Free text |
“ASR, Fondo Disegni, b.12, c.34r” |
optional |
Type |
Source typology |
Free text |
|
yes |
Preview |
Optional thumbnail |
Embedded image cell |
— |
optional |
Notes |
Free-form annotations |
Free text |
“OCR quality low for pp. 142–148” |
optional |
Worked example (excerpt)
Name |
Description |
Url |
Property that can validate |
Type |
Notes |
|---|---|---|---|---|---|
D.01 |
Photogrammetric model of the Great Temple |
Demetrescu E., 2015 (unpublished) |
geometry, material, elevation, surface_treatment |
3D |
|
D.02 |
Excavation report 1975–1977 |
Daicoviciu H. et al., Sargetia XIV, 1979, pp. 139–154 |
stratigraphy, architecture, dimensions, construction_technique |
OCR low pp. 142–148 |
Note
A revised schema with a two-sheet split (Analytical Sources / Comparative Sources), a closed Type controlled vocabulary and an explicit mapping to the DocumentNode three-axis classification is being prepared for EM 1.6 under DP-58. See the development projects index at https://docs.extendedmatrix.org/projects/development-projects/ for the design status. The schema documented above remains the stable, supported one for the entire 1.5.x line.
See also
Extractor Types — how the Property that can validate column drives the validation chain.
Properties (Qualia) — the property vocabulary used in column 4.
Project Organization and Workflow — DosCo folder layout and
D.NNID propagation from the Source List to the file system.
Team Organization: The Source Hunter
The collection and organization of sources can be efficiently managed by assigning a dedicated team member (the “source hunter”) to: * Search and collect relevant documentation from libraries and archives * Organize digital resources * Maintain the source list * Track validation properties for each source
Document Organization: The DosCo System
Sources are organized in a Dossier Comparativ (DosCo) folder structure where:
Each document maintains its unique identifier as a prefix
Original filenames are preserved after the prefix
Digital files follow the naming convention:
D.XX_original_filename.extension
Example:
DosCo/
├── D.01_photogrammetric_survey_temple.pdf
├── D.02_dodwell_engraving_1834.jpg
├── D.03_castrum_reconstruction.pdf
└── ...
Properties Validation Column
A key feature of the Source List is the “Property that can validate” column, which: * Identifies specific properties that can be validated using each source * Helps in building the validation chain through paradata nodes * Guides the creation of extractor nodes * Supports evidence-based property documentation
Examples of validation properties: * Geometrical measurements * Material identification * Construction techniques * Architectural details * Site morphology * Spatial relationships
Best Practices
Source Collection: * Systematically search both physical and digital archives * Document the origin and reliability of each source * Maintain high-quality digital copies
Documentation: * Use consistent naming conventions * Keep the Source List updated * Link sources to specific properties they can validate
Team Coordination: * Assign clear responsibilities for source collection * Regular updates to the Source List * Clear communication about validation needs
Digital Organization: * Maintain organized DosCo folders * Use consistent file naming * Ensure proper backup of digital sources
This systematic approach to source management ensures that: * All interpretations are properly documented * Sources are easily retrievable * The validation chain remains clear and verifiable * Team members can efficiently collaborate on documentation