Mapping System Guide
The s3dgraphy mapping system enables import of tabular data from XLSX and SQLite databases by defining how columns map to graph nodes and attributes.
Note
This guide focuses on the practical aspects of creating and using mappings for data import.
Overview
The mapping system consists of:
JSON mapping files that define column-to-attribute mappings
Mapping registry that manages available mappings
Importers (MappedXLSXImporter, PyArchInitImporter) that use mappings
Column normalization for flexible matching
Why Use Mappings?
Mappings solve these problems:
Diverse data formats: Different projects use different column names
Flexible imports: Same importer works with multiple formats
Reusability: Share mappings across projects
Validation: Define required fields and data types
Documentation: Mappings serve as data format documentation
Mapping File Structure
Basic Structure
A mapping file is a JSON document with this structure:
{
"mapping_name": "unique_identifier",
"description": "Human-readable description",
"version": "1.0",
"format_type": "xlsx" or "sqlite",
"table_settings": {
"sheet_name": "Sheet1", // For XLSX
"table_name": "us_table", // For SQLite
"header_row": 0,
"id_column": "US" // Or formula for SQLite
},
"column_mappings": {
"ColumnName": {
"is_id": false,
"required": false,
"node_attribute": "attribute_name",
"default_value": null,
"data_type": "string"
}
},
"node_settings": {
"default_node_type": "US",
"create_properties": true,
"id_format": "{column1}_{column2}"
}
}
Section Breakdown
Metadata Section
{
"mapping_name": "emdb_basic",
"description": "Basic EMdb format for stratigraphic units",
"version": "1.0",
"format_type": "xlsx"
}
mapping_name: Unique identifier (used in code to reference mapping)
description: Human-readable description
version: Mapping version for tracking changes
format_type:
xlsxorsqlite
Table Settings
For XLSX files:
{
"table_settings": {
"sheet_name": "Stratigraphic Units",
"header_row": 0
}
}
For SQLite databases:
{
"table_settings": {
"table_name": "us_table",
"id_column": "sito||'_'||area||'_'||us"
}
}
sheet_name (XLSX): Name of the Excel sheet
header_row (XLSX): Row number containing column headers (0-indexed)
table_name (SQLite): Database table name
id_column (SQLite): Column name or SQL expression for generating IDs
Column Mappings
{
"column_mappings": {
"US": {
"is_id": true,
"required": true,
"node_attribute": "node_id"
},
"Definition": {
"node_attribute": "description",
"default_value": ""
},
"Chronology": {
"node_attribute": "dating"
},
"Material": {
"node_attribute": "material",
"data_type": "string"
},
"Area": {
"node_attribute": "area",
"required": false
}
}
}
Column mapping properties:
is_id: Marks this column as the unique identifier
required: Import fails if column is missing
node_attribute: Target attribute name in the graph node
default_value: Default if cell is empty
data_type: Expected data type (for validation)
Node Settings
{
"node_settings": {
"default_node_type": "US",
"create_properties": true,
"id_format": "{site}_{area}_{us}",
"prefix": "SITE01"
}
}
default_node_type: Node type to create (US, USV, DOC, etc.)
create_properties: Create PropertyNode for each column
id_format: Template for generating node IDs from columns
prefix: Optional prefix for all node IDs
Column Normalization
The mapping system normalizes both Excel column names and JSON mapping keys to enable flexible matching.
Normalization Rules
Convert to uppercase
Replace spaces with underscores
Replace hyphens, slashes, parentheses, brackets, dots with underscores
Replace special dashes (–, —) with underscores
Remove multiple consecutive underscores
Remove leading/trailing underscores
Examples
# Original -> Normalized
"US Number" -> "US_NUMBER"
"Type (Primary)" -> "TYPE_PRIMARY"
"Date - Start" -> "DATE_START"
"Area/Sector" -> "AREA_SECTOR"
"Phase.Name" -> "PHASE_NAME"
This means your mapping can use "US Number" and it will match Excel columns named:
- "US Number"
- "us number"
- "US_NUMBER"
- "us-number"
- "US.Number"
Built-in Mappings
s3dgraphy includes several predefined mappings:
EMdb Mappings
For the EMdb (Extended Matrix Database) format:
emdb_basic - Basic stratigraphic units
{
"mapping_name": "emdb_basic",
"description": "Basic EMdb format for stratigraphic units",
"format_type": "xlsx",
"table_settings": {
"sheet_name": "US",
"header_row": 0
},
"column_mappings": {
"US": {
"is_id": true,
"required": true,
"node_attribute": "node_id"
},
"Type": {
"node_attribute": "node_type",
"default_value": "US"
},
"Definition": {
"node_attribute": "description"
},
"Chronology": {
"node_attribute": "dating"
},
"Material": {
"node_attribute": "material"
}
},
"node_settings": {
"default_node_type": "US"
}
}
emdb_extended - Extended EMdb with all fields
Includes additional columns for conservation state, interpretation, excavation method, etc.
PyArchInit Mappings
For pyArchInit database format:
pyarchinit_us_table - US table mapping
{
"mapping_name": "pyarchinit_us_table",
"description": "PyArchInit US table mapping",
"format_type": "sqlite",
"table_settings": {
"table_name": "us_table",
"id_column": "sito||'_'||area||'_'||us"
},
"column_mappings": {
"sito": {
"node_attribute": "site"
},
"area": {
"node_attribute": "area"
},
"us": {
"is_id": true,
"node_attribute": "us_number"
},
"d_stratigrafica": {
"node_attribute": "description"
},
"interpretazione": {
"node_attribute": "interpretation"
},
"colore": {
"node_attribute": "color"
},
"consistenza": {
"node_attribute": "consistency"
}
},
"node_settings": {
"default_node_type": "US",
"id_format": "{site}_{area}_{us}"
}
}
Creating Custom Mappings
Step 1: Analyze Your Data
First, examine your data structure:
import pandas as pd
# For XLSX
df = pd.read_excel("your_data.xlsx", sheet_name="Sheet1")
print("Columns:", df.columns.tolist())
print("\nFirst row:")
print(df.iloc[0])
Step 2: Create Mapping File
Create a JSON file with your mapping:
{
"mapping_name": "myproject_excavation",
"description": "Custom format for My Project excavation data",
"version": "1.0",
"format_type": "xlsx",
"table_settings": {
"sheet_name": "Excavation Units",
"header_row": 0
},
"column_mappings": {
"Unit ID": {
"is_id": true,
"required": true,
"node_attribute": "node_id"
},
"Unit Type": {
"node_attribute": "node_type",
"required": true,
"default_value": "US"
},
"Description": {
"node_attribute": "description"
},
"Excavator": {
"node_attribute": "excavator"
},
"Date Excavated": {
"node_attribute": "excavation_date",
"data_type": "date"
},
"Area": {
"node_attribute": "area"
},
"Phase": {
"node_attribute": "chronological_phase"
}
},
"node_settings": {
"default_node_type": "US",
"create_properties": true
}
}
Step 3: Save Mapping File
Save to the mappings directory:
# For project-specific mappings
mkdir -p s3dgraphy/mappings/custom
# Save as: myproject_excavation.json
Step 4: Register Mapping
from s3dgraphy.mappings import mapping_registry
import json
# Load mapping file
with open('path/to/myproject_excavation.json') as f:
mapping_data = json.load(f)
# Register mapping
mapping_registry.register_mapping(
'myproject_excavation',
mapping_data
)
Step 5: Use Mapping
from s3dgraphy.importer import MappedXLSXImporter
from s3dgraphy import Graph
graph = Graph("my_excavation")
importer = MappedXLSXImporter(
filepath="excavation_data.xlsx",
mapping_name="myproject_excavation",
graph=graph
)
graph = importer.parse()
print(f"Imported {len(graph.nodes)} units")
Advanced Mapping Features
ID Format Templates
Generate complex IDs from multiple columns:
{
"node_settings": {
"id_format": "{site}_{trench}_{context}_{year}"
},
"column_mappings": {
"Site": {"node_attribute": "site"},
"Trench": {"node_attribute": "trench"},
"Context": {"node_attribute": "context"},
"Year": {"node_attribute": "year"}
}
}
This creates node IDs like: POMPEII_TR01_0045_2024
Conditional Defaults
Use default values for empty cells:
{
"column_mappings": {
"Preservation": {
"node_attribute": "preservation",
"default_value": "unknown"
},
"Excavation Method": {
"node_attribute": "method",
"default_value": "manual"
}
}
}
Creating Property Nodes
Automatically create PropertyNode for each attribute:
{
"node_settings": {
"create_properties": true
}
}
This creates:
- StratigraphicNode for the unit
- PropertyNode for each attribute
- has_property edges connecting them
Multi-Sheet Import
For XLSX files with multiple sheets:
# Create mapping for each sheet
mappings = {
'stratigraphic_units': 'emdb_basic',
'special_finds': 'emdb_finds',
'samples': 'emdb_samples'
}
graph = Graph("multi_sheet_import")
for sheet_name, mapping_name in mappings.items():
importer = MappedXLSXImporter(
filepath="data.xlsx",
mapping_name=mapping_name,
graph=graph
)
graph = importer.parse()
Troubleshooting
Common Issues
Issue: “Column not found” errors
Solution: Check column normalization
# Debug normalization
original_name = "US Number"
normalized = original_name.upper().replace(' ', '_')
print(f"'{original_name}' -> '{normalized}'")
# Check actual Excel columns
df = pd.read_excel("file.xlsx")
print("Actual columns:", df.columns.tolist())
Issue: “No ID column found”
Solution: Ensure one column has "is_id": true
{
"column_mappings": {
"US": {
"is_id": true, // ← Must have this
"required": true,
"node_attribute": "node_id"
}
}
}
Issue: “Required column missing”
Solution: Make column optional or provide default
{
"Material": {
"node_attribute": "material",
"required": false, // ← Make optional
"default_value": "unknown" // ← Provide default
}
}
Debugging Imports
Enable detailed logging:
importer = MappedXLSXImporter(
filepath="data.xlsx",
mapping_name="my_mapping",
graph=graph
)
# Importer prints detailed progress
graph = importer.parse()
# Check warnings
importer.display_warnings()
# Inspect what was imported
print(f"Nodes: {len(graph.nodes)}")
print(f"Node types: {set(n.node_type for n in graph.nodes)}")
Validate Mapping Before Use
def validate_mapping(mapping_data):
"""Validate mapping structure"""
errors = []
# Check required fields
if 'mapping_name' not in mapping_data:
errors.append("Missing 'mapping_name'")
if 'column_mappings' not in mapping_data:
errors.append("Missing 'column_mappings'")
# Check for ID column
has_id = False
for col, config in mapping_data.get('column_mappings', {}).items():
if config.get('is_id'):
has_id = True
break
if not has_id:
errors.append("No column marked as ID (is_id: true)")
if errors:
print("Mapping validation errors:")
for error in errors:
print(f" - {error}")
return False
print("✓ Mapping is valid")
return True
Best Practices
1. Document Your Mappings
Add clear descriptions:
{
"mapping_name": "site_format_2024",
"description": "Excavation format for Site X, Season 2024. Uses area codes and sequential numbering.",
"version": "1.0"
}
2. Version Your Mappings
Update version when changing mapping:
{
"version": "1.1", // Increment when modifying
"changelog": [
"1.1: Added 'preservation' field",
"1.0: Initial version"
]
}
3. Use Consistent Naming
Follow naming conventions:
mapping_name: lowercase_with_underscores
node_attribute: lowercase_with_underscores
Match s3dgraphy node attribute names when possible
4. Test with Sample Data
Always test with a small dataset first:
# Test with first 10 rows
df = pd.read_excel("data.xlsx", nrows=10)
df.to_excel("test_data.xlsx", index=False)
# Import test data
importer = MappedXLSXImporter(
filepath="test_data.xlsx",
mapping_name="my_mapping",
graph=test_graph
)
graph = importer.parse()
# Verify results
print(f"Test import: {len(graph.nodes)} nodes")
5. Handle Missing Data Gracefully
Use defaults and make columns optional:
{
"column_mappings": {
"Optional Field": {
"node_attribute": "optional_attr",
"required": false,
"default_value": null
}
}
}
Example Workflows
Complete XLSX Import Workflow
import json
from s3dgraphy import Graph
from s3dgraphy.importer import MappedXLSXImporter
from s3dgraphy.mappings import mapping_registry
# 1. Load and register mapping
with open('mappings/excavation_2024.json') as f:
mapping = json.load(f)
mapping_registry.register_mapping('excavation_2024', mapping)
# 2. Create graph
graph = Graph("excavation_2024")
# 3. Import data
importer = MappedXLSXImporter(
filepath="data/stratigraphic_units.xlsx",
mapping_name="excavation_2024",
graph=graph
)
# 4. Parse and check warnings
graph = importer.parse()
importer.display_warnings()
# 5. Validate imported data
print(f"✓ Imported {len(graph.nodes)} nodes")
print(f"✓ Imported {len(graph.edges)} edges")
# 6. Export for verification
from s3dgraphy.exporter import export_to_json
export_to_json("output/verification.json", [graph.graph_id])
PyArchInit Database Import
from s3dgraphy import Graph
from s3dgraphy.importer import PyArchInitImporter
# Create graph
graph = Graph("pyarchinit_import")
# Import from database
importer = PyArchInitImporter(
filepath="excavation.db",
mapping_name="pyarchinit_us_table",
graph=graph
)
graph = importer.parse()
importer.display_warnings()
print(f"Imported from SQLite: {len(graph.nodes)} nodes")
See Also
Import and Export - Complete import/export guide
JSON Configuration Files - JSON configuration files
Integration with EM-tools for Blender - EM-tools integration examples
s3dgraphy Classes Reference - Complete API reference