eCTD XML Backbone: Complete Technical Guide to Structure, DTD, and Validation
The eCTD XML backbone consists of index.xml and regional.xml files that serve as the navigational foundation for regulatory submissions. Index.xml defines the structure of Modules 2-5 using leaf elements with unique IDs, MD5 checksums, and lifecycle operations (new, replace, append, delete), while regional.xml files contain Module 1 content specific to FDA, EMA, Health Canada, or PMDA requirements. XML validation errors account for 23% of all eCTD gateway rejections, making backbone validation critical for submission success.
The eCTD XML backbone is the foundational XML file structure that provides navigation, metadata, and lifecycle management for Electronic Common Technical Document (eCTD) submissions. Consisting of the index.xml master file and region-specific XML files (us-regional.xml, eu-regional.xml), the eCTD XML backbone creates the hyperlinked structure that enables regulatory reviewers to navigate through thousands of documents efficiently.
For regulatory teams, understanding the eCTD XML backbone is essential for successful submission publishing. XML validation errors account for 23% of all gateway rejections - the single largest category of eCTD failures. A malformed backbone file means your entire submission cannot be processed, regardless of how perfect your PDF documents are.
In this guide, you'll learn:
- The complete technical structure of index.xml and how leaf elements define document metadata
- Regional.xml requirements for FDA, EMA, Health Canada, and PMDA submissions
- DTD and schema specifications including ich-ectd-3-2.dtd and v4.0 XSD
- Lifecycle operations (new, replace, append, delete) and when to use each
- Common eCTD XML validation errors and how to prevent them
What Is the eCTD XML Backbone?
The eCTD XML backbone is the set of XML files (index.xml and regional.xml) that serve as the structural foundation and navigation layer for an eCTD submission, defining document inclusion, location, relationships, and lifecycle changes across submission sequences. The backbone enables regulatory reviewers and automated systems to process submissions and validates against Document Type Definition (DTD) or XML Schema (XSD) specifications.
The eCTD XML backbone is the set of XML files that serve as the structural foundation and navigation layer for an eCTD submission. These XML files define which documents are included, where they are located, how they relate to each other, and how they change across submission sequences.
Key characteristics of the eCTD XML backbone:
- Creates the navigable table of contents for regulatory reviewers
- Stores metadata for every document (title, location, checksum, operation)
- Enables lifecycle management across multiple submission sequences
- Validates against Document Type Definition (DTD) or XML Schema (XSD)
- Supports hyperlinked navigation between related documents
The eCTD XML backbone was introduced in ICH M8 version 1.0 (2003) and has evolved through version 3.2.2 (DTD-based) to version 4.0 (XSD-based), with v3.2.2 remaining the most widely implemented globally as of 2026.
The backbone consists of two primary file types that work together:
| File Type | Purpose | Scope |
|---|---|---|
| index.xml | Master navigation file containing Modules 2-5 structure | Global (harmonized) |
| regional.xml | Region-specific Module 1 content and metadata | Regional (varies by agency) |
Together, these files create a complete map of the submission that both human reviewers and automated validation systems use to process the eCTD.
Understanding eCTD XML Structure: index.xml Explained
The index.xml file is the heart of the eCTD XML backbone. It serves as the master table of contents for the entire submission, defining the structure of harmonized Modules 2 through 5 and linking to regional XML files for Module 1 content.
index.xml File Location and Purpose
The index.xml file must be located at the root of each sequence folder:
index.xml Structure Breakdown
The index.xml follows a hierarchical structure matching the eCTD module organization:
Always validate your index.xml against the correct DTD version before submission. Version mismatches (e.g., using v3.2.2 DTD when your XML declares v4.0) are among the easiest errors to introduce and hardest to catch without proper validation tools. Use XML editors with built-in DTD validation to catch these errors in real-time.
XML Declaration and DTD Reference
Every index.xml must begin with the XML declaration and DTD reference:
| Element | Purpose | Required |
|---|---|---|
| `<?xml version="1.0" encoding="UTF-8"?>` | Declares XML version and character encoding | Yes |
| `<!DOCTYPE ectd:ectd SYSTEM "...">` | References the DTD for validation | Yes |
| `xmlns:ectd` | eCTD namespace declaration | Yes |
| `xmlns:xlink` | XLink namespace for hyperlinks | Yes |
| `dtd-version` | Specifies DTD version (3.2 for v3.2.2) | Yes |
Root Element Requirements
The root element must include:
Namespace requirements:
xmlns:ectd="http://www.ich.org/ectd"- Required for all eCTD elementsxmlns:xlink="http://www.w3.org/1999/xlink"- Required for document referencesdtd-version="3.2"- Must match the referenced DTD version
eCTD Leaf Elements: The Building Blocks of XML Backbone
Leaf elements are the fundamental building blocks of the eCTD XML backbone. Each leaf represents a single document in the submission and contains all metadata necessary for validation, navigation, and lifecycle management.
Leaf Element Anatomy
Leaf Element Attributes Reference
| Attribute | Description | Required | Values |
|---|---|---|---|
| `ID` | Unique identifier for the document | Yes | Alphanumeric, unique within submission |
| `operation` | Lifecycle action for this document | Yes | new, replace, append, delete |
| `checksum` | MD5 hash of the file content | Yes | 32-character hexadecimal string |
| `checksum-type` | Hash algorithm used | Yes | md5 (only valid value) |
| `modified-file` | Original leaf ID when replacing | Conditional | ID of leaf being modified |
| `xlink:href` | Relative path to the document | Yes | Valid relative path from sequence root |
Leaf ID Best Practices
The leaf ID must be unique within the entire submission lifecycle, not just the current sequence:
Recommended leaf ID conventions:
| Module | Pattern | Example |
|---|---|---|
| Module 2 | `m2-[section]-[number]` | `m2-23-qos-001` |
| Module 3 | `m3-[substance/product]-[section]-[number]` | `m3-s-stability-001` |
| Module 4 | `m4-[study-type]-[number]` | `m4-tox-repeat-001` |
| Module 5 | `m5-[study-id]-[doc-type]` | `m5-study001-csr` |
Leaf ID rules:
- Must start with a letter (not a number)
- Can contain letters, numbers, hyphens, and underscores
- No spaces or special characters
- Maximum 128 characters (recommended under 64)
- Must remain consistent across sequences when referencing the same document
Using inconsistent leaf IDs across sequences is one of the most common causes of lifecycle operation failures. Establish a leaf ID naming convention before your first submission and maintain it throughout the product lifecycle.
Create a master leaf ID registry spreadsheet documenting every leaf ID, its module/section, first appearance sequence, and all subsequent lifecycle operations. This prevents duplicate IDs and makes it easy to verify modified-file references are correct when using replace operations.
Regional.xml: Region-Specific eCTD XML Requirements
The regional.xml file contains Module 1 content and region-specific metadata. Unlike index.xml, regional files vary significantly between regulatory authorities.
Regional XML File Naming by Agency
| Region | File Name | DTD Reference |
|---|---|---|
| FDA (US) | `us-regional.xml` | `us-regional-v3-0.dtd` |
| EMA (EU) | `eu-regional.xml` | `eu-regional.dtd` |
| Health Canada | `ca-regional.xml` | `ca-regional.dtd` |
| PMDA (Japan) | `jp-regional.xml` | `jp-regional.dtd` |
| TGA (Australia) | `au-regional.xml` | `au-regional.dtd` |
FDA us-regional.xml Structure
FDA Application Type Codes
The us-regional.xml requires specific application type codes:
| Code | Application Type | Description |
|---|---|---|
| `nda` | New Drug Application | Original NDA submission |
| `anda` | Abbreviated New Drug Application | Generic drug application |
| `bla` | Biologics License Application | Biologic product application |
| `ind` | Investigational New Drug | Clinical trial application |
| `dmf` | Drug Master File | Manufacturing information file |
| `pmsr` | Post-Marketing Safety Report | Safety update reporting |
FDA Submission Type Codes
| Code | Submission Type | Usage |
|---|---|---|
| `orig` | Original Application | Initial submission |
| `efficacy-suppl` | Efficacy Supplement | New indication |
| `manuf-suppl` | Manufacturing Supplement | CMC changes |
| `labeling-suppl` | Labeling Supplement | Labeling changes |
| `safety-suppl` | Safety Supplement | Safety updates |
| `annual-report` | Annual Report | IND/NDA annual reports |
| `amendment` | Amendment | Pre-approval amendments |
EMA eu-regional.xml Structure
Regional XML Comparison Table
| Element | FDA (US) | EMA (EU) | Health Canada |
|---|---|---|---|
| Root element | `us:us-regional` | `eu:eu-regional` | `ca:ca-regional` |
| Application ID | `us:application-number` | `eu:procedure-number` | `ca:control-number` |
| Module 1 structure | Sections 1.1-1.16 | Sections 1.0-1.10 | Sections 1.0-1.7 |
| Forms section | `m1-1-forms` | `m1-2-application-form` | `m1-1-forms` |
| Labeling section | `m1-14-labeling` | `m1-3-pi` | `m1-3-1-product-monograph` |
| DTD version | v3.0 | v3.0 | v3.0 |
eCTD DTD Specifications: Technical Reference
The Document Type Definition (DTD) specifies the valid structure, elements, and attributes for eCTD XML files. Understanding DTD specifications is essential for troubleshooting validation errors.
ICH eCTD DTD Versions
| Version | Release | Status | Key Changes |
|---|---|---|---|
| ich-ectd-2-0.dtd | 2005 | Obsolete | Initial stable release |
| ich-ectd-3-0.dtd | 2008 | Legacy | Added lifecycle operations |
| ich-ectd-3-2.dtd | 2016 | Current Standard | Refined element structure |
| eCTD v4.0 XSD | 2024 | Implementing | Schema-based validation |
DTD Location Requirements
DTD files must be placed in the util/dtd folder within each sequence:
Key DTD Element Definitions
The ich-ectd-3-2.dtd defines the valid elements for index.xml:
Module container elements:
Leaf element definition:
DTD Validation Rules
| Rule | Requirement | Error if Violated |
|---|---|---|
| Element order | Elements must appear in DTD-specified order | `Element out of order` |
| Required attributes | ID, operation, checksum, xlink:href always required | `Missing required attribute` |
| Attribute values | operation must be new/replace/append/delete | `Invalid attribute value` |
| ID uniqueness | Each ID must be unique within document | `Duplicate ID value` |
| IDREF validity | modified-file must reference existing ID | `Invalid IDREF` |
eCTD Lifecycle Operations: new, replace, append, delete
Lifecycle operations define how documents change across submission sequences. Proper use of lifecycle operations is critical for maintaining submission integrity and regulatory compliance.
Operation Definitions and Usage
| Operation | Purpose | When to Use | modified-file Required |
|---|---|---|---|
| new | First appearance of document | Initial sequence or new content | No |
| replace | Replaces entire document | Updated version of existing doc | Yes |
| append | Adds content to existing doc | Additional data for same topic | Yes |
| delete | Removes document from submission | Withdrawn or superseded content | Yes |
Operation Examples with XML
New operation (initial submission):
Replace operation (updated document):
Delete operation (document withdrawal):
Lifecycle Operation Best Practices
Operation selection criteria:
| Scenario | Correct Operation | Incorrect Operation |
|---|---|---|
| First time document appears | `new` | replace (nothing to replace) |
| Document content updated | `replace` | new (loses history) |
| Additional stability data | `append` or `new` leaf | replace (overwrites existing) |
| Document no longer relevant | `delete` | Just omitting (ghost reference) |
| Correcting title only | Keep same with correct title | new (breaks traceability) |
Using `new` instead of `replace` for updated documents breaks the submission audit trail. Regulatory agencies track document history using the modified-file reference chain - breaking this chain can raise data integrity questions.
Sequence-Based Lifecycle Tracking
Across multiple sequences, the lifecycle chain must be maintained:
This creates an auditable chain: spec-001 -> spec-002 -> spec-003
Before submitting a multi-sequence eCTD, trace the full lifecycle chain for each document by following the modified-file references backwards from the final sequence. Any broken link in the chain is a critical error that will likely be caught by the gateway. Regulatory agencies audit these chains during reviews to verify data integrity.
MD5 Checksum in eCTD: File Integrity Verification
The MD5 checksum is a critical component of the eCTD XML backbone that ensures file integrity throughout the submission process.
What Is the MD5 Checksum?
The MD5 (Message-Digest Algorithm 5) checksum is a 128-bit hash value that uniquely identifies file content. Any modification to a file - even a single byte change - produces a completely different checksum.
MD5 checksum characteristics:
- 32-character hexadecimal string
- Deterministic (same file always produces same checksum)
- One-way (cannot reverse-engineer file from checksum)
- Collision-resistant (different files produce different checksums)
Checksum Format in eCTD XML
Checksum Validation Process
| Step | Process | Validation Check |
|---|---|---|
| 1 | Publisher calculates MD5 of PDF file | Store in leaf checksum attribute |
| 2 | Gateway receives submission | Recalculates MD5 of each file |
| 3 | Gateway compares checksums | If mismatch, reject submission |
| 4 | Reviewer accesses document | Integrity verified via checksum |
Common Checksum Errors and Prevention
Error causes:
| Cause | Description | Prevention |
|---|---|---|
| Post-calculation modification | File edited after checksum generated | Lock files immediately after publishing |
| Antivirus modification | AV software modifies file during scan | Disable real-time scanning during assembly |
| Compression issues | ZIP/unzip alters file content | Verify after decompression |
| Character encoding | Line ending differences (CR/LF) | Standardize on UTF-8 |
| Publishing tool error | Tool calculates incorrectly | Verify with independent tool |
Verification command (cross-platform):
Common eCTD XML Validation Errors and Solutions
XML validation errors are the leading cause of eCTD gateway rejections. Understanding common errors and their solutions enables faster troubleshooting and prevention.
Top 10 eCTD XML Validation Errors
| Rank | Error | Frequency | Impact | Typical Fix Time |
|---|---|---|---|---|
| 1 | Schema/DTD validation failure | 23% | Critical | 2-4 hours |
| 2 | Invalid leaf operation | 15% | Critical | 1-2 hours |
| 3 | Checksum mismatch | 12% | Critical | 1 hour |
| 4 | Missing required element | 10% | Critical | 1-2 hours |
| 5 | Invalid attribute value | 9% | High | 30 min - 1 hour |
| 6 | Duplicate leaf ID | 8% | High | 1 hour |
| 7 | Broken xlink:href path | 7% | High | 30 min |
| 8 | Invalid modified-file reference | 6% | High | 1-2 hours |
| 9 | Character encoding error | 5% | Medium | 30 min |
| 10 | Namespace declaration error | 5% | Medium | 30 min |
Error 1: Schema/DTD Validation Failure
Symptoms:
- Gateway returns "XML parsing error" or "DTD validation failed"
- Validation tool reports "Element not allowed" or "Invalid content"
Common causes and solutions:
| Cause | Error Message | Solution |
|---|---|---|
| Missing closing tag | `Element not closed` | Add missing `</element>` tag |
| Wrong element order | `Element out of order` | Reorder per DTD specification |
| Invalid nesting | `Element not allowed here` | Check parent-child relationships |
| Wrong DTD version | `DTD not found` | Verify DTD path and version |
Prevention: Use XML-aware editors with real-time validation against the DTD.
Error 2: Invalid Leaf Operation
Symptoms:
- "Invalid operation for leaf" error
- "modified-file reference not found" error
Cause-solution mapping:
| Scenario | Error | Correct Approach |
|---|---|---|
| Using `replace` without prior leaf | No leaf to replace | Use `new` for first appearance |
| Using `new` for updated document | Breaks lifecycle chain | Use `replace` with modified-file |
| modified-file points to wrong ID | Reference not found | Verify ID matches prior sequence |
| Using `delete` without modified-file | Missing reference | Include modified-file attribute |
Error 3: Checksum Mismatch
Symptoms:
- "MD5 checksum does not match" error
- "File integrity verification failed"
Diagnostic steps:
- Regenerate checksum for the file independently
- Compare with value in backbone.xml
- If different, file was modified after publishing
- Check for antivirus, compression, or transfer issues
Error 4: Invalid xlink:href Path
Symptoms:
- "File not found" error
- "Invalid href reference"
Path validation checklist:
| Requirement | Correct | Incorrect |
|---|---|---|
| Relative path | `m3/32-body-data/spec.pdf` | `C:\ectd\m3\32-body-data\spec.pdf` |
| Case sensitivity | `m3/32-body-data/Spec.pdf` if file is Spec.pdf | `m3/32-body-data/spec.pdf` when file is Spec.pdf |
| Forward slashes | `m3/32-body-data/spec.pdf` | `m3\32-body-data\spec.pdf` |
| No spaces | `m3/32-body-data/spec-final.pdf` | `m3/32-body-data/spec final.pdf` |
XML Validation Against Schema: eCTD v4.0 Considerations
eCTD version 4.0 transitions from DTD-based to XSD (XML Schema Definition) validation, introducing more rigorous validation capabilities.
DTD vs. XSD Comparison
| Feature | DTD (v3.2.2) | XSD (v4.0) |
|---|---|---|
| Syntax | Own syntax | XML-based |
| Data types | Limited (CDATA, ID) | Rich types (date, integer, etc.) |
| Validation strength | Basic structure | Structure + content |
| Namespace support | Limited | Full support |
| Extensibility | Difficult | Built-in extension mechanisms |
| Controlled vocabulary | External | Integrated |
eCTD v4.0 XML Structure Changes
Key structural changes in v4.0:
Controlled Vocabulary in v4.0
eCTD v4.0 introduces controlled vocabulary for standardized values:
| Element | v3.2.2 Approach | v4.0 Approach |
|---|---|---|
| Operation | Free text (new, replace) | CV code (1=new, 2=replace) |
| Application type | Regional code | Global CV |
| Submission type | Regional code | Global CV with regional extensions |
| Document type | Title text | Standardized CV code |
Key Takeaways
The eCTD XML backbone is the set of XML files (index.xml and regional.xml) that provide navigation, metadata, and lifecycle management for Electronic Common Technical Document submissions. The backbone creates a hyperlinked table of contents using leaf elements that define each document's location, checksum, and lifecycle operation. All major regulatory agencies (FDA, EMA, Health Canada, PMDA) require valid XML backbone files for eCTD submission acceptance.
Key Takeaways
- The eCTD XML backbone consists of index.xml and regional.xml files that together create the navigable structure, metadata layer, and lifecycle management system for regulatory submissions. XML validation errors account for 23% of gateway rejections - the largest single error category.
- Leaf elements are the building blocks of eCTD XML containing unique IDs, lifecycle operations (new, replace, append, delete), MD5 checksums, and file path references. Consistent leaf ID conventions across sequences are essential for maintaining audit trails.
- Regional XML files vary significantly between agencies with FDA using us-regional.xml (sections 1.1-1.16), EMA using eu-regional.xml (sections 1.0-1.10), and each requiring region-specific DTD validation and application type codes.
- DTD validation ensures XML structural compliance while MD5 checksums verify file integrity. Both validation layers must pass for gateway acceptance. eCTD v4.0 introduces XSD-based validation with richer data typing and integrated controlled vocabularies.
- Lifecycle operations must follow strict rules where `replace` requires modified-file references to prior leaf IDs, creating an auditable chain across sequences. Breaking this chain raises data integrity concerns during regulatory review.
- ---
Next Steps
Understanding the eCTD XML backbone is essential for regulatory submission success, but manually verifying XML structure across thousands of documents is error-prone and time-consuming. XML validation errors remain the leading cause of gateway rejections.
Eliminate XML backbone errors before submission. Assyro's AI-powered platform validates your eCTD XML structure against all ICH M8 and regional DTD specifications, checking leaf elements, lifecycle operations, checksums, and cross-references in real-time during publishing - not just at final assembly.
See How Assyro Catches XML Errors Before FDA Does - Request a Demo
