Abstract
After excavation, triage, and custodial clearance, the Archaeobytologist faces a fundamental technical question: how do we preserve this artifact? This chapter examines two competing preservation paradigms—emulation (recreating the original execution environment) and migration (translating artifacts into contemporary formats)—alongside hybrid approaches that navigate the tensions between authenticity and accessibility. We explore format obsolescence theory, the Ship of Theseus problem in digital preservation, practical decision frameworks for preservation strategy selection, and case studies demonstrating each approach's trade-offs. The goal is not to declare a single "correct" method, but to equip practitioners with the conceptual tools to make informed, context-sensitive preservation decisions.
The Archaeobyte Lives—Now What?
You have done the work. You excavated the artifact from the digital strata, diagnosed its ontological state with your three-axis Microscope, and cleared it through the Custodial Filter. The Archaeobyte sits before you—rescued from oblivion, ethically approved, ready for preservation. And yet, a new crisis emerges: what form should this preservation take?
In traditional archaeology, preservation is often straightforward. A clay pot can be cleaned, stabilized with consolidants, and displayed in a climate-controlled case. The pot remains, materially, a pot. But digital artifacts exist at the volatile intersection of hardware, software, and data—what preservation theorist Jeff Rothenberg calls "a dance between content and machine."1 When the machine stops dancing, the content becomes unintelligible.
Consider a 1985 Lotus 1-2-3 spreadsheet stored on a 5.25" floppy disk. To preserve this artifact, we face cascading dependencies:
- The physical disk (magnetic medium vulnerable to bit rot)
- The disk drive (hardware requiring vintage IBM PC architecture)
- The operating system (MS-DOS 3.x or compatible)
- The application (Lotus 1-2-3 v2.01 with its proprietary .WKS file format)
- The contextual ecosystem (printer drivers, macro libraries, user documentation)
Each dependency layer introduces vulnerability. As David Bearman notes, "Digital preservation is not a product but a process—one that must be continuously enacted against the tide of technological obsolescence."2 This is the paradox of digital preservation: formats are mortal, but meaning must persist.
Two competing philosophies have emerged to address this challenge. Emulation argues we should recreate the original machine—build a virtual IBM PC, install MS-DOS, run Lotus 1-2-3 exactly as it was in 1985, preserving not just data but the authentic experience of interaction. Migration argues we should translate the spreadsheet into a contemporary format like .XLSX or .CSV, accepting some loss of authenticity in exchange for guaranteed readability on modern systems.
Neither approach is universally correct. Each carries philosophical assumptions about what constitutes "preservation"—what matters most, what can be sacrificed, who the future audience is. This chapter provides the conceptual foundation and practical frameworks to navigate these decisions. We begin with format obsolescence theory, explore the emulation-migration spectrum, examine hybrid strategies, and conclude with a preservation decision matrix calibrated to your artifact's diagnostic profile.
"To preserve is to choose what survives. To choose is to declare what matters."
I. The Problem of Format Obsolescence
Formats as Sociotechnical Contracts
A file format is more than a technical specification—it is a sociotechnical contract between creator and future reader, mediated by software. When you save a document as .DOCX, you are making an implicit bet that Microsoft Word (or compatible software) will exist in the future to render that document. As Matthew Kirschenbaum observes, "File formats are stabilized agreements about how to encode meaning in machine-readable form."3
But agreements expire. Vendors go bankrupt. Standards bodies dissolve. Software architectures evolve. What happens when the contract can no longer be honored? Format obsolescence occurs when:
- Software Extinction: No compatible application exists to parse the format (e.g., Aldus PageMaker .PM6 files after Adobe discontinued the product line)
- Hardware Dependency: The format requires obsolete hardware to access (e.g., Iomega Zip disks)
- Documentation Loss: The format specification is proprietary and no longer available (e.g., undocumented features in early QuickTime codecs)
- Ecosystem Collapse: The format relies on external resources that no longer exist (e.g., Flash files calling remote APIs, multiplayer game servers)
The UK National Archives estimates that 40% of digital government records from the 1990s are already at risk of format obsolescence, with proprietary database formats and email systems particularly vulnerable.4 This is not a theoretical problem—it is an ongoing crisis.
The Spectrum of Format Risk
Not all formats are equally vulnerable. The Library of Congress Sustainability of Digital Formats project identifies key risk factors:5
| Risk Factor | Low Risk Example | High Risk Example |
|---|---|---|
| Documentation | PDF/A (ISO 19005 standard) | Proprietary game saves |
| Adoption | JPEG (universal support) | Amiga IFF ILBM |
| Transparency | Plain text (.TXT) | Encrypted containers |
| Dependencies | Standalone HTML | Flash with server calls |
| Self-containment | PNG (metadata in file) | JPEG + sidecar XMP |
This risk assessment informs preservation strategy. Low-risk formats may only require refreshing (copying to new media) and monitoring (watching for software drift). High-risk formats demand immediate intervention—either emulation to freeze the execution environment or migration to a more sustainable format.
The Ship of Theseus Problem
The ancient paradox haunts digital preservation: if you replace every plank of a ship over time, is it still the same ship? If you migrate a WordPerfect document through six format transformations (WPD → DOC → DOCX → ODT → HTML → PDF), is it still the same document?
Philosopher of technology Don Ihde distinguishes between instrumental preservation (preserving function—"can I still read it?") and hermeneutic preservation (preserving meaning—"does it still mean what it meant?").6 A spreadsheet migrated to CSV preserves the data (instrumental success) but loses formulas, formatting, and macros (hermeneutic loss). An emulated Lotus 1-2-3 session preserves the entire phenomenological experience—loading times, menu navigation, even the CGA color palette—but requires future users to learn obsolete interfaces.
There is no escape from this paradox. Every preservation decision is an act of interpretation—a declaration of what aspects of the artifact constitute its "essential character." The question is not how to avoid choosing, but how to choose consciously and transparently.
II. Strategy One: Emulation (The Museum Approach)
Philosophy and Implementation
Emulation preserves the artifact by recreating its original execution environment in software. Rather than translate the content, we build a virtual machine that mimics the hardware and operating system the artifact was designed for. Jeff Rothenberg, emulation's chief advocate, argues: "The only reliable way to preserve digital documents is to preserve the ability to recreate the original environment in which they were created."7
The emulation chain looks like this:
Modern OS (Windows 11/macOS/Linux)
└─ Emulator (DOSBox, QEMU, MAME, Basilisk II)
└─ Guest OS (MS-DOS, System 7, Windows 3.1)
└─ Historical Software (Lotus 1-2-3, HyperCard, Director)
└─ Original Artifact (the .WKS file, .HC stack, .DIR project)
Each layer introduces complexity, but the payoff is authentic preservation. When you open a 1991 HyperCard stack in a Macintosh System 7 emulator, you experience the artifact as its creators intended—complete with pixel-perfect rendering, original sound synthesis, and interaction timing.
Case Study: The Internet Archive's Emulation Infrastructure
The Internet Archive's Historical Software Collection demonstrates emulation at scale. Using Emularity (a JavaScript wrapper for emscripten-compiled emulators), they deliver in-browser experiences of MS-DOS games, early Macintosh software, and arcade cabinets.8 Users can boot The Oregon Trail (1985) or Prince of Persia (1989) directly in their web browser, experiencing pixel-perfect recreations of the original software.
The technical stack:
- Emscripten: Compiles C/C++ emulators (DOSBox, MAME) to WebAssembly
- EM-DOSBox: Browser-based DOS environment
- JSMESS: JavaScript port of MAME (Multiple Arcade Machine Emulator)
- Metadata Layer: JSON manifests describing boot sequences, required files, input mappings
As of 2024, the archive hosts over 100,000 emulated software titles spanning four decades of computing history. This is emulation as public infrastructure—a commitment to preserving not just files but computational experiences.
Advantages of Emulation
-
✓
Authenticity: Preserves the original artifact bit-for-bit, including bugs, quirks, and interaction patterns
-
✓
Format Agnosticism: Works regardless of format opacity—if the original software could run it, the emulator can too
-
✓
Ecosystem Preservation: Captures compound objects (applications + documents + plugins + settings)
-
✓
Research Value: Enables study of software archaeology itself—how did people actually use this tool?
Disadvantages and Limitations
-
✗
Complexity Cascade: Emulators themselves become obsolete, requiring "emulators for emulators" (the Universal Virtual Computer problem)9
-
✗
Performance Overhead: Each emulation layer adds computational cost; nested emulation can be unusably slow
-
✗
Legal Ambiguity: Requires ROM dumps, BIOS images, and historical OS copies (see Chapter 3's copyright test)
-
✗
Accessibility Barriers: Forces future users to learn obsolete interfaces (try explaining floppy disk boot sequences to someone born in 2010)
-
✗
Maintenance Burden: Requires ongoing curation—emulator updates, bug fixes, configuration management
Emulation is the museum approach—it preserves artifacts in their original context, but at the cost of accessibility and long-term sustainability. It is ideal for high-value artifacts where authenticity outweighs usability (interactive fiction, digital art, software as cultural artifact).
III. Strategy Two: Migration (The Translation Approach)
Philosophy and Implementation
Migration preserves the artifact by translating it into a contemporary format. Rather than recreate the past, we bring the artifact into the present, accepting transformation as the price of survival. As digital preservation theorist Adrian Brown argues, "Migration acknowledges that change is inevitable—the question is whether we manage that change deliberately or let it happen through neglect."10
Migration strategies include:
- Format Conversion: Translate to a more sustainable format (WPD → DOCX, TIFF → PNG, QuickTime → MP4)
- Encapsulation: Wrap the original file in a preservation-friendly container (METS, BagIt, WARC)
- Normalization: Extract content to a minimal canonical representation (all word processors → plain text + structural markup)
- Regeneration: Recreate the artifact using contemporary tools (rebuild a website using modern HTML/CSS while preserving visual design)
The goal is not pixel-perfect reproduction, but semantic preservation—maintaining the artifact's informational content and functional purpose even as its material form changes.
Case Study: The UK Web Archive's Migration Pipeline
The UK Web Archive, operated by the British Library, captures over 1 billion UK web pages annually. Given the scale and format diversity (HTML, CSS, JavaScript, Flash, Java applets, video codecs), emulation is impractical. Instead, they employ a normalization-on-ingest migration strategy:11
- Harvest websites using Heritrix web crawler (stores as WARC files)
- Identify file formats using DROID (Digital Record Object Identification)
- Migrate high-risk formats:
- Flash SWF → HTML5 Canvas (via Ruffle emulator or manual reconstruction)
- Java Applets → JavaScript equivalents (where possible)
- Obsolete video codecs → H.264/MP4
- Proprietary fonts → web-safe alternatives with visual similarity scoring
- Preserve both original and migrated versions (with provenance metadata linking them)
- Serve migrated versions by default; offer originals for scholarly research
This hybrid approach balances accessibility (users see working pages) with scholarly integrity (researchers can audit transformations). The system processes ~40TB of web data annually, demonstrating migration at institutional scale.
Advantages of Migration
-
✓
Accessibility: Artifacts open natively on contemporary systems without specialized software
-
✓
Searchability: Normalized formats enable full-text indexing, metadata extraction, computational analysis
-
✓
Integration: Migrated artifacts can participate in modern workflows (embed in documents, cite in databases, mashup with APIs)
-
✓
Legal Clarity: Avoids ROM/BIOS copyright issues; migration often qualifies as fair use transformation
-
✓
Proactive Intervention: Addresses format obsolescence before software becomes unavailable
Disadvantages and Losses
-
✗
Semantic Loss: Each migration discards "inessential" features—but who decides what's essential?
-
✗
Error Accumulation: Serial migration creates a "degradation cascade" (like photocopying a photocopy)
-
✗
Context Collapse: Migrated artifacts lose their original software ecosystem (macros, plugins, settings)
-
✗
Ongoing Commitment: Formats continue to evolve—migration is not "one and done" but a perpetual process
-
✗
Triage Overhead: Requires format risk assessment, conversion validation, quality control for every artifact
Migration is the translation approach—it prioritizes usability and integration over authenticity, accepting that the artifact will change over time. It is ideal for large-scale collections where access outweighs perfect preservation (government documents, email archives, web crawls).
IV. Hybrid Strategies and the Middle Path
Beyond Binary Choices
The emulation-versus-migration debate presents a false dichotomy. In practice, most preservation projects employ hybrid strategies that combine both approaches, tailored to specific use cases and user communities.
Consider three increasingly sophisticated hybrid models:
Model 1: Dual Preservation
Preserve both the original format (for emulation) and a migrated version (for access). The Internet Archive's Historical Software Collection does this: they offer in-browser emulation of MS-DOS games while also providing extracted screenshots, manual PDFs, and gameplay videos in contemporary formats. Users choose their preferred access method based on their needs—researchers use emulation for authentic study, casual users view screenshots.
Cost: Doubles storage and maintenance overhead
Benefit: Serves multiple audiences without compromise
Model 2: Layered Preservation
Create a preservation pyramid with different strategies at different access tiers:12
- Tier 1 (Public): Migrated versions optimized for accessibility (HTML5, PDF/A, MP4)
- Tier 2 (Researchers): Original formats with emulation-on-demand
- Tier 3 (Dark Archive): Bit-level preservation (disk images, raw bitstreams) for future unknown tools
Example: The Library of Congress Web Archive serves normalized HTML to the public, but preserves original WARC files for computational researchers
Model 3: Progressive Enhancement
Start with migration, layer emulation on top only when necessary. This is the approach championed by digital preservation consultancy DPC: migrate by default to reduce maintenance burden, but flag artifacts where migration causes unacceptable loss (interactive media, software as art, legally significant documents where bit-perfect preservation is required).13 These flagged items enter an emulation track with dedicated resources.
Decision Trigger: Migration quality score < 0.85 (significant properties loss > 15%) → escalate to emulation
Format-Specific Strategies
Some formats demand specialized approaches that blend emulation and migration:
| Format Type | Hybrid Strategy |
|---|---|
| Video Games | Emulate gameplay, migrate supporting materials (manuals, screenshots, let's-play videos) |
| Interactive Art | Emulate original + create documentation (artist interviews, behavior descriptions, recreation instructions) |
| Email Archives | Migrate to MBOX/EML + preserve original PST in disk image for forensic analysis |
| Web Archives | Normalize HTML/CSS/JS, embed emulated Flash content via Ruffle, preserve WARC originals |
| Office Documents | Migrate to PDF/A for access, preserve original .DOC/.XLS for macro recovery if needed |
The key insight: preservation strategy should follow from preservation intent. What aspects of the artifact are considered "significant properties"? Who is the intended audience? What resources are available? These questions must be answered before choosing a technical approach.
V. The Preservation Decision Matrix
Mapping Triage State to Strategy
Preservation strategy should align with your Chapter 2 triage diagnosis. The three-axis framework (Technical Legibility, Functional Integrity, Contextual Ecosystem) provides decision inputs:
| Triage State | Technical Legibility | Functional Integrity | Contextual Ecosystem | Recommended Strategy |
|---|---|---|---|---|
| Vivibyte | ✓ YES | ✓ YES | ✓ YES | Monitor + Refresh: Low-risk; watch for format drift but no immediate action needed |
| Umbrabyte (Ecosystem Extinct) |
✓ YES | ✓ YES | ✗ NO | Migration Priority: Content readable; extract before software vanishes. Emulate only if context is research-critical |
| Umbrabyte (Conceptual Ghost) |
⚠ CONDITIONAL | ✗ NO | ✗ NO | Reconstruct + Document: Gather surrounding artifacts (screenshots, descriptions, code fragments). Migration impossible; focus on archaeological reconstruction |
| Umbrabyte (Resurrected Fossil) |
✓ YES | ⚠ CONDITIONAL | ⚠ CONDITIONAL | Dual Preservation: Emulate for authenticity research + migrate for functional access |
| Petribyte | ✗ NO | ✗ NO | ✗ NO | Forensic Recovery Only: Bit-level preservation (disk imaging). Wait for future tools; neither emulation nor migration viable |
The Preservation Strategy Scorecard
When triage state is ambiguous or resources are constrained, use this weighted decision scorecard:
Score each factor from 1-5, then calculate:
Emulation Score = (Authenticity × 3) + (Format Opacity × 2) + (Research Value × 2) − (Scale × 2) − (Maintenance Capacity × 1)
Migration Score = (Access Demand × 3) + (Format Transparency × 2) + (Scale × 2) − (Semantic Loss Risk × 2) − (Ongoing Commitment × 1)
Factor Definitions:
- Authenticity Need: 5 = legally significant/artwork, 1 = informational content only
- Format Opacity: 5 = proprietary/undocumented, 1 = open standard
- Research Value: 5 = unique historical artifact, 1 = one of many similar items
- Access Demand: 5 = public archive, 1 = dark archive
- Format Transparency: 5 = well-documented/parseable, 1 = binary blob
- Scale: 5 = millions of files, 1 = single artifact
- Semantic Loss Risk: 5 = rich interactive media, 1 = plain text
- Maintenance Capacity: 5 = well-staffed institution, 1 = solo practitioner
- Ongoing Commitment: 5 = rapid format evolution, 1 = stable format
Decision Rule: If Emulation Score > Migration Score by 5+ points → Emulate. If Migration Score > Emulation Score by 5+ points → Migrate. If scores within 5 points → Hybrid strategy.
The Preservation Action Report
Just as Chapter 3 required a Custodial Report, Chapter 4 requires a Preservation Action Report documenting your strategy decision. This creates an audit trail for future practitioners and ensures transparency.
Preservation Action Report Template
Artifact Identifier:
[Title, creator, date, format, unique ID]
Triage State:
[Vivibyte/Umbrabyte/Petribyte with three-axis scores]
Chosen Strategy:
[Emulation / Migration / Hybrid / Forensic Recovery Only]
Rationale:
[Why this strategy? What factors were weighted? What trade-offs accepted?]
Implementation Details:
If Emulation: Emulator used, guest OS version, required files (BIOS/ROMs), access method
If Migration: Source format → target format, conversion tools, validation method, quality score
Significant Properties Preserved:
[What aspects of the artifact are guaranteed to survive this strategy?]
Known Losses:
[What will be lost? Why is this loss acceptable?]
Maintenance Schedule:
[When will this strategy be reviewed? What triggers re-evaluation?]
Practitioner:
[Name, organization, date, contact]
Example: GeoCities Vienna (from Chapter 1)
Let's apply the framework to the GeoCities Vienna case study:
Artifact Identifier: GeoCities Vienna neighborhood (Austrian personal homepages, 1996-2009, HTML/GIF/MIDI)
Triage State: Umbrabyte (Ecosystem Extinct) — Technical Legibility: YES, Functional Integrity: CONDITIONAL (broken links), Contextual Ecosystem: NO (GeoCities server architecture gone)
Chosen Strategy: Hybrid (Layered Preservation)
Rationale:
- Scale: 38,000 pages → migration more practical than per-page emulation
- Research value: High (early web culture, non-English internet history)
- Authenticity need: Moderate (scholar interest in "look and feel" but content primacy)
Implementation Details:
- Tier 1 (Public): Migrated to static HTML with broken links patched, MIDI converted to MP3
- Tier 2 (Researchers): Original HTML preserved, viewable via Internet Archive's Wayback Machine (emulated 1998 Netscape Navigator available on request)
- Tier 3 (Dark Archive): Bit-level WARC files stored offline
Significant Properties Preserved: Textual content, visual layout (CSS polyfilled), embedded images, site structure/navigation
Known Losses: JavaScript behaviors (limited in 1990s sites), MIDI timing (converted to static MP3), GeoCities watermarks/ads
Maintenance Schedule: Annual format risk assessment; re-migration if HTML/CSS standards shift significantly (monitoring CSS Working Group)
Practitioner: [Your name], Austrian Web Archive, November 21, 2025
Conclusion: The Operational Anvil
You have now assembled the complete toolkit:
- The Trowel (Chapter 1): You can excavate digital artifacts from obscured strata
- The Microscope (Chapter 2): You can diagnose their ontological state with precision
- The Filter (Chapter 3): You can determine ethical clearance for preservation
- The Anvil (Chapter 4): You can forge preservation strategies that balance authenticity and accessibility
The Anvil is where theory becomes practice, where abstract debates about "what is preservation?" meet the concrete reality of storage budgets, staff time, and user needs. Preservation is not a technical problem with a universal solution—it is a hermeneutic problem requiring contextual judgment.
Emulation preserves the past at the cost of the present. Migration brings artifacts into the present at the cost of the past. Hybrid strategies attempt to serve both gods, but require twice the devotion. There is no escape from trade-offs. The question is not which strategy is "correct," but which trade-offs you—as custodian, not owner—are willing to accept on behalf of future communities.
As digital preservation pioneer Margaret Hedstrom wrote: "We preserve not things but the capacity to make meaning from things."14 Whether that meaning emerges from pixel-perfect emulation or semantically migrated data is secondary to the fact that meaning can still emerge at all. The worst preservation strategy is the one never implemented—the artifact left to rot because we demanded perfection instead of accepting pragmatism.
You are now equipped to act. Excavate, diagnose, clear, and preserve. The Archaeobytes wait. The Anvil is hot. Strike while the iron—and the bits—still hold form.
"Digital preservation is an act of optimism—a belief that the future will care about what we cared about, and that we can build bridges across the chasm of obsolescence to make that caring possible."
— Trevor Owens, The Theory and Craft of Digital Preservation, 2018
Works Cited
- Rothenberg, Jeff. "Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation." Council on Library and Information Resources, 1999.
- Bearman, David. "Reality and Chimeras in the Preservation of Electronic Records." D-Lib Magazine 5.4 (1999).
- Kirschenbaum, Matthew G. Mechanisms: New Media and the Forensic Imagination. MIT Press, 2008.
- Brown, Adrian. Practical Digital Preservation: A How-To Guide for Organizations of Any Size. Facet Publishing, 2013.
- Library of Congress. "Sustainability of Digital Formats: Planning for Library of Congress Collections." Library of Congress, 2024. https://www.loc.gov/preservation/digital/formats/
- Ihde, Don. Technology and the Lifeworld: From Garden to Earth. Indiana University Press, 1990.
- Rothenberg, Jeff. "Using Emulation to Preserve Digital Documents." Koninklijke Bibliotheek, 2000.
- Espenschied, Dragan, et al. "Browser-Based Emulation for Software Preservation." Proceedings of iPres 2013, Lisbon, Portugal.
- Lorie, Raymond A. "The UVC: A Method for Preserving Digital Documents." IBM Research Report RJ 10185, 2002.
- Brown, Adrian. "Selecting File Formats for Long-Term Preservation." Digital Preservation Coalition, 2008.
- Pennock, Maureen. "Web Archiving at the British Library: Current Approaches to Capturing the UK Web." International Internet Preservation Consortium, 2013.
- Caplan, Priscilla. Understanding PREMIS: Preservation Metadata. Library of Congress Network Development and MARC Standards Office, 2009.
- Digital Preservation Coalition. "Digital Preservation Handbook, 2nd Edition." DPC, 2015. https://www.dpconline.org/handbook
- Hedstrom, Margaret. "Digital Preservation: A Time Bomb for Digital Libraries." Computers and the Humanities 31.3 (1997): 189-202.