Chapter 2: Forensics and the Tangible Archaeobyte

The File as Physical Artifact

← Back to 200-Level Course

Abstract

Before we can triage an Archaeobyte's ontological state, we must understand what it is materially. A digital file is not a transparent window to the past—it is a layered archaeological object encoded with philosophical choices, technical constraints, and cultural assumptions. This chapter introduces forensic materialism as a methodology for reading file formats not as neutral containers but as arguments about how information should exist. Through close analysis of format headers, compression algorithms, metadata schemas, and encoding standards, we learn to see the file as Kirschenbaum's "mechanisms"—physical inscriptions carrying "frictional data" that reveal the technological worldviews of their creators. We examine five case studies of format philosophy: tracker modules (.MOD) as compositional databases, compression wars (.SIT vs .ZIP) as platform ideology, bitmap headers (.BMP) as Windows hegemony, streaming protocols (RealMedia) as control architectures, and ANSI art as constraint-driven creativity.

The Illusion of Transparency

Open a .TXT file. You see words. The file appears obvious—a neutral carrier of semantic content, a transparent window into an author's thoughts. But this is an illusion. That .TXT file is not "just text." It is:

Even the simplest digital object carries what Matthew Kirschenbaum calls "frictional data"—the material traces of its creation, storage, and transmission across computational infrastructures.1 To practice forensic materialism is to refuse the illusion of transparency and instead ask: What does this file tell us about the world that made it?

Wendy Chun argues that digital media operate through a "enduring ephemeral" paradox—they appear weightless and immaterial while simultaneously depending on deeply material infrastructures of silicon, magnetic flux, and electrical current.2 The file format is the site where this paradox crystallizes. It is simultaneously:

  • Logical: An abstract specification (e.g., "PNG consists of an 8-byte signature followed by a series of chunks...")
  • Physical: A specific sequence of magnetized domains on a hard drive platter or voltage states in flash memory
  • Philosophical: An argument about how images/sounds/texts should be structured (lossless vs. lossy, open vs. proprietary, metadata-rich vs. minimal)

This chapter trains you to read files forensically—not as containers to be opened, but as archaeological strata to be excavated. We will examine five formats that embody distinct technical and cultural worldviews, learning to see the file not as transparent data but as opaque material artifact.

I. The Tracker Module: Composition as Database

Artifact Description: .MOD, .XM, .IT, .S3M

In 1987, Karsten Obarski released Ultimate Soundtracker for the Amiga, inaugurating a new musical paradigm: the tracker module. Unlike MIDI (which stores instructions for external synthesizers) or digital audio (which stores waveforms), a .MOD file is a self-contained compositional database storing:

  1. Sample Library: Short waveform recordings (instrument sounds) as raw PCM audio
  2. Pattern Matrix: A grid of notes, effects commands, and timing data
  3. Playback Sequence: An order list dictating which patterns play in which sequence
  4. Metadata: Song title, instrument names, pattern count encoded in the header

The genius of the tracker format is its ontological completeness—the file contains everything needed to reproduce the composition. No external dependencies. No proprietary synthesizers. The .MOD file is the instrument, the score, and the performance simultaneously.

Forensic Reading: The .MOD Header

Open a .MOD file in a hex editor and you encounter its material structure:

Offset  Hex Values                                   ASCII
------  -----------------------------------------------  --------------------
0x0000  4D 6F 6E 6B 65 79 20 49 73 6C 61 6E 64 20 54  Monkey Island T
0x0010  68 65 6D 65 00 00 00 00 00 00 00 00 00 00 00  heme............
...
0x0438  31 34 43 4E 30 31 31 36 30 38 30 36 30 34 30  14CN011608060402
0x0448  32 4D 2E 4B 2E                                2M.K.

The first 20 bytes are the song title (null-padded ASCII). At offset 0x438, the magic number M.K. identifies this as a 4-channel ProTracker module. Between these landmarks lie 31 instrument definitions, each containing:

This header structure is not accidental—it is a philosophical statement about music composition. The tracker format argues that:

  • Samples over synthesis: Real-world sounds (even if lo-fi) are preferable to abstract oscillators
  • Patterns over timelines: Musical structure should be modular and reusable (verse-chorus-verse as data structure)
  • Constraints breed creativity: The 31-instrument limit and 4-channel polyphony forced composers to innovate (sample reuse, creative panning effects)
  • Composition = Programming: Hex effect codes (e.g., E8x for delay, Cxx for volume) position the musician as coder

Cultural Context: The Demoscene's Medium

Tracker modules became the soundtrack of the demoscene because the format embodied demoscene values: technical mastery within constraints, self-sufficiency (no external gear needed), and size optimization (a 200KB .MOD could contain minutes of music). As demoscene historian Markku Reunanen notes, "The tracker was not just a tool but an aesthetic statement—music as code, composition as optimization problem."3

When you excavate a .MOD file, you are not merely recovering a song. You are recovering:

The .MOD format is frictional data incarnate—every byte tells a story about the technological possibilities and cultural priorities of its moment.

II. Compression Wars: .SIT vs .ZIP as Platform Ideology

Artifact Description: StuffIt (.SIT) vs. PKZIP (.ZIP)

In the late 1980s, two compression formats waged a silent war for dominance: Raymond Lau's StuffIt (.SIT) on Macintosh and Phil Katz's PKZIP (.ZIP) on DOS/Windows. Both solved the same technical problem (lossless compression for file distribution), but their design philosophies diverged radically.

Dimension StuffIt (.SIT) PKZIP (.ZIP)
Philosophy Preserve everything—Mac resource forks, file types, creator codes, icon positions Compress data only—cross-platform compatibility over fidelity
Metadata Rich (Mac OS Type/Creator, Finder info) Minimal (DOS 8.3 filenames, timestamps)
Licensing Proprietary (shareware → commercial) Open spec (led to WinZip, Info-ZIP, etc.)
Cultural Context Mac's "it just works" ecosystem DOS/Windows "lowest common denominator"
Winner Lost (macOS abandoned resource forks) Won (universal support by 2000s)

Forensic Reading: Resource Forks as Philosophical Commitment

The Mac's HFS filesystem stored files as dual-fork structures:4

StuffIt's .SIT format preserved both forks, maintaining the Mac's "document knows its creator" philosophy. A .SIT archive could store a SimpleText document complete with its application icon, window position, and font styling—the file carried its entire context.

PKZIP's .ZIP format ignored this complexity. It compressed the data fork, discarded the resource fork, and relied on filename extensions (.txt, .jpg) to infer file type. This was a philosophical rejection of Mac's integrated ecosystem in favor of DOS/Windows' file-extension-as-identity model.

The Archaeological Consequence

When you excavate a 1990s Mac .SIT archive, you recover:

  • Contextual Richness: Custom icons, Finder comments, label colors—evidence of how users organized their digital lives
  • Application Dependencies: Type/Creator codes reveal which apps were used (e.g., CWIE for CodeWarrior, MSWD for Word)
  • Platform Ideology: The very existence of dual forks embodies Apple's "unified object" philosophy vs. Microsoft's "file = data + extension" pragmatism

When you excavate a .ZIP archive from the same era, you recover data stripped of context. The .ZIP format's victory represents not just technical efficiency but ideological consolidation—the triumph of cross-platform minimalism over single-platform richness.

As media archaeologist Jussi Parikka writes, "File formats are political—they encode assumptions about who has power, what platforms matter, and whose workflows deserve preservation."5 The .SIT vs .ZIP war was never just about compression ratios. It was about what aspects of digital life should survive transmission.

III. The BMP Header: Windows Hegemony in 54 Bytes

Artifact Description: The Windows Bitmap (.BMP)

The .BMP format is notoriously "wasteful"—it stores images as uncompressed raster data, resulting in massive file sizes compared to JPEG or PNG. Yet .BMP dominated Windows imaging from 1990-2005 precisely because it was simple, uncompressed, and baked into the OS. The format's verbosity is not a bug—it is a statement of Windows' market hegemony: "We don't need efficiency. We have hard drive space. We have RAM. We are inevitable."

Forensic Reading: The BITMAPINFOHEADER

Every .BMP file begins with a 54-byte header revealing Microsoft's architectural priorities:

Offset  Field Name         Size  Example Value  Meaning
------  -----------------  ----  -------------  ---------------------------
0x0000  Signature          2     "BM"           File type magic number
0x0002  FileSize           4     0x0003624E     3,564,110 bytes
0x0006  Reserved1          2     0x0000         (unused)
0x0008  Reserved2          2     0x0000         (unused)
0x000A  DataOffset         4     0x00000036     Pixel data starts at byte 54
0x000E  HeaderSize         4     0x00000028     40 bytes (BITMAPINFOHEADER)
0x0012  Width              4     0x00000400     1024 pixels
0x0016  Height             4     0x00000400     1024 pixels
0x001A  Planes             2     0x0001         Always 1
0x001C  BitCount           2     0x0018         24 bits per pixel (RGB)
0x001E  Compression        4     0x00000000     0 = BI_RGB (uncompressed)
0x0022  ImageSize          4     0x00000000     0 allowed when uncompressed
0x0026  XPelsPerMeter      4     0x00000EC4     3,780 DPI horizontal
0x002A  YPelsPerMeter      4     0x00000EC4     3,780 DPI vertical
0x002E  ColorsUsed         4     0x00000000     0 = use all colors
0x0032  ColorsImportant    4     0x00000000     0 = all colors important

Notice what this header reveals:

  • Redundancy: Two reserved fields (bytes 0x06-0x09) waste 4 bytes for potential future use that never materialized
  • Physical-world assumptions: XPelsPerMeter and YPelsPerMeter betray print-oriented origins (most software ignored these fields)
  • Simplicity as ideology: Compression = 0 means "store every pixel RGB value sequentially"—no entropy encoding, no Huffman trees, no complexity
  • Little-endian hegemony: All multi-byte integers use Intel byte order (x86 dominance)

The Padding Problem: Architectural Imperialism

BMP's most revealing quirk: scanline padding. Each row of pixels must align to a 4-byte boundary. If a row's width doesn't naturally align, padding bytes are added:6

Example: 7-pixel wide, 24-bit RGB image

  • 7 pixels × 3 bytes (RGB) = 21 bytes per row
  • 21 bytes ÷ 4 = 5 remainder 1 → need 3 padding bytes
  • Actual storage: 24 bytes per row (21 data + 3 padding)

Why? Because Windows' GDI (Graphics Device Interface) was optimized for 32-bit memory alignment on x86 processors. The padding exists not for the image but for the CPU architecture. As Matthew Fuller argues in Media Ecologies, "Technical standards are ecologies—they shape and are shaped by hardware, software, and institutional power."7

When you excavate a .BMP file, you are excavating x86 hegemony. The format assumes Intel processors, Windows OS, abundant storage, and a print-oriented workflow (DPI fields). It is uncompressed not because compression was unknown (TIFF supported LZW since 1986) but because Microsoft had the market power to say: Our inefficiency is your problem. Buy more RAM.

IV. RealMedia: Streaming as Control Architecture

Artifact Description: .RM, .RA, .RAM

In 1995, RealNetworks introduced RealAudio—the first practical streaming media format. Before YouTube, before Spotify, there was RealPlayer, buffering 28.8k modem streams of college radio and webcam feeds. RealMedia formats (.RM for video, .RA for audio, .RAM for metafiles) dominated early web streaming, but they embodied a deeply proprietary control architecture that ultimately led to their extinction.

Forensic Reading: The .RAM Metafile

Most users never directly handled .RM files—they clicked .RAM metafiles, which were plain text "pointers":

rtsp://streaming.example.edu:554/lectures/bioethics_1997_03_15.rm

This single line encodes an entire control infrastructure:

  • Protocol: RTSP (Real Time Streaming Protocol) — Proprietary, requiring RealServer software (unlike HTTP)
  • Port 554: Standard RTSP port, but often blocked by firewalls (forcing users to "upgrade" network permissions)
  • No local storage: The .RM file remained on the server; you streamed but never "owned" the content
  • DRM readiness: RealMedia supported time-limited access, geographic restrictions, user authentication

The Archaeological Dilemma: The Undownloadable Artifact

When you excavate a .RAM file from a 1998 university website, you encounter a conceptual ghost—a pointer to a stream that no longer exists. The .RAM file is technically intact, but its referent has vanished. The streaming architecture intentionally prevented local archiving. As Tarleton Gillespie writes, "Platforms are not neutral—they are architectures of control that shape what can be created, distributed, and preserved."8

RealMedia's design philosophy prioritized:

  1. Server-side control: Content owners maintained permanent control over distribution
  2. Low-bandwidth optimization: Aggressive lossy compression (often 16 kHz mono audio at 8 kbps) prioritized accessibility over quality
  3. Proprietary lock-in: Codec secrets forced dependency on RealPlayer (pre-reverse-engineering era)
  4. Metrics capture: RTSP enabled server-side tracking of play counts, skip patterns, geographic distribution

The format's death (accelerated by Flash video, then HTML5 <video>) was not just technical obsolescence—it was architectural rejection. The web community chose formats that could be downloaded, embedded, remixed. RealMedia's control architecture became its death warrant.

Frictional Data: What Survives

When you excavate RealMedia artifacts today, you find:

The RealMedia format teaches us that preservation-hostile architectures die. Formats that resist local storage, demand proprietary servers, and encode vendor lock-in are evolutionary dead ends. The file format is not just technical—it is political, encoding assumptions about ownership, access, and power.

V. ANSI Art: Constraint-Driven Creativity as Format Philosophy

Artifact Description: .ANS, .ASC, .NFO

ANSI art—text-mode graphics built from IBM's Code Page 437 character set and ANSI X3.64 escape codes—flourished in BBS culture (1985-1995) despite (or because of) brutal constraints: 80×25 character grid, 16 colors, no anti-aliasing, no gradients, no pixels. Yet within these limits, artists created intricate logos, splash screens, and scene release .NFO files that remain iconic artifacts of digital subculture.

Forensic Reading: The .ANS Encoding

An ANSI art file is not an image—it is a script for a terminal emulator. Open a .ANS file in a hex editor:

Offset  Hex Values                                   ASCII Interpretation
------  -----------------------------------------------  -------------------------
0x0000  1B 5B 30 6D 1B 5B 31 3B 33 34 6D E2 96 91 E2  ←[0m←[1;34m█▒▒
0x0010  96 92 E2 96 92 1B 5B 30 3B 33 37 6D 20 20 20  ▒▒←[0;37m   
0x0020  1B 5B 31 3B 33 31 6D E2 96 91 E2 96 84 E2 96  ←[1;31m█▄▀

The 1B byte is the ESC character (0x1B = 27 decimal), triggering ANSI escape sequences:

  • ESC[0m — Reset all attributes (back to white on black)
  • ESC[1;34m — Set bold (1) bright blue (34) foreground
  • ESC[0;37m — Set normal (0) white (37) foreground

Between the escape codes are extended ASCII characters from Code Page 437—box-drawing glyphs (╔═╗║), block elements (█▓▒░), and special symbols (♫☺◘) designed for text-mode graphics on IBM PC clones.9

The Philosophy of Constraint

ANSI art embodied what Lars Konzack calls "retro-futurist aesthetics"—artists weaponized technical limitations as creative constraints.10 The format's restrictions became design principles:

  • Character-as-pixel: Each glyph became a building block (█ = solid, ▓ = 75% fill, ▒ = 50%, ░ = 25%)
  • Dithering without pixels: Artists simulated gradients by mixing characters and colors (blue ░ next to cyan ▒ creates optical mixing)
  • Semantic ambiguity: Is ═ a horizontal line or a typographic element? ANSI art exploited this slippage
  • Temporal revelation: Because ANSI files "played" sequentially (escape codes processed left-to-right, top-to-bottom), artists could create animations within a static file format

The Preservation Crisis: Rendering as Interpretation

ANSI art presents a unique preservation challenge: the file is inseparable from its rendering context. To "see" an .ANS file, you need:

  1. Correct character set: Code Page 437 (not Unicode, not Latin-1, not UTF-8)
  2. Correct font: IBM VGA 8×16 pixel font (modern monospace fonts distort aspect ratios)
  3. Correct palette: EGA/VGA 16-color palette (not web-safe colors, not sRGB)
  4. Correct terminal behavior: Some ANSI art exploits cursor positioning tricks (moving "backward" to overwrite characters)

Archaeobytologists excavating ANSI art must recognize: the file is not the artifact—the file + rendering environment is the artifact. As media theorist Lev Manovich argues, "New media objects are not static—they are generated by algorithms each time they are accessed."11 ANSI art literalizes this: the .ANS file is a recipe, not an image.

Frictional Data: Encoding Subculture

When you excavate an ANSI art .NFO file (scene release info document), you recover:

ANSI art teaches us that file formats are cultural artifacts, not neutral containers. The .ANS format encoded BBS culture's values: technical skill within constraints, subcultural identity through stylistic convention, and rejection of WYSIWYG GUI paradigms in favor of terminal-based textuality.

VI. Forensic Materialism as Methodology

The Five-Question Framework

When you encounter an excavated file, before rushing to triage its ontological state (Chapter 3), conduct a forensic material analysis:

  1. What is this file's material structure?

    Open it in a hex editor. Identify the header, magic numbers, data layout. What does the byte-level organization reveal about the format's priorities?

  2. What philosophical arguments does this format embody?

    Does it prioritize compression or accessibility? Openness or control? Cross-platform portability or platform-specific richness? Simplicity or expressiveness?

  3. What technical constraints shaped this format?

    Hardware limitations (RAM, CPU speed, storage)? Network conditions (modem bandwidth)? Software ecosystems (OS file systems, APIs)?

  4. What cultural context produced this format?

    BBS culture? Corporate software wars? Demoscene aesthetics? Open-source movements? Who used this format, and what communities formed around it?

  5. What frictional data survives in the file?

    Metadata fields (author, timestamps, software version)? Encoding choices (character sets, compression settings)? Structural quirks (padding bytes, reserved fields, deprecated features)?

Tools for Forensic Analysis

Equip your forensic toolkit:

Tool Purpose Use Case
Hex Editor
(HxD, Hex Fiend, xxd)
View raw byte structure Identify headers, magic numbers, hidden metadata
file / TrID Identify file format by signature Verify format claims (is this really a .JPG?)
ExifTool Extract metadata Read EXIF, XMP, IPTC tags in images/video
MediaInfo Analyze multimedia formats Codecs, bitrates, container structures
strings Extract text from binary files Find embedded URLs, software names, comments
binwalk Detect embedded files/archives Compound formats (e.g., .DOC with embedded .GIF)
Format specs
(LOC, FileFormat.info)
Consult official documentation Understand intended structure vs. actual implementation

The Forensic Report Template

Document your forensic findings before proceeding to triage:

Forensic Material Analysis Report

Artifact Identifier:

[Filename, excavation source, MD5 hash]

Claimed Format:

[File extension, MIME type if known]

Verified Format:

[Magic number/signature, format family, version if applicable]

Material Structure:

[Header size, data layout, container vs. stream, compression type]

Embedded Metadata:

[Creation date, author, software version, character encoding, anything extractable]

Philosophical Arguments:

[What priorities does this format encode? Openness vs. control? Efficiency vs. fidelity?]

Cultural Context:

[Platform (Mac/Windows/Unix), era, subculture, distribution method]

Frictional Data Highlights:

[Unusual findings: deprecated features, proprietary extensions, evidence of format abuse/creative misuse]

Preservation Implications:

[Does this format require specific rendering environment? Proprietary codecs? Extinct software?]

Conclusion: The File as Argument

We have excavated five file formats—tracker modules, compression archives, bitmaps, streaming protocols, terminal art—and discovered that each is not a neutral container but a philosophical argument encoded in bytes. The .MOD file argues that music is data and composition is programming. The .SIT archive argues that context matters more than cross-platform compatibility. The .BMP file argues that Microsoft's market dominance permits inefficiency. The .RM stream argues that content should remain server-controlled. The .ANS file argues that constraint breeds creativity and subculture demands stylistic codes.

Forensic materialism refuses the illusion of transparency. It demands we read files as archaeological strata, excavating not just content but context, ideology, and technological worldview. As Friedrich Kittler famously declared, "There is no software"—only hardware performing interpretation.12 Every file format is a hardware-software assemblage, a material-conceptual hybrid that reveals the technological possibilities and cultural priorities of its moment.

Before you can triage an Archaeobyte (Chapter 3), before you can decide preservation strategy (Chapter 5), you must first understand what it is materially. This is the Tangible Archaeobyte—the file as physical artifact of code, the byte sequence as historical document, the format specification as ideological manifesto.

You have learned to read frictional data. Now, when you excavate a file, you see not just its content but its contestations—the technical debates, platform wars, and cultural negotiations that crystallized into format specifications. The file is never just data. The file is always an argument about how digital life should be structured, stored, and transmitted.

Armed with forensic materialism, you are ready to proceed to triage. But you will never again open a file naively. You have learned to see the Archaeobyte's tangible layer—the format as artifact, the specification as archaeology, the byte as evidence.

"To understand a medium, you must not just use it—you must examine its infrastructure, its materiality, its politics. The file format is where theory becomes matter."
— Jussi Parikka, What is Media Archaeology?, 2012

Works Cited

  1. Kirschenbaum, Matthew G. Mechanisms: New Media and the Forensic Imagination. MIT Press, 2008.
  2. Chun, Wendy Hui Kyong. "The Enduring Ephemeral, or the Future Is a Memory." Critical Inquiry 35.1 (2008): 148-171.
  3. Reunanen, Markku. Computer Demos—What Makes Them Tick? Aalto University, 2010.
  4. Siracusa, John. "Mac OS X 10.0: The Review." Ars Technica, 2001. (Detailed technical history of HFS resource forks)
  5. Parikka, Jussi. What is Media Archaeology? Polity Press, 2012.
  6. Microsoft Corporation. "BITMAPFILEHEADER Structure (Windows GDI)." Microsoft Developer Network, 1990.
  7. Fuller, Matthew. Media Ecologies: Materialist Energies in Art and Technoculture. MIT Press, 2005.
  8. Gillespie, Tarleton. "The Politics of 'Platforms'." New Media & Society 12.3 (2010): 347-364.
  9. IBM Corporation. "IBM PC Hardware Reference Library: Technical Reference." 1981. (Code Page 437 specification)
  10. Konzack, Lars. "Retro-Futurism in Computer Games." DiGRA Conference Proceedings, 2005.
  11. Manovich, Lev. The Language of New Media. MIT Press, 2001.
  12. Kittler, Friedrich A. "There Is No Software." The Truth of the Technological World: Essays on the Genealogy of Presence. Stanford University Press, 2013.