Abstract
Before we can triage an Archaeobyte's ontological state, we must understand what it is materially. A digital file is not a transparent window to the past—it is a layered archaeological object encoded with philosophical choices, technical constraints, and cultural assumptions. This chapter introduces forensic materialism as a methodology for reading file formats not as neutral containers but as arguments about how information should exist. Through close analysis of format headers, compression algorithms, metadata schemas, and encoding standards, we learn to see the file as Kirschenbaum's "mechanisms"—physical inscriptions carrying "frictional data" that reveal the technological worldviews of their creators. We examine five case studies of format philosophy: tracker modules (.MOD) as compositional databases, compression wars (.SIT vs .ZIP) as platform ideology, bitmap headers (.BMP) as Windows hegemony, streaming protocols (RealMedia) as control architectures, and ANSI art as constraint-driven creativity.
The Illusion of Transparency
Open a .TXT file. You see words. The file appears obvious—a neutral carrier of semantic content, a transparent window into an author's thoughts. But this is an illusion. That .TXT file is not "just text." It is:
- An encoding decision: ASCII? UTF-8? Windows-1252? Each choice privileges certain languages, certain character sets, certain cultural contexts.
- A line-ending philosophy: CRLF (Windows), LF (Unix), or CR (classic Mac)? Each reflects assumptions about what constitutes a "line."
- A byte order mark (or its absence): Does the file announce its encoding, or demand contextual interpretation?
- An operating system artifact: The file extension itself (.txt vs .TXT) reveals whether it was created on a case-sensitive filesystem.
Even the simplest digital object carries what Matthew Kirschenbaum calls "frictional data"—the material traces of its creation, storage, and transmission across computational infrastructures.1 To practice forensic materialism is to refuse the illusion of transparency and instead ask: What does this file tell us about the world that made it?
Wendy Chun argues that digital media operate through a "enduring ephemeral" paradox—they appear weightless and immaterial while simultaneously depending on deeply material infrastructures of silicon, magnetic flux, and electrical current.2 The file format is the site where this paradox crystallizes. It is simultaneously:
- Logical: An abstract specification (e.g., "PNG consists of an 8-byte signature followed by a series of chunks...")
- Physical: A specific sequence of magnetized domains on a hard drive platter or voltage states in flash memory
- Philosophical: An argument about how images/sounds/texts should be structured (lossless vs. lossy, open vs. proprietary, metadata-rich vs. minimal)
This chapter trains you to read files forensically—not as containers to be opened, but as archaeological strata to be excavated. We will examine five formats that embody distinct technical and cultural worldviews, learning to see the file not as transparent data but as opaque material artifact.
I. The Tracker Module: Composition as Database
Artifact Description: .MOD, .XM, .IT, .S3M
In 1987, Karsten Obarski released Ultimate Soundtracker for the Amiga, inaugurating a new musical paradigm: the tracker module. Unlike MIDI (which stores instructions for external synthesizers) or digital audio (which stores waveforms), a .MOD file is a self-contained compositional database storing:
- Sample Library: Short waveform recordings (instrument sounds) as raw PCM audio
- Pattern Matrix: A grid of notes, effects commands, and timing data
- Playback Sequence: An order list dictating which patterns play in which sequence
- Metadata: Song title, instrument names, pattern count encoded in the header
The genius of the tracker format is its ontological completeness—the file contains everything needed to reproduce the composition. No external dependencies. No proprietary synthesizers. The .MOD file is the instrument, the score, and the performance simultaneously.
Forensic Reading: The .MOD Header
Open a .MOD file in a hex editor and you encounter its material structure:
Offset Hex Values ASCII ------ ----------------------------------------------- -------------------- 0x0000 4D 6F 6E 6B 65 79 20 49 73 6C 61 6E 64 20 54 Monkey Island T 0x0010 68 65 6D 65 00 00 00 00 00 00 00 00 00 00 00 heme............ ... 0x0438 31 34 43 4E 30 31 31 36 30 38 30 36 30 34 30 14CN011608060402 0x0448 32 4D 2E 4B 2E 2M.K.
The first 20 bytes are the song title (null-padded ASCII). At offset 0x438, the magic number M.K. identifies this as a 4-channel ProTracker module. Between these landmarks lie 31 instrument definitions, each containing:
- 22 bytes: Instrument name (often revealing: "piano.22k" tells us sample rate and source)
- 2 bytes: Sample length (in words, not bytes—betraying Amiga's 16-bit architecture)
- 1 byte: Finetune value (-8 to +7, for pitch correction)
- 1 byte: Default volume (0-64, revealing Paula chip's 6-bit DAC limitations)
- 2 bytes: Loop start point
- 2 bytes: Loop length
This header structure is not accidental—it is a philosophical statement about music composition. The tracker format argues that:
- Samples over synthesis: Real-world sounds (even if lo-fi) are preferable to abstract oscillators
- Patterns over timelines: Musical structure should be modular and reusable (verse-chorus-verse as data structure)
- Constraints breed creativity: The 31-instrument limit and 4-channel polyphony forced composers to innovate (sample reuse, creative panning effects)
- Composition = Programming: Hex effect codes (e.g.,
E8xfor delay,Cxxfor volume) position the musician as coder
Cultural Context: The Demoscene's Medium
Tracker modules became the soundtrack of the demoscene because the format embodied demoscene values: technical mastery within constraints, self-sufficiency (no external gear needed), and size optimization (a 200KB .MOD could contain minutes of music). As demoscene historian Markku Reunanen notes, "The tracker was not just a tool but an aesthetic statement—music as code, composition as optimization problem."3
When you excavate a .MOD file, you are not merely recovering a song. You are recovering:
- Evidence of hardware constraints (Amiga Paula chip's 4-channel architecture)
- Traces of software workflows (instrument names revealing ProTracker vs. FastTracker vs. Impulse Tracker)
- Artifacts of distribution culture (BBS filename conventions, scene group tags in sample names)
- A compositional philosophy that blurred music and programming
The .MOD format is frictional data incarnate—every byte tells a story about the technological possibilities and cultural priorities of its moment.
II. Compression Wars: .SIT vs .ZIP as Platform Ideology
Artifact Description: StuffIt (.SIT) vs. PKZIP (.ZIP)
In the late 1980s, two compression formats waged a silent war for dominance: Raymond Lau's StuffIt (.SIT) on Macintosh and Phil Katz's PKZIP (.ZIP) on DOS/Windows. Both solved the same technical problem (lossless compression for file distribution), but their design philosophies diverged radically.
| Dimension | StuffIt (.SIT) | PKZIP (.ZIP) |
|---|---|---|
| Philosophy | Preserve everything—Mac resource forks, file types, creator codes, icon positions | Compress data only—cross-platform compatibility over fidelity |
| Metadata | Rich (Mac OS Type/Creator, Finder info) | Minimal (DOS 8.3 filenames, timestamps) |
| Licensing | Proprietary (shareware → commercial) | Open spec (led to WinZip, Info-ZIP, etc.) |
| Cultural Context | Mac's "it just works" ecosystem | DOS/Windows "lowest common denominator" |
| Winner | Lost (macOS abandoned resource forks) | Won (universal support by 2000s) |
Forensic Reading: Resource Forks as Philosophical Commitment
The Mac's HFS filesystem stored files as dual-fork structures:4
- Data Fork: The "content" (text of a document, pixels of an image)
- Resource Fork: Metadata, icons, menus, dialog boxes, even executable code
StuffIt's .SIT format preserved both forks, maintaining the Mac's "document knows its creator" philosophy. A .SIT archive could store a SimpleText document complete with its application icon, window position, and font styling—the file carried its entire context.
PKZIP's .ZIP format ignored this complexity. It compressed the data fork, discarded the resource fork, and relied on filename extensions (.txt, .jpg) to infer file type. This was a philosophical rejection of Mac's integrated ecosystem in favor of DOS/Windows' file-extension-as-identity model.
The Archaeological Consequence
When you excavate a 1990s Mac .SIT archive, you recover:
- Contextual Richness: Custom icons, Finder comments, label colors—evidence of how users organized their digital lives
- Application Dependencies: Type/Creator codes reveal which apps were used (e.g.,
CWIEfor CodeWarrior,MSWDfor Word) - Platform Ideology: The very existence of dual forks embodies Apple's "unified object" philosophy vs. Microsoft's "file = data + extension" pragmatism
When you excavate a .ZIP archive from the same era, you recover data stripped of context. The .ZIP format's victory represents not just technical efficiency but ideological consolidation—the triumph of cross-platform minimalism over single-platform richness.
As media archaeologist Jussi Parikka writes, "File formats are political—they encode assumptions about who has power, what platforms matter, and whose workflows deserve preservation."5 The .SIT vs .ZIP war was never just about compression ratios. It was about what aspects of digital life should survive transmission.
III. The BMP Header: Windows Hegemony in 54 Bytes
Artifact Description: The Windows Bitmap (.BMP)
The .BMP format is notoriously "wasteful"—it stores images as uncompressed raster data, resulting in massive file sizes compared to JPEG or PNG. Yet .BMP dominated Windows imaging from 1990-2005 precisely because it was simple, uncompressed, and baked into the OS. The format's verbosity is not a bug—it is a statement of Windows' market hegemony: "We don't need efficiency. We have hard drive space. We have RAM. We are inevitable."
Forensic Reading: The BITMAPINFOHEADER
Every .BMP file begins with a 54-byte header revealing Microsoft's architectural priorities:
Offset Field Name Size Example Value Meaning ------ ----------------- ---- ------------- --------------------------- 0x0000 Signature 2 "BM" File type magic number 0x0002 FileSize 4 0x0003624E 3,564,110 bytes 0x0006 Reserved1 2 0x0000 (unused) 0x0008 Reserved2 2 0x0000 (unused) 0x000A DataOffset 4 0x00000036 Pixel data starts at byte 54 0x000E HeaderSize 4 0x00000028 40 bytes (BITMAPINFOHEADER) 0x0012 Width 4 0x00000400 1024 pixels 0x0016 Height 4 0x00000400 1024 pixels 0x001A Planes 2 0x0001 Always 1 0x001C BitCount 2 0x0018 24 bits per pixel (RGB) 0x001E Compression 4 0x00000000 0 = BI_RGB (uncompressed) 0x0022 ImageSize 4 0x00000000 0 allowed when uncompressed 0x0026 XPelsPerMeter 4 0x00000EC4 3,780 DPI horizontal 0x002A YPelsPerMeter 4 0x00000EC4 3,780 DPI vertical 0x002E ColorsUsed 4 0x00000000 0 = use all colors 0x0032 ColorsImportant 4 0x00000000 0 = all colors important
Notice what this header reveals:
- Redundancy: Two reserved fields (bytes 0x06-0x09) waste 4 bytes for potential future use that never materialized
- Physical-world assumptions:
XPelsPerMeterandYPelsPerMeterbetray print-oriented origins (most software ignored these fields) - Simplicity as ideology:
Compression = 0means "store every pixel RGB value sequentially"—no entropy encoding, no Huffman trees, no complexity - Little-endian hegemony: All multi-byte integers use Intel byte order (x86 dominance)
The Padding Problem: Architectural Imperialism
BMP's most revealing quirk: scanline padding. Each row of pixels must align to a 4-byte boundary. If a row's width doesn't naturally align, padding bytes are added:6
Example: 7-pixel wide, 24-bit RGB image
- 7 pixels × 3 bytes (RGB) = 21 bytes per row
- 21 bytes ÷ 4 = 5 remainder 1 → need 3 padding bytes
- Actual storage: 24 bytes per row (21 data + 3 padding)
Why? Because Windows' GDI (Graphics Device Interface) was optimized for 32-bit memory alignment on x86 processors. The padding exists not for the image but for the CPU architecture. As Matthew Fuller argues in Media Ecologies, "Technical standards are ecologies—they shape and are shaped by hardware, software, and institutional power."7
When you excavate a .BMP file, you are excavating x86 hegemony. The format assumes Intel processors, Windows OS, abundant storage, and a print-oriented workflow (DPI fields). It is uncompressed not because compression was unknown (TIFF supported LZW since 1986) but because Microsoft had the market power to say: Our inefficiency is your problem. Buy more RAM.
IV. RealMedia: Streaming as Control Architecture
Artifact Description: .RM, .RA, .RAM
In 1995, RealNetworks introduced RealAudio—the first practical streaming media format. Before YouTube, before Spotify, there was RealPlayer, buffering 28.8k modem streams of college radio and webcam feeds. RealMedia formats (.RM for video, .RA for audio, .RAM for metafiles) dominated early web streaming, but they embodied a deeply proprietary control architecture that ultimately led to their extinction.
Forensic Reading: The .RAM Metafile
Most users never directly handled .RM files—they clicked .RAM metafiles, which were plain text "pointers":
rtsp://streaming.example.edu:554/lectures/bioethics_1997_03_15.rm
This single line encodes an entire control infrastructure:
- Protocol: RTSP (Real Time Streaming Protocol) — Proprietary, requiring RealServer software (unlike HTTP)
- Port 554: Standard RTSP port, but often blocked by firewalls (forcing users to "upgrade" network permissions)
- No local storage: The .RM file remained on the server; you streamed but never "owned" the content
- DRM readiness: RealMedia supported time-limited access, geographic restrictions, user authentication
The Archaeological Dilemma: The Undownloadable Artifact
When you excavate a .RAM file from a 1998 university website, you encounter a conceptual ghost—a pointer to a stream that no longer exists. The .RAM file is technically intact, but its referent has vanished. The streaming architecture intentionally prevented local archiving. As Tarleton Gillespie writes, "Platforms are not neutral—they are architectures of control that shape what can be created, distributed, and preserved."8
RealMedia's design philosophy prioritized:
- Server-side control: Content owners maintained permanent control over distribution
- Low-bandwidth optimization: Aggressive lossy compression (often 16 kHz mono audio at 8 kbps) prioritized accessibility over quality
- Proprietary lock-in: Codec secrets forced dependency on RealPlayer (pre-reverse-engineering era)
- Metrics capture: RTSP enabled server-side tracking of play counts, skip patterns, geographic distribution
The format's death (accelerated by Flash video, then HTML5 <video>) was not just technical obsolescence—it was architectural rejection. The web community chose formats that could be downloaded, embedded, remixed. RealMedia's control architecture became its death warrant.
Frictional Data: What Survives
When you excavate RealMedia artifacts today, you find:
- .RAM ghosts: Pointers to dead servers (e.g.,
rtsp://rave.ohiolink.edu:554/) - .RM orphans: Locally cached files that require obsolete RealPlayer builds (or FFmpeg with specially compiled codecs)
- Metadata traces: RealMedia headers stored title, author, copyright—but only if content creators bothered to set them (many didn't)
- Quality degradation: Even preserved .RM files bear the scars of extreme compression—8 kHz audio sampling, posterized video
The RealMedia format teaches us that preservation-hostile architectures die. Formats that resist local storage, demand proprietary servers, and encode vendor lock-in are evolutionary dead ends. The file format is not just technical—it is political, encoding assumptions about ownership, access, and power.
V. ANSI Art: Constraint-Driven Creativity as Format Philosophy
Artifact Description: .ANS, .ASC, .NFO
ANSI art—text-mode graphics built from IBM's Code Page 437 character set and ANSI X3.64 escape codes—flourished in BBS culture (1985-1995) despite (or because of) brutal constraints: 80×25 character grid, 16 colors, no anti-aliasing, no gradients, no pixels. Yet within these limits, artists created intricate logos, splash screens, and scene release .NFO files that remain iconic artifacts of digital subculture.
Forensic Reading: The .ANS Encoding
An ANSI art file is not an image—it is a script for a terminal emulator. Open a .ANS file in a hex editor:
Offset Hex Values ASCII Interpretation ------ ----------------------------------------------- ------------------------- 0x0000 1B 5B 30 6D 1B 5B 31 3B 33 34 6D E2 96 91 E2 ←[0m←[1;34m█▒▒ 0x0010 96 92 E2 96 92 1B 5B 30 3B 33 37 6D 20 20 20 ▒▒←[0;37m 0x0020 1B 5B 31 3B 33 31 6D E2 96 91 E2 96 84 E2 96 ←[1;31m█▄▀
The 1B byte is the ESC character (0x1B = 27 decimal), triggering ANSI escape sequences:
ESC[0m— Reset all attributes (back to white on black)ESC[1;34m— Set bold (1) bright blue (34) foregroundESC[0;37m— Set normal (0) white (37) foreground
Between the escape codes are extended ASCII characters from Code Page 437—box-drawing glyphs (╔═╗║), block elements (█▓▒░), and special symbols (♫☺◘) designed for text-mode graphics on IBM PC clones.9
The Philosophy of Constraint
ANSI art embodied what Lars Konzack calls "retro-futurist aesthetics"—artists weaponized technical limitations as creative constraints.10 The format's restrictions became design principles:
- Character-as-pixel: Each glyph became a building block (█ = solid, ▓ = 75% fill, ▒ = 50%, ░ = 25%)
- Dithering without pixels: Artists simulated gradients by mixing characters and colors (blue ░ next to cyan ▒ creates optical mixing)
- Semantic ambiguity: Is ═ a horizontal line or a typographic element? ANSI art exploited this slippage
- Temporal revelation: Because ANSI files "played" sequentially (escape codes processed left-to-right, top-to-bottom), artists could create animations within a static file format
The Preservation Crisis: Rendering as Interpretation
ANSI art presents a unique preservation challenge: the file is inseparable from its rendering context. To "see" an .ANS file, you need:
- Correct character set: Code Page 437 (not Unicode, not Latin-1, not UTF-8)
- Correct font: IBM VGA 8×16 pixel font (modern monospace fonts distort aspect ratios)
- Correct palette: EGA/VGA 16-color palette (not web-safe colors, not sRGB)
- Correct terminal behavior: Some ANSI art exploits cursor positioning tricks (moving "backward" to overwrite characters)
Archaeobytologists excavating ANSI art must recognize: the file is not the artifact—the file + rendering environment is the artifact. As media theorist Lev Manovich argues, "New media objects are not static—they are generated by algorithms each time they are accessed."11 ANSI art literalizes this: the .ANS file is a recipe, not an image.
Frictional Data: Encoding Subculture
When you excavate an ANSI art .NFO file (scene release info document), you recover:
- Group identity: ASCII art logos as territorial markers (e.g., Fairlight, Razor 1911)
- Technical prowess: Complex ANSI art signaled BBS operator skill and dedication
- Subcultural codes: Formatting conventions (centered text, specific borders) signaled membership in the warez/demoscene community
- Platform specificity: The format assumes DOS/VGA architecture—it is illegible on Mac, Unix, or modern web browsers without emulation
ANSI art teaches us that file formats are cultural artifacts, not neutral containers. The .ANS format encoded BBS culture's values: technical skill within constraints, subcultural identity through stylistic convention, and rejection of WYSIWYG GUI paradigms in favor of terminal-based textuality.
VI. Forensic Materialism as Methodology
The Five-Question Framework
When you encounter an excavated file, before rushing to triage its ontological state (Chapter 3), conduct a forensic material analysis:
- What is this file's material structure?
Open it in a hex editor. Identify the header, magic numbers, data layout. What does the byte-level organization reveal about the format's priorities?
- What philosophical arguments does this format embody?
Does it prioritize compression or accessibility? Openness or control? Cross-platform portability or platform-specific richness? Simplicity or expressiveness?
- What technical constraints shaped this format?
Hardware limitations (RAM, CPU speed, storage)? Network conditions (modem bandwidth)? Software ecosystems (OS file systems, APIs)?
- What cultural context produced this format?
BBS culture? Corporate software wars? Demoscene aesthetics? Open-source movements? Who used this format, and what communities formed around it?
- What frictional data survives in the file?
Metadata fields (author, timestamps, software version)? Encoding choices (character sets, compression settings)? Structural quirks (padding bytes, reserved fields, deprecated features)?
Tools for Forensic Analysis
Equip your forensic toolkit:
| Tool | Purpose | Use Case |
|---|---|---|
| Hex Editor (HxD, Hex Fiend, xxd) |
View raw byte structure | Identify headers, magic numbers, hidden metadata |
| file / TrID | Identify file format by signature | Verify format claims (is this really a .JPG?) |
| ExifTool | Extract metadata | Read EXIF, XMP, IPTC tags in images/video |
| MediaInfo | Analyze multimedia formats | Codecs, bitrates, container structures |
| strings | Extract text from binary files | Find embedded URLs, software names, comments |
| binwalk | Detect embedded files/archives | Compound formats (e.g., .DOC with embedded .GIF) |
| Format specs (LOC, FileFormat.info) |
Consult official documentation | Understand intended structure vs. actual implementation |
The Forensic Report Template
Document your forensic findings before proceeding to triage:
Forensic Material Analysis Report
Artifact Identifier:
[Filename, excavation source, MD5 hash]
Claimed Format:
[File extension, MIME type if known]
Verified Format:
[Magic number/signature, format family, version if applicable]
Material Structure:
[Header size, data layout, container vs. stream, compression type]
Embedded Metadata:
[Creation date, author, software version, character encoding, anything extractable]
Philosophical Arguments:
[What priorities does this format encode? Openness vs. control? Efficiency vs. fidelity?]
Cultural Context:
[Platform (Mac/Windows/Unix), era, subculture, distribution method]
Frictional Data Highlights:
[Unusual findings: deprecated features, proprietary extensions, evidence of format abuse/creative misuse]
Preservation Implications:
[Does this format require specific rendering environment? Proprietary codecs? Extinct software?]
Conclusion: The File as Argument
We have excavated five file formats—tracker modules, compression archives, bitmaps, streaming protocols, terminal art—and discovered that each is not a neutral container but a philosophical argument encoded in bytes. The .MOD file argues that music is data and composition is programming. The .SIT archive argues that context matters more than cross-platform compatibility. The .BMP file argues that Microsoft's market dominance permits inefficiency. The .RM stream argues that content should remain server-controlled. The .ANS file argues that constraint breeds creativity and subculture demands stylistic codes.
Forensic materialism refuses the illusion of transparency. It demands we read files as archaeological strata, excavating not just content but context, ideology, and technological worldview. As Friedrich Kittler famously declared, "There is no software"—only hardware performing interpretation.12 Every file format is a hardware-software assemblage, a material-conceptual hybrid that reveals the technological possibilities and cultural priorities of its moment.
Before you can triage an Archaeobyte (Chapter 3), before you can decide preservation strategy (Chapter 5), you must first understand what it is materially. This is the Tangible Archaeobyte—the file as physical artifact of code, the byte sequence as historical document, the format specification as ideological manifesto.
You have learned to read frictional data. Now, when you excavate a file, you see not just its content but its contestations—the technical debates, platform wars, and cultural negotiations that crystallized into format specifications. The file is never just data. The file is always an argument about how digital life should be structured, stored, and transmitted.
Armed with forensic materialism, you are ready to proceed to triage. But you will never again open a file naively. You have learned to see the Archaeobyte's tangible layer—the format as artifact, the specification as archaeology, the byte as evidence.
"To understand a medium, you must not just use it—you must examine its infrastructure, its materiality, its politics. The file format is where theory becomes matter."
— Jussi Parikka, What is Media Archaeology?, 2012
Works Cited
- Kirschenbaum, Matthew G. Mechanisms: New Media and the Forensic Imagination. MIT Press, 2008.
- Chun, Wendy Hui Kyong. "The Enduring Ephemeral, or the Future Is a Memory." Critical Inquiry 35.1 (2008): 148-171.
- Reunanen, Markku. Computer Demos—What Makes Them Tick? Aalto University, 2010.
- Siracusa, John. "Mac OS X 10.0: The Review." Ars Technica, 2001. (Detailed technical history of HFS resource forks)
- Parikka, Jussi. What is Media Archaeology? Polity Press, 2012.
- Microsoft Corporation. "BITMAPFILEHEADER Structure (Windows GDI)." Microsoft Developer Network, 1990.
- Fuller, Matthew. Media Ecologies: Materialist Energies in Art and Technoculture. MIT Press, 2005.
- Gillespie, Tarleton. "The Politics of 'Platforms'." New Media & Society 12.3 (2010): 347-364.
- IBM Corporation. "IBM PC Hardware Reference Library: Technical Reference." 1981. (Code Page 437 specification)
- Konzack, Lars. "Retro-Futurism in Computer Games." DiGRA Conference Proceedings, 2005.
- Manovich, Lev. The Language of New Media. MIT Press, 2001.
- Kittler, Friedrich A. "There Is No Software." The Truth of the Technological World: Essays on the Genealogy of Presence. Stanford University Press, 2013.