Chapter 3: The Custodial Filter | Archaeobytology 200

Course: Archaeobytology 200: Advanced Triage & Methodology

Section: Part II - The Anvil

Status: Final Academic Draft

Abstract

This chapter establishes the ethical and legal framework that governs all preservation decisions in Archaeobytology: the Custodial Filter. While Chapters 1 and 2 provided the methodology for finding and classifying artifacts, this chapter addresses the critical question that precedes all technical work: Should this artifact be preserved, and if so, how should access be controlled? Drawing on privacy law, copyright doctrine, archival ethics, and emerging frameworks for digital stewardship, this text provides a formal decision protocol for navigating the tension between preservation (the imperative to save) and protection (the duty to prevent harm). The Custodial Filter is not a barrier to preservation but a necessary safeguard that ensures the Archaeobytologist operates as a responsible steward, not a digital grave robber.

Preamble: The Crisis of Custody

The Excavation Protocol (Chapter 1) taught the practitioner how to find the artifact. The Triage Protocol (Chapter 2) taught them how to classify its state. But between diagnosis and preservation lies a third, more profound challenge: the question of custody.

The term "custody" comes from the Latin custodia, meaning "guardianship" or "protection."^[1] In law, custody implies not just possession but responsibility—the duty to care for something on behalf of others, including those who cannot speak for themselves. The Archaeobytologist, upon excavating an artifact, becomes its custodian. This is not a right but a burden.

This burden is unique to the digital realm. In physical archaeology, the artifact is dead matter: a potsherd, a bone, a tool. It cannot speak, cannot be harmed further, and its "life" ended millennia ago. But the digital artifact is different. It may contain the living voices of real people—their names, their faces, their private communications. It may be copyrighted, owned by entities that still exist. It may carry cultural significance to communities that never consented to its preservation. It may, if mishandled, cause tangible harm.

This is the crisis of custody: the digital past is not safely dead. It is entangled with the living present. The practitioner who excavates without ethical consideration is not an archaeologist but a voyeur, or worse, a thief.

The Custodial Filter is the formal protocol that prevents this. It is the set of tests, questions, and frameworks that must be applied to every artifact before preservation work begins. It asks three foundational questions:

Privacy: Does this artifact contain information that could harm living individuals?
Legality: Do I have the legal right to preserve and share this artifact?
Ethics: Even if legal, is preservation the right thing to do?

This chapter provides the formal methodology for answering these questions. It is the conscience of the discipline.

Filter 1: The Privacy Test (Protecting the Living)

The first and most urgent filter is privacy. The question is simple but consequential: Does this artifact contain Personally Identifiable Information (PII) that could be used to identify, contact, or harm living individuals?

The Taxonomy of PII

Not all personal information is equally sensitive. Privacy scholars distinguish between three categories of identifiers, as formalized in the GDPR and privacy impact assessment frameworks:^[2]

Category 1: Direct Identifiers

These are data points that directly identify a specific individual without need for additional information.

Full legal names (especially when combined with location)
Email addresses
Phone numbers
Social Security Numbers or national ID numbers
Street addresses
Photographs of faces (especially with metadata)

Risk Level: HIGH. These must be redacted unless explicit consent exists or the individual is a public figure.

Category 2: Quasi-Identifiers

These are data points that, when combined, can be used to re-identify individuals even if no direct identifier is present. Computer scientist Latanya Sweeney demonstrated that 87% of the U.S. population could be uniquely identified using just three quasi-identifiers: ZIP code, birthdate, and gender.^[3]

Dates of birth (especially when combined with location)
ZIP codes or postal codes
Occupation or job title
Educational institution attended
Usernames (especially if reused across platforms)

Risk Level: MEDIUM-HIGH. These require contextual assessment. A single quasi-identifier may be acceptable; multiple in combination are not.

Category 3: Sensitive Attributes

These are data points that, while not directly identifying, reveal information that could cause harm, discrimination, or distress if exposed.

Medical information or health status
Sexual orientation or gender identity
Religious or political beliefs
Financial information (debts, income)
Criminal history or allegations
Biometric data (fingerprints, DNA)

Risk Level: CONTEXT-DEPENDENT. Even if anonymized, this information may cause harm to the individual if the context makes re-identification possible.

The REDACT vs. PRESERVE Decision Tree

Once PII is identified, the practitioner must make a formal decision: redact, embargo, or preserve as-is.

Path 1: REDACT (Anonymize the Artifact)

When to use: The artifact has high research value, but PII is present and cannot be justified as necessary.

Methodology:

Automated Redaction: Use tools like scrubadub (Python) or regex patterns to find and replace PII with [REDACTED] or pseudonyms.
Manual Review: Automated tools miss context. A human must verify that quasi-identifiers in combination have been addressed.
Cryptographic Hashing: If preserving structure matters (e.g., for network analysis), replace names with consistent hashes (e.g., "Alice" becomes "User_A3F7B2").

Academic Grounding: This approach aligns with the "k-anonymity" framework, where data is modified such that any individual is indistinguishable from at least k-1 others in the dataset.^[4]

Path 2: EMBARGO (Restrict Access)

When to use: The artifact's research value depends on preserving the PII (e.g., a study of online harassment patterns requires seeing actual usernames), but public release would cause harm.

Methodology:

Researcher-Only Access: Place the artifact in a restricted archive accessible only to vetted researchers with IRB approval.
Time-Limited Embargo: Set a release date (e.g., 25 years, 50 years) after which the artifact becomes public, following archival norms for sensitive materials.
Aggregated Access: Provide statistical summaries or visualizations of the data without releasing the raw artifact.

Precedent: This mirrors practices in oral history archives, where interviews containing sensitive material are embargoed until the interviewee's death or a specified date.^[5]

Path 3: PRESERVE AS-IS (Public Release Justified)

When to use: The individuals are public figures, the information is already public, or the historical significance outweighs privacy concerns.

Justification Framework:

Public Figure Exception: Elected officials, CEOs, public intellectuals—their digital traces are matters of public record.
Already Public: If the artifact was publicly accessible (e.g., a public forum post, a published blog), preservation is not creating new exposure.
Historical Significance: In rare cases, the artifact's importance to understanding a historical moment (e.g., documentation of a social movement) justifies preservation despite PII presence. This requires rigorous justification and peer review.

Warning: This path must be taken with extreme caution. As privacy scholar Helen Nissenbaum argues, "privacy is contextual"—something posted publicly in 2005 may not have been intended to be searchable, archived, and analyzed in 2025.^[6] The "already public" exception is not a blank check.

Filter 2: The Copyright Test (Respecting Ownership)

The second filter addresses legality: Do I have the legal right to preserve and distribute this artifact?

Unlike physical archaeology, where artifacts are often unclaimed or fall under "finders keepers" doctrines, digital artifacts almost always have an owner. A .mp3 file is copyrighted. A blog post is copyrighted. A screenshot of a website is copyrighted. The practitioner is not excavating abandoned property; they are handling someone's intellectual property.

The Four Pillars of Fair Use

In U.S. law, the "fair use" doctrine provides a limited exception to copyright, allowing use of copyrighted material without permission for purposes such as research, education, and criticism.^[7] Fair use is determined by a four-factor test:

Factor 1: Purpose and Character of Use

Question: Is the use transformative? Is it for nonprofit educational or research purposes?

Archaeobytology Application: Preservation for scholarly research is highly favored under this factor. The work is transformative (the artifact is being studied, not consumed as entertainment) and nonprofit.

Example: Preserving a GeoCities homepage to study early web design patterns is transformative. Mirroring a GeoCities homepage to "revive cool 90s sites" for nostalgia traffic is not.

Factor 2: Nature of the Copyrighted Work

Question: Is the work factual or creative? Is it published or unpublished?

Archaeobytology Application: Factual works (databases, documentation, logs) receive less copyright protection than creative works (art, music, fiction). Unpublished works (private emails, drafts) receive stronger protection.

Example: Preserving a README.txt file (factual, functional) is more defensible than preserving an unpublished novel found on an abandoned server.

Factor 3: Amount and Substantiality

Question: How much of the work is being used? Is the "heart" of the work being taken?

Archaeobytology Application: Preserving a portion of a work for analysis is stronger than preserving it in its entirety for redistribution. However, preserving the whole work is sometimes necessary for research (e.g., a complete website for link analysis).

Example: Preserving a single blog post from a 500-post blog is more defensible than mirroring the entire blog for public access.

Factor 4: Effect on the Market

Question: Does this use harm the market value of the original work?

Archaeobytology Application: If the artifact is abandoned (the creator is unreachable, the platform is dead), market harm is nearly impossible. If the creator is active and still selling the work, preservation could harm their market.

Example: Preserving a Flash game from a dead website (no market exists) is defensible. Preserving and redistributing a Flash game still sold on Steam is not.

The Orphan Works Problem

The most common copyright challenge in Archaeobytology is the "orphan work"—a copyrighted artifact whose owner cannot be identified or located. Legally, these works remain under copyright (in the U.S., for 70 years after the author's death), but practically, no one can grant permission.^[8]

Best Practice: Document a "good faith effort" to locate the copyright holder. This includes:

Checking WHOIS records for domain ownership
Searching for the creator's name + contact info
Posting a public call for the owner on relevant forums

If after reasonable effort the owner cannot be found, preservation under fair use (with full citation and takedown policy) is the least-worst option. As copyright scholar Lawrence Lessig argues, orphan works law is broken, but the risk of lawsuit from an untraceable owner is near zero.^[9]

The CITE & PRESERVE Protocol

For all copyrighted artifacts preserved under fair use, the practitioner must:

Cite the Original Creator: Include full attribution in metadata (creator name, original URL, date).
Provide Context: Explain the research purpose in accompanying documentation.
Offer Takedown: Include a clear contact method for copyright holders to request removal.
Limit Access if Necessary: If full public release is questionable, use researcher-only access or time-limited embargo.

This protocol balances preservation (the public good) with respect for ownership (the private right).

Filter 3: The Ethics Test (Beyond Law)

The third filter is the most difficult because it cannot be codified. It asks: Even if preservation is legal and privacy-compliant, is it ethical?

Law defines the floor of acceptable behavior. Ethics defines the ceiling of responsible behavior. The Archaeobytologist must operate at the ceiling.

The Harm Principle

The foundational ethical guideline comes from philosopher John Stuart Mill's "harm principle": actions are permissible unless they harm others.^[10] In Archaeobytology, this translates to: Preservation is justified unless it causes tangible harm to individuals or communities.

Types of Potential Harm:

Reputational Harm: Exposing past statements or affiliations that could damage someone's current reputation (e.g., teenage forum posts, early blog entries with now-disavowed views).
Safety Harm: Exposing location data, abuse survivor identities, or whistleblower information that could lead to physical danger.
Dignitary Harm: Violating the dignity of the dead by exposing private communications without family consent (e.g., preserved private messages from a deceased individual).
Community Harm: Exposing cultural practices or knowledge that a community considers sacred or private (e.g., indigenous language resources not intended for outsiders).

Test: For each artifact, ask: "If this person saw this preserved and public, would they feel violated? Would they be endangered?" If yes, preservation must be reconsidered.

The Community Consent Model

One of the most ethically complex scenarios is the preservation of artifacts created by marginalized or indigenous communities. These artifacts may be legally "public" (posted on open forums) but culturally "private" (intended only for in-group consumption).

Drawing on indigenous data sovereignty frameworks, scholars like Tahu Kukutai argue that communities should have the right to control how their cultural data is used, even if that data is technically public.^[11] This has led to the development of the CARE Principles for Indigenous Data Governance: Collective benefit, Authority to control, Responsibility, and Ethics.^[12]

Best Practice for Community-Sensitive Artifacts:

Identify the Community: Is this artifact associated with a specific cultural, ethnic, or marginalized group?
Seek Input: Contact community leaders, cultural organizations, or tribal councils to request guidance.
Respect Refusal: If the community requests that the artifact not be preserved or made public, honor that request even if legal preservation would be possible.
Co-Stewardship: Where possible, work with the community to determine preservation and access terms.

Example: A researcher discovers an archived forum for LGBTQ+ youth from the early 2000s. Even if the forum was technically public, preserving and publicizing it without consulting LGBTQ+ advocacy groups could expose individuals who were closeted at the time. Ethical practice requires community engagement.

The Right to Be Forgotten

The European Union's GDPR enshrines a "right to be forgotten"—the ability for individuals to request deletion of personal data under certain conditions.^[13] While this right does not apply to archival research in the U.S., it represents an important ethical principle: individuals should have agency over their digital past.

Ethical Protocol:

If an individual contacts you requesting removal of their artifact from your archive, seriously consider it.
If the artifact is of genuine historical significance (e.g., a public statement by a government official), you may decline, but you must provide a clear justification.
If the artifact is of minimal research value, removal is the ethical choice even if legal preservation is possible.

As digital archivist Kate Theimer argues, "The digital archive must be a living, responsive institution, not a prison for the past."^[14]

The Custodial Report: Formal Documentation

Just as the Triage Protocol requires a formal Triage Report, the Custodial Filter requires a Custodial Report documenting all ethical and legal decisions.

Required Elements of a Custodial Report:

Artifact Identifier: File name, URL, or unique descriptor
PII Assessment:
- PII Present? (YES/NO)
- If YES: Category (Direct/Quasi/Sensitive)
- Action Taken: (REDACT/EMBARGO/PRESERVE AS-IS)
- Justification: (2-3 sentences)
Copyright Assessment:
- Copyright Status: (Public Domain/Fair Use/Permission Granted/Orphan Work)
- Fair Use Factors: (Four-factor analysis summary)
- Owner Contact Attempt: (YES/NO, with documentation)
Ethical Assessment:
- Potential Harm: (NONE/LOW/MEDIUM/HIGH)
- If MEDIUM or HIGH: Harm Type (Reputational/Safety/Dignitary/Community)
- Mitigation Steps Taken
Access Control: (PUBLIC/EMBARGO/RESEARCHER-ONLY/REDACTED VERSION ONLY)
Takedown Policy: Contact method and response protocol
Date of Assessment: Because legal and ethical standards evolve

Example Custodial Report:

Artifact: myspace_messages_2006.csv (Exported private messages from MySpace)

PII Assessment:

PII Present: YES
Category: Direct Identifiers (full names, email addresses in signatures)
Action: EMBARGO with researcher-only access after redaction of email addresses
Justification: Messages contain valuable data on early social network communication patterns, but direct identifiers create reputational risk. Emails redacted; names retained in hashed form for network analysis.

Copyright Assessment:

Copyright Status: Uncertain (messages authored by users, platform defunct)
Fair Use Analysis: Transformative research use (Factor 1: ✓), factual communication (Factor 2: ✓), dataset used for analysis not redistribution (Factor 3: ✓), no market harm as platform dead (Factor 4: ✓)
Owner Contact: Attempted contact with 5 most active users via archived email addresses; 0 responses after 60 days

Ethical Assessment:

Potential Harm: MEDIUM (reputational harm if adolescent messages made public)
Harm Type: Reputational (users were teenagers; messages may contain now-embarrassing content)
Mitigation: Researcher-only access with IRB oversight; pseudonymization of all usernames; embargo until 2040 (when most users will be 40+ years old)

Access Control: RESEARCHER-ONLY (Pseudonymized version); FULL EMBARGO on raw data until 2040

Takedown Policy: Any individual identified in dataset may request full removal via [email protected]; requests reviewed within 30 days

Assessment Date: November 21, 2025

Conclusion: The Guardian, Not the Grave Robber

The Custodial Filter is not an obstacle to preservation. It is the foundation of responsible preservation. Without it, the Archaeobytologist is not a scholar but a data hoarder, a privacy violator, or a copyright infringer.

The practitioner who applies the Custodial Filter acknowledges a fundamental truth: the digital past is not abandoned property. It is entangled with living people, living communities, and living systems of ownership and meaning. To excavate without care is to harm.

The Custodial Filter ensures that every preserved artifact can be defended on three grounds:

Privacy: It does not expose individuals to harm.
Legality: It respects intellectual property rights.
Ethics: It honors the dignity and agency of those whose lives are reflected in the artifact.

This is the discipline's vow: to be a guardian of the past, not a grave robber. To preserve with purpose, not curiosity. To protect, not exploit.

With the artifact now excavated (Chapter 1), triaged (Chapter 2), and cleared through the Custodial Filter (Chapter 3), the practitioner is finally ready to proceed to the technical work of preservation: the choice between migration, emulation, and reconstruction. That is the work of Chapter 4.

But preservation without custody is theft. The Filter comes first. Always.

Works Cited

[1] ↑Oxford English Dictionary. (2025). "Custody." Retrieved from https://www.oed.com/ — Etymology from Latin custodia, establishing the concept of guardianship.
[2] ↑European Union. (2016). General Data Protection Regulation (GDPR), Articles 4 & 9. Official Journal of the European Union. The GDPR's taxonomy of personal data categories provides the foundational framework for PII classification.
[3] ↑Sweeney, L. (2000). "Simple Demographics Often Identify People Uniquely." Carnegie Mellon University, Data Privacy Working Paper 3. Foundational work demonstrating re-identification risk from quasi-identifiers.
[4] ↑Sweeney, L. (2002). "k-Anonymity: A Model for Protecting Privacy." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5), 557-570. The foundational framework for anonymization in research datasets.
[5] ↑Oral History Association. (2009). "Principles and Best Practices." Retrieved from https://oralhistory.org/ — Guidelines for embargo practices in sensitive oral history collections.
[6] ↑Nissenbaum, H. (2009). Privacy in Context: Technology, Policy, and the Integrity of Social Life. Stanford University Press. Nissenbaum's theory of "contextual integrity" explains why "already public" does not equal "consent to archive."
[7] ↑U.S. Copyright Act, 17 U.S.C. § 107 (Fair Use). The four-factor test remains the foundational framework for fair use analysis in U.S. law.
[8] ↑U.S. Copyright Office. (2015). Orphan Works and Mass Digitization. A Report of the Register of Copyrights. Documents the orphan works problem and proposed (but not enacted) solutions.
[9] ↑Lessig, L. (2004). Free Culture: How Big Media Uses Technology and the Law to Lock Down Culture and Control Creativity. Penguin Press. Lessig's critique of orphan works law provides the ethical justification for preservation despite copyright uncertainty.
[10] ↑Mill, J. S. (1859). On Liberty. John W. Parker and Son. Mill's harm principle remains foundational to ethical frameworks in research and archival practice.
[11] ↑Kukutai, T., & Taylor, J. (Eds.). (2016). Indigenous Data Sovereignty: Toward an Agenda. ANU Press. Foundational work on the right of indigenous communities to control their data.
[12] ↑Carroll, S. R., et al. (2020). "The CARE Principles for Indigenous Data Governance." Data Science Journal, 19(43), 1-12. The CARE Principles provide an actionable framework for ethical data stewardship with indigenous communities.
[13] ↑European Union. (2016). GDPR Article 17: Right to Erasure ("Right to Be Forgotten"). While not directly applicable in the U.S., this principle represents an important ethical standard.
[14] ↑Theimer, K. (2018). "Digital Archives Should Be Living, Responsive Institutions." ArchivesNext (blog). Retrieved from http://www.archivesnext.com/ — Theimer's advocacy for responsive, ethical archival practice.