The Triage Workflow: From Discovery to Access
Transform Panic into Systematic Rescue Operations
When a platform announces a shutdown, time is the enemy. This 8-Phase Workflow transforms panic into a systematic rescue operation.
Derived from Chapter 10: Triage Workflow and the Archaeobytology Protocol v1.0, this guide operationalizes preservation ethics into actionable steps.
The 8-Phase Workflow
Detect endangerment via "Canary" monitoring systems. Watch for:
- Official shutdown notices
- Mass user exodus patterns
- Terms of Service changes signaling platform pivot
- Leadership turnover or acquisition announcements
- Sudden removal of API access or export tools
Determine three critical factors:
- Scope: How much data exists? (GB? TB? Estimate server load)
- Urgency: How many days remain? (Critical if < 7 days)
- Ethics: Does this pass the Custodial Filter?
Decision Point:
Apply the 5-Step Ethical Decision Matrix (see below). If the artifact fails the ethics check, do not proceed.
The 5-Step Ethical Decision Matrix
Before preserving or publishing an artifact, applying the following filter is mandatory:
Does this artifact represent a community, movement, or moment that would otherwise be lost?
Priority: Elevate marginalized voices and grassroots movements over corporate or mainstream content.
How close is this to extinction?
Action: Critical fragility (48 hours to shutdown) overrides lower-priority concerns.
Is it technically feasible to save this with current resources?
Trade-off: Do not spend 100 hours on one low-value artifact if it costs saving 1,000 high-value ones.
Is the Internet Archive or Library of Congress already saving this?
Rule: Do not duplicate effort unless you are adding unique context or fidelity.
- The Harm Principle: Does preserving this cause direct harm (doxxing, revenge porn)? If yes, do not preserve.
- The Right to be Forgotten: Did the creator explicitly delete this? If so, respect the deletion unless there is an overriding public interest (e.g., public official accountability).
- Context Collapse: Will preserving this expose a private community to public scrutiny they did not consent to?
Assemble the rescue team and select appropriate tools:
- For static sites:
wget,HTTrack - For dynamic content:
Selenium,Puppeteer - For API-accessible data: Custom API scrapers
- For social media: Platform-specific tools (e.g.,
gallery-dl,yt-dlp)
Strategy:
- Breadth-First: Capture the list of all URLs first (the "index"). This ensures you know what exists before the server dies.
- Depth-Second: Download heavy media/content after ensuring the index is safe.
⚠️ Critical Rule:
Respect rate limits to avoid crashing the dying server. You are preserving, not attacking. Use delays between requests and monitor server response times.
Verify data integrity to ensure nothing was corrupted during capture:
- Generate and store checksums (MD5, SHA-256) for all files
- Verify WARC file integrity if using Web ARChive format
- Spot-check random samples to ensure content is complete and readable
- Document any missing or corrupted files
- 3 copies of the data
- 2 different storage formats/media types
- 1 copy stored off-site (different physical location)
Apply LOCKSS Principles:
Lots of Copies Keep Stuff Safe. Distribute copies to multiple trusted stewards when possible. Single points of failure doom artifacts to eventual loss.
Determine appropriate access level based on ethical considerations:
- Public Archive: Open web access (appropriate for public content with no privacy concerns)
- Restricted Access: Researchers only, requires authentication (for sensitive but historically valuable content)
- Dark Archive: Preserved but sealed (content with privacy concerns but historical value; accessible only with special permission or after time delay)
Record the Chain of Custody:
Without provenance, the artifact is just a file. Documentation transforms it into a historical record.
Document the following:
- Who: Who performed the capture? (Individual or organization)
- When: Date and time of capture
- How: What tools were used? What parameters?
- Where: Original source URLs and server information
- Why: Context about the platform shutdown and preservation rationale
- Integrity: Checksums and verification methods
- Completeness: Known gaps or limitations in the capture
Summary: When Time is the Enemy
Platform deaths are unpredictable. The difference between a successful rescue and permanent loss often comes down to having a rehearsed workflow.
This 8-Phase system ensures that when the alarm sounds, your team moves with purpose rather than panic.
Remember: Preservation is an act of power. Wield it responsibly.
Source Note
This workflow is derived from "Chapter 10: Triage Workflow," "Chapter 9: The Custodial Filter," and the "Archaeobytology Protocol v1.0."