Foundations Series / Vol 01 Est. 2025

Chapter 10: Triage Workflow — From Discovery to Preservation


Opening: The Clock Is Always Ticking

March 17, 2023, 9:47 AM: A Discord message in the Archive Team channel: "Credit Karma is shutting down their forums on April 15th. 28 days. Thousands of posts about personal finance from 2007-2023. Anyone on this?"

9:52 AM: Three people respond. They've never worked together before. One is a college student in California. One is a librarian in Germany. One is a retired programmer in Ohio.

10:15 AM: They've created a shared spreadsheet, assigned tasks, and started reconnaissance.

April 14th, 11:58 PM: The scraping is complete. 47,000 posts, 8,200 users, 16 years of financial advice—all captured. Total time: 27 days, 14 hours. They did it.

April 15th, 12:01 AM: Credit Karma's forums go offline. The original URLs return 404 errors. But the archive exists—backed up to Internet Archive, stored on three personal servers, uploaded as a torrent.

This is triage workflow in action: from discovery to preservation in less than a month. Every step matters. Every hour counts. One mistake, one delay, and the content is lost forever.

This chapter teaches you the complete triage workflow—an 8-phase process tested across hundreds of platform deaths. Whether you have 48 hours or 6 months, this framework will guide you from panic to preservation.


The 8-Phase Triage Workflow

Overview

Phase 1: Discovery — Detecting that content is endangered
Phase 2: Assessment — Understanding scope, urgency, and feasibility
Phase 3: Mobilization — Assembling team and resources
Phase 4: Capture — Executing the scrape/download/preservation
Phase 5: Validation — Verifying data integrity
Phase 6: Storage — Securing long-term preservation
Phase 7: Access — Making content discoverable and usable
Phase 8: Documentation — Recording what you did and why

Each phase has specific goals, tools, and decision points. Let's explore them in detail.


Phase 1: Discovery — Detecting Endangerment

Goal

Identify that content is at risk of disappearing before it's too late.

Common Discovery Channels

1. Official Announcements

2. Financial/Business Signals

3. User Exodus

4. Technical Degradation

5. Community Monitoring

6. Policy Changes

Discovery Tools and Practices

Proactive Monitoring:

Reactive Response:

Decision Point: Is This Worth Investigating?

Rapid Assessment (5 minutes):

If answers suggest "yes, endangered and valuable," proceed to Phase 2.


Phase 2: Assessment — Understanding the Challenge

Goal

Determine scope, technical requirements, ethical concerns, and resource needs before committing to preservation.

Assessment Checklist

A. Scope Assessment

Content Inventory:

Example: Credit Karma Forums

Storage Estimate:

B. Technical Assessment

Platform Architecture:

Preservation Difficulty:

Tools Needed:

Example: Credit Karma Forums Technical Profile

C. Urgency Assessment

Time Until Loss:

Timeline Categories:

Example: Credit Karma

D. Resource Assessment

Labor:

Infrastructure:

Budget:

Example: Credit Karma

E. Ethical Assessment (Custodial Filter)

Cultural Significance: Medium-high (personal finance advice, especially recession-era)

Technical Fragility: High (28 days to shutdown)

Rescue Feasibility: Medium (doable with browser automation)

Redundancy: None (no other known preservation effort)

Ethical Concerns:

Decision: Preserve with restricted access

Copyright:

Terms of Service:

Privacy Laws:

Risk Assessment:

Example: Credit Karma

Output of Phase 2: Go/No-Go Decision

After assessment, decide:

GO: Proceed with preservation

NO-GO: Don't preserve (because...)

DEFER: Monitor but don't act yet


Phase 3: Mobilization — Assembling Resources

Goal

Get team, tools, and infrastructure ready before capture begins.

3A: Team Formation

Solo vs. Collaborative:

When to work solo:

When to recruit team:

Recruiting:

Team Roles:

Example: Credit Karma Team

3B: Tool Selection and Setup

Scraping Tools:

Static Sites:

Dynamic Sites (JavaScript-heavy):

API Harvesting:

Forensic Recovery:

Example: Credit Karma Stack

3C: Infrastructure Setup

Storage:

Compute:

Bandwidth:

Backup Strategy:

Example: Credit Karma Infrastructure

3D: Coordination Tools

Communication:

Documentation:

Example: Credit Karma Coordination


Phase 4: Capture — Executing the Preservation

Goal

Download/scrape/capture the endangered content before it disappears.

4A: Capture Strategy

Breadth vs. Depth:

Example:

Best practice: Breadth first (get IDs of everything), then depth (fill in details). If time runs out, you at least have a list of what existed.

Parallelization:

4B: Capture Execution

Step 1: Initial Crawl (Breadth)

Step 2: Content Download (Depth)

Step 3: Iterative Refinement

Example: Credit Karma Capture Process

Day 1-3: Reconnaissance

Day 4-7: Initial Crawl

Day 8-20: Content Download

Day 21-26: Gap Filling

Day 27: Final Validation

4C: Dealing with Technical Challenges

Challenge 1: Rate Limiting

Challenge 2: JavaScript Rendering

Challenge 3: Login Walls

Challenge 4: CAPTCHAs

Challenge 5: Dynamic URLs

Challenge 6: Server Instability

4D: Ethical Boundaries During Capture

Don't:

Do:


Phase 5: Validation — Verifying Data Integrity

Goal

Ensure captured data is complete, accurate, and uncorrupted.

5A: Completeness Checks

Quantitative:

Example: Credit Karma

Qualitative:

5B: Integrity Checks

File Corruption:

Format Validation:

Metadata Accuracy:

5C: Documentation of Gaps

What's Missing:

Known Issues:

Why Documentation Matters:


Phase 6: Storage — Long-Term Preservation

Goal

Store captured data securely with redundancy for decades-long access.

6A: Storage Formats

Raw Captures:

Derived Formats:

Media:

Example: Credit Karma Storage

6B: Redundancy Strategy

Local Redundancy:

Cloud Redundancy:

Community Redundancy:

Example: Credit Karma Redundancy

  1. USB drive #1 (coordinator's backup)
  2. USB drive #2 (technical lead's backup)
  3. USB drive #3 (scraper's backup)
  4. Internet Archive upload (public institution)
  5. BitTorrent (uploaded to Archive Team tracker)

Result: 5 copies, multiple custodians, extremely unlikely to be fully lost

6C: Metadata Preservation

Collection-level metadata:

Item-level metadata:

Technical metadata:

Store metadata in:


Phase 7: Access — Making Content Discoverable

Goal

Ensure captured content is usable by researchers, communities, and the public.

7A: Access Levels

Public Access:

Researcher Access:

Community Access:

Dark Archive:

Example: Credit Karma Access Decision

7B: Access Infrastructure

Static HTML Site:

Database + Web Interface:

Upload to Platforms:

Example: Credit Karma Access

7C: Discovery Mechanisms

How do people find this archive?

Documentation:

Indexing:

Community Outreach:


Phase 8: Documentation — Recording the Process

Goal

Document what you did, why, and what happened for future archivists and researchers.

8A: Technical Documentation

Scraping Process:

Challenges Encountered:

Final Statistics:

Example: Credit Karma Documentation (excerpt)

# Credit Karma Forums Archive - Technical Documentation

## Timeline
- Discovery: March 17, 2023
- Capture: March 19 - April 14, 2023 (27 days)
- Shutdown: April 15, 2023

## Team
- 3 volunteers (Archive Team)

## Tools
- Selenium (Python) for browser automation
- SQLite for progress tracking
- rsync for backups

## Statistics
- 47,155 posts captured (99.88% of estimated total)
- 8,200 unique users
- 2007-2023 (16 years of content)
- Total size: 187 MB (compressed)

## Challenges
- Dynamic pagination required browser automation
- Server timeouts during peak hours (scraped during US nighttime)
- 56 posts returned 404 (likely deleted by users before scrape)

## Storage
- 5 redundant copies (3 USB drives, Internet Archive, BitTorrent)

8B: Ethical Documentation

Decisions Made:

Takedown Policy:

Future Considerations:

8C: Historical Documentation

Why This Mattered:

Contextual Essay:

Example: Credit Karma Context (excerpt)

Credit Karma was a free credit-monitoring service that launched forums in 2007. During the Great Recession (2008-2009), these forums became a vital resource for people navigating financial hardship—debt, bankruptcy, foreclosure, unemployment. Users shared advice, support, and strategies for rebuilding credit. The forums remained active through 2023, documenting 16 years of American financial struggles and recovery. Credit Karma shut down the forums as part of a platform redesign focused on mobile apps. The decision prioritized sleek user experience over community memory, erasing nearly two decades of peer support and financial education.

8D: Lessons Learned

What Would You Do Differently?

Advice for Future Archivists:

Meta-Reflection:


Case Study: The Complete Triage Workflow in Action

The GeoCities Rescue (2009)

Let's trace the entire workflow through Archive Team's legendary GeoCities rescue:

Phase 1: Discovery

Phase 2: Assessment

Phase 3: Mobilization

Phase 4: Capture

Phase 5: Validation

Phase 6: Storage

Phase 7: Access

Phase 8: Documentation

Legacy:


Workflow Variations for Different Scenarios

Scenario 1: Emergency Triage (< 48 hours)

Compress the workflow:

Priority: Speed over perfection. Save something rather than nothing.

Scenario 2: Systematic Preservation (6+ months)

Expand the workflow:

Priority: Quality and comprehensiveness. Create gold-standard archive.

Stealth considerations:

Priority: Survival (yours and the archive's). Preserve ethically but carefully.


Conclusion: The Workflow Is Your Map

Digital preservation under deadline is chaos. Platforms die with little warning. Servers vanish. URLs break. The clock ticks down.

The workflow is your map through chaos. It won't make preservation easy, but it will make it systematic. When you panic (and you will), return to the workflow:

  1. What phase am I in?
  2. What's the goal of this phase?
  3. What's the next action?

The workflow has been tested across hundreds of platform deaths. It works. It's saved millions of digital artifacts. It will guide you through your first rescue—and your hundredth.

Next chapter: Part III: Institution Building begins. We've learned to excavate, analyze, triage, and preserve. Now we must build institutions that can sustain this work for decades—organizations that outlive founders, survive funding crises, and resist corporate capture.

The rescue is only the beginning. The real work is building systems that prevent future murders.

But first: practice the workflow. Find an endangered platform. Walk through the phases. Preserve something.

The clock is always ticking. Start now.


Discussion Questions

  1. Personal Experience: Have you ever tried to preserve digital content before a deadline (even personal—like backing up your own social media)? What went well? What did you wish you'd known?

  2. Workflow Adaptation: Which scenario (emergency, systematic, guerrilla) would be hardest for you? Why? What skills would you need to develop?

  3. Team Dynamics: The Credit Karma example had 3 strangers collaborate effectively. What made that work? What could go wrong?

  4. Validation Trade-offs: In emergency triage, validation is minimal. How do you decide "good enough" when perfection isn't possible?

  5. Access Decisions: The Credit Karma archive restricted full content to researchers. Agree or disagree? Where would you draw the line?

  6. Future Scenarios: Imagine a platform shutdown in 2030. What might be different (technology, laws, culture)? How would the workflow need to adapt?


Exercise: Conduct a Practice Triage

Scenario: It's November 2025. A small platform called "BookTalk" (fictional) announces it will shut down in 60 days. It's a reading discussion forum with:

Your Task: Walk through the 8-phase workflow.

Phase 1: Discovery (Already done—you just heard the news)

Phase 2: Assessment (500 words)

Phase 3: Mobilization (300 words)

Phase 4: Capture (500 words)

Phase 5: Validation (200 words)

Phase 6: Storage (300 words)

Phase 7: Access (300 words)

Phase 8: Documentation (200 words)

Reflection (300 words)


Further Reading

On Preservation Workflows

On Rapid Response Archiving

On Data Integrity and Validation

On Access and Ethics

Primary Sources


End of Chapter 10

Next: Part III — Institution Building Chapter 11 — Sustainable Preservation Organizations: Building the Archive