This appendix provides a curated catalog of tools, software, services, and resources essential for Archaeobytological practice. Tools are organized by function and annotated with:
Purpose: What the tool does
Skill Level: Beginner, Intermediate, Advanced
Cost: Free, Freemium, Paid
Platform: Windows, macOS, Linux, Web-based
Open Source: Yes/No
Tools are current as of 2025 but the digital preservation landscape evolves rapidly. Check the Archaeobytology community wiki (archaeobytology.org/wiki) for updates.
Purpose: Command-line tool for downloading websites
recursively
Skill Level:
Beginner-Intermediate
Cost:
Free
Platform: Windows, macOS,
Linux
Open Source:
Yes
Website: https://www.gnu.org/software/wget/
What It Does: Downloads web pages and their linked resources (images, CSS, JavaScript). Creates mirror copies of websites on your local machine.
Basic Usage:
wget --recursive --level=2 --no-parent --wait=1 https://example.com
Best For: Static HTML sites, simple scraping projects
Limitations: Doesn't handle JavaScript-heavy sites well, can't navigate login walls
Purpose: Website copier with GUI interface
Skill
Level: Beginner
Cost:
Free
Platform: Windows, macOS,
Linux
Open Source:
Yes
Website: https://www.httrack.com/
What It Does: Similar to wget but with graphical interface. Easier for beginners who don't want command-line tools.
Best For: One-time website archiving, beginners
Limitations: Slower than command-line tools, less flexible configuration
Purpose: Self-hosted web archiving platform
Skill
Level: Intermediate
Cost:
Free
Platform: Windows, macOS, Linux,
Docker
Open Source:
Yes
Website: https://archivebox.io/
What It Does: Creates permanent archives of web pages including HTML, screenshots, PDFs, videos, and git repositories. Provides web interface for browsing archives.
Features:
Multiple capture methods (wget, Chrome headless, youtube-dl, etc.)
Scheduled archiving (cron jobs)
Full-text search
Deduplication
Best For: Personal archiving projects, research collections, small organizations
Setup Complexity: Requires server or Docker knowledge
Purpose: Browser-based interactive web
archiving
Skill Level:
Beginner
Cost:
Free
Platform: Web (browser extension also
available)
Open Source:
Yes
Website: https://archiveweb.page/
What It Does: Records your browsing session including JavaScript interactions, videos, and dynamic content. Creates WARC (Web ARChive) files you can replay.
Features:
Captures JavaScript-heavy sites
Records social media feeds (Twitter, Instagram)
Exports to standard WARC format
Replay archives offline
Best For: Social media archiving, dynamic websites, personal projects
Unique Advantage: Works in browser, no installation required
Purpose: Industrial-strength web crawler
Skill
Level: Advanced
Cost:
Free
Platform: Java
(cross-platform)
Open Source:
Yes
Website: https://github.com/internetarchive/heritrix3
What It Does: Internet Archive's production crawler. Designed for massive-scale archiving (billions of URLs).
Features:
Highly configurable crawl policies
Distributed crawling
Respects robots.txt
Creates WARC files
Best For: Large institutions, comprehensive web archiving
Limitations: Steep learning curve, requires significant infrastructure
Purpose: High-fidelity browser-based crawling
Skill
Level: Intermediate-Advanced
Cost:
Free
Platform: Docker
Open
Source: Yes
Website: https://github.com/webrecorder/browsertrix-crawler
What It Does: Uses real browsers (Chrome) to capture JavaScript-heavy sites with perfect fidelity. Creates WARC files.
Best For: Modern web apps, single-page applications, sites requiring JavaScript
Purpose: Subscription web archiving service
Skill
Level: Beginner
Cost: Paid
(subscription based on storage)
Platform:
Web-based
Open Source:
No
Website: https://archive-it.org/
What It Does: Managed web archiving service by Internet Archive. Point-and-click interface for creating and managing web archives.
Features:
Scheduled recurring crawls
Metadata management
Public or private collections
Integration with Wayback Machine
Best For: Institutions without technical staff, organizations needing reliable managed service
Cost: Starts ~$1,500/year for small collections
Purpose: Video downloader for YouTube and 1000+
sites
Skill Level:
Intermediate
Cost:
Free
Platform: Windows, macOS,
Linux
Open Source:
Yes
Website: https://github.com/yt-dlp/yt-dlp
What It Does: Downloads videos from streaming platforms including metadata, subtitles, thumbnails.
Basic Usage:
xxxxxxxxxx
yt-dlp --write-description --write-info-json --write-thumbnail https://youtube.com/watch?v=VIDEO_ID
Best For: Video archiving, preserving YouTube/Vimeo/TikTok content
Note: yt-dlp is actively maintained fork of youtube-dl (use yt-dlp)
Purpose: Image gallery downloader
Skill
Level: Intermediate
Cost:
Free
Platform: Windows, macOS,
Linux
Open Source:
Yes
Website: https://github.com/mikf/gallery-dl
What It Does: Downloads images from image hosting sites (Imgur, Flickr, DeviantArt, Twitter, etc.)
Best For: Image archiving, art preservation, meme collections
Purpose: Multimedia conversion and processing
Skill
Level: Intermediate-Advanced
Cost:
Free
Platform: Windows, macOS,
Linux
Open Source:
Yes
Website: https://ffmpeg.org/
What It Does: Converts video/audio formats, extracts frames, creates thumbnails, transcodes for preservation.
Best For: Format migration, creating preservation masters, generating access copies
Example:
xxxxxxxxxx
ffmpeg -i input.flv -c:v libx264 -c:a aac output.mp4
Purpose: Flash game and animation
preservation
Skill Level:
Beginner
Cost:
Free
Platform: Windows, Linux
Open
Source: Partially
Website:
https://flashpointarchive.org/
What It Does: Preserves and plays 500,000+ Flash games and animations using embedded emulators.
Features:
Curated, playable collection
Built-in launcher
Metadata and screenshots
Regular updates
Best For: Playing preserved Flash content, research, nostalgia
Download Size: ~1TB for full collection (smaller curated versions available)
Purpose: Flash Player emulator in Rust
Skill
Level: Beginner
Cost:
Free
Platform: Web (browser extension),
Desktop
Open Source:
Yes
Website: https://ruffle.rs/
What It Does: Open-source Flash Player replacement that runs in browsers and as standalone app.
Best For: Viewing archived Flash content, embedding Flash in modern websites
Status: Under active development, not 100% compatible yet
Purpose: Arcade game preservation
Skill
Level: Intermediate
Cost:
Free
Platform: Windows, macOS,
Linux
Open Source:
Yes
Website: https://www.mamedev.org/
What It Does: Emulates arcade hardware to preserve vintage arcade games.
Best For: Arcade game preservation, historical research
Purpose: DOS emulator
Skill
Level: Beginner
Cost:
Free
Platform: Windows, macOS,
Linux
Open Source:
Yes
Website: https://www.dosbox.com/
What It Does: Emulates MS-DOS environment for running old DOS games and software.
Best For: 1980s-1990s software preservation
Purpose: Browser-based emulation framework
Skill
Level: Advanced (for setup), Beginner (for
use)
Cost:
Free
Platform: Web-based
Open
Source: Yes
Website: https://github.com/db48x/emularity
What It Does: JavaScript framework for running emulators in web browsers. Powers Internet Archive's software collection.
Best For: Making preserved software publicly playable via web
Purpose: Digital forensics and file recovery
Skill
Level: Advanced
Cost:
Free
Platform: Windows, macOS,
Linux
Open Source:
Yes
Website: https://www.sleuthkit.org/
What It Does: Analyzes disk images, recovers deleted files, examines file systems.
Best For: Forensic analysis of hard drives, recovering deleted content
Purpose: Disk imaging tool
Skill
Level: Intermediate
Cost:
Free
Platform: Windows
Open
Source: No
Website: https://www.exterro.com/ftk-imager
What It Does: Creates forensic disk images (bit-by-bit copies) for preservation and analysis.
Best For: Creating preservation masters of physical media
Purpose: File recovery tool
Skill
Level: Intermediate
Cost:
Free
Platform: Windows, macOS,
Linux
Open Source:
Yes
Website: https://www.cgsecurity.org/wiki/PhotoRec
What It Does: Recovers deleted files from hard drives, memory cards, etc.
Best For: Recovering accidentally deleted content, salvaging corrupted media
Purpose: Digital forensics tool for extracting
information
Skill Level:
Advanced
Cost:
Free
Platform: Windows, macOS,
Linux
Open Source:
Yes
Website: https://github.com/simsong/bulk_extractor
What It Does: Scans disk images and extracts emails, credit cards, URLs, etc. without mounting filesystem.
Best For: Analyzing large datasets, finding specific types of information
Purpose: Research photo management and
annotation
Skill Level:
Beginner
Cost:
Free
Platform: Windows, macOS,
Linux
Open Source:
Yes
Website: https://tropy.org/
What It Does: Organize, annotate, and tag research photos. Add metadata, transcribe, and create citations.
Best For: Managing archival photos, research collections
Purpose: Web publishing platform for digital
collections
Skill Level:
Intermediate
Cost: Free (self-hosted) or Paid
(Omeka.net)
Platform: Web-based
(PHP)
Open Source:
Yes
Website: https://omeka.org/
What It Does: Create online exhibits and digital archives with rich metadata.
Features:
Dublin Core metadata
Exhibit builder
Item management
Public/private collections
Best For: Small museums, libraries, digital humanities projects
Purpose: Museum and archival collections
management
Skill Level:
Advanced
Cost:
Free
Platform: Web-based
(PHP)
Open Source:
Yes
Website: https://collectiveaccess.org/
What It Does: Full-featured collections management system with cataloging, media management, and public access.
Best For: Museums, archives, larger institutions
Purpose: eBook management
Skill
Level: Beginner
Cost:
Free
Platform: Windows, macOS,
Linux
Open Source:
Yes
Website: https://calibre-ebook.com/
What It Does: Organize, convert, and read eBooks. Powerful metadata editor.
Best For: Managing personal book collections, format conversion
Purpose: Self-hosted cloud storage
Skill
Level: Intermediate
Cost: Free
(self-hosted)
Platform: Web-based
(PHP)
Open Source:
Yes
Website: https://nextcloud.com/
What It Does: Personal cloud storage like Dropbox but self-hosted. Sync files across devices.
Features:
File sharing
Calendar/contacts
Collaborative editing
End-to-end encryption
Best For: Personal sovereignty, institutional storage
Purpose: Peer-to-peer file synchronization
Skill
Level: Beginner-Intermediate
Cost:
Free
Platform: Windows, macOS, Linux,
Android
Open Source:
Yes
Website: https://syncthing.net/
What It Does: Syncs files between devices without central server. True peer-to-peer.
Best For: Personal backups, distributed storage without cloud dependency
Purpose: Encrypted backup program
Skill
Level: Intermediate
Cost:
Free
Platform: Windows, macOS,
Linux
Open Source:
Yes
Website: https://restic.net/
What It Does: Fast, encrypted, deduplicated backups to local or cloud storage.
Best For: Secure long-term backups
Purpose: Encrypted backup with cloud support
Skill
Level: Beginner-Intermediate
Cost:
Free
Platform: Windows, macOS,
Linux
Open Source:
Yes
Website: https://www.duplicati.com/
What It Does: Encrypted backups to cloud storage (S3, Google Drive, Dropbox, etc.)
Best For: Encrypted cloud backups, scheduled backups
Purpose: Distributed file storage protocol
Skill
Level: Advanced
Cost:
Free
Platform: Cross-platform
Open
Source: Yes
Website: https://ipfs.tech/
What It Does: Content-addressed, peer-to-peer file system. Files stored across network, retrieved by hash.
Best For: Censorship-resistant storage, distributed archives
Challenges: Requires peers to "pin" content (host it) or content disappears
Purpose: Peer-to-peer file sharing
Skill
Level: Beginner
Cost:
Free
Platform: Windows, macOS,
Linux
Open Source: Yes
(qBittorrent)
Website: https://www.qbittorrent.org/
What It Does: Download and seed torrents. Distributed file sharing without central server.
Best For: Distributing large archives (GeoCities torrent), redundant preservation
Purpose: Peer-to-peer data sharing protocol
Skill
Level: Advanced
Cost:
Free
Platform: Cross-platform
Open
Source: Yes
Website: https://dat.foundation/
What It Does: Like BitTorrent but with versioning and live updates. Share datasets P2P.
Best For: Scientific datasets, collaborative archiving
Purpose: Content management system
Skill
Level: Beginner-Intermediate
Cost:
Free (software), hosting costs vary
Platform: Web-based
(PHP)
Open Source:
Yes
Website: https://wordpress.org/
What It Does: Build websites and blogs with full control. Powers 40%+ of the web.
Best For: Personal websites, institutional sites, blogs with custom domains
Purpose: Publishing platform for newsletters and
memberships
Skill Level: Beginner (hosted) to Intermediate
(self-hosted)
Cost: Freemium (Ghost Pro) or Free
(self-hosted)
Platform: Web-based
(Node.js)
Open Source:
Yes
Website: https://ghost.org/
What It Does: Blogging platform with built-in newsletter and membership features.
Best For: Writers who want sovereignty + monetization
Purpose: Generate static HTML websites from
markdown
Skill Level:
Intermediate
Cost:
Free
Platform: Cross-platform
Open
Source: Yes
Websites: https://gohugo.io/ | https://jekyllrb.com/
What It Does: Convert markdown files to HTML websites. No database, just files.
Best For: Fast, secure websites, GitHub Pages hosting, technical users
Purpose: Federated social networking
Skill
Level: Advanced (self-hosting), Beginner (joining
instance)
Cost: Free (software), hosting costs
vary
Platform: Web-based
(Ruby)
Open Source:
Yes
Website: https://joinmastodon.org/
What It Does: Twitter-like social media but federated. Anyone can run an instance.
Best For: Social networking with sovereignty, community hosting
Purpose: File format identification
Skill
Level: Intermediate
Cost:
Free
Platform: Windows, macOS, Linux
(Java)
Open Source:
Yes
Website: https://digital-preservation.github.io/droid/
What It Does: Identifies file formats and versions for preservation planning.
Best For: Surveying collections, format migration planning
Purpose: Digital preservation system
Skill
Level: Advanced
Cost:
Free
Platform: Linux
Open
Source: Yes
Website: https://www.archivematica.org/
What It Does: Complete digital preservation workflow from ingest to access. Creates OAIS-compliant preservation packages.
Best For: Institutions with professional archival requirements
Purpose: Distributed preservation network
Skill
Level: Advanced
Cost: Free
(software), membership costs vary
Platform:
Linux
Open Source:
Yes
Website: https://www.lockss.org/
What It Does: Creates distributed dark archives where institutions preserve copies of content.
Best For: Consortial preservation, journal archiving
Purpose: Documentation of rescue projects and
methods
Skill Level: All
levels
Cost:
Free
Website: https://wiki.archiveteam.org/
What It Provides: Guides for archiving specific platforms, tool documentation, community projects.
Best For: Learning web archiving, finding ongoing rescue projects
Purpose: Professional organization for digital
preservation
Skill Level: All
levels
Cost: Free resources, membership fees for
institutions
Website: https://www.dpconline.org/
What It Provides: Handbook, tech watch reports, training, community
Purpose: Government standards and guidance
Skill
Level: All levels
Cost:
Free
Website: https://www.loc.gov/preservation/digital/
What It Provides: Format recommendations, case studies, technical standards
Purpose: Academic paper search in preserved
collections
Skill Level:
Beginner
Cost:
Free
Website: https://scholar.archive.org/
What It Provides: Search 25+ million academic articles including deleted/paywalled content
Purpose: Web annotation tool
Skill
Level: Beginner
Cost:
Free
Platform: Web (browser
extension)
Open Source:
Yes
Website: https://hypothes.is/
What It Does: Collaborative annotation of web pages and PDFs.
Best For: Research collaboration, shared analysis of archived content
Purpose: Reference management
Skill
Level: Beginner
Cost: Free (with
storage limits)
Platform: Windows, macOS,
Linux
Open Source:
Yes
Website: https://www.zotero.org/
What It Does: Organize research sources, generate citations, share libraries.
Best For: Academic research, bibliography management
Purpose: Web scraping framework
Skill
Level: Advanced
Cost:
Free
Platform: Python
(cross-platform)
Open Source:
Yes
Website: https://scrapyd.readthedocs.io/
What It Does: Deploy and run web scraping spiders at scale.
Best For: Custom large-scale scraping projects
Purpose: Browser automation
Skill
Level: Advanced
Cost:
Free
Platform: Cross-platform
Open
Source: Yes
Website: https://www.selenium.dev/
What It Does: Automate web browser interactions for scraping JavaScript-heavy sites.
Best For: Archiving dynamic web apps, automated testing
Purpose: HTML/XML parsing library (Python)
Skill
Level: Intermediate
Cost:
Free
Platform: Python
(cross-platform)
Open Source:
Yes
Website: https://www.crummy.com/software/BeautifulSoup/
What It Does: Parse and extract data from HTML documents.
Best For: Custom scrapers, data extraction
Purpose: Metadata reading/writing for images
Skill
Level: Intermediate
Cost:
Free
Platform: Windows, macOS,
Linux
Open Source:
Yes
Website: https://exiftool.org/
What It Does: Read and write metadata in images, videos, PDFs.
Best For: Extracting metadata, adding preservation info to files
Purpose: Text analysis and visualization
Skill
Level: Beginner
Cost:
Free
Platform: Web-based
Open
Source: Yes
Website: https://voyant-tools.org/
What It Does: Analyze text corpuses with visualizations (word clouds, trends, etc.)
Best For: Analyzing preserved text collections, research
Purpose: Cloud VPS hosting
Skill
Level: Intermediate-Advanced
Cost:
Paid (~$5-50/month)
Platform: Cloud
infrastructure
Website: Various
What They Do: Provide virtual servers for hosting tools, websites, archives.
Best For: Self-hosting projects, institutional infrastructure
Purpose: Version control and code hosting
Skill
Level: Intermediate
Cost: Free
(with limits)
Platform:
Web-based
Open Source: GitLab self-hosted
is
Website: https://github.com | https://gitlab.com
What They Do: Host code repositories, documentation, static sites (GitHub Pages).
Best For: Collaborative development, documentation, portfolio
This toolkit represents the essential software infrastructure for Archaeobytological practice. Key principles:
Open Source First: Prioritize tools you control (no vendor lock-in)
Standard Formats: Use WARC, JSON, CSV, plain text (future-proof)
Redundancy: Multiple tools for critical functions (no single points of failure)
Learning Curve: Start with beginner tools, grow into advanced ones
Community: Join user communities (Archive Team, COPTR, DPC) to learn
The field evolves rapidly—new tools emerge, old ones are abandoned. Check the Archaeobytology Wiki regularly for updates.
Next Steps:
Install 3-5 tools from this list that match your current projects
Join Archive Team IRC or Discord to see tools in action
Contribute to tool documentation when you learn something useful
Tools are just means—the goal is preserving digital culture and building sovereign alternatives. Choose tools that serve those ends.
For Updates: Visit archaeobytology.org/tools for latest versions and community recommendations.