Steal the Internet - Archiving Everything and Backing Up Your Investigations

Steal the Internet

Archiving Everything and Backing Up Your Investigations

Follow along at https://aramzs.me/stealit

Hi! 👋 I'm Aram Zucker-Scharff (@chronotope.aramzs.xyz)

Every time you visit a website, it downloads itself to your computer...

You should get to keep it.

Websites die, get taken offline, or get removed to hide the truth.

Between 2013 and 2024 66.5% of the links on the web went dead.

66.5%

2014: 70% of the links within legal journals and 50% of the URLs from Supreme Court decisions did not contain the originally cited material.

2012: 30% of social media links were dead within two years.

The Trump administration has been deleting things off the web left and right.

And no university seems able to keep their own pages online.

What about The Internet Archive?

People can request takedowns from the archive, or block the archive's crawlers

What about The Internet Archive?

And The Internet Archive is in danger...

Help The Internet Archive if you can!

but don't rely on it to be the only solution.

The archive can use your donations

Do not use archive dot is, the service does not persist effectively and isn't trustworthy.

So if you want to make sure a citation or a source is available


Archive it yourself!

Meet Conifer!

Let's quickly try Conifer out

Click here
Browsertrix is a high-powered tool for caching tons of pages at once.
You can share the collections you've created.

A quick Browsertrix demo

Web Recorder puts archive tools in your browser

Web Recorder's browser extension will archive pages as you look at them and allow you to replay them and export them.

Once you've created an archive you can replay it, share it and store it.

The files created are WARC and WARCZ files

Web ARChive files are the main way web site archives are built, stored and made transportable. They can be played back and explored.

You can embed re-playable web page archives in other pages, like your articles.

Here's how.

Learn more about how web archives and WARC files work.

Let's try out Web Recorder

Got WordPress? You can save the text of webpages with a nifty plugin:

PressForward

Pressforward creates text-only archives, and allows you to subscribe to RSS feeds and collaborate with others

Let's take a look

ArchiveBox is a really useful tool to host your very own web archive site.

You can use ArchiveBox to archive groups of pages, whole sites, sitemaps or RSS feeds. It also makes copies of videos with yt-dlp.

You can also make a copy of the files it creates.

Or use it to create a static website with all your archives.

Let's try it out!

Archives should be:

  • portable
  • sharable
  • re-playable
  • able to be taken offline
  • marked with useful metadata

They are invaluable to hold for reporting on web-based information.

Want to help the archive?

Archive Team is a really cool collective of volunteers who do work on their own to save the history of the internet and if you have a little extra cash, maybe support their efforts!

if you're interested in how they do it, it is all open source, check that out too! You could even participate!
Check out their work!

Worksheet

More Resources

Check out this great zine on web archiving techniques and tools.

It has tips on how best to use Web Archiver and Browsertrix and overall best practices.

Here's my blog post that was the beginning of this presentation.

Steal this site at https://aramzs.me/stealit

Make your own copy from its open source code!

Donate to The Internet Archive, if you can.

Steal the Internet by Aram Zucker-Scharff is licensed under CC BY 4.0