1
0
Fork 0

Update README.md

This commit is contained in:
Nick Sweeting 2019-03-19 01:20:34 -04:00 committed by GitHub
parent 53127fc469
commit a868dc320c
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -85,6 +85,10 @@ ArchiveBox imports a list of URLs from stdin, remote URL, or file, then adds the
Using multiple methods and the market-dominant browser to execute JS ensures we can save even the most complex, finicky websites in at least a few high-quality, long-term data formats. Using multiple methods and the market-dominant browser to execute JS ensures we can save even the most complex, finicky websites in at least a few high-quality, long-term data formats.
Archiving is additive so you can schedule `./archive` to [run regularly](https://github.com/pirate/ArchiveBox/wiki/Scheduled-Archiving) and pull new links into the index. For each link, it saves the first succesful snapshot (it will retry any that failed on the next run). Support for saving multiple snapshots of each site over time will be added soon (along with the ability to view diffs of the changes between runs).
All the archived links are stored by date bookmarked in `output/archive/<timestamp>`, and everything is indexed nicely with JSON & HTML files. The intent is for all the content to be viewable with common software in 50 - 100 years without needing to run ArchiveBox in a VM.
#### Can import links from many formats: #### Can import links from many formats:
```bash ```bash
@ -116,7 +120,7 @@ Using multiple methods and the market-dominant browser to execute JS ensures we
It does everything out-of-the-box by default, but you can disable or tweak [individual archive methods](https://github.com/pirate/ArchiveBox/wiki/Configuration) via environment variables or config file. It does everything out-of-the-box by default, but you can disable or tweak [individual archive methods](https://github.com/pirate/ArchiveBox/wiki/Configuration) via environment variables or config file.
#### Key features ## Key Features
- **Free & open source**, doesn't require signing up for anything, stores all data locally - **Free & open source**, doesn't require signing up for anything, stores all data locally
- **Few dependencies** and simple command line interface - **Few dependencies** and simple command line interface
@ -127,10 +131,6 @@ It does everything out-of-the-box by default, but you can disable or tweak [indi
- Can **run scripts during archiving** to scroll pages, close modals, expand comment threads, etc. - Can **run scripts during archiving** to scroll pages, close modals, expand comment threads, etc.
- Can also **mirror content to 3rd-party archiving services** automatically for redundancy - Can also **mirror content to 3rd-party archiving services** automatically for redundancy
The archiving is additive so you can schedule `./archive` to [run regularly](https://github.com/pirate/ArchiveBox/wiki/Scheduled-Archiving) and pull new links into the index. For each link, it saves the first succesful snapshot (it will retry any that failed on the next run). Support for saving multiple snapshots of each site over time will be added soon (along with the ability to view diffs of the changes between runs).
All the archived links are stored by date bookmarked in `output/archive/<timestamp>`, and everything is indexed nicely with JSON & HTML files. The intent is for all the content to be viewable with common software in 50 - 100 years without needing to run ArchiveBox in a VM.
## Background & Motivation ## Background & Motivation
Vast treasure troves of knowledge are lost every day on the internet to link rot. As a society, we have an imperative to preserve some important parts of that treasure, just like we preserve our books, paintings, and music in physical libraries long after the originals go out of print or fade into obscurity. Vast treasure troves of knowledge are lost every day on the internet to link rot. As a society, we have an imperative to preserve some important parts of that treasure, just like we preserve our books, paintings, and music in physical libraries long after the originals go out of print or fade into obscurity.