1
0
Fork 0
mirror of https://github.com/iipc/awesome-web-archiving.git synced 2024-10-30 12:13:56 -04:00

Added warc-safe to list (#148)

This commit is contained in:
lasztoth 2024-05-06 14:26:07 +02:00 committed by GitHub
parent 8e713a4388
commit 99241ae461
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -153,6 +153,7 @@ This list of tools and software is intended to briefly describe some of the most
* [Warchaeology](https://nlnwa.github.io/warchaeology/) - Warchaeology is a collection of tools for inspecting, manipulating, deduplicating and validating WARC-files. *Stable*
* [warcdb](https://github.com/florents-Tselai/warcdb) - A command line utility (Python) for importing WARC files into a SQLite database. *(Stable)*
* [warcdedupe](https://gitlab.com/taricorp/warcdedupe) - WARC deduplication tool (and WARC library) written in Rust. (In Development)
* [warc-safe](https://github.com/natliblux/warc-safe) - Automatic detection of viruses and NSFW content in WARC files.
* [WarcPartitioner](https://github.com/helgeho/WarcPartitioner) - Partition (W)ARC Files by MIME Type and Year. *(Stable)*
* [warcrefs](https://github.com/arcalex/warcrefs) - Web archive deduplication tools. *Stable*
* [webarchive-indexing](https://github.com/ikreymer/webarchive-indexing) - Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.