Ross Williams
|
310b4d1242
|
Add htmltotext extractor
Saves HTML text nodes and selected element attributes in
`htmltotext.txt` for each Snapshot. Primarily intended to be used
for search indexing.
|
2023-10-23 21:42:32 -04:00 |
|
Cristian
|
f6ce1de882
|
fix: archivebox version was being called as root
|
2020-10-27 09:15:14 -05:00 |
|
Cristian
|
e1d0b8bce7
|
feat: Initialize django at the beginning
|
2020-10-26 07:45:21 -05:00 |
|
Cristian
|
62ed11a5ca
|
fix: Improve headers handling
|
2020-09-24 12:55:51 -05:00 |
|
ttimasdf
|
e3329be291
|
tests: add test for mercury-parser
|
2020-09-22 18:44:12 -05:00 |
|
Cristian
|
8aa7b34de7
|
tests: Add readability to ignored methods in tests
|
2020-08-11 08:58:49 -05:00 |
|
Cristian
|
5429096c30
|
tests: Add mechanism to avoid using extractors that we are not testing
|
2020-08-04 08:42:30 -05:00 |
|
Cristian
|
d5fc13b34e
|
refactor: Move pytest fixtures to its own file
|
2020-07-07 08:36:58 -05:00 |
|