1
0
Fork 0
Commit graph

15 commits

Author SHA1 Message Date
Frédéric Guillot
df2bebaf3d Update scraper rule for heise.de 2018-08-25 10:33:18 -07:00
Frédéric Guillot
dbcc5d8a97 Use canonical imports 2018-08-24 21:56:39 -07:00
Frédéric Guillot
1eba1730d1 Move HTTP client to its own package 2018-04-28 10:51:07 -07:00
aniran
322b265d7a Scrape parent element for iframe
Current behavior: if you have an `iframe` scraper rule, `scrapContent`
tries to return the inner HTML of the `iframe`, which turns up blank.

New behavior: like `img` elements, if an `iframe` is matched by a scraper rule,
the parent element's inner HTML (i.e. the `iframe` is returned).
2018-04-27 17:57:22 -07:00
Frédéric Guillot
1d7fe892e1 Add scraper rule for darkreading.com 2018-01-06 13:25:12 -08:00
Frédéric Guillot
48aa0d07ef Add more scraper rules 2018-01-04 19:32:24 -08:00
Frédéric Guillot
3c3f397bf5 Make sure the scraper parse only HTML documents 2018-01-02 18:32:01 -08:00
Frédéric Guillot
c454f67037 Add scraper rules for version2.dk and ing.dk 2017-12-27 19:44:23 -08:00
Frédéric Guillot
d4839b5597 Add more scraper rules 2017-12-27 13:36:07 -08:00
Frédéric Guillot
1d8193b892 Add logger 2017-12-15 18:55:57 -08:00
Frédéric Guillot
c6d9eb3614 Improve content scraper 2017-12-13 21:30:40 -08:00
Frédéric Guillot
84d912c979 Rewrite imports 2017-12-12 21:48:13 -08:00
Frédéric Guillot
ef097f02fe Add the possibility to enable crawler for feeds 2017-12-12 19:19:36 -08:00
Frédéric Guillot
87ccad5c7f Add scraper rules 2017-12-10 20:51:04 -08:00
Frédéric Guillot
7a35c58f53 Add readability package to fetch original content 2017-12-10 19:01:38 -08:00