From d52ea10139e6a9f3e7d3c8d3afeb530f906dfc53 Mon Sep 17 00:00:00 2001 From: Vinta Date: Tue, 5 Aug 2014 22:10:47 +0800 Subject: [PATCH] add textract to Web Content Extracting section --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 0eaecb1..6b41f60 100644 --- a/README.md +++ b/README.md @@ -565,6 +565,7 @@ A curated list of awesome Python frameworks, libraries and software. Inspired by * [Haul](https://github.com/vinta/Haul) - An Extensible Image Crawler. * [python-readability](https://github.com/buriy/python-readability) - Fast Python port of arc90's readability tool. * [opengraph](https://github.com/erikriver/opengraph) - A Python module to parse the Open Graph Protocol +* [textract](https://github.com/deanmalmgren/textract) - Extract text from any document, Word documents, PowerPoint presentations, PDFs, etc. ## Forms