add Specific Formats Processing section

This commit is contained in:
Vinta 2014-08-13 22:55:29 +08:00
parent 7d672d832b
commit 10e09daa89
1 changed files with 26 additions and 15 deletions

View File

@ -11,6 +11,7 @@ A curated list of awesome Python frameworks, libraries and software. Inspired by
- [Files](#files)
- [Date and Time](#date-and-time)
- [Text Processing](#text-processing)
- [Specific Formats Processing](#specific-formats-processing)
- [Natural Language Processing](#natural-language-processing)
- [Documentation](#documentation)
- [Configuration](#configuration)
@ -158,7 +159,7 @@ A curated list of awesome Python frameworks, libraries and software. Inspired by
## Text Processing
*Libraries for parsing and manipulating texts.*
*Libraries for parsing and manipulating plain texts.*
* General
* [difflib](https://docs.python.org/2/library/difflib.html) - (Python standard library) Helpers for computing deltas.
@ -166,25 +167,15 @@ A curated list of awesome Python frameworks, libraries and software. Inspired by
* [fuzzywuzzy](https://github.com/seatgeek/fuzzywuzzy) - Fuzzy String Matching.
* [esmre](https://code.google.com/p/esmre/) - Regular expression accelerator.
* [shortuuid](https://github.com/stochastic-technologies/shortuuid) - A generator library for concise, unambiguous and URL-safe UUIDs.
* [awesome-slugify](https://github.com/dimka665/awesome-slugify) - A Python slugify library that can preserve unicode.
* [python-slugify](https://github.com/un33k/python-slugify) - A Python slugify library that translates unicode to ASCII.
* [unicode-slugify](https://github.com/mozilla/unicode-slugify) - A slugifier that generates unicode slugs with Django as a dependency.
* [unidecode](https://pypi.python.org/pypi/Unidecode) - ASCII transliterations of Unicode text.
* [chardet](https://github.com/chardet/chardet) - Python 2/3 compatible character encoding detector.
* [xpinyin](https://github.com/lxneng/xpinyin) - A library to translate Chinese hanzi (漢字) to pinyin (拼音).
* [pangu.py](https://github.com/vinta/pangu.py) - Spacing texts for CJK and alphanumerics.
* [pyfiglet](https://github.com/pwaller/pyfiglet) - An implementation of figlet written in Python.
* Specific Formats
* [tablib](https://github.com/kennethreitz/tablib) - A module for Tabular Datasets in XLS, CSV, JSON, YAML.
* [python-docx](https://github.com/mikemaccana/python-docx) - Reads, queries and modifies Microsoft Word 2007/2008 docx files.
* [xlwt](https://github.com/python-excel/xlwt) / [xlrd](https://github.com/python-excel/xlrd) - Packages is for writing and reading data and formatting information from Excel files.
* [XlsxWriter](https://xlsxwriter.readthedocs.org/) - A Python module for creating Excel .xlsx files.
* [mm](https://github.com/brianray/mm) - Python powered spreadsheets.
* [PDFMiner](https://github.com/euske/pdfminer) - A tool for extracting information from PDF documents.
* [PyPDF2](https://github.com/mstamy2/PyPDF2) - A pure Python PDF library capable of splitting, merging and transforming PDF pages.
* [Python-Markdown](https://github.com/waylan/Python-Markdown) - A Python implementation of John Grubers Markdown.
* [Mistune](https://github.com/lepture/mistune) - Fastest and full featured pure Python parsers of Markdown.
* [PyYAML](http://pyyaml.org/) - YAML implementations for Python.
* Slugify
* [awesome-slugify](https://github.com/dimka665/awesome-slugify) - A Python slugify library that can preserve unicode.
* [python-slugify](https://github.com/un33k/python-slugify) - A Python slugify library that translates unicode to ASCII.
* [unicode-slugify](https://github.com/mozilla/unicode-slugify) - A slugifier that generates unicode slugs with Django as a dependency.
* Parser
* [phonenumbers](https://github.com/daviddrysdale/python-phonenumbers) - Library for parsing, formatting, storing and validating international phone numbers.
* [python-user-agents](https://github.com/selwin/python-user-agents) - Browser user agent parser.
@ -193,6 +184,26 @@ A curated list of awesome Python frameworks, libraries and software. Inspired by
* [python-nameparser](https://github.com/derek73/python-nameparser) - A simple Python module for parsing human names into their individual components.
* [pyparsing](http://pyparsing.wikispaces.com/) - A general purpose framework for generating parsers.
## Specific Formats Processing
*Libraries for parsing and manipulating specific text formats.*
* General
* [tablib](https://github.com/kennethreitz/tablib) - A module for Tabular Datasets in XLS, CSV, JSON, YAML.
* Office
* [python-docx](https://github.com/mikemaccana/python-docx) - Reads, queries and modifies Microsoft Word 2007/2008 docx files.
* [xlwt](https://github.com/python-excel/xlwt) / [xlrd](https://github.com/python-excel/xlrd) - Packages is for writing and reading data and formatting information from Excel files.
* [XlsxWriter](https://xlsxwriter.readthedocs.org/) - A Python module for creating Excel .xlsx files.
* [Marmir](https://github.com/brianray/mm) - Takes Python data structures and turns them into spreadsheets.
* PDF
* [PDFMiner](https://github.com/euske/pdfminer) - A tool for extracting information from PDF documents.
* [PyPDF2](https://github.com/mstamy2/PyPDF2) - A pure Python PDF library capable of splitting, merging and transforming PDF pages.
* Markdown
* [Python-Markdown](https://github.com/waylan/Python-Markdown) - A Python implementation of John Grubers Markdown.
* [Mistune](https://github.com/lepture/mistune) - Fastest and full featured pure Python parsers of Markdown.
* YAML
* [PyYAML](http://pyyaml.org/) - YAML implementations for Python.
## Natural Language Processing
*Libraries for working with human languages.*