machine-learning/README.md

261 lines
12 KiB
Markdown
Raw Normal View History

2014-07-15 19:15:23 +00:00
A curated list of awesome machine learning frameworks, libraries and software (by language). Inspired by awesome-php.
2014-07-15 19:54:28 +00:00
If you want to contribute to this list, send me a pull request or contact me [@josephmisiti](https://www.twitter.com/josephmisiti)
2014-07-15 19:15:23 +00:00
## Python
2014-07-15 19:18:51 +00:00
#### Natural Language Processing
2014-07-15 19:15:23 +00:00
* [NLTK](http://www.nltk.org/) - A leading platform for building Python programs to work with human language data.
* [Pattern](http://www.clips.ua.ac.be/pattern) - A web mining module for the Python programming language. It has tools for natural language processing, machine learning, among others.
* [TextBlob](http://textblob.readthedocs.org/) - Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of NLTK and Pattern, and plays nicely with both.
* [jieba](https://github.com/fxsjy/jieba#jieba-1) - Chinese Words Segementation Utilities.
* [SnowNLP](https://github.com/isnowfy/snownlp) - A library for processing Chinese text.
* [loso](https://github.com/victorlin/loso) - Another Chinese segmentation library.
* [genius](https://github.com/duanhongyi/genius) - A Chinese segment base on Conditional Random Field.
2014-07-15 19:18:51 +00:00
#### General-Purpose Machine Learning
* [scikit-learn](http://scikit-learn.org/) - A Python module for machine learning built on top of SciPy.
* [pattern](https://github.com/clips/pattern) - Web mining module for Python.
* [NuPIC](https://github.com/numenta/nupic) - Numenta Platform for Intelligent Computing.
* [Pylearn2](https://github.com/lisa-lab/pylearn2) - A Machine Learning library based on [Theano](https://github.com/Theano/Theano).
* [hebel](https://github.com/hannes-brt/hebel) - GPU-Accelerated Deep Learning Library in Python.
* [gensim](https://github.com/piskvorky/gensim) - Topic Modelling for Humans.
* [PyBrain](https://github.com/pybrain/pybrain) - Another Python Machine Learning Library.
* [Crab](https://github.com/muricoca/crab) - A flexible, fast recommender engine.
* [python-recsys](https://github.com/ocelma/python-recsys) - A Python library for implementing a Recommender System.
2014-07-15 20:04:11 +00:00
* [BayesPy](https://github.com/maxsklar/BayesPy)
2014-07-15 19:18:51 +00:00
#### Data Analysis / Data Visualization
* [SciPy](http://www.scipy.org/) - A Python-based ecosystem of open-source software for mathematics, science, and engineering.
* [NumPy](http://www.numpy.org/) - A fundamental package for scientific computing with Python.
* [Numba](http://numba.pydata.org/) - Python JIT (just in time) complier to LLVM aimed at scientific Python by the developers of Cython and NumPy.
* [NetworkX](https://networkx.github.io/) - A high-productivity software for complex networks.
* [Pandas](http://pandas.pydata.org/) - A library providing high-performance, easy-to-use data structures and data analysis tools.
* [Open Mining](https://github.com/avelino/mining) - Business Intelligence (BI) in Python (Pandas web interface)
* [PyMC](https://github.com/pymc-devs/pymc) - Markov Chain Monte Carlo sampling toolkit.
* [zipline](https://github.com/quantopian/zipline) - A Pythonic algorithmic trading library.
* [PyDy](https://pydy.org/) - Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion based around NumPy, SciPy, IPython, and matplotlib.
* [SymPy](https://github.com/sympy/sympy) - A Python library for symbolic mathematics.
* [statsmodels](https://github.com/statsmodels/statsmodels) - Statistical modeling and econometrics in Python.
* [astropy](http://www.astropy.org/) - A community Python library for Astronomy.
* [matplotlib](http://matplotlib.org/) - A Python 2D plotting library.
* [bokeh](https://github.com/ContinuumIO/bokeh) - Interactive Web Plotting for Python.
* [plotly](https://plot.ly/python) - Collaborative web plotting for Python and matplotlib.
* [vincent](https://github.com/wrobstory/vincent) - A Python to Vega translator.
* [d3py](https://github.com/mikedewar/d3py) - A plottling library for Python, based on [D3.js](http://d3js.org/).
* [ggplot](https://github.com/yhat/ggplot) - Same API as ggplot2 for R.
* [Kartograph.py](https://github.com/kartograph/kartograph.py) - Rendering beautiful SVG maps in Python.
* [pygal](http://pygal.org/) - A Python SVG Charts Creator.
2014-07-15 23:06:28 +00:00
* [pycascading](https://github.com/twitter/pycascading)
2014-07-15 19:18:51 +00:00
2014-07-15 19:52:14 +00:00
#### Misc Scripts / iPython Notebooks
* [pattern_classification](https://github.com/rasbt/pattern_classification)
* [thinking stats 2](https://github.com/Wavelets/ThinkStats2)
* [hyperopt](https://github.com/hyperopt/hyperopt-sklearn)
* [numpic](https://github.com/numenta/nupic)
* [2012-paper-diginorm](https://github.com/ged-lab/2012-paper-diginorm)
* [ipython-notebooks](https://github.com/ogrisel/notebooks)
2014-07-15 20:04:11 +00:00
* [decision-weights](https://github.com/CamDavidsonPilon/decision-weights)
2014-07-15 19:52:14 +00:00
2014-07-15 19:18:51 +00:00
2014-07-15 19:15:23 +00:00
## Ruby
2014-07-15 19:24:19 +00:00
#### Natural Language Processing
* [Treat](https://github.com/louismullie/treat) - Text REtrieval and Annotation Toolkit, definitely the most comprehensive toolkit Ive encountered so far for Ruby
* [Ruby Linguistics](http://www.deveiate.org/projects/Linguistics/) - NLTK for Ruby
* [Stemmer](https://github.com/aurelian/ruby-stemmer)
* [Ruby Wordnet](http://www.deveiate.org/projects/Ruby-WordNet/)
* [Raspel](http://sourceforge.net/projects/raspell/)
* [UEA Stemmer](https://github.com/ealdent/uea-stemmer)
2014-07-15 23:06:28 +00:00
* [Twitter-text-rb](https://github.com/twitter/twitter-text-rb)
2014-07-15 19:30:25 +00:00
#### General-Purpose Machine Learning
2014-07-15 19:30:42 +00:00
* [Ruby Machine Learning](https://github.com/tsycho/ruby-machine-learning)
* [Machine Learning Ruby](https://github.com/mizoR/machine-learning-ruby)
* [jRuby Mahout](https://github.com/vasinov/jruby_mahout)
* [CardMagic-Classifier](https://github.com/cardmagic/classifier)
2014-07-15 19:30:25 +00:00
2014-07-15 19:30:42 +00:00
#### Data Analysis / Data Visualization
2014-07-15 19:30:25 +00:00
2014-07-15 19:31:01 +00:00
* [rsruby](https://github.com/alexgutteridge/rsruby)
* [data-visualization-ruby](https://github.com/chrislo/data_visualisation_ruby)
* [ruby-plot](https://www.ruby-toolbox.com/projects/ruby-plot)
* [plot-rb](https://github.com/zuhao/plotrb)
* [scruffy](http://www.rubyinside.com/scruffy-a-beautiful-graphing-toolkit-for-ruby-194.html)
2014-07-15 23:06:28 +00:00
* [SciRuby](http://sciruby.com/)
## Javascrpt
#### Natural Language Processing
* [Twitter-text-js](https://github.com/twitter/twitter-text-js)
* [NLP.js](https://github.com/nicktesla/nlpjs)
#### Data Analysis / Data Visualization
2014-07-15 23:06:48 +00:00
* [High Charts](http://www.highcharts.com/)
* [NVD3.js](http://nvd3.org/)
* [dc.js](http://dc-js.github.io/dc.js/)
* [chartjs](http://www.chartjs.org/)
* [dimple](http://dimplejs.org/)
* [amCharts](http://www.amcharts.com/)
2014-07-15 23:06:28 +00:00
#### General-Purpose Machine Learning
* [Convnet.js](http://cs.stanford.edu/people/karpathy/convnetjs/) [DEEP LEARNING]
* [Clustering.js](https://github.com/tixz/clustering.js)
* [Decision Trees](https://github.com/serendipious/nodejs-decision-tree-id3)
* [Node-fann](https://github.com/rlidwka/node-fann)
* [Kmeans.js](https://github.com/tixz/kmeans.js)
* [LDA.js](https://github.com/primaryobjects/lda)
* [Learning.js](https://github.com/yandongliu/learningjs)
* [Machine Learning](http://joonku.com/project/machine_learning)
* [Node-SVM](https://github.com/nicolaspanel/node-svm)
* [Brain](https://github.com/harthur/brain)
2014-07-15 19:30:25 +00:00
2014-07-15 19:15:23 +00:00
## Scala
2014-07-15 19:52:14 +00:00
#### Natural Language Processing
* TODO
#### Data Analysis / Data Visualization
2014-07-15 23:06:28 +00:00
* [Scalding](https://github.com/twitter/scalding)
* [Summing Bird](https://github.com/twitter/summingbird)
* [Algebird](https://github.com/twitter/algebird)
2014-07-15 19:52:14 +00:00
#### General-Purpose Machine Learning
* [Conjecture](https://github.com/etsy/Conjecture)
2014-07-15 19:15:23 +00:00
## Java
2014-07-15 19:40:45 +00:00
#### Natural Language Processing
* [CoreNLP] (http://nlp.stanford.edu/software/corenlp.shtml)
* [Stanford Parser] (http://nlp.stanford.edu/software/lex-parser.shtml)
* [Stanford POS Tagger] (http://nlp.stanford.edu/software/tagger.shtml)
* [Stanford Name Entity Recognizer] (http://nlp.stanford.edu/software/CRF-NER.shtml)
* [Stanford Word Segmenter] (http://nlp.stanford.edu/software/segmenter.shtml)
* [Tregex, Tsurgeon and Semgrex](http://nlp.stanford.edu/software/tregex.shtml)
* [Stanford Phrasal: A Phrase-Based Translation System](http://nlp.stanford.edu/software/phrasal/)
* [Stanford English Tokenizer](http://nlp.stanford.edu/software/tokenizer.shtml)
* [Stanford Tokens Regex](http://nlp.stanford.edu/software/tokensregex.shtml)
* [Stanford Temporal Tagger](http://nlp.stanford.edu/software/sutime.shtml)
* [Stanford SPIED](http://nlp.stanford.edu/software/patternslearning.shtml)
* [Stanford Topic Modeling Toolbox](http://nlp.stanford.edu/software/tmt/tmt-0.4/)
2014-07-15 23:06:28 +00:00
* [Twitter Text Java](https://github.com/twitter/twitter-text-java)
2014-07-15 19:40:45 +00:00
#### General-Purpose Machine Learning
* [Mahout](https://github.com/apache/mahout)
* [Stanford Classifier](http://nlp.stanford.edu/software/classifier.shtml)
#### Data Analysis / Data Visualization
* [Hadoop](https://github.com/apache/hadoop-mapreduce)
* [Spark](https://github.com/apache/spark)
* [Impala](https://github.com/cloudera/impala)
2014-07-15 19:15:23 +00:00
## Go
2014-07-15 19:52:14 +00:00
#### Natural Language Processing
* TODO
#### General-Purpose Machine Learning
* [Go Learn](https://github.com/sjwhitworth/golearn)
#### Data Analysis / Data Visualization
* TODO
## Matlab
#### Natural Language Processing
2014-07-15 23:06:48 +00:00
* [NLP](https://amplab.cs.berkeley.edu/2012/05/05/an-nlp-library-for-matlab/)
2014-07-15 19:52:14 +00:00
#### General-Purpose Machine Learning
2014-07-15 23:06:48 +00:00
* [Training a deep autoencoder or a classifier
on MNIST digits](http://www.cs.toronto.edu/~hinton/MatlabForSciencePaper.html) [DEEP LEARNING]
* [t-Distributed Stochastic Neighbor Embedding](http://homepage.tudelft.nl/19j49/t-SNE.html)
* [Spider](http://people.kyb.tuebingen.mpg.de/spider/)
* [LibSVM](http://www.csie.ntu.edu.tw/~cjlin/libsvm/#matlab)
* [LibLinear](http://www.csie.ntu.edu.tw/~cjlin/liblinear/#download)
2014-07-15 19:52:14 +00:00
#### Data Analysis / Data Visualization
2014-07-15 23:06:48 +00:00
* [matlab_gbl](https://www.cs.purdue.edu/homes/dgleich/packages/matlab_bgl/)
* [gamic](http://www.mathworks.com/matlabcentral/fileexchange/24134-gaimc---graph-algorithms-in-matlab-code)
2014-07-15 19:15:23 +00:00
## Julia
2014-07-15 20:04:11 +00:00
#### General-Purpose Machine Learning
2014-07-15 19:52:14 +00:00
2014-07-15 20:04:11 +00:00
* [PGM](https://github.com/JuliaStats/PGM.jl)
* [DA](https://github.com/trthatcher/DA.jl)
* [Regression](https://github.com/lindahua/Regression.jl)
2014-07-15 20:04:29 +00:00
* [Local Regression](https://github.com/dcjones/Loess.jl)
* [Naive Bayes](https://github.com/nutsiepully/NaiveBayes.jl)
* [Mixed Models](https://github.com/dmbates/MixedModels.jl)
* [Simple MCMC](https://github.com/fredo-dedup/SimpleMCMC.jl)
* [Distance](https://github.com/JuliaStats/Distance.jl)
* [Decision Tree](https://github.com/bensadeghi/DecisionTree.jl)
* [Neural](https://github.com/compressed/neural.jl)
2014-07-15 20:11:30 +00:00
* [MCMC](https://github.com/doobwa/MCMC.jl)
* [GLM](https://github.com/JuliaStats/GLM.jl)
* [Online Learning](https://github.com/lendle/OnlineLearning.jl)
* [GLMNet](https://github.com/simonster/GLMNet.jl)
* [Clustering](https://github.com/JuliaStats/Clustering.jl)
* [SVM](https://github.com/JuliaStats/SVM.jl)
2014-07-15 23:06:28 +00:00
* [Kernal Density](https://github.com/JuliaStats/KernelDensity.jl)
* [Dimensionality Reduction](https://github.com/JuliaStats/DimensionalityReduction.jl)
* [NMF](https://github.com/JuliaStats/NMF.jl)
2014-07-15 20:11:30 +00:00
2014-07-15 20:04:11 +00:00
#### Natural Language Processing
2014-07-15 19:52:14 +00:00
2014-07-15 20:04:29 +00:00
* [Topic Models](https://github.com/slycoder/TopicModels.jl)
2014-07-15 20:11:30 +00:00
* [Text Analysis](https://github.com/johnmyleswhite/TextAnalysis.jl)
2014-07-15 20:04:29 +00:00
2014-07-15 19:52:14 +00:00
#### Data Analysis / Data Visualization
2014-07-15 20:04:11 +00:00
* [Graph Layout](https://github.com/IainNZ/GraphLayout.jl)
* [Data Frames Meta](https://github.com/JuliaStats/DataFramesMeta.jl)
* [Julia Data](https://github.com/nfoti/JuliaData)
* [Data Read](https://github.com/WizardMac/DataRead.jl)
2014-07-15 20:04:29 +00:00
* [Hypothesis Tests](https://github.com/JuliaStats/HypothesisTests.jl)
2014-07-15 20:11:30 +00:00
* [Gladfly](https://github.com/dcjones/Gadfly.jl)
* [Stats](https://github.com/johnmyleswhite/stats.jl)
* [RDataSets](https://github.com/johnmyleswhite/RDatasets.jl)
* [DataFrames](https://github.com/JuliaStats/DataFrames.jl)
* [Distributions](https://github.com/JuliaStats/Distributions.jl)
* [Data Arrays](https://github.com/JuliaStats/DataArrays.jl)
* [Time Series](https://github.com/JuliaStats/TimeSeries.jl)
* [Sampling](https://github.com/JuliaStats/Sampling.jl)
2014-07-15 19:52:14 +00:00
2014-07-15 20:11:30 +00:00
#### Misc Stuff / Presentations
2014-07-15 19:52:14 +00:00
* [JuliaCon Presentations](https://github.com/JuliaCon/presentations)
2014-07-15 20:11:30 +00:00
* [SignalProcessing](https://github.com/davidavdav/SignalProcessing)
* [Images](https://github.com/timholy/Images.jl)
2014-07-15 19:15:23 +00:00
2014-07-15 19:20:31 +00:00
## Credits
2014-07-15 19:15:23 +00:00
2014-07-15 19:20:31 +00:00
* Some of the python libraries were cut-and-pasted from [vinta](https://github.com/vinta/awesome-python)
2014-07-15 19:15:23 +00:00