A curated list of awesome machine learning frameworks, libraries and software (by language). Inspired by awesome-php. If you want to contribute to this list (please do), send me a pull request or contact me [@josephmisiti](https://www.twitter.com/josephmisiti) ## Python #### Natural Language Processing * [NLTK](http://www.nltk.org/) - A leading platform for building Python programs to work with human language data. * [Pattern](http://www.clips.ua.ac.be/pattern) - A web mining module for the Python programming language. It has tools for natural language processing, machine learning, among others. * [TextBlob](http://textblob.readthedocs.org/) - Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of NLTK and Pattern, and plays nicely with both. * [jieba](https://github.com/fxsjy/jieba#jieba-1) - Chinese Words Segmentation Utilities. * [SnowNLP](https://github.com/isnowfy/snownlp) - A library for processing Chinese text. * [loso](https://github.com/victorlin/loso) - Another Chinese segmentation library. * [genius](https://github.com/duanhongyi/genius) - A Chinese segment base on Conditional Random Field. #### General-Purpose Machine Learning * [scikit-learn](http://scikit-learn.org/) - A Python module for machine learning built on top of SciPy. * * [OpenCV](http://opencv.org) - OpenCV has C++, C, Python, Java and MATLAB interfaces and supports Windows, Linux, Android and Mac OS. It has C++, C, Python, Java and MATLAB interfaces and supports Windows, Linux, Android and Mac OS. * [BigML](https://bigml.com) - A library that contacts external servers. * [pattern](https://github.com/clips/pattern) - Web mining module for Python. * [NuPIC](https://github.com/numenta/nupic) - Numenta Platform for Intelligent Computing. * [Pylearn2](https://github.com/lisa-lab/pylearn2) - A Machine Learning library based on [Theano](https://github.com/Theano/Theano). * [hebel](https://github.com/hannes-brt/hebel) - GPU-Accelerated Deep Learning Library in Python. * [gensim](https://github.com/piskvorky/gensim) - Topic Modelling for Humans. * [PyBrain](https://github.com/pybrain/pybrain) - Another Python Machine Learning Library. * [Crab](https://github.com/muricoca/crab) - A flexible, fast recommender engine. * [python-recsys](https://github.com/ocelma/python-recsys) - A Python library for implementing a Recommender System. * [BayesPy](https://github.com/maxsklar/BayesPy) #### Data Analysis / Data Visualization * [SciPy](http://www.scipy.org/) - A Python-based ecosystem of open-source software for mathematics, science, and engineering. * [NumPy](http://www.numpy.org/) - A fundamental package for scientific computing with Python. * [Numba](http://numba.pydata.org/) - Python JIT (just in time) complier to LLVM aimed at scientific Python by the developers of Cython and NumPy. * [NetworkX](https://networkx.github.io/) - A high-productivity software for complex networks. * [Pandas](http://pandas.pydata.org/) - A library providing high-performance, easy-to-use data structures and data analysis tools. * [Open Mining](https://github.com/avelino/mining) - Business Intelligence (BI) in Python (Pandas web interface) * [PyMC](https://github.com/pymc-devs/pymc) - Markov Chain Monte Carlo sampling toolkit. * [zipline](https://github.com/quantopian/zipline) - A Pythonic algorithmic trading library. * [PyDy](https://pydy.org/) - Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion based around NumPy, SciPy, IPython, and matplotlib. * [SymPy](https://github.com/sympy/sympy) - A Python library for symbolic mathematics. * [statsmodels](https://github.com/statsmodels/statsmodels) - Statistical modeling and econometrics in Python. * [astropy](http://www.astropy.org/) - A community Python library for Astronomy. * [matplotlib](http://matplotlib.org/) - A Python 2D plotting library. * [bokeh](https://github.com/ContinuumIO/bokeh) - Interactive Web Plotting for Python. * [plotly](https://plot.ly/python) - Collaborative web plotting for Python and matplotlib. * [vincent](https://github.com/wrobstory/vincent) - A Python to Vega translator. * [d3py](https://github.com/mikedewar/d3py) - A plottling library for Python, based on [D3.js](http://d3js.org/). * [ggplot](https://github.com/yhat/ggplot) - Same API as ggplot2 for R. * [Kartograph.py](https://github.com/kartograph/kartograph.py) - Rendering beautiful SVG maps in Python. * [pygal](http://pygal.org/) - A Python SVG Charts Creator. * [pycascading](https://github.com/twitter/pycascading) #### Misc Scripts / iPython Notebooks * [pattern_classification](https://github.com/rasbt/pattern_classification) * [thinking stats 2](https://github.com/Wavelets/ThinkStats2) * [hyperopt](https://github.com/hyperopt/hyperopt-sklearn) * [numpic](https://github.com/numenta/nupic) * [2012-paper-diginorm](https://github.com/ged-lab/2012-paper-diginorm) * [ipython-notebooks](https://github.com/ogrisel/notebooks) * [decision-weights](https://github.com/CamDavidsonPilon/decision-weights) ## Ruby #### Natural Language Processing * [Treat](https://github.com/louismullie/treat) - Text REtrieval and Annotation Toolkit, definitely the most comprehensive toolkit I’ve encountered so far for Ruby * [Ruby Linguistics](http://www.deveiate.org/projects/Linguistics/) - Linguistics is a framework for building linguistic utilities for Ruby objects in any language. It includes a generic language-independant front end, a module for mapping language codes into language names, and a module which contains various English-language utilities. * [Stemmer](https://github.com/aurelian/ruby-stemmer) - Expose libstemmer_c to Ruby * [Ruby Wordnet](http://www.deveiate.org/projects/Ruby-WordNet/) - This library is a Ruby interface to WordNet * [Raspel](http://sourceforge.net/projects/raspell/) - raspell is an interface binding for ruby * [UEA Stemmer](https://github.com/ealdent/uea-stemmer) - Ruby port of UEALite Stemmer - a conservative stemmer for search and indexing * [Twitter-text-rb](https://github.com/twitter/twitter-text-rb) - A library that does auto linking and extraction of usernames, lists and hashtags in tweets #### General-Purpose Machine Learning * [Ruby Machine Learning](https://github.com/tsycho/ruby-machine-learning) - Some Machine Learning algorithms, implemented in Ruby * [Machine Learning Ruby](https://github.com/mizoR/machine-learning-ruby) * [jRuby Mahout](https://github.com/vasinov/jruby_mahout) - JRuby Mahout is a gem that unleashes the power of Apache Mahout in the world of JRuby. * [CardMagic-Classifier](https://github.com/cardmagic/classifier) - A general classifier module to allow Bayesian and other types of classifications. * [Neural Networks and Deep Learning](https://github.com/mnielsen/neural-networks-and-deep-learning) - Code samples for my book "Neural Networks and Deep Learning" [DEEP LEARNING] #### Data Analysis / Data Visualization * [rsruby](https://github.com/alexgutteridge/rsruby) - Ruby - R bridge * [data-visualization-ruby](https://github.com/chrislo/data_visualisation_ruby) - Source code and supporting content for my Ruby Manor presentation on Data Visualisation with Ruby * [ruby-plot](https://www.ruby-toolbox.com/projects/ruby-plot) - gnuplot wrapper for ruby, especially for plotting roc curves into svg files * [plot-rb](https://github.com/zuhao/plotrb) - A plotting library in Ruby built on top of Vega and D3. * [scruffy](http://www.rubyinside.com/scruffy-a-beautiful-graphing-toolkit-for-ruby-194.html) - A beautiful graphing toolkit for Ruby * [SciRuby](http://sciruby.com/) * [Glean](https://github.com/glean/glean) - A data management tool for humans * [Bioruby](https://github.com/bioruby/bioruby) * [Arel](https://github.com/nkallen/arel) #### Misc * [Big Data For Chimps](https://github.com/infochimps-labs/big_data_for_chimps) ## R #### General-Purpose Machine Learning * [Clever Algorithms For Machine Learning](https://github.com/jbrownlee/CleverAlgorithmsMachineLearning) * [Machine Learning For Hackers](https://github.com/johnmyleswhite/ML_for_Hackers) #### Data Analysis / Data Visualization * [Learning Statistics Using R](http://health.adelaide.edu.au/psychology/ccs/teaching/lsr/) ## Javascript #### Natural Language Processing * [Twitter-text-js](https://github.com/twitter/twitter-text-js) - A JavaScript implementation of Twitter's text processing library * [NLP.js](https://github.com/nicktesla/nlpjs) - NLP utilities in javascript and coffeescript #### Data Analysis / Data Visualization * [High Charts](http://www.highcharts.com/) * [NVD3.js](http://nvd3.org/) * [dc.js](http://dc-js.github.io/dc.js/) * [chartjs](http://www.chartjs.org/) * [dimple](http://dimplejs.org/) * [amCharts](http://www.amcharts.com/) #### General-Purpose Machine Learning * [Convnet.js](http://cs.stanford.edu/people/karpathy/convnetjs/) - ConvNetJS is a Javascript library for training Deep Learning models[DEEP LEARNING] * [Clustering.js](https://github.com/tixz/clustering.js) - Clustering algorithms implemented in Javascript for Node.js and the browser * [Decision Trees](https://github.com/serendipious/nodejs-decision-tree-id3) - NodeJS Implementation of Decision Tree using ID3 Algorithm * [Node-fann](https://github.com/rlidwka/node-fann) - FANN (Fast Artificial Neural Network Library) bindings for Node.js * [Kmeans.js](https://github.com/tixz/kmeans.js) - Simple Javascript implementation of the k-means algorithm, for node.js and the browser * [LDA.js](https://github.com/primaryobjects/lda) - LDA topic modeling for node.js * [Learning.js](https://github.com/yandongliu/learningjs) - Javascript implementation of logistic regression/c4.5 decision tree * [Machine Learning](http://joonku.com/project/machine_learning) - Machine learning library for Node.js * [Node-SVM](https://github.com/nicolaspanel/node-svm) - Support Vector Machine for nodejs * [Brain](https://github.com/harthur/brain) - Neural networks in JavaScript ## Scala #### Natural Language Processing * [ScalaNLP](http://www.scalanlp.org/) - ScalaNLP is a suite of machine learning and numerical computing libraries. * [Breeze](https://github.com/scalanlp/breeze) - Breeze is a numerical processing library for Scala. * [Chalk](https://github.com/scalanlp/chalk) - Chalk is a natural language processing library. * [FACTORIE](https://github.com/factorie/factorie) - FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference. #### Data Analysis / Data Visualization * [Scalding](https://github.com/twitter/scalding) - A Scala API for Cascading * [Summing Bird](https://github.com/twitter/summingbird) - Streaming MapReduce with Scalding and Storm * [Algebird](https://github.com/twitter/algebird) - Abstract Algebra for Scala * [xerial](https://github.com/xerial/xerial) - Data management utilities for Scala * [simmer](https://github.com/avibryant/simmer) - Reduce your data. A unix filter for algebird-powered aggregation. * [PredictionIO](https://github.com/PredictionIO/PredictionIO) - PredictionIO, a machine learning server for software developers and data engineers. #### General-Purpose Machine Learning * [Conjecture](https://github.com/etsy/Conjecture) - Scalable Machine Learning in Scalding * [brushfire](https://github.com/avibryant/brushfire) - decision trees for scalding * [ganitha](https://github.com/tresata/ganitha) - scalding powered machine learning * [adam](https://github.com/bigdatagenomics/adam) - A genomics processing engine and specialized file format built using Apache Avro, Apache Spark and Parquet. Apache 2 licensed. * [bioscala](https://github.com/bioscala/bioscala) - Bioinformatics for the Scala programming language ## Java #### Natural Language Processing * [CoreNLP] (http://nlp.stanford.edu/software/corenlp.shtml) - Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words * [Stanford Parser] (http://nlp.stanford.edu/software/lex-parser.shtml) - A natural language parser is a program that works out the grammatical structure of sentences * [Stanford POS Tagger] (http://nlp.stanford.edu/software/tagger.shtml) - A Part-Of-Speech Tagger (POS Tagger * [Stanford Name Entity Recognizer] (http://nlp.stanford.edu/software/CRF-NER.shtml) - Stanford NER is a Java implementation of a Named Entity Recognizer. * [Stanford Word Segmenter] (http://nlp.stanford.edu/software/segmenter.shtml) - Tokenization of raw text is a standard pre-processing step for many NLP tasks. * [Tregex, Tsurgeon and Semgrex](http://nlp.stanford.edu/software/tregex.shtml) - Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for "tree regular expressions"). * [Stanford Phrasal: A Phrase-Based Translation System](http://nlp.stanford.edu/software/phrasal/) * [Stanford English Tokenizer](http://nlp.stanford.edu/software/tokenizer.shtml) - Stanford Phrasal is a state-of-the-art statistical phrase-based machine translation system, written in Java. * [Stanford Tokens Regex](http://nlp.stanford.edu/software/tokensregex.shtml) - A tokenizer divides text into a sequence of tokens, which roughly correspond to "words" * [Stanford Temporal Tagger](http://nlp.stanford.edu/software/sutime.shtml) - SUTime is a library for recognizing and normalizing time expressions. * [Stanford SPIED](http://nlp.stanford.edu/software/patternslearning.shtml) - Learning entities from unlabeled text starting with seed sets using patterns in an iterative fashion * [Stanford Topic Modeling Toolbox](http://nlp.stanford.edu/software/tmt/tmt-0.4/) - Topic modeling tools to social scientists and others who wish to perform analysis on datasets * [Twitter Text Java](https://github.com/twitter/twitter-text-java) - A Java implementation of Twitter's text processing library * [MALLET](http://mallet.cs.umass.edu/) - A Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text. * [OpenNLP](https://opennlp.apache.org/) - a machine learning based toolkit for the processing of natural language text. * [LingPipe](http://alias-i.com/lingpipe/index.html) - A tool kit for processing text using computational linguistics. #### General-Purpose Machine Learning * [Mahout](https://github.com/apache/mahout) - Distributed machine learning * [Stanford Classifier](http://nlp.stanford.edu/software/classifier.shtml) - A classifier is a machine learning tool that will take data items and place them into one of k classes. * [Weka](http://www.cs.waikato.ac.nz/ml/weka/) - Weka is a collection of machine learning algorithms for data mining tasks #### Data Analysis / Data Visualization * [Hadoop](https://github.com/apache/hadoop-mapreduce) - Hadoop/HDFS * [Spark](https://github.com/apache/spark) - Spark is a fast and general engine for large-scale data processing. * [Impala](https://github.com/cloudera/impala) - Real-time Query for Hadoop ## Go #### Natural Language Processing * [go-porterstemmer](https://github.com/reiver/go-porterstemmer) - A native Go clean room implementation of the Porter Stemming algorithm. * [paicehusk](https://github.com/Rookii/paicehusk) - Golang implementation of the Paice/Husk Stemming Algorithm * [snowball](https://bitbucket.org/tebeka/snowball) - Snowball Stemmer for Go. #### General-Purpose Machine Learning * [Go Learn](https://github.com/sjwhitworth/golearn) - Machine Learning for Go * [go-pr](https://github.com/daviddengcn/go-pr) - Pattern recognition package in Go lang. * [bayesian](https://github.com/jbrukh/bayesian) - Naive Bayesian Classification for Golang. * [go-galib](https://github.com/thoj/go-galib) - Genetic Algorithms library written in Go / golang #### Data Analysis / Data Visualization * [go-graph](https://github.com/StepLg/go-graph) - Graph library for Go/golang language. * [SVGo](http://www.svgopen.org/2011/papers/34-SVGo_a_Go_Library_for_SVG_generation/) - The Go Language library for SVG generation ## Matlab #### Natural Language Processing * [NLP](https://amplab.cs.berkeley.edu/2012/05/05/an-nlp-library-for-matlab/) - An NLP library for Matlab #### General-Purpose Machine Learning * [Training a deep autoencoder or a classifier on MNIST digits](http://www.cs.toronto.edu/~hinton/MatlabForSciencePaper.html) - Training a deep autoencoder or a classifier on MNIST digits[DEEP LEARNING] * [t-Distributed Stochastic Neighbor Embedding](http://homepage.tudelft.nl/19j49/t-SNE.html) - t-Distributed Stochastic Neighbor Embedding (t-SNE) is a (prize-winning) technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets. * [Spider](http://people.kyb.tuebingen.mpg.de/spider/) - The spider is intended to be a complete object orientated environment for machine learning in Matlab. * [LibSVM](http://www.csie.ntu.edu.tw/~cjlin/libsvm/#matlab) - A Library for Support Vector Machines * [LibLinear](http://www.csie.ntu.edu.tw/~cjlin/liblinear/#download) - A Library for Large Linear Classification #### Data Analysis / Data Visualization * [matlab_gbl](https://www.cs.purdue.edu/homes/dgleich/packages/matlab_bgl/) - MatlabBGL is a Matlab package for working with graphs. * [gamic](http://www.mathworks.com/matlabcentral/fileexchange/24134-gaimc---graph-algorithms-in-matlab-code) - Efficient pure-Matlab implementations of graph algorithms to complement MatlabBGL's mex functions. ## Julia #### General-Purpose Machine Learning * [PGM](https://github.com/JuliaStats/PGM.jl) - A Julia framework for probabilistic graphical models. * [DA](https://github.com/trthatcher/DA.jl) - Julia package for Regularized Discriminant Analysis * [Regression](https://github.com/lindahua/Regression.jl) - Algorithms for regression analysis (e.g. linear regression and logistic regression) * [Local Regression](https://github.com/dcjones/Loess.jl) - Local regression, so smooooth! * [Naive Bayes](https://github.com/nutsiepully/NaiveBayes.jl) - Simple Naive Bayes implementation in Julia * [Mixed Models](https://github.com/dmbates/MixedModels.jl) - A Julia package for fitting (statistical) mixed-effects models * [Simple MCMC](https://github.com/fredo-dedup/SimpleMCMC.jl) - basic mcmc sampler implemented in Julia * [Distance](https://github.com/JuliaStats/Distance.jl) - Julia module for Distance evaluation * [Decision Tree](https://github.com/bensadeghi/DecisionTree.jl) - Decision Tree Classifier and Regressor * [Neural](https://github.com/compressed/neural.jl) - A neural network in Julia * [MCMC](https://github.com/doobwa/MCMC.jl) - MCMC tools for Julia * [GLM](https://github.com/JuliaStats/GLM.jl) - Generalized linear models in Julia * [Online Learning](https://github.com/lendle/OnlineLearning.jl) * [GLMNet](https://github.com/simonster/GLMNet.jl) - Julia wrapper for fitting Lasso/ElasticNet GLM models using glmnet * [Clustering](https://github.com/JuliaStats/Clustering.jl) - Basic functions for clustering data: k-means, dp-means, etc. * [SVM](https://github.com/JuliaStats/SVM.jl) - SVM's for Julia * [Kernal Density](https://github.com/JuliaStats/KernelDensity.jl) - Kernel density estimators for julia * [Dimensionality Reduction](https://github.com/JuliaStats/DimensionalityReduction.jl) - Methods for dimensionality reduction * [NMF](https://github.com/JuliaStats/NMF.jl) - A Julia package for non-negative matrix factorization #### Natural Language Processing * [Topic Models](https://github.com/slycoder/TopicModels.jl) - TopicModels for Julia * [Text Analysis](https://github.com/johnmyleswhite/TextAnalysis.jl) - Julia package for text analysis #### Data Analysis / Data Visualization * [Graph Layout](https://github.com/IainNZ/GraphLayout.jl) - Graph layout algorithms in pure Julia * [Data Frames Meta](https://github.com/JuliaStats/DataFramesMeta.jl) - Metaprogramming tools for DataFrames * [Julia Data](https://github.com/nfoti/JuliaData) - library for working with tabular data in Julia * [Data Read](https://github.com/WizardMac/DataRead.jl) - Read files from Stata, SAS, and SPSS * [Hypothesis Tests](https://github.com/JuliaStats/HypothesisTests.jl) - Hypothesis tests for Julia * [Gladfly](https://github.com/dcjones/Gadfly.jl) - Crafty statistical graphics for Julia. * [Stats](https://github.com/johnmyleswhite/stats.jl) - Statistical tests for Julia * [RDataSets](https://github.com/johnmyleswhite/RDatasets.jl) - Julia package for loading many of the data sets available in R * [DataFrames](https://github.com/JuliaStats/DataFrames.jl) - library for working with tabular data in Julia * [Distributions](https://github.com/JuliaStats/Distributions.jl) - A Julia package for probability distributions and associated functions. * [Data Arrays](https://github.com/JuliaStats/DataArrays.jl) - Data structures that allow missing values * [Time Series](https://github.com/JuliaStats/TimeSeries.jl) - Time series toolkit for Julia * [Sampling](https://github.com/JuliaStats/Sampling.jl) - Basic sampling algorithms for Julia #### Misc Stuff / Presentations * [JuliaCon Presentations](https://github.com/JuliaCon/presentations) - Presentations for JuliaCon * [SignalProcessing](https://github.com/davidavdav/SignalProcessing) - Signal Processing tools for Julia * [Images](https://github.com/timholy/Images.jl) - An image library for Julia ## Credits * Some of the python libraries were cut-and-pasted from [vinta](https://github.com/vinta/awesome-python) * The few go reference I found where pulled from [this page](https://code.google.com/p/go-wiki/wiki/Projects#Machine_Learning)