
301 lines
20 KiB
Raw Normal View History

2014-07-15 19:15:23 +00:00
A curated list of awesome machine learning frameworks, libraries and software (by language). Inspired by awesome-php.
2014-07-15 19:54:28 +00:00
If you want to contribute to this list, send me a pull request or contact me [@josephmisiti](
2014-07-15 19:15:23 +00:00
## Python
2014-07-15 19:18:51 +00:00
#### Natural Language Processing
2014-07-15 19:15:23 +00:00
* [NLTK]( - A leading platform for building Python programs to work with human language data.
* [Pattern]( - A web mining module for the Python programming language. It has tools for natural language processing, machine learning, among others.
* [TextBlob]( - Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of NLTK and Pattern, and plays nicely with both.
* [jieba]( - Chinese Words Segementation Utilities.
* [SnowNLP]( - A library for processing Chinese text.
* [loso]( - Another Chinese segmentation library.
* [genius]( - A Chinese segment base on Conditional Random Field.
2014-07-15 19:18:51 +00:00
#### General-Purpose Machine Learning
* [scikit-learn]( - A Python module for machine learning built on top of SciPy.
* [pattern]( - Web mining module for Python.
* [NuPIC]( - Numenta Platform for Intelligent Computing.
* [Pylearn2]( - A Machine Learning library based on [Theano](
* [hebel]( - GPU-Accelerated Deep Learning Library in Python.
* [gensim]( - Topic Modelling for Humans.
* [PyBrain]( - Another Python Machine Learning Library.
* [Crab]( - A flexible, fast recommender engine.
* [python-recsys]( - A Python library for implementing a Recommender System.
2014-07-15 20:04:11 +00:00
* [BayesPy](
2014-07-15 19:18:51 +00:00
#### Data Analysis / Data Visualization
* [SciPy]( - A Python-based ecosystem of open-source software for mathematics, science, and engineering.
* [NumPy]( - A fundamental package for scientific computing with Python.
* [Numba]( - Python JIT (just in time) complier to LLVM aimed at scientific Python by the developers of Cython and NumPy.
* [NetworkX]( - A high-productivity software for complex networks.
* [Pandas]( - A library providing high-performance, easy-to-use data structures and data analysis tools.
* [Open Mining]( - Business Intelligence (BI) in Python (Pandas web interface)
* [PyMC]( - Markov Chain Monte Carlo sampling toolkit.
* [zipline]( - A Pythonic algorithmic trading library.
* [PyDy]( - Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion based around NumPy, SciPy, IPython, and matplotlib.
* [SymPy]( - A Python library for symbolic mathematics.
* [statsmodels]( - Statistical modeling and econometrics in Python.
* [astropy]( - A community Python library for Astronomy.
* [matplotlib]( - A Python 2D plotting library.
* [bokeh]( - Interactive Web Plotting for Python.
* [plotly]( - Collaborative web plotting for Python and matplotlib.
* [vincent]( - A Python to Vega translator.
* [d3py]( - A plottling library for Python, based on [D3.js](
* [ggplot]( - Same API as ggplot2 for R.
* []( - Rendering beautiful SVG maps in Python.
* [pygal]( - A Python SVG Charts Creator.
2014-07-15 23:06:28 +00:00
* [pycascading](
2014-07-15 19:18:51 +00:00
2014-07-15 19:52:14 +00:00
#### Misc Scripts / iPython Notebooks
* [pattern_classification](
* [thinking stats 2](
* [hyperopt](
* [numpic](
* [2012-paper-diginorm](
* [ipython-notebooks](
2014-07-15 20:04:11 +00:00
* [decision-weights](
2014-07-15 19:52:14 +00:00
2014-07-15 19:15:23 +00:00
## Ruby
2014-07-15 19:24:19 +00:00
#### Natural Language Processing
* [Treat]( - Text REtrieval and Annotation Toolkit, definitely the most comprehensive toolkit Ive encountered so far for Ruby
2014-07-16 01:15:06 +00:00
* [Ruby Linguistics]( - Linguistics is a framework for building linguistic utilities for Ruby objects in any language. It includes a generic language-independant front end, a module for mapping language codes into language names, and a module which contains various English-language utilities.
* [Stemmer]( - Expose libstemmer_c to Ruby
* [Ruby Wordnet]( - This library is a Ruby interface to WordNet
* [Raspel]( - raspell is an interface binding for ruby
* [UEA Stemmer]( - Ruby port of UEALite Stemmer - a conservative stemmer for search and indexing
* [Twitter-text-rb]( - A library that does auto linking and extraction of usernames, lists and hashtags in tweets
2014-07-15 19:30:25 +00:00
#### General-Purpose Machine Learning
2014-07-16 01:15:06 +00:00
* [Ruby Machine Learning]( - Some Machine Learning algorithms, implemented in Ruby
2014-07-15 19:30:42 +00:00
* [Machine Learning Ruby](
2014-07-16 01:15:06 +00:00
* [jRuby Mahout]( - JRuby Mahout is a gem that unleashes the power of Apache Mahout in the world of JRuby.
* [CardMagic-Classifier]( - A general classifier module to allow Bayesian and other types of classifications.
2014-07-15 19:30:25 +00:00
2014-07-15 19:30:42 +00:00
#### Data Analysis / Data Visualization
2014-07-15 19:30:25 +00:00
2014-07-16 01:15:06 +00:00
* [rsruby]( - Ruby - R bridge
* [data-visualization-ruby]( - Source code and supporting content for my Ruby Manor presentation on Data Visualisation with Ruby
* [ruby-plot]( - gnuplot wrapper for ruby, especially for plotting roc curves into svg files
* [plot-rb]( - A plotting library in Ruby built on top of Vega and D3.
2014-07-16 01:22:23 +00:00
* [scruffy]( - A beautiful graphing toolkit for Ruby
2014-07-15 23:06:28 +00:00
* [SciRuby](
2014-07-16 00:42:44 +00:00
* [Glean]( - A data management tool for humans
* [Bioruby](
* [Arel](
#### Misc
* [Big Data For Chimps](
## R
#### General-Purpose Machine Learning
* [Clever Algorithms For Machine Learning](
2014-07-16 01:15:06 +00:00
* [Machine Learning For Hackers](
#### Data Analysis / Data Visualization
* [Learning Statistics Using R](
2014-07-15 23:06:28 +00:00
2014-07-15 23:15:40 +00:00
## Javascript
2014-07-15 23:06:28 +00:00
#### Natural Language Processing
* [Twitter-text-js](
* [NLP.js](
#### Data Analysis / Data Visualization
2014-07-15 23:06:48 +00:00
* [High Charts](
* [NVD3.js](
* [dc.js](
* [chartjs](
* [dimple](
* [amCharts](
2014-07-15 23:06:28 +00:00
#### General-Purpose Machine Learning
* [Convnet.js]( [DEEP LEARNING]
* [Clustering.js](
* [Decision Trees](
* [Node-fann](
* [Kmeans.js](
* [LDA.js](
* [Learning.js](
* [Machine Learning](
* [Node-SVM](
* [Brain](
2014-07-15 19:30:25 +00:00
2014-07-15 19:15:23 +00:00
## Scala
2014-07-15 19:52:14 +00:00
#### Natural Language Processing
2014-07-15 23:13:40 +00:00
* [ScalaNLP]( - ScalaNLP is a suite of machine learning and numerical computing libraries.
* [Breeze]( - Breeze is a numerical processing library for Scala.
* [Chalk]( - Chalk is a natural language processing library.
* [FACTORIE]( - FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.
2014-07-15 19:52:14 +00:00
#### Data Analysis / Data Visualization
2014-07-16 01:33:28 +00:00
* [Scalding]( - A Scala API for Cascading
* [Summing Bird]( - Streaming MapReduce with Scalding and Storm
* [Algebird]( - Abstract Algebra for Scala
* [xerial]( - Data management utilities for Scala
* [simmer]( - Reduce your data. A unix filter for algebird-powered aggregation.
* [PredictionIO]( - PredictionIO, a machine learning server for software developers and data engineers.
2014-07-15 19:52:14 +00:00
#### General-Purpose Machine Learning
2014-07-16 01:33:28 +00:00
* [Conjecture]( - Scalable Machine Learning in Scalding
* [brushfire]( - decision trees for scalding
* [ganitha]( - scalding powered machine learning
* [adam]( - A genomics processing engine and specialized file format built using Apache Avro, Apache Spark and Parquet. Apache 2 licensed.
* [bioscala]( - Bioinformatics for the Scala programming language
2014-07-15 23:13:40 +00:00
2014-07-15 19:15:23 +00:00
## Java
2014-07-15 19:40:45 +00:00
#### Natural Language Processing
2014-07-16 01:31:00 +00:00
* [CoreNLP] ( - Stanford CoreNLP provides a set of natural language analysis tools which can take raw English language text input and give the base forms of words
* [Stanford Parser] ( - A natural language parser is a program that works out the grammatical structure of sentences
* [Stanford POS Tagger] ( - A Part-Of-Speech Tagger (POS Tagger
* [Stanford Name Entity Recognizer] ( - Stanford NER is a Java implementation of a Named Entity Recognizer.
* [Stanford Word Segmenter] ( - Tokenization of raw text is a standard pre-processing step for many NLP tasks.
* [Tregex, Tsurgeon and Semgrex]( - Tregex is a utility for matching patterns in trees, based on tree relationships and regular expression matches on nodes (the name is short for "tree regular expressions").
2014-07-15 19:40:45 +00:00
* [Stanford Phrasal: A Phrase-Based Translation System](
2014-07-16 01:31:00 +00:00
* [Stanford English Tokenizer]( - Stanford Phrasal is a state-of-the-art statistical phrase-based machine translation system, written in Java.
* [Stanford Tokens Regex]( - A tokenizer divides text into a sequence of tokens, which roughly correspond to "words"
* [Stanford Temporal Tagger]( - SUTime is a library for recognizing and normalizing time expressions.
* [Stanford SPIED]( - Learning entities from unlabeled text starting with seed sets using patterns in an iterative fashion
* [Stanford Topic Modeling Toolbox]( - Topic modeling tools to social scientists and others who wish to perform analysis on datasets
* [Twitter Text Java]( - A Java implementation of Twitter's text processing library
2014-07-15 23:15:26 +00:00
2014-07-15 19:40:45 +00:00
#### General-Purpose Machine Learning
2014-07-16 01:31:00 +00:00
* [Mahout]( - Distributed machine learning
* [Stanford Classifier]( - A classifier is a machine learning tool that will take data items and place them into one of k classes.
2014-07-15 19:40:45 +00:00
#### Data Analysis / Data Visualization
2014-07-16 01:27:56 +00:00
* [Hadoop]( - Hadoop/HDFS
* [Spark]( - Spark is a fast and general engine for large-scale data processing.
* [Impala]( - Real-time Query for Hadoop
2014-07-15 19:40:45 +00:00
2014-07-15 19:15:23 +00:00
## Go
2014-07-15 19:52:14 +00:00
#### Natural Language Processing
2014-07-16 01:27:56 +00:00
* [go-porterstemmer]( - A native Go clean room implementation of the Porter Stemming algorithm.
* [paicehusk]( - Golang implementation of the Paice/Husk Stemming Algorithm
* [snowball]( - Snowball Stemmer for Go.
2014-07-15 19:52:14 +00:00
#### General-Purpose Machine Learning
2014-07-16 01:27:56 +00:00
* [Go Learn]( - Machine Learning for Go
* [go-pr]( - Pattern recognition package in Go lang.
* [bayesian]( - Naive Bayesian Classification for Golang.
* [go-galib]( - Genetic Algorithms library written in Go / golang
2014-07-15 19:52:14 +00:00
#### Data Analysis / Data Visualization
2014-07-16 01:27:56 +00:00
* [go-graph]( - Graph library for Go/golang language.
* [SVGo]( - The Go Language library for SVG generation
2014-07-15 19:52:14 +00:00
## Matlab
#### Natural Language Processing
2014-07-16 01:27:56 +00:00
* [NLP]( - An NLP library for Matlab
2014-07-15 19:52:14 +00:00
#### General-Purpose Machine Learning
2014-07-15 23:06:48 +00:00
* [Training a deep autoencoder or a classifier
2014-07-16 01:24:44 +00:00
on MNIST digits]( - Training a deep autoencoder or a classifier
* [t-Distributed Stochastic Neighbor Embedding]( - t-Distributed Stochastic Neighbor Embedding (t-SNE) is a (prize-winning) technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets.
* [Spider]( - The spider is intended to be a complete object orientated environment for machine learning in Matlab.
* [LibSVM]( - A Library for Support Vector Machines
* [LibLinear]( - A Library for Large Linear Classification
2014-07-15 19:52:14 +00:00
#### Data Analysis / Data Visualization
2014-07-16 01:24:44 +00:00
* [matlab_gbl]( - MatlabBGL is a Matlab package for working with graphs.
* [gamic]( - Efficient pure-Matlab implementations of graph algorithms to complement MatlabBGL's mex functions.
2014-07-15 19:15:23 +00:00
## Julia
2014-07-15 20:04:11 +00:00
#### General-Purpose Machine Learning
2014-07-15 19:52:14 +00:00
2014-07-16 01:22:23 +00:00
* [PGM]( - A Julia framework for probabilistic graphical models.
* [DA]( - Julia package for Regularized Discriminant Analysis
* [Regression]( - Algorithms for regression analysis (e.g. linear regression and logistic regression)
* [Local Regression]( - Local regression, so smooooth!
* [Naive Bayes]( - Simple Naive Bayes implementation in Julia
* [Mixed Models]( - A Julia package for fitting (statistical) mixed-effects models
* [Simple MCMC]( - basic mcmc sampler implemented in Julia
* [Distance]( - Julia module for Distance evaluation
* [Decision Tree]( - Decision Tree Classifier and Regressor
* [Neural]( - A neural network in Julia
* [MCMC]( - MCMC tools for Julia
* [GLM]( - Generalized linear models in Julia
2014-07-15 20:11:30 +00:00
* [Online Learning](
2014-07-16 01:22:23 +00:00
* [GLMNet]( - Julia wrapper for fitting Lasso/ElasticNet GLM models using glmnet
* [Clustering]( - Basic functions for clustering data: k-means, dp-means, etc.
* [SVM]( - SVM's for Julia
* [Kernal Density]( - Kernel density estimators for julia
* [Dimensionality Reduction]( - Methods for dimensionality reduction
* [NMF]( - A Julia package for non-negative matrix factorization
2014-07-15 20:11:30 +00:00
2014-07-15 20:04:11 +00:00
#### Natural Language Processing
2014-07-15 19:52:14 +00:00
2014-07-16 01:22:23 +00:00
* [Topic Models]( - TopicModels for Julia
* [Text Analysis]( - Julia package for text analysis
2014-07-15 20:04:29 +00:00
2014-07-15 19:52:14 +00:00
#### Data Analysis / Data Visualization
2014-07-16 01:22:23 +00:00
* [Graph Layout]( - Graph layout algorithms in pure Julia
* [Data Frames Meta]( - Metaprogramming tools for DataFrames
* [Julia Data]( - library for working with tabular data in Julia
* [Data Read]( - Read files from Stata, SAS, and SPSS
* [Hypothesis Tests]( - Hypothesis tests for Julia
* [Gladfly]( - Crafty statistical graphics for Julia.
* [Stats]( - Statistical tests for Julia
* [RDataSets]( - Julia package for loading many of the data sets available in R
* [DataFrames]( - library for working with tabular data in Julia
* [Distributions]( - A Julia package for probability distributions and associated functions.
* [Data Arrays]( - Data structures that allow missing values
* [Time Series]( - Time series toolkit for Julia
* [Sampling]( - Basic sampling algorithms for Julia
2014-07-15 19:52:14 +00:00
2014-07-15 20:11:30 +00:00
#### Misc Stuff / Presentations
2014-07-15 19:52:14 +00:00
2014-07-16 01:22:23 +00:00
* [JuliaCon Presentations]( - Presentations for JuliaCon
* [SignalProcessing]( - Signal Processing tools for Julia
* [Images]( - An image library for Julia
2014-07-15 19:15:23 +00:00
2014-07-15 19:20:31 +00:00
## Credits
2014-07-15 19:15:23 +00:00
2014-07-15 19:20:31 +00:00
* Some of the python libraries were cut-and-pasted from [vinta](
2014-07-15 23:13:40 +00:00
* The few go reference I found where pulled from [this page](
2014-07-15 19:15:23 +00:00