diff --git a/README.md b/README.md index 67bc951..6444ad1 100644 --- a/README.md +++ b/README.md @@ -60,8 +60,8 @@ Further resources: - [Erlang](#erlang) - [General-Purpose Machine Learning](#erlang-general-purpose-machine-learning) - [Fortran](#fortran) - - [General-Purpose Machine Learning](#general-purpose-machine-learning-8) - - [Data Analysis / Data Visualization](#data-analysis--data-visualization) + - [General-Purpose Machine Learning](#fortran-general-purpose-machine-learning) + - [Data Analysis / Data Visualization](#fortran-data-analysis--data-visualization) - [Go](#go) - [Natural Language Processing](#go-natural-language-processing) - [General-Purpose Machine Learning](#go-general-purpose-machine-learning) @@ -124,7 +124,7 @@ Further resources: - [Misc Scripts / iPython Notebooks / Codebases](#python-misc-scripts--ipython-notebooks--codebases) - [Neural Networks](#python-neural-networks) - [Survival Analysis](#python-survival-analysis) - - [Federated Learning](#federated-learning) + - [Federated Learning](#python-federated-learning) - [Kaggle Competition Source Code](#python-kaggle-competition-source-code) - [Reinforcement Learning](#python-reinforcement-learning) - [Ruby](#ruby) @@ -133,7 +133,7 @@ Further resources: - [Data Analysis / Data Visualization](#ruby-data-analysis--data-visualization) - [Misc](#ruby-misc) - [Rust](#rust) - - [General-Purpose Machine Learning](#general-purpose-machine-learning) + - [General-Purpose Machine Learning](#rust-general-purpose-machine-learning) - [R](#r) - [General-Purpose Machine Learning](#r-general-purpose-machine-learning) - [Data Analysis / Data Visualization](#r-data-analysis--data-visualization) @@ -400,7 +400,7 @@ Further resources: * [neural-fortran](https://github.com/modern-fortran/neural-fortran) - A parallel neural net microframework. Read the paper [here](https://arxiv.org/abs/1902.06714). - + #### Data Analysis / Data Visualization * [ParaMonte](https://github.com/cdslaborg/paramonte) - A general-purpose Fortran library for Bayesian data analysis and visualization via serial/parallel Monte Carlo and MCMC simulations. Documentation can be found [here](https://www.cdslab.org/paramonte/). @@ -427,7 +427,7 @@ Read the paper [here](https://arxiv.org/abs/1902.06714). * [gobrain](https://github.com/goml/gobrain) - Neural Networks written in Go. * [go-featureprocessing](https://github.com/nikolaydubina/go-featureprocessing) - Fast and convenient feature processing for low latency machine learning in Go. * [go-mxnet-predictor](https://github.com/songtianyi/go-mxnet-predictor) - Go binding for MXNet c_predict_api to do inference with a pre-trained model. -* [go-ml-benchmarks](https://github.com/nikolaydubina/go-ml-benchmarks) — benchmarks of machine learning inference for Go +* [go-ml-benchmarks](https://github.com/nikolaydubina/go-ml-benchmarks) — benchmarks of machine learning inference for Go. * [go-ml-transpiler](https://github.com/znly/go-ml-transpiler) - An open source Go transpiler for machine learning models. * [golearn](https://github.com/sjwhitworth/golearn) - Machine learning for Go. * [goml](https://github.com/cdipaolo/goml) - Machine learning library written in pure Go. @@ -520,18 +520,19 @@ Read the paper [here](https://arxiv.org/abs/1902.06714). * [aerosolve](https://github.com/airbnb/aerosolve) - A machine learning library by Airbnb designed from the ground up to be human friendly. * [AMIDST Toolbox](http://www.amidsttoolbox.com/) - A Java Toolbox for Scalable Probabilistic Machine Learning. +* [Chips-n-Salsa](https://github.com/cicirello/Chips-n-Salsa) - A Java library for genetic algorithms, evolutionary computation, and stochastic local search, with a focus on self-adaptation / self-tuning, as well as parallel execution. * [Datumbox](https://github.com/datumbox/datumbox-framework) - Machine Learning framework for rapid development of Machine Learning and Statistical applications. * [ELKI](https://elki-project.github.io/) - Java toolkit for data mining. (unsupervised: clustering, outlier detection etc.) -* [Encog](https://github.com/encog/encog-java-core) - An advanced neural network and machine learning framework. Encog contains classes to create a wide variety of networks, as well as support classes to normalize and process data for these neural networks. Encog trains using multithreaded resilient propagation. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train neural networks. +* [Encog](https://github.com/encog/encog-java-core) - An advanced neural network and machine learning framework. Encog contains classes to create a wide variety of networks, as well as support classes to normalize and process data for these neural networks. Encog trainings using multithreaded resilient propagation. Encog can also make use of a GPU to further speed processing time. A GUI based workbench is also provided to help model and train neural networks. * [FlinkML in Apache Flink](https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/ml/index.html) - Distributed machine learning library in Flink. * [H2O](https://github.com/h2oai/h2o-3) - ML engine that supports distributed learning on Hadoop, Spark or your laptop via APIs in R, Python, Scala, REST/JSON. * [htm.java](https://github.com/numenta/htm.java) - General Machine Learning library using Numenta’s Cortical Learning Algorithm. * [liblinear-java](https://github.com/bwaldvogel/liblinear-java) - Java version of liblinear. * [Mahout](https://github.com/apache/mahout) - Distributed machine learning. * [Meka](http://meka.sourceforge.net/) - An open source implementation of methods for multi-label classification and evaluation (extension to Weka). -* [MLlib in Apache Spark](https://spark.apache.org/docs/latest/mllib-guide.html) - Distributed machine learning library in Spark +* [MLlib in Apache Spark](https://spark.apache.org/docs/latest/mllib-guide.html) - Distributed machine learning library in Spark. * [Hydrosphere Mist](https://github.com/Hydrospheredata/mist) - a service for deployment Apache Spark MLLib machine learning models as realtime, batch or reactive web services. -* [Neuroph](http://neuroph.sourceforge.net/) - Neuroph is lightweight Java neural network framework +* [Neuroph](http://neuroph.sourceforge.net/) - Neuroph is lightweight Java neural network framework. * [ORYX](https://github.com/oryxproject/oryx) - Lambda Architecture Framework using Apache Spark and Apache Kafka with a specialization for real-time large-scale machine learning. * [Samoa](https://samoa.incubator.apache.org/) SAMOA is a framework that includes distributed machine learning for data streams with an interface to plug-in different stream processing platforms. * [RankLib](https://sourceforge.net/p/lemur/wiki/RankLib/) - RankLib is a library of learning to rank algorithms. **[Deprecated]** @@ -564,7 +565,7 @@ Read the paper [here](https://arxiv.org/abs/1902.06714). #### Deep Learning * [Deeplearning4j](https://github.com/deeplearning4j/deeplearning4j) - Scalable deep learning for industry with parallel GPUs. -* [Keras Beginner Tutorial](https://victorzhou.com/blog/keras-neural-network-tutorial/) - Friendly guide on using Keras to implement a simple Neural Network in Python +* [Keras Beginner Tutorial](https://victorzhou.com/blog/keras-neural-network-tutorial/) - Friendly guide on using Keras to implement a simple Neural Network in Python. * [deepjavalibrary/djl](https://github.com/deepjavalibrary/djl) - Deep Java Library (DJL) is an open-source, high-level, engine-agnostic Java framework for deep learning, designed to be easy to get started with and simple to use for Java developers. @@ -578,7 +579,7 @@ Read the paper [here](https://arxiv.org/abs/1902.06714). * [Knwl.js](https://github.com/loadfive/Knwl.js) - A Natural Language Processor in JS. * [Retext](https://github.com/retextjs/retext) - Extensible system for analyzing and manipulating natural language. * [NLP Compromise](https://github.com/spencermountain/compromise) - Natural Language processing in the browser. -* [nlp.js](https://github.com/axa-group/nlp.js) - An NLP library built in node over Natural, with entity extraction, sentiment analysis, automatic language identify, and so more +* [nlp.js](https://github.com/axa-group/nlp.js) - An NLP library built in node over Natural, with entity extraction, sentiment analysis, automatic language identify, and so more. @@ -643,7 +644,7 @@ Read the paper [here](https://arxiv.org/abs/1902.06714). * [JSMLT](https://github.com/jsmlt/jsmlt) - Machine learning toolkit with classification and clustering for Node.js; supports visualization (see [visualml.io](https://visualml.io)). * [xgboost-node](https://github.com/nuanio/xgboost-node) - Run XGBoost model and make predictions in Node.js. * [Netron](https://github.com/lutzroeder/netron) - Visualizer for machine learning models. -* [tensor-js](https://github.com/Hoff97/tensorjs) - A deep learning library for the browser, accelerated by WebGL and WebAssembly +* [tensor-js](https://github.com/Hoff97/tensorjs) - A deep learning library for the browser, accelerated by WebGL and WebAssembly. * [WebDNN](https://github.com/mil-tokyo/webdnn) - Fast Deep Neural Network Javascript Framework. WebDNN uses next generation JavaScript API, WebGPU for GPU execution, and WebAssembly for CPU execution. @@ -706,7 +707,7 @@ Read the paper [here](https://arxiv.org/abs/1902.06714). * [ScikitLearn](https://github.com/cstjean/ScikitLearn.jl) - Julia implementation of the scikit-learn API. * [Knet](https://github.com/denizyuret/Knet.jl) - Koç University Deep Learning Framework. * [Flux](https://fluxml.ai/) - Relax! Flux is the ML library that doesn't make you tensor -* [MLJ](https://github.com/alan-turing-institute/MLJ.jl) - A Julia machine learning framework +* [MLJ](https://github.com/alan-turing-institute/MLJ.jl) - A Julia machine learning framework. #### Natural Language Processing @@ -1222,7 +1223,7 @@ be * [Eurybia](https://github.com/MAIF/eurybia): Eurybia monitors data and model drift over time and securizes model deployment with data validation. * [Colossal-AI](https://github.com/hpcaitech/ColossalAI): An open-source deep learning system for large-scale model training and inference with high efficiency and low cost. * [dirty_cat](https://github.com/dirty-cat/dirty_cat) - facilitates machine-learning on dirty, non-curated categories. It provides transformers and encoders robust to morphological variants, such as typos. -* [Upgini](https://github.com/upgini/river): Free automated data & feature enrichment library for machine learning - automatically searches through thousands of ready-to-use features from public and community shared data sources and enriches your training dataset with only the accuracy improving features. +* [Upgini](https://github.com/upgini/upgini): Free automated data & feature enrichment library for machine learning - automatically searches through thousands of ready-to-use features from public and community shared data sources and enriches your training dataset with only the accuracy improving features. #### Data Analysis / Data Visualization @@ -1361,7 +1362,7 @@ be * [lifelines](https://github.com/CamDavidsonPilon/lifelines) - lifelines is a complete survival analysis library, written in pure Python * [Scikit-Survival](https://github.com/sebp/scikit-survival) - scikit-survival is a Python module for survival analysis built on top of scikit-learn. It allows doing survival analysis while utilizing the power of scikit-learn, e.g., for pre-processing or doing cross-validation. - + #### Federated Learning * [Flower](https://flower.dev/) - A unified approach to federated learning, analytics, and evaluation. Federate any workload, any ML framework, and any programming language. * [PySyft](https://github.com/OpenMined/PySyft) - A Python library for secure and private Deep Learning. @@ -1711,9 +1712,11 @@ be * [Notebooks](https://github.com/rlan/notebooks) - A starter kit for Jupyter notebooks and machine learning. Companion docker images consist of all combinations of python versions, machine learning frameworks (Keras, PyTorch and Tensorflow) and CPU/CUDA versions. * [DVC](https://github.com/iterative/dvc) - Data Science Version Control is an open-source version control system for machine learning projects with pipelines support. It makes ML projects reproducible and shareable. * [DVClive](https://github.com/iterative/dvclive) - Python library for experiment metrics logging into simply formatted local files. +* [VDP](https://github.com/instill-ai/vdp) - open source visual data ETL to streamline the end-to-end visual data processing pipeline: extract unstructured visual data from pre-built data sources, transform it into analysable structured insights by Vision AI models imported from various ML platforms, and load the insights into warehouses or applications. * [Kedro](https://github.com/quantumblacklabs/kedro/) - Kedro is a data and development workflow framework that implements best practices for data pipelines with an eye towards productionizing machine learning models. * [guild.ai](https://guild.ai/) - Tool to log, analyze, compare and "optimize" experiments. It's cross-platform and framework independent, and provided integrated visualizers such as tensorboard. * [Sacred](https://github.com/IDSIA/sacred) - Python tool to help you configure, organize, log and reproduce experiments. Like a notebook lab in the context of Chemistry/Biology. The community has built multiple add-ons leveraging the proposed standard. +* [Comet](https://www.comet.com/) - ML platform for tracking experiments, hyper-parameters, artifacts and more. It's deeply integrated with over 15+ deep learning frameworks and orchestration tools. Users can also use the platform to monitor their models in production. * [MLFlow](https://mlflow.org/) - platform to manage the ML lifecycle, including experimentation, reproducibility and deployment. Framework and language agnostic, take a look at all the built-in integrations. * [Weights & Biases](https://www.wandb.com/) - Machine learning experiment tracking, dataset versioning, hyperparameter search, visualization, and collaboration * More tools to improve the ML lifecycle: [Catalyst](https://github.com/catalyst-team/catalyst), [PachydermIO](https://www.pachyderm.io/). The following are Github-alike and targeting teams [Weights & Biases](https://www.wandb.com/), [Neptune.ai](https://neptune.ai/), [Comet.ml](https://www.comet.ml/), [Valohai.ai](https://valohai.com/), [DAGsHub](https://DAGsHub.com/). diff --git a/books.md b/books.md index a871a90..b03e0f7 100644 --- a/books.md +++ b/books.md @@ -34,12 +34,12 @@ The following is a list of free and/or open source books on machine learning, st * [Bayesian Reasoning and Machine Learning](http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.HomePage) Book+MatlabToolBox * [R Programming for Data Science](https://leanpub.com/rprogramming) * [Data Mining - Practical Machine Learning Tools and Techniques](https://cdn.preterhuman.net/texts/science_and_technology/artificial_intelligence/Data%20Mining%20Practical%20Machine%20Learning%20Tools%20and%20Techniques%202d%20ed%20-%20Morgan%20Kaufmann.pdf) Book -* [Machine Learning with TensorFlow](https://www.manning.com/books/machine-learning-with-tensorflow) Early access book -* [Machine Learning Systems](https://www.manning.com/books/machine-learning-systems) Early access book +* [Machine Learning with TensorFlow](https://www.manning.com/books/machine-learning-with-tensorflow) Early book access +* [Machine Learning Systems](https://www.manning.com/books/machine-learning-systems) Early book access * [Hands‑On Machine Learning with Scikit‑Learn and TensorFlow](http://index-of.es/Varios-2/Hands%20on%20Machine%20Learning%20with%20Scikit%20Learn%20and%20Tensorflow.pdf) - Aurélien Géron -* [R for Data Science: Import, Tidy, Transform, Visualize, and Model Data](https://r4ds.had.co.nz/) - Wickham and Grolemund. Great as introduction on how to use R. +* [R for Data Science: Import, Tidy, Transform, Visualize, and Model Data](https://r4ds.had.co.nz/) - Wickham and Grolemund. Great introduction on how to use R language. * [Advanced R](http://adv-r.had.co.nz/) - Hadley Wickham. More advanced usage of R for programming. -* [Graph-Powered Machine Learning](https://www.manning.com/books/graph-powered-machine-learning) - Alessandro Negro. Combining graph theory and models to improve machine learning projects +* [Graph-Powered Machine Learning](https://www.manning.com/books/graph-powered-machine-learning) - Alessandro Negro. Combining graph theory and models to improve machine learning projects. * [Machine Learning for Dummies](https://mscdss.ds.unipi.gr/wp-content/uploads/2018/02/Untitled-attachment-00056-2-1.pdf) * [Machine Learning for Mortals (Mere and Otherwise)](https://www.manning.com/books/machine-learning-for-mortals-mere-and-otherwise) - Early access book that provides basics of machine learning and using R programming language. * [Grokking Machine Learning](https://www.manning.com/books/grokking-machine-learning) - Early access book that introduces the most valuable machine learning techniques. @@ -64,6 +64,7 @@ The following is a list of free and/or open source books on machine learning, st - [Metalearning: Applications to Automated Machine Learning and Data Mining](https://link.springer.com/content/pdf/10.1007/978-3-030-67024-5.pdf) - Pavel Brazdil, Jan N. van Rijn, Carlos Soares, Joaquin Vanschoren - [Managing Machine Learning Projects: From design to deployment](https://www.manning.com/books/managing-machine-learning-projects) - Simon Thompson - [Causal Machine Learning](https://www.manning.com/books/causal-machine-learning) - Robert Ness - Practical introduction to building AI models that can reason about causality. +- [Bayesian Optimization in Action](https://www.manning.com/books/bayesian-optimization-in-action) - Quan Nguyen - Book about building Bayesian optimization systems from the ground up. ## Deep Learning