A curated list of awesome machine learning frameworks, libraries and software (by language). Inspired by awesome-php. If you want to contribute to this list, send me a pull request or contact me [@josephmisiti](https://www.twitter.com/josephmisiti) ## Python #### Natural Language Processing * [NLTK](http://www.nltk.org/) - A leading platform for building Python programs to work with human language data. * [Pattern](http://www.clips.ua.ac.be/pattern) - A web mining module for the Python programming language. It has tools for natural language processing, machine learning, among others. * [TextBlob](http://textblob.readthedocs.org/) - Providing a consistent API for diving into common natural language processing (NLP) tasks. Stands on the giant shoulders of NLTK and Pattern, and plays nicely with both. * [jieba](https://github.com/fxsjy/jieba#jieba-1) - Chinese Words Segementation Utilities. * [SnowNLP](https://github.com/isnowfy/snownlp) - A library for processing Chinese text. * [loso](https://github.com/victorlin/loso) - Another Chinese segmentation library. * [genius](https://github.com/duanhongyi/genius) - A Chinese segment base on Conditional Random Field. #### General-Purpose Machine Learning * [scikit-learn](http://scikit-learn.org/) - A Python module for machine learning built on top of SciPy. * [pattern](https://github.com/clips/pattern) - Web mining module for Python. * [NuPIC](https://github.com/numenta/nupic) - Numenta Platform for Intelligent Computing. * [Pylearn2](https://github.com/lisa-lab/pylearn2) - A Machine Learning library based on [Theano](https://github.com/Theano/Theano). * [hebel](https://github.com/hannes-brt/hebel) - GPU-Accelerated Deep Learning Library in Python. * [gensim](https://github.com/piskvorky/gensim) - Topic Modelling for Humans. * [PyBrain](https://github.com/pybrain/pybrain) - Another Python Machine Learning Library. * [Crab](https://github.com/muricoca/crab) - A flexible, fast recommender engine. * [python-recsys](https://github.com/ocelma/python-recsys) - A Python library for implementing a Recommender System. * [BayesPy](https://github.com/maxsklar/BayesPy) #### Data Analysis / Data Visualization * [SciPy](http://www.scipy.org/) - A Python-based ecosystem of open-source software for mathematics, science, and engineering. * [NumPy](http://www.numpy.org/) - A fundamental package for scientific computing with Python. * [Numba](http://numba.pydata.org/) - Python JIT (just in time) complier to LLVM aimed at scientific Python by the developers of Cython and NumPy. * [NetworkX](https://networkx.github.io/) - A high-productivity software for complex networks. * [Pandas](http://pandas.pydata.org/) - A library providing high-performance, easy-to-use data structures and data analysis tools. * [Open Mining](https://github.com/avelino/mining) - Business Intelligence (BI) in Python (Pandas web interface) * [PyMC](https://github.com/pymc-devs/pymc) - Markov Chain Monte Carlo sampling toolkit. * [zipline](https://github.com/quantopian/zipline) - A Pythonic algorithmic trading library. * [PyDy](https://pydy.org/) - Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion based around NumPy, SciPy, IPython, and matplotlib. * [SymPy](https://github.com/sympy/sympy) - A Python library for symbolic mathematics. * [statsmodels](https://github.com/statsmodels/statsmodels) - Statistical modeling and econometrics in Python. * [astropy](http://www.astropy.org/) - A community Python library for Astronomy. * [matplotlib](http://matplotlib.org/) - A Python 2D plotting library. * [bokeh](https://github.com/ContinuumIO/bokeh) - Interactive Web Plotting for Python. * [plotly](https://plot.ly/python) - Collaborative web plotting for Python and matplotlib. * [vincent](https://github.com/wrobstory/vincent) - A Python to Vega translator. * [d3py](https://github.com/mikedewar/d3py) - A plottling library for Python, based on [D3.js](http://d3js.org/). * [ggplot](https://github.com/yhat/ggplot) - Same API as ggplot2 for R. * [Kartograph.py](https://github.com/kartograph/kartograph.py) - Rendering beautiful SVG maps in Python. * [pygal](http://pygal.org/) - A Python SVG Charts Creator. #### Misc Scripts / iPython Notebooks * [pattern_classification](https://github.com/rasbt/pattern_classification) * [thinking stats 2](https://github.com/Wavelets/ThinkStats2) * [hyperopt](https://github.com/hyperopt/hyperopt-sklearn) * [numpic](https://github.com/numenta/nupic) * [2012-paper-diginorm](https://github.com/ged-lab/2012-paper-diginorm) * [ipython-notebooks](https://github.com/ogrisel/notebooks) * [decision-weights](https://github.com/CamDavidsonPilon/decision-weights) ## Ruby #### Natural Language Processing * [Treat](https://github.com/louismullie/treat) - Text REtrieval and Annotation Toolkit, definitely the most comprehensive toolkit I’ve encountered so far for Ruby * [Ruby Linguistics](http://www.deveiate.org/projects/Linguistics/) - NLTK for Ruby * [Stemmer](https://github.com/aurelian/ruby-stemmer) * [Ruby Wordnet](http://www.deveiate.org/projects/Ruby-WordNet/) * [Raspel](http://sourceforge.net/projects/raspell/) * [UEA Stemmer](https://github.com/ealdent/uea-stemmer) #### General-Purpose Machine Learning * [Ruby Machine Learning](https://github.com/tsycho/ruby-machine-learning) * [Machine Learning Ruby](https://github.com/mizoR/machine-learning-ruby) * [jRuby Mahout](https://github.com/vasinov/jruby_mahout) * [CardMagic-Classifier](https://github.com/cardmagic/classifier) #### Data Analysis / Data Visualization * [rsruby](https://github.com/alexgutteridge/rsruby) * [data-visualization-ruby](https://github.com/chrislo/data_visualisation_ruby) * [ruby-plot](https://www.ruby-toolbox.com/projects/ruby-plot) * [plot-rb](https://github.com/zuhao/plotrb) * [scruffy](http://www.rubyinside.com/scruffy-a-beautiful-graphing-toolkit-for-ruby-194.html) ## Scala #### Natural Language Processing * TODO #### Data Analysis / Data Visualization * TODO #### General-Purpose Machine Learning * [Conjecture](https://github.com/etsy/Conjecture) ## Java #### Natural Language Processing * [CoreNLP] (http://nlp.stanford.edu/software/corenlp.shtml) * [Stanford Parser] (http://nlp.stanford.edu/software/lex-parser.shtml) * [Stanford POS Tagger] (http://nlp.stanford.edu/software/tagger.shtml) * [Stanford Name Entity Recognizer] (http://nlp.stanford.edu/software/CRF-NER.shtml) * [Stanford Word Segmenter] (http://nlp.stanford.edu/software/segmenter.shtml) * [Tregex, Tsurgeon and Semgrex](http://nlp.stanford.edu/software/tregex.shtml) * [Stanford Phrasal: A Phrase-Based Translation System](http://nlp.stanford.edu/software/phrasal/) * [Stanford English Tokenizer](http://nlp.stanford.edu/software/tokenizer.shtml) * [Stanford Tokens Regex](http://nlp.stanford.edu/software/tokensregex.shtml) * [Stanford Temporal Tagger](http://nlp.stanford.edu/software/sutime.shtml) * [Stanford SPIED](http://nlp.stanford.edu/software/patternslearning.shtml) * [Stanford Topic Modeling Toolbox](http://nlp.stanford.edu/software/tmt/tmt-0.4/) #### General-Purpose Machine Learning * [Mahout](https://github.com/apache/mahout) * [Stanford Classifier](http://nlp.stanford.edu/software/classifier.shtml) #### Data Analysis / Data Visualization * [Hadoop](https://github.com/apache/hadoop-mapreduce) * [Spark](https://github.com/apache/spark) * [Impala](https://github.com/cloudera/impala) ## Go #### Natural Language Processing * TODO #### General-Purpose Machine Learning * [Go Learn](https://github.com/sjwhitworth/golearn) #### Data Analysis / Data Visualization * TODO ## R #### Natural Language Processing * TODO #### General-Purpose Machine Learning * TODO #### Data Analysis / Data Visualization * TODO ## Matlab #### Natural Language Processing * TODO #### General-Purpose Machine Learning * TODO #### Data Analysis / Data Visualization * TODO ## Julia #### General-Purpose Machine Learning * [PGM](https://github.com/JuliaStats/PGM.jl) * [DA](https://github.com/trthatcher/DA.jl) * [Regression](https://github.com/lindahua/Regression.jl) #### Natural Language Processing * TODO #### Data Analysis / Data Visualization * [Graph Layout](https://github.com/IainNZ/GraphLayout.jl) * [Data Frames Meta](https://github.com/JuliaStats/DataFramesMeta.jl) * [Julia Data](https://github.com/nfoti/JuliaData) * [Data Read](https://github.com/WizardMac/DataRead.jl) #### Misc Scripts + Presentations * [JuliaCon Presentations](https://github.com/JuliaCon/presentations) ## Credits * Some of the python libraries were cut-and-pasted from [vinta](https://github.com/vinta/awesome-python)