All libraries below are free, and most are open-source.
Machine Learning
General purpouse Machine Learning
- scikit-learn – machine learning in Python
- Shogun – machine learning toolbox
- xLearn – High Performance, Easy-to-use, and Scalable Machine Learning Package
- Reproducible Experiment Platform (REP) – Machine Learning toolbox for Humans
- modAL – a modular active learning framework for Python3
- Sparkit-learn – PySpark + Scikit-learn = Sparkit-learn
- mlpack – a scalable C++ machine learning library (Python bindings)
- dlib – A toolkit for making real world machine learning and data analysis applications in C++ (Python bindings)
- MLxtend – extension and helper modules for Python’s data analysis and machine learning libraries
- tick – module for statistical learning, with a particular emphasis on time-dependent modelling
- sklearn-extensions – a consolidated package of small extensions to scikit-learn
- civisml-extensions – scikit-learn-compatible estimators from Civis Analytics
- scikit-multilearn – multi-label classification for python
- tslearn – machine learning toolkit dedicated to time-series data
- seqlearn – seqlearn is a sequence classification toolkit for Python
- pystruct – Simple structured learning framework for python
- sklearn-expertsys – Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models
- skutil – A set of scikit-learn and h2o extension classes (as well as caret classes for python)
- sklearn-crfsuite – scikit-learn inspired API for CRFsuite
- RuleFit – implementation of the rulefit
- metric-learn – metric learning algorithms in Python
- pyGAM – Generalized Additive Models in Python
- luminol – Anomaly Detection and Correlation library
Automated machine learning
- TPOT – Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming
- auto-sklearn – is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator
- MLBox – a powerful Automated Machine Learning python library.
Ensemble methods
- ML-Ensemble – high performance ensemble learning
- brew – Python Ensemble Learning API
- Stacking – Simple and useful stacking library, written in Python.
- stacked_generalization – library for machine learning stacking generalization.
- vecstack – Python package for stacking (machine learning technique)
Imbalanced datasets
- imbalanced-learn – module to perform under sampling and over sampling with various techniques
- imbalanced-algorithms – Python-based implementations of algorithms for learning on imbalanced data.
Random Forests
- rpforest – a forest of random projection trees
- Random Forest Clustering – Unsupervised Clustering using Random Forests
- sklearn-random-bits-forest – wrapper of the Random Bits Forest program written by (Wang et al., 2016)
- rgf_python – Python Wrapper of Regularized Greedy Forest
Extreme Learning Machine
- Python-ELM – Extreme Learning Machine implementation in Python
- Python Extreme Learning Machine (ELM) – a machine learning technique used for classification/regression tasks
- hpelm ![alt text][gpu] – High performance implementation of Extreme Learning Machines (fast randomized neural networks).
Kernel methods
- pyFM – Factorization machines in python
- fastFM – a library for Factorization Machines
- tffm – TensorFlow implementation of an arbitrary order Factorization Machine
- liquidSVM – an implementation of SVMs
- scikit-rvm – Relevance Vector Machine implementation using the scikit-learn API
Gradient boosting
- XGBoost ![alt text][gpu] – Scalable, Portable and Distributed Gradient Boosting
- LightGBM ![alt text][gpu] – a fast, distributed, high performance gradient boosting by Microsoft
- CatBoost ![alt text][gpu] – an open-source gradient boosting on decision trees library by Yandex
- InfiniteBoost – building infinite ensembles with gradient descent
- TGBoost – Tiny Gradient Boosting Tree
Deep Learning
Keras
- Keras – a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano
- keras-contrib – Keras community contributions
- Hyperas – Keras + Hyperopt: A very simple wrapper for convenient hyperparameter
- Elephas – Distributed Deep learning with Keras & Spark
- Hera – Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.
- dist-keras – Distributed Deep Learning, with a focus on distributed training
- Conx – The On-Ramp to Deep Learning
- Keras add-ons
PyTorch
- PyTorch – Tensors and Dynamic neural networks in Python with strong GPU acceleration
- torchvision – Datasets, Transforms and Models specific to Computer Vision
- torchtext – Data loaders and abstractions for text and NLP
- torchaudio – an audio library for PyTorch
- ignite – high-level library to help with training neural networks in PyTorch
- PyToune – a Keras-like framework and utilities for PyTorch
- skorch – a scikit-learn compatible neural network library that wraps pytorch
- PyTorchNet – an abstraction to train neural networks
- Aorun – intend to implement an API similar to Keras with PyTorch as backend.
- pytorch_geometric – Geometric Deep Learning Extension Library for PyTorch
Tensorflow
- TensorFlow – Computation using data flow graphs for scalable machine learning by Google
- TensorLayer – Deep Learning and Reinforcement Learning Library for Researcher and Engineer.
- TFLearn – Deep learning library featuring a higher-level API for TensorFlow
- Sonnet – TensorFlow-based neural network library by DeepMind
- TensorForce – a TensorFlow library for applied reinforcement learning
- tensorpack – a Neural Net Training Interface on TensorFlow
- Polyaxon – a platform that helps you build, manage and monitor deep learning models
- Horovod – Distributed training framework for TensorFlow
- tfdeploy – Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy
- hiptensorflow ![alt text][amd] – ROCm/HIP enabled Tensorflow
- TensorFlow Fold – Deep learning with dynamic computation graphs in TensorFlow
- tensorlm – wrapper library for text generation / language models at char and word level with RNN
- TensorLight – a high-level framework for TensorFlow
- Mesh TensorFlow – Model Parallelism Made Easier
Theano
Warning: Theano development has ceased
- Theano – is a Python library that allows you to define, optimize, and evaluate mathematical expressions
- Lasagne – Lightweight library to build and train neural networks in Theano Lasagne add-ons…
- nolearn – scikit-learn compatible neural network library (mainly for Lasagne)
- Blocks – a Theano framework for building and training neural networks
- platoon – Multi-GPU mini-framework for Theano
- NeuPy – NeuPy is a Python library for Artificial Neural Networks and Deep Learning
- scikit-neuralnetwork – Deep neural networks without the learning cliff
- Theano-MPI – MPI Parallel framework for training deep learning models built in Theano
MXNet
- MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler
- Gluon – a clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet)
- MXbox – simple, efficient and flexible vision toolbox for mxnet framework.
- gluon-cv – provides implementations of the state-of-the-art deep learning models in computer vision.
- gluon-nlp – NLP made easy
- MXNet ![alt text][amd] – HIP Port of MXNet
Caffe
- Caffe – a fast open framework for deep learning
- Caffe2 – a lightweight, modular, and scalable deep learning framework
- hipCaffe ![alt text][amd] – the HIP port of Caffe
CNTK
- CNTK – Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit
Chainer
- Chainer – a flexible framework for neural networks
- ChainerRL – a deep reinforcement learning library built on top of Chainer.
- ChainerCV – a Library for Deep Learning in Computer Vision
- ChainerMN – scalable distributed deep learning with Chainer
- scikit-chainer – scikit-learn like interface to chainer
- chainer_sklearn – Sklearn (Scikit-learn) like interface for Chainer
Others
- SKIL Skymind’s platform for distributed training of machine learning models, tracking machine learning experiments, deploying models to production and managing them over their lifecycle.
- Neon – Intel Nervana™ reference deep learning framework committed to best performance on all hardware
- Tangent – Source-to-Source Debuggable Derivatives in Pure Python
- autograd – Efficiently computes derivatives of numpy code
- Myia – deep learning framework (pre-alpha)
- nnabla – Neural Network Libraries by Sony
Model explanation
- Auralisation – auralisation of learned features in CNN (for audio)
- CapsNet-Visualization – a visualization of the CapsNet layers to better understand how it works
- lucid – a collection of infrastructure and tools for research in neural network interpretability.
- Netron – visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks)
- FlashLight – visualization Tool for your NeuralNetwork
- tensorboard-pytorch – tensorboard for pytorch (and chainer, mxnet, numpy, …)
- anchor – code for “High-Precision Model-Agnostic Explanations” paper
- aequitas – Bias and Fairness Audit Toolkit
- Contrastive Explanation – Contrastive Explanation (Foil Trees)
- yellowbrick – visual analysis and diagnostic tools to facilitate machine learning model selection
- scikit-plot – an intuitive library to add plotting functionality to scikit-learn objects
- shap – a unified approach to explain the output of any machine learning model
- ELI5 – a library for debugging/inspecting machine learning classifiers and explaining their predictions
- Lime – Explaining the predictions of any machine learning classifier
- FairML – FairML is a python toolbox auditing the machine learning models for bias
- L2X – Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation
- PDPbox – partial dependence plot toolbox
- pyBreakDown – Python implementation of R package breakDown
- PyCEbox – Python Individual Conditional Expectation Plot Toolbox
- Skater – Python Library for Model Interpretation
- tensorflow/model-analysis – Model analysis tools for TensorFlow
- themis-ml – a library that implements fairness-aware machine learning algorithms
- treeinterpreter [alt text][skl] -interpreting scikit-learn’s decision tree and random forest predictions
Reinforcement Learning
- OpenAI Gym – a toolkit for developing and comparing reinforcement learning algorithms.
Distributed computing systems
- PySpark – exposes the Spark programming model to Python
- Veles – Distributed machine learning platform by Samsung
- Jubatus – Framework and Library for Distributed Online Machine Learning
- DMTK – Microsoft Distributed Machine Learning Toolkit
- PaddlePaddle – PArallel Distributed Deep LEarning by Baidu
- dask-ml – Distributed and parallel machine learning
- Distributed – Distributed computation in Python
Probabilistic methods
- pomegranate ![alt text][cp] – probabilistic and graphical models for Python
- pyro – a flexible, scalable deep probabilistic programming library built on PyTorch.
- ZhuSuan – Bayesian Deep Learning
- PyMC – Bayesian Stochastic Modelling in Python
- PyMC3 – Python package for Bayesian statistical modeling and Probabilistic Machine Learning
- sampled – Decorator for reusable models in PyMC3
- Edward – A library for probabilistic modeling, inference, and criticism.
- InferPy – Deep Probabilistic Modelling Made Easy
- GPflow – Gaussian processes in TensorFlow
- PyStan – Bayesian inference using the No-U-Turn sampler (Python interface)
- gelato – Bayesian dessert for Lasagne
- sklearn-bayes – Python package for Bayesian Machine Learning with scikit-learn API
- bayesloop – Probabilistic programming framework that facilitates objective model selection for time-varying parameter models
- PyFlux – Open source time series library for Python
- skggm – estimation of general graphical models
- pgmpy – a python library for working with Probabilistic Graphical Models.
- skpro – supervised domain-agnostic prediction framework for probabilistic modelling by The Alan Turing Institute
- Aboleth – a bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation
- PtStat – Probabilistic Programming and Statistical Inference in PyTorch
- PyVarInf – Bayesian Deep Learning methods with Variational Inference for PyTorch
- emcee – The Python ensemble sampling toolkit for affine-invariant MCMC
- hsmmlearn – a library for hidden semi-Markov models with explicit durations
- pyhsmm – bayesian inference in HSMMs and HMMs
- GPyTorch – a highly efficient and modular implementation of Gaussian Processes in PyTorch
- Bayes – Python implementations of Naive Bayes algorithm variants
Genetic Programming
- gplearn – Genetic Programming in Python
- DEAP – Distributed Evolutionary Algorithms in Python
- karoo_gp – A Genetic Programming platform for Python with GPU support
- monkeys – A strongly-typed genetic programming framework for Python
- sklearn-genetic – Genetic feature selection module for scikit-learn
Optimization
- Spearmint – Bayesian optimization
- SMAC3 – Sequential Model-based Algorithm Configuration
- Optunity – is a library containing various optimizers for hyperparameter tuning.
- hyperopt – Distributed Asynchronous Hyperparameter Optimization in Python
- hyperopt-sklearn – hyper-parameter optimization for sklearn
- sklearn-deap – use evolutionary algorithms instead of gridsearch in scikit-learn
- sigopt_sklearn – SigOpt wrappers for scikit-learn methods
- Bayesian Optimization – A Python implementation of global optimization with gaussian processes.
- SafeOpt – Safe Bayesian Optimization
- scikit-optimize – Sequential model-based optimization with a
scipy.optimize
interface - Solid – A comprehensive gradient-free optimization framework written in Python
- PySwarms – A research toolkit for particle swarm optimization in Python
- Platypus – A Free and Open Source Python Library for Multiobjective Optimization
- GPflowOpt – Bayesian Optimization using GPflow
- POT – Python Optimal Transport library
- Talos – Hyperparameter Optimization for Keras Models
Natural Language Processing
- NLTK – modules, data sets, and tutorials supporting research and development in Natural Language Processing
- CLTK – The Classical Language Toolkik
- gensim – Topic Modelling for Humans
- PSI-Toolkit – a natural language processing toolkit by Adam Mickiewicz University in Poznań
- pyMorfologik – Python binding for Morfologik (Polish morphological analyzer)
- skift – scikit-learn wrappers for Python fastText.
- Phonemizer – Simple text to phonemes converter for multiple languages
Computer Audition
- librosa – Python library for audio and music analysis
- Yaafe – Audio features extraction
- aubio – a library for audio and music analysis
- Essentia – library for audio and music analysis, description and synthesis
- LibXtract – is a simple, portable, lightweight library of audio feature extraction functions
- Marsyas – Music Analysis, Retrieval and Synthesis for Audio Signals
- muda – a library for augmenting annotated audio data
- madmom – Python audio and music signal processing library
Computer Vision
- OpenCV – Open Source Computer Vision Library
- scikit-image – Image Processing SciKit (Toolbox for SciPy)
- imgaug – image augmentation for machine learning experiments
- imgaug_extension – additional augmentations for imgaug
- Augmentor – Image augmentation library in Python for machine learning
- albumentations – fast image augmentation library and easy to use wrapper around other libraries
Feature engineering
- Featuretools – automated feature engineering
- scikit-feature – feature selection repository in python
- skl-groups – scikit-learn addon to operate on set/”group”-based features
- Feature Forge – a set of tools for creating and testing machine learning feature
- boruta_py – implementations of the Boruta all-relevant feature selection method
- BoostARoota – a fast xgboost feature selection algorithm
- few – a feature engineering wrapper for sklearn
- scikit-rebate – a scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning
- scikit-mdr – a sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.
- tsfresh – Automatic extraction of relevant features from time series
Data manipulation & pipelines
- pandas – powerful Python data analysis toolkit
- sklearn-pandas – Pandas integration with sklearn
- alexander – wrapper that aims to make scikit-learn fully compatible with pandas
- blaze – NumPy and Pandas interface to Big Data
- pandasql – allows you to query pandas DataFrames using SQL syntax
- pandas-gbq – Pandas Google Big Query
- xpandas – universal 1d/2d data containers with Transformers functionality for data analysis by The Alan Turing Institute
- Fuel – data pipeline framework for machine learning
- Arctic – high performance datastore for time series and tick data
- pdpipe – sasy pipelines for pandas DataFrames.
- SSPipe – Python pipe () operator with support for DataFrames and Numpy and Pytorch
- meza – a Python toolkit for processing tabular data
- pandas-ply – functional data manipulation for pandas
- Dplython – Dplyr for Python
- pysparkling – a pure Python implementation of Apache Spark’s RDD and DStream interfaces
- quinn – pyspark methods to enhance developer productivity
- Dataset – helps you conveniently work with random or sequential batches of your data and define data processing
- swifter – a package which efficiently applies any function to a pandas dataframe or series in the fastest available manner
Statistics
- statsmodels – statistical modeling and econometrics in Python
- stockstats – Supply a wrapper
StockDataFrame
based on thepandas.DataFrame
with inline stock statistics/indicators support. - simplestatistics – simple statistical functions implemented in readable Python.
- weightedcalcs – pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more
- scikit-posthocs – Pairwise Multiple Comparisons Post-hoc Tests
- pysie – provides python implementation of statistical inference engine
Experiments tools
- Sacred – a tool to help you configure, organize, log and reproduce experiments by IDSIA
- Xcessiv – a web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling
- Persimmon – A visual dataflow programming language for sklearn
Visualization
- Matplotlib – plotting with Python
- seaborn – statistical data visualization using matplotlib
- Bokeh – Interactive Web Plotting for Python
- HoloViews – stop plotting your data – annotate your data and let it visualize itself
- Alphalens – performance analysis of predictive (alpha) stock factors by Quantopian
- python-ternary – ternary plotting library for python with matplotlib
- Naarad – framework for performance analysis & rating of sharded & stateful services.
Evaluation
- kaggle-metrics – Metrics for Kaggle competitions
- Metrics – machine learning evaluation metric
- sklearn-evaluation – scikit-learn model evaluation made easy: plots, tables and markdown reports
Computations
- numpy – the fundamental package needed for scientific computing with Python.
- Dask – parallel computing with task scheduling
- bottleneck – Fast NumPy array functions written in C
- minpy – NumPy interface with mixed backend execution
- CuPy – NumPy-like API accelerated with CUDA
- scikit-tensor – Python library for multilinear algebra and tensor factorizations
- numdifftools – solve automatic numerical differentiation problems in one or more variables
- quaternion – Add built-in support for quaternions to numpy
- adaptive – Tools for adaptive and parallel samping of mathematical functions
Spatial analysis
Quantum Computing
- QML – a Python Toolkit for Quantum Machine Learning
Conversion
- sklearn-porter – transpile trained scikit-learn estimators to C, Java, JavaScript and others
- ONNX – Open Neural Network Exchange
- MMdnn – a set of tools to help users inter-operate among different deep learning frameworks.
See Also
Ref
https://skymind.ai