A Directory of Python Machine Learning and Data Science Frameworks

In Artificial Intelligence by Christian HissibiniLeave a Comment

All libraries below are free, and most are open-source.

Machine Learning

General purpouse Machine Learning

  • scikit-learn – machine learning in Python
  • Shogun – machine learning toolbox
  • xLearn – High Performance, Easy-to-use, and Scalable Machine Learning Package
  • Reproducible Experiment Platform (REP) – Machine Learning toolbox for Humans
  • modAL – a modular active learning framework for Python3
  • Sparkit-learn – PySpark + Scikit-learn = Sparkit-learn
  • mlpack – a scalable C++ machine learning library (Python bindings)
  • dlib – A toolkit for making real world machine learning and data analysis applications in C++ (Python bindings)
  • MLxtend – extension and helper modules for Python’s data analysis and machine learning libraries
  • tick – module for statistical learning, with a particular emphasis on time-dependent modelling
  • sklearn-extensions – a consolidated package of small extensions to scikit-learn
  • civisml-extensions – scikit-learn-compatible estimators from Civis Analytics
  • scikit-multilearn – multi-label classification for python
  • tslearn – machine learning toolkit dedicated to time-series data
  • seqlearn – seqlearn is a sequence classification toolkit for Python
  • pystruct – Simple structured learning framework for python
  • sklearn-expertsys – Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models
  • skutil – A set of scikit-learn and h2o extension classes (as well as caret classes for python)
  • sklearn-crfsuite – scikit-learn inspired API for CRFsuite
  • RuleFit – implementation of the rulefit
  • metric-learn – metric learning algorithms in Python
  • pyGAM – Generalized Additive Models in Python
  • luminol – Anomaly Detection and Correlation library

Automated machine learning

  • TPOT – Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming
  • auto-sklearn – is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator
  • MLBox – a powerful Automated Machine Learning python library.

Ensemble methods

  • ML-Ensemble – high performance ensemble learning
  • brew – Python Ensemble Learning API
  • Stacking – Simple and useful stacking library, written in Python.
  • stacked_generalization – library for machine learning stacking generalization.
  • vecstack – Python package for stacking (machine learning technique)

Imbalanced datasets

  • imbalanced-learn – module to perform under sampling and over sampling with various techniques
  • imbalanced-algorithms – Python-based implementations of algorithms for learning on imbalanced data.

Random Forests

Extreme Learning Machine

  • Python-ELM – Extreme Learning Machine implementation in Python
  • Python Extreme Learning Machine (ELM) – a machine learning technique used for classification/regression tasks
  • hpelm ![alt text][gpu] – High performance implementation of Extreme Learning Machines (fast randomized neural networks).

Kernel methods

  • pyFM – Factorization machines in python
  • fastFM – a library for Factorization Machines
  • tffm – TensorFlow implementation of an arbitrary order Factorization Machine
  • liquidSVM – an implementation of SVMs
  • scikit-rvm – Relevance Vector Machine implementation using the scikit-learn API

Gradient boosting

  • XGBoost ![alt text][gpu] – Scalable, Portable and Distributed Gradient Boosting
  • LightGBM ![alt text][gpu] – a fast, distributed, high performance gradient boosting by Microsoft
  • CatBoost ![alt text][gpu] – an open-source gradient boosting on decision trees library by Yandex
  • InfiniteBoost – building infinite ensembles with gradient descent
  • TGBoost – Tiny Gradient Boosting Tree

Deep Learning

Keras

  • Keras – a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano
  • keras-contrib – Keras community contributions
  • Hyperas – Keras + Hyperopt: A very simple wrapper for convenient hyperparameter
  • Elephas – Distributed Deep learning with Keras & Spark
  • Hera – Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.
  • dist-keras – Distributed Deep Learning, with a focus on distributed training
  • Conx – The On-Ramp to Deep Learning
  • Keras add-ons

PyTorch

  • PyTorch – Tensors and Dynamic neural networks in Python with strong GPU acceleration
  • torchvision – Datasets, Transforms and Models specific to Computer Vision
  • torchtext – Data loaders and abstractions for text and NLP
  • torchaudio – an audio library for PyTorch
  • ignite – high-level library to help with training neural networks in PyTorch
  • PyToune – a Keras-like framework and utilities for PyTorch
  • skorch – a scikit-learn compatible neural network library that wraps pytorch
  • PyTorchNet – an abstraction to train neural networks
  • Aorun – intend to implement an API similar to Keras with PyTorch as backend.
  • pytorch_geometric – Geometric Deep Learning Extension Library for PyTorch

Tensorflow

  • TensorFlow – Computation using data flow graphs for scalable machine learning by Google
  • TensorLayer – Deep Learning and Reinforcement Learning Library for Researcher and Engineer.
  • TFLearn – Deep learning library featuring a higher-level API for TensorFlow
  • Sonnet – TensorFlow-based neural network library by DeepMind
  • TensorForce – a TensorFlow library for applied reinforcement learning
  • tensorpack – a Neural Net Training Interface on TensorFlow
  • Polyaxon – a platform that helps you build, manage and monitor deep learning models
  • Horovod – Distributed training framework for TensorFlow
  • tfdeploy – Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy
  • hiptensorflow ![alt text][amd] – ROCm/HIP enabled Tensorflow
  • TensorFlow Fold – Deep learning with dynamic computation graphs in TensorFlow
  • tensorlm – wrapper library for text generation / language models at char and word level with RNN
  • TensorLight – a high-level framework for TensorFlow
  • Mesh TensorFlow – Model Parallelism Made Easier

Theano

Warning: Theano development has ceased

  • Theano – is a Python library that allows you to define, optimize, and evaluate mathematical expressions
  • Lasagne – Lightweight library to build and train neural networks in Theano Lasagne add-ons…
  • nolearn – scikit-learn compatible neural network library (mainly for Lasagne)
  • Blocks – a Theano framework for building and training neural networks
  • platoon – Multi-GPU mini-framework for Theano
  • NeuPy – NeuPy is a Python library for Artificial Neural Networks and Deep Learning
  • scikit-neuralnetwork – Deep neural networks without the learning cliff
  • Theano-MPI – MPI Parallel framework for training deep learning models built in Theano

MXNet

  • MXNet – Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler
  • Gluon – a clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet)
  • MXbox – simple, efficient and flexible vision toolbox for mxnet framework.
  • gluon-cv – provides implementations of the state-of-the-art deep learning models in computer vision.
  • gluon-nlp – NLP made easy
  • MXNet ![alt text][amd] – HIP Port of MXNet

Caffe

  • Caffe – a fast open framework for deep learning
  • Caffe2 – a lightweight, modular, and scalable deep learning framework
  • hipCaffe ![alt text][amd] – the HIP port of Caffe

CNTK

  • CNTK – Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit

Chainer

  • Chainer – a flexible framework for neural networks
  • ChainerRL – a deep reinforcement learning library built on top of Chainer.
  • ChainerCV – a Library for Deep Learning in Computer Vision
  • ChainerMN – scalable distributed deep learning with Chainer
  • scikit-chainer – scikit-learn like interface to chainer
  • chainer_sklearn – Sklearn (Scikit-learn) like interface for Chainer

Others

  • SKIL Skymind’s platform for distributed training of machine learning models, tracking machine learning experiments, deploying models to production and managing them over their lifecycle.
  • Neon – Intel Nervana™ reference deep learning framework committed to best performance on all hardware
  • Tangent – Source-to-Source Debuggable Derivatives in Pure Python
  • autograd – Efficiently computes derivatives of numpy code
  • Myia – deep learning framework (pre-alpha)
  • nnabla – Neural Network Libraries by Sony

Model explanation

  • Auralisation – auralisation of learned features in CNN (for audio)
  • CapsNet-Visualization – a visualization of the CapsNet layers to better understand how it works
  • lucid – a collection of infrastructure and tools for research in neural network interpretability.
  • Netron – visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks)
  • FlashLight – visualization Tool for your NeuralNetwork
  • tensorboard-pytorch – tensorboard for pytorch (and chainer, mxnet, numpy, …)
  • anchor – code for “High-Precision Model-Agnostic Explanations” paper
  • aequitas – Bias and Fairness Audit Toolkit
  • Contrastive Explanation – Contrastive Explanation (Foil Trees)
  • yellowbrick – visual analysis and diagnostic tools to facilitate machine learning model selection
  • scikit-plot – an intuitive library to add plotting functionality to scikit-learn objects
  • shap – a unified approach to explain the output of any machine learning model
  • ELI5 – a library for debugging/inspecting machine learning classifiers and explaining their predictions
  • Lime – Explaining the predictions of any machine learning classifier
  • FairML – FairML is a python toolbox auditing the machine learning models for bias
  • L2X – Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation
  • PDPbox – partial dependence plot toolbox
  • pyBreakDown – Python implementation of R package breakDown
  • PyCEbox – Python Individual Conditional Expectation Plot Toolbox
  • Skater – Python Library for Model Interpretation
  • tensorflow/model-analysis – Model analysis tools for TensorFlow
  • themis-ml – a library that implements fairness-aware machine learning algorithms
  • treeinterpreter [alt text][skl] -interpreting scikit-learn’s decision tree and random forest predictions

Reinforcement Learning

  • OpenAI Gym – a toolkit for developing and comparing reinforcement learning algorithms.

Distributed computing systems

  • PySpark – exposes the Spark programming model to Python
  • Veles – Distributed machine learning platform by Samsung
  • Jubatus – Framework and Library for Distributed Online Machine Learning
  • DMTK – Microsoft Distributed Machine Learning Toolkit
  • PaddlePaddle – PArallel Distributed Deep LEarning by Baidu
  • dask-ml – Distributed and parallel machine learning
  • Distributed – Distributed computation in Python

Probabilistic methods

  • pomegranate ![alt text][cp] – probabilistic and graphical models for Python
  • pyro – a flexible, scalable deep probabilistic programming library built on PyTorch.
  • ZhuSuan – Bayesian Deep Learning
  • PyMC – Bayesian Stochastic Modelling in Python
  • PyMC3 – Python package for Bayesian statistical modeling and Probabilistic Machine Learning
  • sampled – Decorator for reusable models in PyMC3
  • Edward – A library for probabilistic modeling, inference, and criticism.
  • InferPy – Deep Probabilistic Modelling Made Easy
  • GPflow – Gaussian processes in TensorFlow
  • PyStan – Bayesian inference using the No-U-Turn sampler (Python interface)
  • gelato – Bayesian dessert for Lasagne
  • sklearn-bayes – Python package for Bayesian Machine Learning with scikit-learn API
  • bayesloop – Probabilistic programming framework that facilitates objective model selection for time-varying parameter models
  • PyFlux – Open source time series library for Python
  • skggm – estimation of general graphical models
  • pgmpy – a python library for working with Probabilistic Graphical Models.
  • skpro – supervised domain-agnostic prediction framework for probabilistic modelling by The Alan Turing Institute
  • Aboleth – a bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation
  • PtStat – Probabilistic Programming and Statistical Inference in PyTorch
  • PyVarInf – Bayesian Deep Learning methods with Variational Inference for PyTorch
  • emcee – The Python ensemble sampling toolkit for affine-invariant MCMC
  • hsmmlearn – a library for hidden semi-Markov models with explicit durations
  • pyhsmm – bayesian inference in HSMMs and HMMs
  • GPyTorch – a highly efficient and modular implementation of Gaussian Processes in PyTorch
  • Bayes – Python implementations of Naive Bayes algorithm variants

Genetic Programming

  • gplearn – Genetic Programming in Python
  • DEAP – Distributed Evolutionary Algorithms in Python
  • karoo_gp – A Genetic Programming platform for Python with GPU support
  • monkeys – A strongly-typed genetic programming framework for Python
  • sklearn-genetic – Genetic feature selection module for scikit-learn

Optimization

  • Spearmint – Bayesian optimization
  • SMAC3 – Sequential Model-based Algorithm Configuration
  • Optunity – is a library containing various optimizers for hyperparameter tuning.
  • hyperopt – Distributed Asynchronous Hyperparameter Optimization in Python
  • hyperopt-sklearn – hyper-parameter optimization for sklearn
  • sklearn-deap – use evolutionary algorithms instead of gridsearch in scikit-learn
  • sigopt_sklearn – SigOpt wrappers for scikit-learn methods
  • Bayesian Optimization – A Python implementation of global optimization with gaussian processes.
  • SafeOpt – Safe Bayesian Optimization
  • scikit-optimize – Sequential model-based optimization with a scipy.optimizeinterface
  • Solid – A comprehensive gradient-free optimization framework written in Python
  • PySwarms – A research toolkit for particle swarm optimization in Python
  • Platypus – A Free and Open Source Python Library for Multiobjective Optimization
  • GPflowOpt – Bayesian Optimization using GPflow
  • POT – Python Optimal Transport library
  • Talos – Hyperparameter Optimization for Keras Models

Natural Language Processing

  • NLTK – modules, data sets, and tutorials supporting research and development in Natural Language Processing
  • CLTK – The Classical Language Toolkik
  • gensim – Topic Modelling for Humans
  • PSI-Toolkit – a natural language processing toolkit by Adam Mickiewicz University in Poznań
  • pyMorfologik – Python binding for Morfologik (Polish morphological analyzer)
  • skift – scikit-learn wrappers for Python fastText.
  • Phonemizer – Simple text to phonemes converter for multiple languages

Computer Audition

  • librosa – Python library for audio and music analysis
  • Yaafe – Audio features extraction
  • aubio – a library for audio and music analysis
  • Essentia – library for audio and music analysis, description and synthesis
  • LibXtract – is a simple, portable, lightweight library of audio feature extraction functions
  • Marsyas – Music Analysis, Retrieval and Synthesis for Audio Signals
  • muda – a library for augmenting annotated audio data
  • madmom – Python audio and music signal processing library

Computer Vision

  • OpenCV – Open Source Computer Vision Library
  • scikit-image – Image Processing SciKit (Toolbox for SciPy)
  • imgaug – image augmentation for machine learning experiments
  • imgaug_extension – additional augmentations for imgaug
  • Augmentor – Image augmentation library in Python for machine learning
  • albumentations – fast image augmentation library and easy to use wrapper around other libraries

Feature engineering

  • Featuretools – automated feature engineering
  • scikit-feature – feature selection repository in python
  • skl-groups – scikit-learn addon to operate on set/”group”-based features
  • Feature Forge – a set of tools for creating and testing machine learning feature
  • boruta_py – implementations of the Boruta all-relevant feature selection method
  • BoostARoota – a fast xgboost feature selection algorithm
  • few – a feature engineering wrapper for sklearn
  • scikit-rebate – a scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning
  • scikit-mdr – a sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.
  • tsfresh – Automatic extraction of relevant features from time series

Data manipulation & pipelines

  • pandas – powerful Python data analysis toolkit
  • sklearn-pandas – Pandas integration with sklearn
  • alexander – wrapper that aims to make scikit-learn fully compatible with pandas
  • blaze – NumPy and Pandas interface to Big Data
  • pandasql – allows you to query pandas DataFrames using SQL syntax
  • pandas-gbq – Pandas Google Big Query
  • xpandas – universal 1d/2d data containers with Transformers functionality for data analysis by The Alan Turing Institute
  • Fuel – data pipeline framework for machine learning
  • Arctic – high performance datastore for time series and tick data
  • pdpipe – sasy pipelines for pandas DataFrames.
  • SSPipe – Python pipe () operator with support for DataFrames and Numpy and Pytorch
  • meza – a Python toolkit for processing tabular data
  • pandas-ply – functional data manipulation for pandas
  • Dplython – Dplyr for Python
  • pysparkling – a pure Python implementation of Apache Spark’s RDD and DStream interfaces
  • quinn – pyspark methods to enhance developer productivity
  • Dataset – helps you conveniently work with random or sequential batches of your data and define data processing
  • swifter – a package which efficiently applies any function to a pandas dataframe or series in the fastest available manner

Statistics

  • statsmodels – statistical modeling and econometrics in Python
  • stockstats – Supply a wrapper StockDataFrame based on the pandas.DataFramewith inline stock statistics/indicators support.
  • simplestatistics – simple statistical functions implemented in readable Python.
  • weightedcalcs – pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more
  • scikit-posthocs – Pairwise Multiple Comparisons Post-hoc Tests
  • pysie – provides python implementation of statistical inference engine

Experiments tools

  • Sacred – a tool to help you configure, organize, log and reproduce experiments by IDSIA
  • Xcessiv – a web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling
  • Persimmon – A visual dataflow programming language for sklearn

Visualization

  • Matplotlib – plotting with Python
  • seaborn – statistical data visualization using matplotlib
  • Bokeh – Interactive Web Plotting for Python
  • HoloViews – stop plotting your data – annotate your data and let it visualize itself
  • Alphalens – performance analysis of predictive (alpha) stock factors by Quantopian
  • python-ternary – ternary plotting library for python with matplotlib
  • Naarad – framework for performance analysis & rating of sharded & stateful services.

Evaluation

  • kaggle-metrics – Metrics for Kaggle competitions
  • Metrics – machine learning evaluation metric
  • sklearn-evaluation – scikit-learn model evaluation made easy: plots, tables and markdown reports

Computations

  • numpy – the fundamental package needed for scientific computing with Python.
  • Dask – parallel computing with task scheduling
  • bottleneck – Fast NumPy array functions written in C
  • minpy – NumPy interface with mixed backend execution
  • CuPy – NumPy-like API accelerated with CUDA
  • scikit-tensor – Python library for multilinear algebra and tensor factorizations
  • numdifftools – solve automatic numerical differentiation problems in one or more variables
  • quaternion – Add built-in support for quaternions to numpy
  • adaptive – Tools for adaptive and parallel samping of mathematical functions

Spatial analysis

  • GeoPandas – Python tools for geographic data
  • PySal – Python Spatial Analysis Library

Quantum Computing

  • QML – a Python Toolkit for Quantum Machine Learning

Conversion

  • sklearn-porter – transpile trained scikit-learn estimators to C, Java, JavaScript and others
  • ONNX – Open Neural Network Exchange
  • MMdnn – a set of tools to help users inter-operate among different deep learning frameworks.

See Also

Ref
https://skymind.ai

Leave a Comment