Resources

Python

Books

Algorithms and Design Patterns in Python

Python implementation of algorithms and design patterns.

Compatibility

Libraries for migrating from Python 2 to 3.

  • Python-Future – The missing compatibility layer between Python 2 and Python 3.
  • Python-Modernize – Modernizes Python code for eventual Python 3 migration.
  • Six – Python 2 and 3 compatibility utilities.

Cluster Computing

Frameworks and libraries for Cluster Computing.

  • PySparkApache Spark Python API.
  • dask – A flexible parallel computing library for analytic computing.
  • faust – A stream processing library, porting the ideas from Kafka Streams to Python.
  • luigi – A module that helps you build complex pipelines of batch jobs.
  • mrjob – Run MapReduce jobs on Hadoop or Amazon Web Services.
  • streamparse – Run Python code against real-time streams of data via Apache Storm.

Computer Vision

Libraries for computer vision.

Concurrency and Parallelism

Libraries for concurrent and parallel execution.

  • concurrent.futures – (Python standard library) Process-based "threading" interface.
  • multiprocessing – (Python standard library) A high-level interface for asynchronously executing callables.
  • eventlet – Asynchronous framework with WSGI support.
  • gevent – A coroutine-based Python networking library that uses greenlet.
  • SCOOP – Scalable Concurrent Operations in Python.
  • Tomorrow – Magic decorator syntax for asynchronous code.
  • uvloop – Ultra fast implementation of asyncio event loop on top of libuv.

Deep Learning

Frameworks for Neural Networks and Deep Learning. See: awesome-deep-learning.

  • TensorFlow – The most popular Deep Learning framework created by Google.
  • Caffe – A fast open framework for deep learning..
  • Keras – A high-level neural networks library and capable of running on top of either TensorFlow or Theano.
  • MXNet – A deep learning framework designed for both efficiency and flexibility.
  • Neupy – Running and testing different Artificial Neural Networks algorithms.
  • Pytorch – Tensors and Dynamic neural networks in Python with strong GPU acceleration.
  • Serpent.AI – Game agent framework. Use any video game as a deep learning sandbox.
  • Theano – A library for fast numerical computation.

Machine Learning

Libraries for Machine Learning. See: awesome-machine-learning.

  • H2O – Open Source Fast Scalable Machine Learning Platform.
  • Metrics – Machine learning evaluation metrics.
  • NuPIC – Numenta Platform for Intelligent Computing.
  • scikit-learn – The most popular Python library for Machine Learning.
  • Spark MLApache Spark‘s scalable Machine Learning library.
  • vowpal_porpoise – A lightweight Python wrapper for Vowpal Wabbit.
  • xgboost – A scalable, portable, and distributed gradient boosting library.

Data Analysis

Libraries for data analyzing.

  • Pandas – A library providing high-performance, easy-to-use data structures and data analysis tools.
  • Blaze – NumPy and Pandas interface to Big Data.
  • Open Mining – Business Intelligence (BI) in Pandas interface.
  • Orange – Data mining, data visualization, analysis and machine learning through visual programming or scripts.
  • Optimus – Cleansing, pre-processing, feature engineering, exploratory data analysis and easy Machine Learning with a PySpark backend.

Data Visualization

Libraries for visualizing data. See: awesome-javascript.

  • Altair – Declarative statistical visualization library for Python.
  • Bokeh – Interactive Web Plotting for Python.
  • bqplot – Interactive Plotting Library for the Jupyter Notebook
  • ggplot – Same API as ggplot2 for R.
  • Matplotlib – A Python 2D plotting library.
  • Pygal – A Python SVG Charts Creator.
  • PyGraphviz – Python interface to Graphviz.
  • PyQtGraph – Interactive and realtime 2D/3D/Image plotting and science/engineering widgets.
  • Seaborn – Statistical data visualization using Matplotlib.
  • VisPy – High-performance scientific visualization based on OpenGL.

Database

Databases implemented in Python.

  • pickleDB – A simple and lightweight key-value store for Python.
  • TinyDB – A tiny, document-oriented database.
  • ZODB – A native object database for Python. A key-value and object graph database.

Database Drivers

Libraries for connecting and operating databases.

  • MySQL – awesome-mysql
    • mysqlclient – MySQL connector with Python 3 support (mysql-python fork).
    • oursql – A better MySQL connector with support for native prepared statements and BLOBs.
    • PyMySQL – A pure Python MySQL driver compatible to mysql-python.
  • PostgreSQL – awesome-postgres
    • psycopg2 – The most popular PostgreSQL adapter for Python.
    • queries – A wrapper of the psycopg2 library for interacting with PostgreSQL.
    • txpostgres – Twisted based asynchronous driver for PostgreSQL.
  • Other Relational Databases
    • apsw – Another Python SQLite wrapper.
    • dataset – Store Python dicts in a database – works with SQLite, MySQL, and PostgreSQL.
    • pymssql – A simple database interface to Microsoft SQL Server.
  • NoSQL Databases
    • cassandra-driver – The Python Driver for Apache Cassandra.
    • HappyBase – A developer-friendly library for Apache HBase.
    • kafka-python – The Python client for Apache Kafka.
    • py2neo – Python wrapper client for Neo4j’s restful interface.
    • PyMongo – The official Python client for MongoDB.
    • redis-py – The Python client for Redis.
  • Asynchronous Clients
    • Motor – The async Python driver for MongoDB.
    • telephus – Twisted based client for Cassandra.
    • txRedis – Twisted based client for Redis.

Documentation

Libraries for generating project documentation.

  • Sphinx – Python Documentation generator.
  • MkDocs – Markdown friendly documentation generator.
  • pdoc – Epydoc replacement to auto generate API documentation for Python libraries.
  • Pycco – The literate-programming-style documentation generator.

Environment Management

Libraries for Python version and environment management.

  • Pipenv – Sacred Marriage of Pipfile, Pip, & Virtualenv.
  • p – Dead simple interactive Python version management.
  • pyenv – Simple Python version management.
  • venv – (Python standard library in Python 3.3+) Creating lightweight virtual environments.
  • virtualenv – A tool to create isolated Python environments.
  • virtualenvwrapper – A set of extensions to virtualenv.

HTML Manipulation

Libraries for working with HTML and XML.

  • BeautifulSoup – Providing Pythonic idioms for iterating, searching, and modifying HTML or XML.
  • bleach – A whitelist-based HTML sanitization and text linkification library.
  • cssutils – A CSS library for Python.
  • html5lib – A standards-compliant library for parsing and serializing HTML documents and fragments.
  • lxml – A very fast, easy-to-use and versatile library for handling HTML and XML.
  • MarkupSafe – Implements a XML/HTML/XHTML Markup safe string for Python.
  • pyquery – A jQuery-like library for parsing HTML.
  • untangle – Converts XML documents to Python objects for easy access.
  • WeasyPrint – A visual rendering engine for HTML and CSS that can export to PDF.
  • xmldataset – Simple XML Parsing.
  • xmltodict – Working with XML feel like you are working with JSON.

HTTP

Libraries for working with HTTP.

  • grequests – requests + gevent for asynchronous HTTP requests.
  • httplib2 – Comprehensive HTTP client library.
  • requests – HTTP Requests for Humans.
  • treq – Python requests like API built on top of Twisted’s HTTP client.
  • urllib3 – A HTTP library with thread-safe connection pooling, file post support, sanity friendly.

Natural Language Processing

Libraries for working with human languages.

  • gensim – Topic Modelling for Humans.
  • Jieba – Chinese text segmentation.
  • langid.py – Stand-alone language identification system.
  • NLTK – A leading platform for building Python programs to work with human language data.
  • Pattern – A web mining module for the Python.
  • polyglot – Natural language pipeline supporting hundreds of languages.
  • SnowNLP – A library for processing Chinese text.
  • spaCy – A library for industrial-strength natural language processing in Python and Cython.
  • TextBlob – Providing a consistent API for diving into common NLP tasks.
  • PyTorch-NLP – A toolkit enabling rapid deep learning NLP prototyping for research.

Networking

Libraries for networking programming.

  • asyncio – (Python standard library) Asynchronous I/O, event loop, coroutines and tasks.
  • diesel – Greenlet-based event I/O Framework for Python.
  • pulsar – Event-driven concurrent framework for Python.
  • pyzmq – A Python wrapper for the ZeroMQ message library.
  • Twisted – An event-driven networking engine.
  • txZMQ – Twisted based wrapper for the ZeroMQ message library.
  • NAPALM – Cross-vendor API to manipulate network devices.

Package Management

Libraries for package and dependency management.

  • pip – The Python package and dependency manager.
  • conda – Cross-platform, Python-agnostic binary package manager.
  • Curdling – Curdling is a command line tool for managing Python packages.
  • pip-tools – A set of tools to keep your pinned Python dependencies fresh.
  • wheel – The new standard of Python distribution and are intended to replace eggs.

Queue

Libraries for working with event and task queues.

  • celery – An asynchronous task queue/job queue based on distributed message passing.
  • huey – Little multi-threaded task queue.
  • mrq – Mr. Queue – A distributed worker task queue in Python using Redis & gevent.
  • rq – Simple job queues for Python.
  • simpleq – A simple, infinitely scalable, Amazon SQS based queue.

Recommender Systems

Libraries for building recommender systems.

  • annoy – Approximate Nearest Neighbors in C++/Python optimized for memory usage.
  • fastFM – A library for Factorization Machines.
  • implicit – A fast Python implementation of collaborative filtering for implicit datasets.
  • libffm – A library for Field-aware Factorization Machine (FFM).
  • LightFM – A Python implementation of a number of popular recommendation algorithms.
  • Spotlight – Deep recommender models using PyTorch.
  • surprise – A scikit for building and analyzing recommender systems.
  • TensorRec – A Recommendation Engine Framework in TensorFlow.

RESTful API

Libraries for developing RESTful APIs.

  • Django
  • Flask
    • eve – REST API framework powered by Flask, MongoDB and good intentions.
    • flask-api-utils – Taking care of API representation and authentication for Flask.
    • flask-api – Browsable Web APIs for Flask.
    • flask-restful – Quickly building REST APIs for Flask.
    • flask-restless – Generating RESTful APIs for database models defined with SQLAlchemy.
  • Pyramid
    • cornice – A RESTful framework for Pyramid.
  • Framework agnostic
    • falcon – A high-performance framework for building cloud APIs and web app backends.
    • hug – A Python3 framework for cleanly exposing APIs over HTTP and the Command Line with automatic documentation and validation.
    • restless – Framework agnostic REST framework based on lessons learned from Tastypie.
    • ripozo – Quickly creating REST/HATEOAS/Hypermedia APIs.
    • sandman – Automated REST APIs for existing database-driven systems.
    • apistar – A smart Web API framework, designed for Python 3.

Science

Libraries for scientific computing.

  • astropy – A community Python library for Astronomy.
  • bcbio-nextgen – Providing best-practice pipelines for fully automated high throughput sequencing analysis.
  • bccb – Collection of useful code related to biological analysis.
  • Biopython – Biopython is a set of freely available tools for biological computation.
  • cclib – A library for parsing and interpreting the results of computational chemistry packages.
  • Colour – A colour science package implementing a comprehensive number of colour theory transformations and algorithms.
  • NetworkX – A high-productivity software for complex networks.
  • NIPY – A collection of neuroimaging toolkits.
  • NumPy – A fundamental package for scientific computing with Python.
  • Open Babel – A chemical toolbox designed to speak the many languages of chemical data.
  • ObsPy – A Python toolbox for seismology.
  • PyDy – Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion.
  • PyMC – Markov Chain Monte Carlo sampling toolkit.
  • QuTiP – Quantum Toolbox in Python.
  • RDKit – Cheminformatics and Machine Learning Software.
  • SciPy – A Python-based ecosystem of open-source software for mathematics, science, and engineering.
  • statsmodels – Statistical modeling and econometrics in Python.
  • SymPy – A Python library for symbolic mathematics.
  • Zipline – A Pythonic algorithmic trading library.
  • SimPy – A process-based discrete-event simulation framework.

Search

Libraries and software for indexing and performing search queries on data.

Serialization

Libraries for serializing complex data types

  • marshmallow – marshmallow is an ORM/ODM/framework-agnostic library for converting complex datatypes, such as objects, to and from native Python datatypes.

Serverless Frameworks

Frameworks for developing serverless Python code.

  • apex – Build, deploy, and manage AWS Lambda functions with ease.
  • python-lambda – A toolkit for developing and deploying Python code in AWS Lambda.
  • Zappa – A tool for deploying WSGI applications on AWS Lambda and API Gateway.

Document Manipulation

Libraries for parsing and manipulating specific text formats.

  • General
    • tablib – A module for Tabular Datasets in XLS, CSV, JSON, YAML.
  • Office
    • Marmir – Takes Python data structures and turns them into spreadsheets.
    • openpyxl – A library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files.
    • pyexcel – Providing one API for reading, manipulating and writing csv, ods, xls, xlsx and xlsm files.
    • python-docx – Reads, queries and modifies Microsoft Word 2007/2008 docx files.
    • python-pptx – Python library for creating and updating PowerPoint (.pptx) files.
    • relatorio – Templating OpenDocument files.
    • unoconv – Convert between any document format supported by LibreOffice/OpenOffice.
    • XlsxWriter – A Python module for creating Excel .xlsx files.
    • xlwings – A BSD-licensed library that makes it easy to call Python from Excel and vice versa.
    • xlwt / xlrd – Writing and reading data and formatting information from Excel files.
  • PDF
    • PDFMiner – A tool for extracting information from PDF documents.
    • PyPDF2 – A library capable of splitting, merging and transforming PDF pages.
    • ReportLab – Allowing Rapid creation of rich PDF documents.
  • Markdown
    • Mistune – Fastest and full featured pure Python parsers of Markdown.
    • Python-Markdown – A Python implementation of John Gruber’s Markdown.
  • YAML
    • PyYAML – YAML implementations for Python.
  • CSV
    • csvkit – Utilities for converting to and working with CSV.
  • Archive
    • unp – A command line tool that can unpack archives easily.

Testing

Libraries for testing codebases and generating test data.

  • Testing Frameworks
    • hypothesis – Hypothesis is an advanced Quickcheck style property based testing library.
    • mamba – The definitive testing tool for Python. Born under the banner of BDD.
    • nose – A nicer unittest for Python.
    • nose2 – The successor to nose, based on unittest2.
    • pytest – A mature full-featured Python testing tool.
    • Robot Framework – A generic test automation framework.
    • unittest – (Python standard library) Unit testing framework.
  • Test Runners
    • green – A clean, colorful test runner.
    • tox – Auto builds and tests distributions in multiple Python versions
  • GUI / Web Testing
    • locust – Scalable user load testing tool written in Python.
    • PyAutoGUI – PyAutoGUI is a cross-platform GUI automation Python module for human beings.
    • Selenium – Python bindings for Selenium WebDriver.
    • sixpack – A language-agnostic A/B Testing framework.
    • splinter – Open source tool for testing web applications.
  • Mock
    • doublex – Powerful test doubles framework for Python.
    • freezegun – Travel through time by mocking the datetime module.
    • httmock – A mocking library for requests for Python 2.6+ and 3.2+.
    • httpretty – HTTP request mock tool for Python.
    • mock – (Python standard library) A mocking and patching library.
    • Mocket – Socket Mock Framework plus HTTP[S]/asyncio/gevent mocking library with recording/replaying capability.
    • responses – A utility library for mocking out the requests Python library.
    • VCR.py – Record and replay HTTP interactions on your tests.
  • Object Factories
    • factory_boy – A test fixtures replacement for Python.
    • mixer – Another fixtures replacement. Supported Django, Flask, SQLAlchemy, Peewee and etc.
    • model_mommy – Creating random fixtures for testing in Django.
  • Code Coverage
    • coverage – Code coverage measurement.
  • Fake Data
    • mimesis – is a Python library that help you generate fake data.
    • fake2db – Fake database generator.
    • faker – A Python package that generates fake data.
    • radar – Generate random datetime / time.
  • Error Handler
    • FuckIt.py – FuckIt.py uses state-of-the-art technology to make sure your Python code runs whether it has any right to or not.

Text Processing

Libraries for parsing and manipulating plain texts.

  • General
    • chardet – Python 2/3 compatible character encoding detector.
    • difflib – (Python standard library) Helpers for computing deltas.
    • ftfy – Makes Unicode text less broken and more consistent automagically.
    • fuzzywuzzy – Fuzzy String Matching.
    • Levenshtein – Fast computation of Levenshtein distance and string similarity.
    • pangu.py – Spacing texts for CJK and alphanumerics.
    • pyfiglet – An implementation of figlet written in Python.
    • pypinyin – Convert Chinese hanzi to pinyin.
    • shortuuid – A generator library for concise, unambiguous and URL-safe UUIDs.
    • textdistance – Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.
    • unidecode – ASCII transliterations of Unicode text.
    • uniout – Print readable chars instead of the escaped string.
    • xpinyin – A library to translate Chinese hanzi (漢字) to pinyin (拼音).
  • Slugify
    • awesome-slugify – A Python slugify library that can preserve unicode.
    • python-slugify – A Python slugify library that translates unicode to ASCII.
    • unicode-slugify – A slugifier that generates unicode slugs with Django as a dependency.
  • Parser
    • phonenumbers – Parsing, formatting, storing and validating international phone numbers.
    • PLY – Implementation of lex and yacc parsing tools for Python.
    • Pygments – A generic syntax highlighter.
    • pyparsing – A general purpose framework for generating parsers.
    • python-nameparser – Parsing human names into their individual components.
    • python-user-agents – Browser user agent parser.
    • sqlparse – A non-validating SQL parser.

Third-party APIs

Libraries for accessing third party services APIs. See: List of Python API Wrappers and Libraries.

  • apache-libcloud – One Python library for all clouds.
  • boto3 – Python interface to Amazon Web Services.
  • django-wordpress – WordPress models and views for Django.
  • facebook-sdk – Facebook Platform Python SDK.
  • facepy – Facepy makes it really easy to interact with Facebook’s Graph API
  • gmail – A Pythonic interface for Gmail.
  • google-api-python-client – Google APIs Client Library for Python.
  • gspread – Google Spreadsheets Python API.
  • twython – A Python wrapper for the Twitter API.

Audio

Libraries for manipulating audio.

  • audiolazy – Expressive Digital Signal Processing (DSP) package for Python.
  • audioread – Cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding.
  • beets – A music library manager and MusicBrainz tagger.
  • eyeD3 – A tool for working with audio files, specifically MP3 files containing ID3 metadata.
  • id3reader – A Python module for reading MP3 meta data.
  • m3u8 – A module for parsing m3u8 file.
  • mingus – An advanced music theory and notation package with MIDI file and playback support.
  • mutagen – A Python module to handle audio metadata.
  • pyAudioAnalysis – Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
  • pydub – Manipulate audio with a simple and easy high level interface.
  • pyechonest – Python client for the Echo Nest API.
  • talkbox – A Python library for speech/signal processing.
  • TimeSide – Open web audio processing framework.
  • tinytag – A library for reading music meta data of MP3, OGG, FLAC and Wave files.

Video

Libraries for manipulating video and GIFs.

  • moviepy – A module for script-based movie editing with many formats, including animated GIFs.
  • scikit-video – Video processing routines for SciPy.

WSGI Servers

WSGI-compatible web servers.

  • bjoern – Asynchronous, very fast and written in C.
  • fapws3 – Asynchronous (network side only), written in C.
  • gunicorn – Pre-forked, partly written in C.
  • meinheld – Asynchronous, partly written in C.
  • netius – Asynchronous, very fast.
  • rocket – Multi-threaded.
  • uWSGI – A project aims at developing a full stack for building hosting services, written in C.
  • waitress – Multi-threaded, powers Pyramid.
  • Werkzeug – A WSGI utility library for Python that powers Flask and can easily be embedded into your own projects.

Web Content Extracting

Libraries for extracting web contents.

  • Haul – An Extensible Image Crawler.
  • html2text – Convert HTML to Markdown-formatted text.
  • lassie – Web Content Retrieval for Humans.
  • micawber – A small library for extracting rich content from URLs.
  • newspaper – News extraction, article extraction and content curation in Python.
  • python-goose – HTML Content/Article Extractor.
  • python-readability – Fast Python port of arc90’s readability tool.
  • requests-html – Pythonic HTML Parsing for Humans.
  • sanitize – Bringing sanity to world of messed-up data.
  • sumy – A module for automatic summarization of text documents and HTML pages.
  • textract – Extract text from any document, Word, PowerPoint, PDFs, etc.
  • toapi – Every web site provides APIs.

Web Crawling & Web Scraping

Libraries to automate data extraction from websites.

  • cola – A distributed crawling framework.
  • Demiurge – PyQuery-based scraping micro-framework.
  • feedparser – Universal feed parser.
  • Grab – Site scraping framework.
  • MechanicalSoup – A Python library for automating interaction with websites.
  • portia – Visual scraping for Scrapy.
  • pyspider – A powerful spider system.
  • RoboBrowser – A simple, Pythonic library for browsing the web without a standalone web browser.
  • Scrapy – A fast high-level screen scraping and web crawling framework.

Web Frameworks

Full stack web frameworks.

  • Django – The most popular web framework in Python.
  • Flask – A microframework for Python.
  • Pyramid – A small, fast, down-to-earth, open source Python web framework.
  • Sanic – Web server that’s written to go fast.
  • Tornado – A Web framework and asynchronous networking library.
  • Vibora – Fast, efficient and asynchronous Web framework inspired by Flask.

WebSocket

Libraries for working with WebSocket.

Resources

Twitter

Websites

Github

Close Menu