Cross-platform python databases

Published on
3 min read
Cover image for Cross-platform python databases

Introduction

Recently I have been looking for a way to save states of my scripts in python and return to them freely later. I came up with an idea to use embedded database. An additional criterion was that it should be easily transferable between environments. I tried few modules, but after all, I didn't know which of these is the fastest. That's why I created benchmark of embedded databases that can be installed directly from the pythonic pip and can be implemented easly just by code.

Database libraries

I found a lot of great key/value redis-like databases, but most of them didn't meet my requirements. I just wanted db to be portable as much it can be. Thats why I found some db modules fully integrated into python code. You can get them just by pip without running any external services in the background. So these DB modules are:

  • vedis-python
  • unqlite-python
  • sqlitedict
  • tinydb
  • pickledb
  • semidbm
  • dbm.dumb

And I also used python wrapper connected to Redis docker instance for comparison purpose:

  • redis-py

Tests were made with:

  • macOS Mojave 10.14.6
  • Python 3.7

My code is a fork of benchmark found on https://charlesleifer.com blog. Original tested DBs are not easy-enough to implement as a cross-platform solution. There is always something not out-of-the box to-do with these DBs in non-unix environment and vice versa. That's why I came with my own list. Let's see if these databases are worth considering to use in your projects at all.

For test purposes I recorded time to create 10K keys and values pairs then time spend to read back all values. Then I repeated test for 100K pairs, but I had to exclude two the slowest modules (spoiler: tinydb, pickledb)

The results

Results for 100K and 10K are same. It looks like Vedis (#1) and UnQlite (#2) are still the quickest embedded cross-platform DBs for Python. Semidbm got also great result (#3). DBM.dumb (#4) and SQLiteDict (#5) are not a demons of speed. Tinydb (#7) and Picledb (#8) were so slow that I had to remove them from the graph to keep it clean. Results of Redis (#6) were also below expectations (w:18/r:20s), but it might be because I kept docker instance on my Mac. It looks like my comparison is unscientific as original one.

Graph

Benchmark

I share my benchmark as a gist here. Please feel free to run tests by yourself or extend it by some other libraries.