NovelEssay.com Programming Blog

Exploration of Big Data, Machine Learning, Natural Language Processing, and other fun problems.

Fast Persistent Key Value Pairs in C# with LevelDb



Let's say we want to crawl the internet, but we don't want to request any given URL more than once. We need to have a collection of URL keys that we can look up. It would be nice if we could have key-value pairs, so that we can give URL keys a value in case we change our minds and want to allow URL request updates every X days. We want it to handle billions of records and be really fast (and free). This article will show how to accomplish that using LevelDb and its C# wrapper.


First, start a Visual Studio C# project and download the LevelDb.Net nuget package. There are a few different one, but this is my favorite. 


You can also find this LevelDb.Net at this Github location:

https://github.com/AntShares/leveldb


First, I'm going to show how to use LevelDb via C#. Later in this article, code shows how to insert and select a large number of records for speed testing.


Let's create a LevelDb:

            Options levelDbOptions = new Options();
            levelDbOptions.CreateIfMissing = true;
            LevelDB.DB levelDb = LevelDB.DB.Open("myLevelDb.dat", levelDbOptions);

Next, we'll insert some keys:

            levelDb.Put(LevelDB.WriteOptions.Default, "Key1", "Value1");
            levelDb.Put(LevelDB.WriteOptions.Default, "Key1", "Value2");
            levelDb.Put(LevelDB.WriteOptions.Default, "Key1", "Value3");
            levelDb.Put(LevelDB.WriteOptions.Default, "Key2", "Value2");

Then, we'll select some keys:

            LevelDB.Slice outputValue;
            if (levelDb.TryGet(LevelDB.ReadOptions.Default, "Key2", out outputValue))
            {
                Console.WriteLine("Key2: Value = " + outputValue.ToString());// Expect: Value2
            }
            if (levelDb.TryGet(LevelDB.ReadOptions.Default, "Key1", out outputValue))
            {
                Console.WriteLine("Key1: Value = " + outputValue.ToString()); // Expect: Value3
            }
            if (!levelDb.TryGet(LevelDB.ReadOptions.Default, "KeyXYZ", out outputValue))
            {
                Console.WriteLine("KeyXYZ: NOT FOUND.");
            }

LevelDb supports many different types of keys and values (strings, int, float, byte[], etc...).

  1. Open instance handle.
  2. Insert = Put
  3. Select = TryGet

That's it! 

But, how fast is it?

Let's build a collection of MD5 hash keys and insert them:

            List<string> seedHashes = new List<string>();
            for (int idx = 0; idx < 500000; idx++)
            {
                byte[] encodedPassword = new UTF8Encoding().GetBytes(idx.ToString());
                byte[] hash = ((HashAlgorithm)CryptoConfig.CreateFromName("MD5")).ComputeHash(encodedPassword);
                string encoded = BitConverter.ToString(hash).Replace("-", string.Empty).ToLower();
                seedHashes.Add(encoded);
            }

            // Start Insert Speed Tests
            Stopwatch stopWatch = new Stopwatch();
            stopWatch.Start();
            foreach(var key in seedHashes)
            {
                levelDb.Put(LevelDB.WriteOptions.Default, key, "1");
            }
            stopWatch.Stop();
            Console.WriteLine("LevelDb Inserts took (ms) " + stopWatch.ElapsedMilliseconds);


Next, let's select each of the keys we just inserted several times:

            // Start Lookup Speed Tests
            stopWatch.Start();
            for (int loopIndex = 0; loopIndex < 10; loopIndex++)
            {
                for(int seedIndex = 0; seedIndex < seedHashes.Count; seedIndex++)
                {
                    if (!levelDb.TryGet(LevelDB.ReadOptions.Default, seedHashes[seedIndex], out outputValue))
                    {
                        Console.WriteLine("ERROR: Key Not Found: " + seedHashes[seedIndex]);
                    }
                }
            }
            stopWatch.Stop();
            Console.WriteLine("LevelDb Lookups took (ms) " + stopWatch.ElapsedMilliseconds);

On my junky 4 year old desktop, 500,000 inserts took just under 60 seconds and 5 Million selects took just over 2 minutes. Here's the program output:


The complete code sample is below:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using LevelDB;
using System.Security.Cryptography;
using System.Diagnostics;

namespace LevelDbExample
{
    class Program
    {
        static void Main(string[] args)
        {

            Options levelDbOptions = new Options();
            levelDbOptions.CreateIfMissing = true;
            LevelDB.DB levelDb = LevelDB.DB.Open("myLevelDb.dat", levelDbOptions);

            // Insert some records
            levelDb.Put(LevelDB.WriteOptions.Default, "Key1", "Value1");
            levelDb.Put(LevelDB.WriteOptions.Default, "Key1", "Value2");
            levelDb.Put(LevelDB.WriteOptions.Default, "Key1", "Value3");
            levelDb.Put(LevelDB.WriteOptions.Default, "Key2", "Value2");

            // Select some records
            LevelDB.Slice outputValue;
            if (levelDb.TryGet(LevelDB.ReadOptions.Default, "Key2", out outputValue))
            {
                Console.WriteLine("Key2: Value = " + outputValue.ToString());// Expect: Value2
            }
            if (levelDb.TryGet(LevelDB.ReadOptions.Default, "Key1", out outputValue))
            {
                Console.WriteLine("Key1: Value = " + outputValue.ToString()); // Expect: Value3
            }
            if (!levelDb.TryGet(LevelDB.ReadOptions.Default, "KeyXYZ", out outputValue))
            {
                Console.WriteLine("KeyXYZ: NOT FOUND.");
            }

            // Build a collection of hash keys
            List<string> seedHashes = new List<string>();
            for (int idx = 0; idx < 500000; idx++)
            {
                byte[] encodedPassword = new UTF8Encoding().GetBytes(idx.ToString());
                byte[] hash = ((HashAlgorithm)CryptoConfig.CreateFromName("MD5")).ComputeHash(encodedPassword);
                string encoded = BitConverter.ToString(hash).Replace("-", string.Empty).ToLower();
                seedHashes.Add(encoded);
            }

            // Start Insert Speed Tests
            Stopwatch stopWatch = new Stopwatch();
            stopWatch.Start();
            foreach(var key in seedHashes)
            {
                levelDb.Put(LevelDB.WriteOptions.Default, key, "1");
            }
            stopWatch.Stop();
            Console.WriteLine("LevelDb Inserts took (ms) " + stopWatch.ElapsedMilliseconds);

            // Start Lookup Speed Tests
            stopWatch.Start();
            for (int loopIndex = 0; loopIndex < 10; loopIndex++)
            {
                for(int seedIndex = 0; seedIndex < seedHashes.Count; seedIndex++)
                {
                    if (!levelDb.TryGet(LevelDB.ReadOptions.Default, seedHashes[seedIndex], out outputValue))
                    {
                        Console.WriteLine("ERROR: Key Not Found: " + seedHashes[seedIndex]);
                    }
                }
            }
            stopWatch.Stop();
            Console.WriteLine("LevelDb Lookups took (ms) " + stopWatch.ElapsedMilliseconds);

            return;
        }
    }
}