Tuesday, February 11, 2014

Code Kata, Simple implementation of Bloom Filters

We are doing kata at work this month. It is pretty exciting as you spend around 30 mins everyday learning a new technique or stretching your coding abilities. I tried the code kata website for some fun exercises.

One thing that I've been particularly interested in the last year or so is Bloom Filters. Joins are so expensive! Bloom filters are simply amazing. They help you find if a value is NOT in a particular.

Here's my implementation of Bloom Filters in Ruby. It is not perfect and can use a BitSet ruby implementation to save on some memory. Also, not tested very thoroughly but you get the idea.


# http://codekata.com/kata/kata05-bloom-filters/

  class BloomFilter
    def initialize(bitmap_size)
      @bitmap = Array.new
      @bitmap_max_size = bitmap_size
    end

    def hash_function_1(some_object)
      raw_val = some_object.inspect.each_byte.inject do |sum,c|
        sum += c
      end

      raw_val % @bitmap_max_size
    end

    def hash_function_2(some_object)
      raw_val = some_object.inspect.each_byte.inject do |sum,c|
        sum += c
      end

      (raw_val + raw_val.to_s.length**3) % @bitmap_max_size
    end

    def hash_function_3(some_object)
      raw_val = some_object.inspect.each_byte.inject do |sum,c|
        sum += c
      end

      (raw_val + raw_val.to_s.split().last.to_i**8) % @bitmap_max_size
    end

    def put(put_object)
      @bitmap[hash_function_1(put_object)] = 1
      @bitmap[hash_function_2(put_object)] = 1
      @bitmap[hash_function_3(put_object)] = 1
    end

    def exists(put_object)
      @bitmap[hash_function_1(put_object)] == 1 &&
      @bitmap[hash_function_2(put_object)] == 1 &&
      @bitmap[hash_function_3(put_object)] == 1
    end
  end


a = BloomFilter.new(1000)
(1..1000).each {|x| a.put("test#{x}")}
(1..1000).each {|x| puts "#{x}" unless a.exists("test#{x}")}


This was quickly hacked, so let me know your comments on how this can be improved.

No comments:

Post a Comment