We are doing kata at work this month. It is pretty exciting as you spend around 30 mins everyday learning a new technique or stretching your coding abilities. I tried the code kata website for some fun exercises.
One thing that I've been particularly interested in the last year or so is Bloom Filters. Joins are so expensive! Bloom filters are simply amazing. They help you find if a value is NOT in a particular.
Here's my implementation of Bloom Filters in Ruby. It is not perfect and can use a BitSet ruby implementation to save on some memory. Also, not tested very thoroughly but you get the idea.
# http://codekata.com/kata/kata05-bloom-filters/
class BloomFilter
def initialize(bitmap_size)
@bitmap = Array.new
@bitmap_max_size = bitmap_size
end
def hash_function_1(some_object)
raw_val = some_object.inspect.each_byte.inject do |sum,c|
sum += c
end
raw_val % @bitmap_max_size
end
def hash_function_2(some_object)
raw_val = some_object.inspect.each_byte.inject do |sum,c|
sum += c
end
(raw_val + raw_val.to_s.length**3) % @bitmap_max_size
end
def hash_function_3(some_object)
raw_val = some_object.inspect.each_byte.inject do |sum,c|
sum += c
end
(raw_val + raw_val.to_s.split().last.to_i**8) % @bitmap_max_size
end
def put(put_object)
@bitmap[hash_function_1(put_object)] = 1
@bitmap[hash_function_2(put_object)] = 1
@bitmap[hash_function_3(put_object)] = 1
end
def exists(put_object)
@bitmap[hash_function_1(put_object)] == 1 &&
@bitmap[hash_function_2(put_object)] == 1 &&
@bitmap[hash_function_3(put_object)] == 1
end
end
a = BloomFilter.new(1000)
(1..1000).each {|x| a.put("test#{x}")}
(1..1000).each {|x| puts "#{x}" unless a.exists("test#{x}")}
This was quickly hacked, so let me know your comments on how this can be improved.
One thing that I've been particularly interested in the last year or so is Bloom Filters. Joins are so expensive! Bloom filters are simply amazing. They help you find if a value is NOT in a particular.
Here's my implementation of Bloom Filters in Ruby. It is not perfect and can use a BitSet ruby implementation to save on some memory. Also, not tested very thoroughly but you get the idea.
# http://codekata.com/kata/kata05-bloom-filters/
class BloomFilter
def initialize(bitmap_size)
@bitmap = Array.new
@bitmap_max_size = bitmap_size
end
def hash_function_1(some_object)
raw_val = some_object.inspect.each_byte.inject do |sum,c|
sum += c
end
raw_val % @bitmap_max_size
end
def hash_function_2(some_object)
raw_val = some_object.inspect.each_byte.inject do |sum,c|
sum += c
end
(raw_val + raw_val.to_s.length**3) % @bitmap_max_size
end
def hash_function_3(some_object)
raw_val = some_object.inspect.each_byte.inject do |sum,c|
sum += c
end
(raw_val + raw_val.to_s.split().last.to_i**8) % @bitmap_max_size
end
def put(put_object)
@bitmap[hash_function_1(put_object)] = 1
@bitmap[hash_function_2(put_object)] = 1
@bitmap[hash_function_3(put_object)] = 1
end
def exists(put_object)
@bitmap[hash_function_1(put_object)] == 1 &&
@bitmap[hash_function_2(put_object)] == 1 &&
@bitmap[hash_function_3(put_object)] == 1
end
end
a = BloomFilter.new(1000)
(1..1000).each {|x| a.put("test#{x}")}
(1..1000).each {|x| puts "#{x}" unless a.exists("test#{x}")}
This was quickly hacked, so let me know your comments on how this can be improved.
No comments:
Post a Comment