A couple of weeks ago, in an article about the science behind the Message Machine project, we mentioned the custom key-value store we built to store non-relational data. Today, we're open sourcing the library which we're calling Daybreak.
Daybreak is a simple key-value store for Ruby that operates just like a Ruby hash. Commits to the database are stored in an append-only file and flushed asynchronously, though there are options to atomically commit a write. It is faster than pstore
and is simpler to use than dbm
. Because your data is stored in an in-memory hash table you also get Ruby conveniences like each
, filter
, map
and reduce
. You can install it by running gem install daybreak
, and the code is over on github.
Daybreak's API mirrors Ruby's hash interface and it is convenient to use. The docs have a simple walkthrough of the api, but let's create a simple search engine to showcase Daybreak's abilities. Here's the class we'll fill out in this post:
classSearch
definitialize(docs)
# tk
end
defquery
# tk
end
defadd(docs)
# tk
end
end
First, let's store the documents in a database:
classSearch
definitialize
# create the storage database
@docs_db=Daybreak::DB.new('./docs.db')
end
# ...
# add some documents
defadd(docs)
# Daybreak keys are strings so we'll want to convert them back to
# integers to find the next key
max=(@docs_db.keys.map(&:to_i).max||0)
docs.eachdo|doc|
max+=1
@docs_db[max]=doc
end
# we'll make sure our changes are flushed to disk
@docs_db.flush!
@index_db.flush!
end
end
In order to have an effective search engine, we'll also need to store each document's index. So let's add another database to handle those indexes:
classSearch
definitialize
# create the storage database
@docs_db=Daybreak::DB.new('./docs.db')
# Ruby objects work too, this db will have a default value of an empty
# set
@index_db=Daybreak::DB.new('./index.db'){|k|Set.new}
end
# add some documents
defadd(docs)
# Daybreak keys are strings so we'll want to convert them back to
# integers to find the next key
max=(@docs_db.keys.map(&:to_i).max||0)
docs.eachdo|doc|
max+=1
@docs_db[max]=doc
tokens=doc.split(/ +/)
# create a simple index of ids by word frequencies
tokens.each{|t|@index_db[t.downcase]=@index_db[t.downcase]<<max}
end
# we'll make sure our changes are flushed to disk
@docs_db.flush!
@index_db.flush!
end
end
And finally let's write the function that takes a query and returns documents that have the words that match the query:
classSearch
defquery(query)
num_docs=@docs_db.length
tokens=query.split(/ +/)
# Find documents with the query terms
ids=tokens.reduce([]){|m,t|m+@index_db[t.downcase].to_a}.uniq
# Finally grab the text and return it.
ids.map{|id|@docs_db[id]}
end
end
Here's an example of how to use the above class:
searcher=Search.new
searcher.add([
"To define the reality of the human condition and to make our definitions public.",
"To confront the new facts of history-making in our time, and their meaning for the problem of political responsibility.",
"Continually to investigate the causes of war, and among them to locate the decisions and defaults of elite circles.",
"To release the human imagination, to explore tall the alternatives now open to the human community by transcending both the mere exhortation of grand principle and the mere opportunist reaction.",
"To demand full information of relevance to human destiny and the end of decisions made in irresponsible secrecy.",
"To cease being the intellectual dupes of political patrioteers."
])
searcher.query("human condition")
>>["To define the reality of the human condition and to make our definitions public.",
"To release the human imagination, to explore tall the alternatives now open to the human community by transcending both the mere exhortation of grand principle and the mere opportunist reaction.",
"To demand full information of relevance to human destiny and the end of decisions made in irresponsible secrecy."]
In another program we can reopen the database and perform another query:
search2=Search.new
searcher.query("explore")
>>["To release the human imagination, to explore tall the alternatives now open to the human community by transcending both the mere exhortation of grand principle and the mere opportunist reaction."]
That's the basics. If it sounds useful to you, head on over to github and kick the tires. If you run into bugs, open up an issue on github, and of course we're always happy to receive pull requests!
Let us know if you end up using it in a project by emailing us at [email protected].