Today we’re releasing a new open source project, which will enable any organization with a DocumentCloud account to do crowdsourcing using documents.
Since we wrapped up our Free the Files project after last year’s U.S. election, many people and organizations have asked us how they could build their own web applications like Free the Files to crowdsource their caches of documents. The full Free the Files codebase is undocumented, a bit messy and isn’t easy to deploy in environments other than our own, so we decided to extract the salient bits into a Rails plugin we’re calling Transcribable.
Transcribable allows you to drop a RubyGem into your Rails app, and instantly add “transcribability” to any attribute on a given model. So, for example, if you have a Filing
model, and you’d like users to be able to transcribe buyers and amounts, you could write:
class Filing < ActiveRecord::Base
transcribable :buyer, :amount
end
Once you’ve defined which details you’d like the crowd to help you find, there is a generator that writes the rest of the code for you, including a beautiful “casino-driven” transcription form with automatically created fields for your attributes. That page looks something like this:
To make sure your crowdsourced data is accurate, Transcribable will “verify” your users’ transcriptions by comparing multiple users' answers, and then committing only the agreed upon ones to the master model. Within your model, you can set a threshold over which you’d like users to agree on a filing’s attributes, and Transcribable does the rest.
The RubyGem also comes with a few other cool features such as a task that will slurp all the documents in a DocumentCloud project into your database to await transcription. It also lets you specify fields you’d like users to fill out, but not necessarily verify (for things like notes).
To start using Transcribable, just drop it into your Gemfile as you normally would. Instructions for that, and more of the nitty gritty, are available in the documentation on the Github page.
Happy crowdsourcing!