We are no longer accepting applications for this position. See available positions here.
ProPublica is seeking a data expert and programmer to implement an Extract, Transform, Load (ETL) workflow that will help us maintain some of our most important online databases and to build an infrastructure to make maintaining our work easier forever.
Proficiency in an open-source ETL system like Talend Open Studio or Pentaho Kettle is required.
ProPublica leads the industry in the number and complexity of data sets that we maintain continuously and make available to the public through our interactive graphics and news applications online. The person in this position can help revolutionize how we — and the entire news industry — use ETL in our data workflows. For instance: We’ve maintained our Dollars for Docs database since 2010. Though it’s gotten easier over the years, it’s still a huge job. In addition to that, we have about 20 data sets that we’re committed to keeping up to date. Some are quick and easily scripted. Some are more involved and require manual fiddling. We need help making our processes consistent and predictable. That’s where you come in.
As our data machinist you will build us better tools for doing this work. This includes:
- Researching and gaining a thorough understanding of our existing data sets and news apps.
- Creating ETL workflows and helping us adopt ETL tools and infrastructure to support updating and maintaining those apps.
- Updating existing data sets using the new tools.
- Sharing skills and know-how with the News Apps team, and perhaps writing about the work in public posts on our blog.
There are two kinds of ability needed for this work:
- Using “editorial judgment” to clean dirty data in a consistent, defensible, documented, and repeatable way. That could mean canonicalizing inconsistent fields, joining data sets, etc.
- Creating automated data workflows and building tools to support them. Some of that will require custom code (Python or Ruby preferred), though it will also require adopting and training the rest of the team in an ETL system like Talend Open Studio or Pentaho Kettle.
The ideal candidate is an ETL expert who’s a news junkie. News experience not a requirement, but excellent communication skills and a desire to share skills are critical. We also know that great candidates can bring skills to ProPublica that we haven’t thought of, and who won’t fit everything we’ve described above. If this is you, don’t hesitate to apply, and tell us what unique contributions you can offer.
This is a contract position that runs through the end of 2017. ProPublica’s offices are in New York City but we’re open to having you work remote. This position is full time and includes benefits.
You can apply using this form. Applications will be reviewed on a rolling basis.
ProPublica is dedicated to improving our newsroom, in part by better reflecting the people we cover. Therefore, we are committed to diversity and especially encourage members of underrepresented communities to apply, including women, people of color, LGBTQ people and people with disabilities.
Have questions? Email [email protected].