Stimulus critics were abuzz this week flogging the federal Web site Recovery.gov for flaws in its first big data release. Problems ranged from confusing variation and gaps in job numbers to mistakes that put projects in nonexistent congressional districts to spending that never made it into the data.
Even stimulus backers demanded fixes. Rep. David Obey, D-Wis., chairman of the House Committee on Appropriations and one of the chief architects of the nearly $800 billion stimulus package, demanded that the Obama administration "correct the ludicrous mistakes."
Earl Devaney, the top stimulus watchdog charged with running Recovery.gov, downplayed the problems, saying he's less interested in data glitches than in what's happening with taxpayer money.
"Things like inappropriate congressional districts have no effect on our ability to combat waste, fraud and abuse," said Devaney, chairman of the Recovery Accountability and Transparency Board.
The administration's repeated promise that Americans would be able to track the course of every stimulus dollar set high expectations for the first round of spending reports. And compared with many other government databases, the error rate in the stimulus data isn't all that bad.
With any database, the process of collecting information is susceptible to computer glitches and data entry errors. In the case of Recovery.gov, the flashy charts and maps all rely on a collection of reports filed by grant, loan and contract recipients nationwide.
With 131,000 filers -- ranging from cities and school districts to small paving companies -- the administration expected some errors.
After the data came in, recipients had 10 days to correct their errors. Meanwhile, agencies started reviewing the data. After that 10-day window, agencies went back to recipients to point out errors they had found, according to Danny Werfel, controller for the Office of Management and Budget.
Agencies could only review the data; they could not make changes, Werfel said.
OMB also reviewed the data. "We wake up every day trying to make sure the data is as accurate as possible," said OMB Deputy Director Rob Nabors. "There's going to be a balance between transparency and speed and data accuracy."
Twelve filings seemed so out of whack that OMB and the Recovery Board removed them from the data. Among them was one from Shelton State Community College in Tuscaloosa, Ala., which said its $27,502 stimulus funding would result in 14,500 jobs.
Ed DeSeve, recovery spokesman at the White House, was quoted saying he had been scrubbing job estimates so much that he had "dishpan hands and my fingers are worn to the nub." Recipients were supposed to estimate how many jobs were created or saved by the money they got, but many didn't report or did so in a way that made it hard to add up the numbers.
Nabors acknowledged that because so much energy was spent on job numbers, OMB may have overlooked other problems.
Some fixes would be easy to make. Consider those phantom congressional districts.
Recovery.gov data show 15 congressional districts for Alaska -- which has just one. The site also lists 22 districts for Arizona, which has only eight. And District of Columbia residents can remove their "taxation without representation" license plates: Recovery.gov gives D.C. 12 members in Congress.
While the data might show money going to a district that does not exist, the money probably didn't go into a black hole. It likely went to a district that does exist, but was entered wrong.
Based on other geographical information in the data, such as ZIP code or city, it's a simple task to correct the congressional district. (Assuming the ZIP code is correct.)
The recovery board says that for the next reporting period, in January, recipients will be able to enter only a district that exists in their state. And the board released a statement (PDF) Wednesday saying it would fix the errors in congressional districts.
But those are just one weakness in the data.
According to Devaney, thousands of recipients failed to report at all. Once the board has a complete list of delinquent filers, it will publish them on Recovery.gov, he said. The list is currently under review by the federal agencies that handed out those stimulus awards. OMB estimated that about 10 percent of recipients -- some 15,000 -- failed to file their information.
"There's no penalty for not complying with this ARRA Act," Devaney noted. "I'm not a big fan of programs that don't have rules."
Devaney said he would support legislation to create penalties for not reporting. Existing administrative and contractual rules might allow agencies to pursue or withhold funding from recipients that don't report.
The stimulus job estimates have drawn by far the most attention.
Multiple stories, including many by ProPublica, have pointed to overreported figures. But there also is likely underreporting. In at least 400 cases, we found recipients that entered zero for the number of jobs created or retained, but later went on to describe the number of positions they were adding or keeping.
Few reports have broken down the extent of the data errors on Recovery.gov. Our analysis found that less than 1 percent of the records have bad congressional districts assigned to them.
As anyone who works with data knows, there is no such thing as perfection.
"In the past this data would be scrubbed by agencies for months before it was released to the public," Devaney said, noting that the government finance site USAspending.gov had a 53 percent error rate when it launched in 2007. It's now down to 10 percent, he said.
But few government databases have been as public as the stimulus data.
"Credibility counts in government, and stupid mistakes like this undermine it," Obey said in a prepared statement. "We've got too many serious problems in this country to let that happen."
The House Committee on Oversight and Government Reform is having a hearing this morning about the quality of stimulus reporting. Devaney is among those scheduled to testify.