Old-Guard Archivists Keep Federal Data Safer Than You Think

By  | 

At least twice a day, the data recorder on board NASAs Lunar Reconnaissance Orbiter beams images to a station in White Sands, New Mexico. That data gets copied to the Goddard Space Flight Center in Maryland, and then copied again to computers at Arizona State University in Tempe. Two other copies go to an off-campus building where they live on different access-controlled computer systems.

Mark Robinson, the researcher running the team that operates the LROs cameras, analyzes that data. Every three months his team uploads raw and calibrated images to NASAs public website for anyone to access.

Thats five layers of redundancy. Nobody would be ever be able to delete these data, Robinson says. Ive dedicated my life to preserving data. Buildings can collapse. Computers fail. But the LROC data will still be around.

For one thing, deleting federal records is illegal. The National Archives Office of the Inspector General investigates claims of record fraud and can refer cases for prosecution. And for another, NASA shares and backs up all of its datasets over multiple government research facilities and academic institutions across the country. So theres no easy way to erase all the copies of it, even if a webpage or two were to go missing.

Buildings can collapse. Computers fail. But the data will still be around.Mark Robinson, Lunar Reconnaissance Orbiter project, Arizona State University

And NASA says none have. The availability of NASA Earth science data has not changed in recent months, nor have any Earth science datasets been taken offline, the agency said in a statement.

So why, then, did web archivers at a data rescue event in Berkeley last week flag an atmospheric carbon dioxide dataset as missing? NASA had migrated it to a new location when the entire Earth Observing System site underwent a redesign in January of 2013. As for a Global Change Data Center reports repository that one web archiver worried was empty, the reality was that GCDC scientists never put any files there in the first place.

Since election night, hundreds of people have come to events across the country to log data they feared would get disappeared by government agencies in the Trump administration—because of connections to climate change, or gun violence, or any number of other subjects. Coordinated by groups like DataRefuge and the Environmental Data and Governance Initiative, every 404 error on a government website has made these would-be archivists suspicious.

But not every 404 error is evidence that some critical dataset got tossed. The webpages might be the same; its the world around them that changed.

Into the Archives

So NASA is on the record saying the agency hasn’t put anything down the memory hole. The National Oceanic and Atmospheric Administration also confirmed that none of its datasets have been removed since January 20, 2017, and the agency has no plans to take any down in the immediate future. The Environmental Protection Agency did not respond to a request for comment.

We dont want to play into the dynamic that is rushing toward us here, where a vacuum of confirmed, trustworthy information from the top levels of government is filled up instead with our fear.Alex Howard, Sunlight Foundation

EDGI, though, is recruiting domain analysts to help distinguish relevant gaps from harmless artifacts in the data they scrape. For now theyre treating every lead as relevant. All these errors are worth investigating, says Lindsey Dillon, EDGIs steering committee chair. They might turn out to be nothing, but the interesting thing is that in this moment, as never before, people are finding a deleted page or an absent report politically meaningful.

Politically meaningful or not, the government has taken down only one federal database since January 20. The US Department of Agriculture scrubbed from its site animal welfare records, according to the Sunlight Foundation, a nonprofit that advocates for government transparency and data access. The erasure caused a public outcry, and as of Friday morning, the USDA began returning some documents.

Alex Howard, Sunlights deputy director, warns that its easy to read malice into every broken link or changed text on a webpage, but that it could just as easily be incompetence, or ignorance, something totally unrelated, or nothing at all. We dont want to play into the dynamic that is rushing toward us here, he says, where a vacuum of confirmed, trustworthy information from the top levels of government is filled up instead with our fear.

New Kids On The Web

Groups like DataRefuge and EDGI organized quickly—getting a national movement off the ground in a matter of months. They operate from a triage and prioritize posture, based on tips they get from government scientists and with an eye toward the moves inside the White House and on Capitol Hill.

Worthy, sure, but long before Trump entered the political picture, open government and open data evangelists had been preserving all kinds of data collected and stored by the government, from crime statistics to unemployment rates to trade deficits.

Its been really hard for librarians to convince people that preserving the web is important. Google has done a very good job of making people think that once its online its there forever.James Jacobs, Free Government Information

Some changes, like a redo of the White House website, are normal parts of a presidential transition. The Department of Labor removing its blog posts on how it calculates the unemployment rate, or the Department of Energy changing its language around climate change, are worth keeping an eye on. The more the merrier, says James Jacobs, who runs Free Government Information, which tracks and stores government web data. Its been really hard for librarians to convince people that preserving the web is important. Google has done a very good job of making people think that once its online its there forever.

Together with the Internet Archive, the Library of Congress, and the Government Publishing Office, Jacobs coordinates the End of Term project, a once a once-every-four-years web harvest of all .gov and .mil sites. Hes also considering completing annual harvests under the Trump administration.

All the extra hands will make lighter work for people like Jacobs, and having lots of copies of things is always better than having none at all. But even as groups of galvanized guerrilla archivists join the fray, breathing life into a cause to which Jacobs has committed his career, he is clear-eyed about the limitations of his line of work.

Archiving is inherently static. Its a snapshot you take of a moment in time—whether thats text on a webpage or surface water temperature measurements from the Chukchi Sea in February.

Datasets, on the other hand, are dynamic. And keeping open data pipelines, and the funding that makes them possible, is what scientists and concerned citizens should really be worried about. So while seeding web crawlers and downloading satellite images might make people feel a little less helpless in a time of digital uncertainty, a dataset is only as useful as its last upload. Its not whether the data disappears—its whether people will still be collecting it tomorrow, and next month, and next year that matters.

Read more:

We use cookies to give you the best online experience. By agreeing you accept the use of cookies in accordance with our cookie policy.

Privacy Settings saved!
Privacy Settings

When you visit any web site, it may store or retrieve information on your browser, mostly in the form of cookies. Control your personal Cookie Services here.

We use Google Tag Manager to monitor our traffic and to help us AB test new features.

Decline all Services
Accept all Services