How can we know which data to target so we don’t replicate the work at other data rescue events?
We are currently gathering information from experts, scientists, and community members about particularly valuable and vulnerable data. Fill out the form before January 7 to help us! For those events that would like help identifying areas to focus on, we aim to provide lists of the most important datasets and sources that we’ve identified so that each event can tackle a piece of the larger set without too much duplication. However, understanding the data that is most valuable and vulnerable within your own community can be a really important aspect of your Data Rescue event.
Where will downloaded data go?
If your institution can’t host the data your Data Rescue event downloads, we are developing a repository using Amazon Web Services integrated with CKAN - an open source data catalog - that will be available to DataRescue events for storing and making accessible copies of data.
Will you be providing best practices for creating reliable copies?
Yes! And we welcome your collaboration. Generally, we are recommending that those materials that can be captured through webcrawling and the activities of End of Term Harvest should be captured in that way. We will rely, in part, on the toolkit developed after the event at University of Toronto, as well as locally developed code to seed the harvester. For those data that don’t make sense in the Internet Archive, we’ll be adding them to an open data catalog, mentioned above.