Data curation in Purdue

On Wednesday, September 22, 2010 a number of us from Notre Dame (Eric Morgan, Julie Arnott, Michelle Hudson, and Rick Johnson) went on a road trip to visit a number of people from Purdue (Christopher Miller, Dean Lingley, Jacob Carlson, Mark Newton, Matthew Riehle, Michael Witt, and Megan Sapp Nelson). Our joint goal was to share with each other our experiences regarding data curation.

After introductions, quite a number of talking points were enumerated (Purdue’s IMLS project, their E-Data Taskforce and data repository prototype, data literacy, data citation, and electronic theses & dissertation data curation). But we spent the majority of our formal meeting talking about their Data Curation Profile toolkit. Consisting of a number of questions, the toolkit is intended to provide the framework for discussions with data creators (researchers, scholars, graduate students, faculty, etc.). “It is a way to engage the faculty… It is generic (modular) and intended to form a baseline for understanding… It is used as a guide to learn about the data and the researcher’s needs.” From the toolkit’s introduction:

A completed Data Curation Profile will contain two types of information about a data set. First, the Profile will contain information about the data set itself, including its current lifecycle, purpose, forms, and perceived value. Second, a Data Curation Profile will contain information regarding a researcher’s needs for the data including how and when the data should be made accessible to others, what documentation and description for the data are needed, and details regarding the need for the preservation of the data.

The Purdue folks are tentatively scheduled to give workshops around the country on the use of the toolkit, and I believe one of those workshops will be at the upcoming International Digital Curation Conference taking place in Chicago (December 6-8).

We also talked about infrastructure — both technical and human. We agreed that the technical infrastructure, while not necessarily trivial, could be created and maintained. On the other hand the human infrastructure may be more difficult to establish. There are hosts of issues to address listed here in no priority order: copyright & other legal issues, privacy & anonymity, business models & finances, workflows & true integration of services, and the articulation of roles played by librarians, curators, faculty, etc. “There is a need to build lots of infrastructure, both technical as well as human. The human infrastructure does not scale as well as the technical infrastructure.” Two examples were outlined. One required a couple of years of relationship building, and the other required cultural differences to be bridged.

We then retired to lunch and shared more of our experiences in a less formal atmosphere. We discussed “micro curation services”, data repository bibliographies, and the need to do more collaboration since our two universities have more in common than differences. Ironically, one of the projects being worked on at Purdue involves Notre Dame faculty, but alas, none of us from Notre Dame knew of the project’s specifics.

Yes, the drive was long and the meeting relatively short, but everybody went away feeling like their time was well-spent. “Thank you for hosting us!”

Comments are closed.