Archive for September, 2010

Diddling with data

Posted on September 30, 2010 in Uncategorized

Michele Hudson and I have begun visiting faculty across campus in an effort to learn about data needs and desires. We’re diddling with data. To date, we have only visited two (and a half) people, but I have learned a few things:

  • Darwin Core – Apparently there is metadata schema used for describing biological content — Darin Core. I wonder where they got that name?
  • ease-of-use – At least one faculty member said a University-wide effort to collect and curate scientific data would be good thing, as long as the overhead of doing so was minimal.
  • evaluation – Some scientists generate data and then work with computer scientists to do the analysis and look for patterns. This seems akin to the relationship between social scientists and statisticians.
  • Google Groups – When it comes to collaboration, Google Groups & Friends was used at a tool. Consequently much of their content is save in Google’s “cloud”.
  • MBLWHOI Library – A faculty member suggested we get in touch with the librarian at the MBLWHOI Library because they are doing similar work, and “That person changed forever my perception of librarians.”
  • notebooks – Many scientists record their actions in notebooks, a la the way science was first conducted. Sometimes these notebooks are physical items, and sometimes they are digital manifestations. The physical items may benefit from digitization. The “born digital” notebooks may benefit from preservation. Everything goes into them. Data. Observations. Cogitations. In this vein, a number of software packages were brought to our attention including: YoGo and CambridgeSoft.
  • scooped – One faculty member thought the idea of sharing data enables “scooping”, the process of stealing another’s ideas. While they advocated sharing, they thought it would be better if it were brought up through the ranks and manifested in undergraduate research. This is what has essentially happened with the local implementation of the Excellent Undergraduate Research.

That is what I’ve learned so far. ‘More later.

Data curation in Purdue

Posted on September 23, 2010 in Uncategorized

On Wednesday, September 22, 2010 a number of us from Notre Dame (Eric Morgan, Julie Arnott, Michelle Hudson, and Rick Johnson) went on a road trip to visit a number of people from Purdue (Christopher Miller, Dean Lingley, Jacob Carlson, Mark Newton, Matthew Riehle, Michael Witt, and Megan Sapp Nelson). Our joint goal was to share with each other our experiences regarding data curation.

After introductions, quite a number of talking points were enumerated (Purdue’s IMLS project, their E-Data Taskforce and data repository prototype, data literacy, data citation, and electronic theses & dissertation data curation). But we spent the majority of our formal meeting talking about their Data Curation Profile toolkit. Consisting of a number of questions, the toolkit is intended to provide the framework for discussions with data creators (researchers, scholars, graduate students, faculty, etc.). “It is a way to engage the faculty… It is generic (modular) and intended to form a baseline for understanding… It is used as a guide to learn about the data and the researcher’s needs.” From the toolkit’s introduction:

A completed Data Curation Profile will contain two types of information about a data set. First, the Profile will contain information about the data set itself, including its current lifecycle, purpose, forms, and perceived value. Second, a Data Curation Profile will contain information regarding a researcher’s needs for the data including how and when the data should be made accessible to others, what documentation and description for the data are needed, and details regarding the need for the preservation of the data.

The Purdue folks are tentatively scheduled to give workshops around the country on the use of the toolkit, and I believe one of those workshops will be at the upcoming International Digital Curation Conference taking place in Chicago (December 6-8).

We also talked about infrastructure — both technical and human. We agreed that the technical infrastructure, while not necessarily trivial, could be created and maintained. On the other hand the human infrastructure may be more difficult to establish. There are hosts of issues to address listed here in no priority order: copyright & other legal issues, privacy & anonymity, business models & finances, workflows & true integration of services, and the articulation of roles played by librarians, curators, faculty, etc. “There is a need to build lots of infrastructure, both technical as well as human. The human infrastructure does not scale as well as the technical infrastructure.” Two examples were outlined. One required a couple of years of relationship building, and the other required cultural differences to be bridged.

We then retired to lunch and shared more of our experiences in a less formal atmosphere. We discussed “micro curation services”, data repository bibliographies, and the need to do more collaboration since our two universities have more in common than differences. Ironically, one of the projects being worked on at Purdue involves Notre Dame faculty, but alas, none of us from Notre Dame knew of the project’s specifics.

Yes, the drive was long and the meeting relatively short, but everybody went away feeling like their time was well-spent. “Thank you for hosting us!”