{"id":122,"date":"2019-10-29T16:08:10","date_gmt":"2019-10-29T20:08:10","guid":{"rendered":"http:\/\/sites.nd.edu\/marble\/?p=122"},"modified":"2019-10-30T16:51:23","modified_gmt":"2019-10-30T20:51:23","slug":"metadata-connections","status":"publish","type":"post","link":"https:\/\/sites.nd.edu\/marble\/metadata-connections\/","title":{"rendered":"Metadata connections"},"content":{"rendered":"<p>By Hanna Bertoldi&nbsp;<\/p>\n<h2><strong>What is metadata?<\/strong><\/h2>\n<p><span style=\"font-weight: 400\">I am constantly thinking about metadata, or the data that describes other data.&nbsp;<\/span><\/p>\n<p><span style=\"font-weight: 400\">While most people may not often think about metadata, everyone who uses search tools benefits from clean, well-structured metadata.&nbsp;<\/span><\/p>\n<p><span style=\"font-weight: 400\">Think about a music file. Its metadata might contain the artist\u2019s name, song title, album title, and year it was released. Spotify uses this metadata to help you find the songs you want and also to suggest music you might like. Some of these terms may be hidden from you, such as terms that describe the mood or genre of a song, but they all work in the background to create easily browsable categories.&nbsp;<\/span><\/p>\n<div id=\"attachment_123\" style=\"width: 1440px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-123\" class=\"size-full wp-image-123\" src=\"http:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-Spotify.png\" alt=\"Screenshot of Spotify music player\" width=\"1430\" height=\"911\" srcset=\"https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-Spotify.png 1430w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-Spotify-300x191.png 300w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-Spotify-768x489.png 768w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-Spotify-1024x652.png 1024w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-Spotify-471x300.png 471w\" sizes=\"auto, (max-width: 1430px) 100vw, 1430px\" \/><p id=\"caption-attachment-123\" class=\"wp-caption-text\">Figure 1: Spotify browse page<\/p><\/div>\n<h2><strong>Museums and metadata<\/strong><\/h2>\n<p><span style=\"font-weight: 400\">Like Spotify, museums also create and use metadata, but, instead of music, the metadata describes pieces of art.&nbsp;<\/span><\/p>\n<p><span style=\"font-weight: 400\">As its best self, museum metadata has the potential to bridge gaps between cultural heritage institutions and provide access beyond virtual and physical barriers. This unbounded potential can only be leveraged by using accurate and standardized metadata.&nbsp;<\/span><\/p>\n<p><span style=\"font-weight: 400\">But, when I think about museum metadata, I mostly think about how messy it is. (Apparently, <\/span><a href=\"https:\/\/www.digitalmusicnews.com\/2018\/03\/14\/spotify-fix-bad-metadata-160mm-users\/\"><span style=\"font-weight: 400\">Spotify thinks about this too<\/span><\/a><span style=\"font-weight: 400\">). In the screenshot below, the fields \u201cCreation Date\u201d and \u201cCentury\u201d reveal some examples of data inconsistency and redundancy. For instance, you can see that at one time items without an identifiable creation date were marked with \u201cno date\u201d whereas others were marked with the abbreviation \u201cn.d.\u201d In addition, the same date is repeated in both the \u201cCreation Date\u201d and \u201cCentury\u201d fields for many of these objects. These kinds of metadata errors can prevent database users from finding objects that meet their search criteria.<\/span><\/p>\n<div id=\"attachment_124\" style=\"width: 1504px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-124\" class=\"size-full wp-image-124\" src=\"http:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-CreationDate.png\" alt=\"Screenshot of database page showing lists of dates \" width=\"1494\" height=\"1196\" srcset=\"https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-CreationDate.png 1494w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-CreationDate-300x240.png 300w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-CreationDate-768x615.png 768w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-CreationDate-1024x820.png 1024w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-CreationDate-375x300.png 375w\" sizes=\"auto, (max-width: 1494px) 100vw, 1494px\" \/><p id=\"caption-attachment-124\" class=\"wp-caption-text\">Figure 2: Screenshot of creation date export from EmbARK, the museum&#8217;s collections management system<\/p><\/div>\n<h2><strong>Organizing the \u201cmess\u201d with controlled vocabularies<\/strong><\/h2>\n<p><span style=\"font-weight: 400\">Controlled vocabularies are a critical element of creating consistent metadata. A controlled vocabulary is an organized arrangement of words and phrases used to index content and to retrieve content by browsing and searching.<\/span><\/p>\n<p><span style=\"font-weight: 400\">If we wanted to build an application where you could search every artwork in the world, every arts organization or collector would have to agree to use the same vocabulary, or a \u201ccontrolled vocabulary,\u201d to describe their objects. We couldn\u2019t have the Getty Museum using \u201cClaude Monet\u201d while the Louvre was using \u201cOscar-Claude Monet\u201d to describe the artist that painted water lilies in the nineteenth century. They would have to agree on the same version of the artist\u2019s name.&nbsp;<\/span><\/p>\n<p><span style=\"font-weight: 400\">Beginning in the 1980s, the <\/span><a href=\"http:\/\/www.getty.edu\/research\/index.html\"><span style=\"font-weight: 400\">Getty Research Institute<\/span><\/a><span style=\"font-weight: 400\"> started developing controlled vocabularies for cataloging visual arts and cultural heritage.<\/span><span style=\"font-weight: 400\"> The terms and associated information in the <\/span><a href=\"http:\/\/www.getty.edu\/research\/tools\/vocabularies\/index.html\"><span style=\"font-weight: 400\">Getty Vocabularies<\/span><\/a><span style=\"font-weight: 400\"> are valued as authoritative because they are derived from published sources and represent current research and usage in art history and cultural heritage communities.<\/span><span style=\"font-weight: 400\">&nbsp;<\/span><\/p>\n<p><span style=\"font-weight: 400\">In short, the Getty Vocabularies provide a convenient system for mapping data, without spending a lot of wasted time reinventing what already exists.&nbsp;<\/span><\/p>\n<div id=\"attachment_125\" style=\"width: 1494px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-125\" class=\"size-full wp-image-125\" src=\"http:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-Subject-Terms.png\" alt=\"Database screenshot showing list of keywords\" width=\"1484\" height=\"1244\" srcset=\"https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-Subject-Terms.png 1484w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-Subject-Terms-300x251.png 300w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-Subject-Terms-768x644.png 768w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-Subject-Terms-1024x858.png 1024w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-Subject-Terms-358x300.png 358w\" sizes=\"auto, (max-width: 1484px) 100vw, 1484px\" \/><p id=\"caption-attachment-125\" class=\"wp-caption-text\">Figure 3: Searching EmbARK, the museum&#8217;s collections management system, by subject terms<\/p><\/div>\n<h2><strong>Applying the Getty Vocabularies to Snite Museum collections<\/strong><\/h2>\n<p><span style=\"font-weight: 400\">Now, back to the messy data part.&nbsp;<\/span><\/p>\n<p><a href=\"https:\/\/sniteartmuseum.nd.edu\/\"><span style=\"font-weight: 400\">The Snite Museum of Art<\/span><\/a><span style=\"font-weight: 400\"> has been thinking about subject retrieval for quite a while. Perhaps as early as 2010, Snite curators began cataloging works of art with subject terms or keywords. This metadata allows curators to search for things about an object, such as who or what is depicted. Unfortunately, these terms were used without standardization.&nbsp;<\/span><\/p>\n<p><span style=\"font-weight: 400\">Building off of the curators\u2019 previous work, my goal as Collections Database Coordinator was to transform these subject terms or keywords so that they fit within the Getty Vocabularies. On a superficial level, this process has helped to clean up things such as errors in spelling and inconsistencies with formatting and style. On a deeper level, the process created relationships among objects and increased discoverability.<\/span><\/p>\n<div id=\"attachment_134\" style=\"width: 4161px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-134\" class=\"wp-image-134 size-full\" src=\"http:\/\/sites.nd.edu\/marble\/files\/2019\/10\/2008_054_002-v0001.jpg\" alt=\"Photograph of man pulling woman in a rickshaw \" width=\"4151\" height=\"3165\" srcset=\"https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/2008_054_002-v0001.jpg 4151w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/2008_054_002-v0001-300x229.jpg 300w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/2008_054_002-v0001-768x586.jpg 768w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/2008_054_002-v0001-1024x781.jpg 1024w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/2008_054_002-v0001-393x300.jpg 393w\" sizes=\"auto, (max-width: 4151px) 100vw, 4151px\" \/><p id=\"caption-attachment-134\" class=\"wp-caption-text\">Figure 4: Unidentified photographer, Geisha in a Ricksha, Japan, ca. 1880-1890, albumen silver print with applied color. Snite Museum of Art, University of Notre Dame. Acquired with funds provided by Robert E. (ND \u201963) and Beverly (SMC \u201863) O\u2019Grady, 2008.054.002.<\/p><\/div>\n<p><span style=\"font-weight: 400\">Such immediate improvements in discoverability are a result of the way Getty Vocabularies are structured: as hierarchical thesauri.&nbsp;<\/span><\/p>\n<p><span style=\"font-weight: 400\">Getty Vocabularies are <\/span><b>hierarchical <\/b><span style=\"font-weight: 400\">to allow similar objects to be grouped together. For example, you want to discover all of the works in your collection that depict transportation.&nbsp;<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">You would search the term \u201cvehicles\u201d and return all objects that use the word \u201cvehicles\u201d&nbsp;<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">You would also return any objects with more specific terms that are organized underneath the term \u201cvehicles\u201d such as \u201cmilk carts,\u201d \u201cfire engines,\u201d \u201cwheelbarrows,\u201d etc.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400\">With the hierarchical organization, you would not have to think of or know all the various subsets of vehicles.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Getty Vocabularies are also <\/span><b>thesauri<\/b><span style=\"font-weight: 400\"> because they connect variable terms that express the same concept. It doesn\u2019t matter if you search for \u201cricksha,\u201d \u201crickshaws,\u201d or the French <\/span><i><span style=\"font-weight: 400\">pousse-pousse <\/span><\/i><span style=\"font-weight: 400\">\u2014 the computer will always know what you mean.&nbsp;<\/span><\/p>\n<p><span style=\"font-weight: 400\">By leveraging the research that Getty has done, I can make the Snite collections more usable to a diverse audience.<\/span><\/p>\n<h2><strong>Contributing to Getty Vocabularies to pave the way for MARBLE&nbsp;<\/strong><\/h2>\n<p><span style=\"font-weight: 400\">Creating a thesaurus for the entirety of human knowledge is a big task, so Getty relies on contributions from the user community to expand their vocabularies.&nbsp;<\/span><\/p>\n<p><span style=\"font-weight: 400\">I\u2019ve come across many terms that weren\u2019t part of the Getty Vocabularies in the Snite Museum\u2019s database. These terms could have stayed \u201clocal,\u201d which would have made the information only available to our staff. Our purpose, however, is to make the Museum\u2019s collections searchable alongside the University Archives, Rare Books &amp; Special Collections, and other cultural heritage materials on Notre Dame\u2019s campus.&nbsp;<\/span><\/p>\n<p><span style=\"font-weight: 400\">To make sure our collections are cataloged in a way that computers can understand, I needed to add these missing terms to the Getty Vocabularies.&nbsp;<\/span><\/p>\n<div id=\"attachment_127\" style=\"width: 1118px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-127\" class=\"size-full wp-image-127\" src=\"http:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-GettyContributions.png\" alt=\"Screenshot of Getty vocabulary entry for carrots \" width=\"1108\" height=\"1600\" srcset=\"https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-GettyContributions.png 1108w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-GettyContributions-208x300.png 208w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-GettyContributions-768x1109.png 768w, https:\/\/sites.nd.edu\/marble\/files\/2019\/10\/HB-GettyContributions-709x1024.png 709w\" sizes=\"auto, (max-width: 1108px) 100vw, 1108px\" \/><p id=\"caption-attachment-127\" class=\"wp-caption-text\">Figure 5: Record for carrots created from my submission<\/p><\/div>\n<p><span style=\"font-weight: 400\">Using the Snite\u2019s collections, I have contributed over 80 terms to the Getty Vocabularies within the past year. The terms are <\/span><a href=\"http:\/\/blogs.getty.edu\/iris\/getty-profiles-patricia-harpring-managing-editor-of-the-getty-vocabulary-program\/\"><span style=\"font-weight: 400\">reviewed and published<\/span><\/a><span style=\"font-weight: 400\"> so that they can be used by others. Terms like these will be a core piece of the MARBLE website so that users can search across different kinds of collections through a single search portal. <\/span><\/p>\n<h2><strong>MARBLE: Combining controlled vocabularies<\/strong><\/h2>\n<p><span style=\"font-weight: 400\">The Getty Vocabularies are just one example of a controlled vocabulary. Other vocabularies are appropriate for cataloguing different kinds of materials, such as the <\/span><a href=\"http:\/\/id.loc.gov\/authorities\/subjects.html\"><span style=\"font-weight: 400\">Library of Congress subject terms<\/span><\/a><span style=\"font-weight: 400\"> for print materials or the <\/span><a href=\"https:\/\/www.nlm.nih.gov\/mesh\/meshhome.html\"><span style=\"font-weight: 400\">National Institute for Health\u2019s Medical Subject Headings<\/span><\/a><span style=\"font-weight: 400\"> for biomedical information. Although we don\u2019t have many biomedical items slated for inclusion in the MARBLE site, we will need to accommodate the different controlled vocabularies used by art museum, library, and archival catalogers.&nbsp;<\/span><\/p>\n<p><span style=\"font-weight: 400\">Our <\/span><a href=\"https:\/\/innovation.library.nd.edu\/marble\/people.shtml\"><span style=\"font-weight: 400\">Metadata Team<\/span><\/a><span style=\"font-weight: 400\"> is currently working on a solution for combining the authoritative vocabularies used by the Hesburgh Libraries and the Snite Museum. We\u2019ve considered everything from enforcing a singular controlled vocabulary across collections to experimenting with<\/span><a href=\"https:\/\/www.w3.org\/standards\/semanticweb\/data\"><span style=\"font-weight: 400\"> Linked Open Data<\/span><\/a><span style=\"font-weight: 400\"> solutions that would allow us to combine different vocabularies using emerging web technologies. Cleaning up the Snite\u2019s subject terms is just the first step towards revealing metadata\u2019s best self. Stay tuned for future posts on our progress.<\/span><\/p>\n<p><span style=\"font-weight: 400\">While our immediate aim is to improve discoverability, our future aim is to provide some of the same services as Spotify. Users may not be aware that controlled vocabularies are working to suggest artworks or texts that might be of interest to them. My hope is that specific metadata clean up will seamlessly improve searching and browsing on a large scale.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>By Hanna Bertoldi&nbsp; What is metadata? I am constantly thinking about metadata, or the data that describes other data.&nbsp; While most people may not often think about metadata, everyone who uses search tools benefits from clean, well-structured metadata.&nbsp; Think about &hellip; <a href=\"https:\/\/sites.nd.edu\/marble\/metadata-connections\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":3367,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ngg_post_thumbnail":0,"footnotes":""},"categories":[361578],"tags":[],"class_list":["post-122","post","type-post","status-publish","format-standard","hentry","category-metadata"],"_links":{"self":[{"href":"https:\/\/sites.nd.edu\/marble\/wp-json\/wp\/v2\/posts\/122","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.nd.edu\/marble\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.nd.edu\/marble\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.nd.edu\/marble\/wp-json\/wp\/v2\/users\/3367"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.nd.edu\/marble\/wp-json\/wp\/v2\/comments?post=122"}],"version-history":[{"count":7,"href":"https:\/\/sites.nd.edu\/marble\/wp-json\/wp\/v2\/posts\/122\/revisions"}],"predecessor-version":[{"id":135,"href":"https:\/\/sites.nd.edu\/marble\/wp-json\/wp\/v2\/posts\/122\/revisions\/135"}],"wp:attachment":[{"href":"https:\/\/sites.nd.edu\/marble\/wp-json\/wp\/v2\/media?parent=122"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.nd.edu\/marble\/wp-json\/wp\/v2\/categories?post=122"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.nd.edu\/marble\/wp-json\/wp\/v2\/tags?post=122"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}