Issues in Science and Technology Librarianship
Posted on April 1, 2022 in Distant Reader by Eric Lease Morgan
I asked a colleague (Parker Ladwig) if there was a blog he thought might be worthy of archiving, and he mentioned Issues in Science and Technology Librarianship (ISTL). I took it upon myself to see what I could do.
After looking more closely at the site, I guessed the underlying technology was not blog technology but rooted in the venerable OJS journal publishing system, and OJS robustly supports a protocol called OAI-PMH. Luckily I had previously written a suite of software used to harvest all the bibliographic information and content from (OJS) OAI-PHM sites. Consequently, in a matter of about 30 minutes, I was able to create a CSV file listing all the articles along with their authors, titles, abstracts, URLs, etc.
I then ran a program that looped through the CSV file and downloaded (cached) the content. Thus, all the articles in their original form are found in the (temporarily) linked .zip file. There are about 900 of them.
I then ran the whole thing through my Distant Reader Toolbox, and I am now able to characterize the journal as a whole. For example, after removing bogus files, there are about 850 articles, and the whole corpus is 2.5 million words long. (The Bible is about .8 million words long.) I was then able to create a rudimentary bibliography, which is really only half a step better than the original CSV file.
Do you know what ISTL is about? Science and technology librarianship would be a good guess, but can you elaborate? I can, in a number of ways. For example, if I compute statistically significant keywords against the text, I can visualize the result as a word cloud. Now you know more, and in what proportions.
Rudimentary clustering of the data returns two possible themes, but the clustering process (Principle Component Analysis) does not articulate what those themes may be. Still, such an analysis points to what a good topic model might be.
Topic modeling with only two topics, returns two possible, over-arching themes: 1) students, and 2) search. Notice how the students theme is really about people, and the search theme seems to be about searching stuff:
topic weights features students 0.54019 students data research librarians faculty univ... search 0.44664 search journals web access research articles d...
In terms of proportions, the pie chart of the weights mirrors the clustering analysis.
After a bit more modeling, the idea of search is still evident, but the theme of students has broken down into different types of people and different things being searched:
topic weights features use 0.34272 use faculty survey services new staff university search 0.33438 search web database users results databases also book 0.27892 book technology work new internet example many librarians 0.25108 librarians research technology university educ... access 0.20613 access journals electronic open research publi... citation 0.19648 citation study journals research analysis arti... students 0.17608 students literacy research instruction course ... food 0.10850 food site resources environmental research agr... data 0.10367 data research management researchers gis servi... site 0.09101 site resources links web provides history rese... patent 0.06577 patent yes patents titles databases journals d... chemistry 0.05041 chemistry chemical structure molecular data bi...
Again, a pie chart of the whole.
One of our esteemed colleagues — Roy Tennant — once said, “Librarians like to search. Everybody else likes to find.” Consequently, I was not surprised to see search as a theme in a library-related journal.
Upon closer inspection of the keywords, the word soil piqued my interest, so I created a full text index and searched for “title:soil OR keyword:soil”. I got three records:
Your search (title:soil OR keyword:soil) against the study carrel named "istl" returned 3 record(s): id: 2495 author: Pellack, Lorraine J. title: Soil Surveys — They're Not Just for Farmers. date: 2009-09-01 summary: Soil surveys do contain inventories of the soils of an area; however, they also contain a wealth of tabular data that help interpret whether a location is suitable for a given use, such as a playground, a golf course, or a highway. This guide will describe soil surveys, their uses, and uniqueness. keyword(s): library; soil; surveys; u.s words: 3003 sentence: 151 flesch: 59 cache: /Users/eric/Documents/reader-library/istl/cache/2495.htm txt: /Users/eric/Documents/reader-library/istl/txt/2495.txt id: 2420 author: Bracke, Marianne Stowell title: Agronomy: Selected Resources date: 2007-07-01 summary: This web bibliography, or webliography, contains links and descriptions of agronomy web sites that cover general and background information, crop science, soil science, resources for K-12 teachers, databases, freely-available and subscription-based journals, and organizations. Only a select number of sites that focused on crop science, soil science, or a particular sub-area (e.g., corn) were included due to the large number of sites in existence. keyword(s): agronomy; crop; information; plant; science; site; soil words: 5135 sentence: 265 flesch: 39 cache: /Users/eric/Documents/reader-library/istl/cache/2420.htm txt: /Users/eric/Documents/reader-library/istl/txt/2420.txt id: 1984 author: Harnly, Caroline D. title: Sustainable Agriculture and Sustainable Forestry: A Bibliographic Essay: Theme: All Topics date: 2004-08-14 summary: The authors found that there is no clear preference in the marketplace to the many approaches to achieving sustainable forest management. Peter F. Ffolliott, et al.'s book, Dryland Forestry, details how to manage both the biophysical and socioeconomic aspects of environmentally sound, sustainable forest management in dryland environments. keyword(s): agricultural; book; edited; farming; food; forest; forest management; management; new; papers; press; soil; sustainability; sustainable; sustainable agriculture; sustainable forestry; systems; topics words: 14403 sentence: 990 flesch: 43 cache: /Users/eric/Documents/reader-library/istl/cache/1984.htm txt: /Users/eric/Documents/reader-library/istl/txt/1984.txt
Well, that’s enough for now, but the point is this:
As librarians we collect, organize, preserve, and disseminate data, information, and knowledge. These are the whats of librarianship, and they change very slowly. On the other hand, the hows of librarianship — card catalogs versus OPAC, MARC versus linked data, licensing versus purchasing, reference desking versus zooming, just-in-time collection versus just-in-case collection, etc. — change much faster with changes in the political environment and technology. Harvesting things from the Web and adding value to the resulting collection may be things we ought to do more actively. The things outlined above are possible examples.
Fun with librarianship?
P.S. Ironically, I just noticed the linked blog posting about Web archiving. From the concluding paragraph:
There is clearly value in identifying the parts of “the Web” that aren’t being collected, preserved, and disseminated to scholars. But in an era when real resources are limited, and likely shrinking, proposals to address these deficiencies need to be realistic about what can be achieved with the available resources. They should be specific about what current tasks should be eliminated to free up resources for these additional efforts, or the sources of (sustainable, not one-off) additional funding for them.
Food for thought.