Haunted Spaces: library classification

Showing posts with label library classification. Show all posts

Monday, September 10, 2012

Aboutness and Data-about-Data

In another post, the idea was brought up that there is a disconnect between information that humans make, produce or understand (think) and information (data) that computers are structured to use as they communicate with other parts of the machine (or between machines). This might have as much to do with tagging posts in a blog, adding labels to items posted publication platforms such as Google's Blogger or writing descriptions while cataloging items in Millennium or Ex Libris Voyager. These last two software options are interacted with via a library's search catalog in their OPAC or publicly available URL. The previous interfaces are different.

There are some similarities between each of these. But basically, the similarities revolve around code built into the systems because these are assumed to be how knowledge is categorized. The above article highlighted as "tagging" suggests platforms such as Wordpress have categories and tags. The blend of these features create a general "box" for the knowledge in said post while the tags allow for a little nuance added that supposedly helps "aboutness" to be more clear for readers. The fact of this knowledge organization structure is assumed with the use of the technology and there is no more available to the user of the technology at any give time except for what the designers have assumed as more correct (or justified) at the time. Every piece of machinery has this arrangement, but the ubiquitous quality of these technologies' use currently means that these set modes of knowledge organization are hoisted upon more and more people.

     Millennium and Ex Libris Voyager have their own set of built-in assumptions about knowledge organization and own ways of applying metadata to items - in this case surrogate records for items that are not the record itself. The distinction between the post and the surrogate record means that even though there are still many machine-specific assumptions in every technology mentioned thus far, the surrogate is STILL a very different interaction because it is not necessarily read for its own sake in most cases. Both of these technologies have certain set fields within their interfaces that cannot be changed - even if they can be fine-tuned to a much far greater degree than any of the web-publishing technologies mentioned above.

     Today, however, I was in a conversation with a polyglot cataloger of serials in many languages (currently working with a collection of items from Harry Houdini's library donated to special collections) with the Library of Congress. The conversation was specifically on data-about-data (metadata) and the ways in which technologies do and do not accomplish certain jobs which they could accomplish if certain arrangements were different. She told us that even with the code-style used with cataloging (MARC - Machine-Readable Cataloging), all the detailed set of rules for each field and sub-field (including the formatting of those sub-fields) and all the facets of information able to be added to the surrogate record made in the cataloging module, the technology is still quite limited. By this she meant at least one important point - that even though there are so many methods within this technology to describe artifacts, the human mind understands and is frustrated by the singular method offered to accomplish the cataloger's goals.

     The same conversation included a man, also from the Library of Congress, but from the Preservation Directorate - Re-formatting Division, who has written on the modes of expression possible in describing any given work that are not used due to who has already decided what kinds of information counts as data. There are a great number of factors in these decisions, but much of them have to do with socio-economics. These decisions do not revolve around issues about people or writing. Rather, they are also tied to "truths" about physical and mathematical sciences from positions of power. For a good read on this topic, I heartily recommend "Cataloging Theory in Search of Graph Theory and Other Ivory Towers," a paper that has this post's topic as one facet. The paper is available in a pre-print format from American Library Association here. And again, both of these library minded people recognize that even though computers and IT-minded groups/companies have done a lot in the world, they may not have set the world up for a multitude of knowledge organization structures even though most technologies in use today are capable of so much more than what is being taken advantage of at the present time. Machines do certain things really well. But they only do what they do. Humans do the rest (and built those machines).

Thank you.

As always, dialogue is welcome here or @ Twitter.

Sunday, August 19, 2012

Metadata and "Aboutness" - JOT and Tagging

     Currently, The Neighborhood Writing Alliance is working on a project in which interns are adding sets of non-hierarchical keywords (sort of like a tag cloud in social media) to an internally accessible bibliographic database of the journal it has published for more than 20 years, The Journal of Ordinary Thought - Or JOT as it is called. The database is being created in a log-in controlled environ called CiteuLike. This application works like other reference-maintenance software available. Users, who are given controlled log-in web-based profiles, build collections and can add multiple levels of information to each bibliographic record. One of the types of information users can apply to records is tags.

     I am working as Metadata Specialist on this project, overseeing the work of the interns, editing tag-sets for better search potential and presenting examples through assorted instructional techniques best practices/policy for adding tags to the collection. One of the questions anyone who considers subject access in library catalogs is this notion of "aboutness" - that of determining what a piece of writing or other cultural artifact is about. In other words, if we could attach a subject to some cultural artifact, what would it be, how many subjects can one artifact have and how do we decide? To make matters more abstract, "subjects" themselves are also cultural products based on factors such as who might be in charge at the time, who is most likely to be the common users or viewers of said artifacts and whether there are requisite resources (money and other factors) at the time of creation of the bibliographic record to add or attach all possible permutations. For example, here is a link to a search for manuscript papers connected with Abraham Lincoln in Library of Congress' holdings. If one clicks on Andrew Johnson Papers, 1783-1947, it is apparent the record contains a summary of that collection's contents. This summary works to tell what the collection is about. On one level, this bibliographic record contains pure data referring to the collection. But on another level, the writing of the summary is a human-decided process that involves processing (thinking) and writing (also a human experience). It is not obvious what a piece of writing is "about" - even if the writer or bibliographic record creator states so - nor how it will affect the reader or viewer.

     These are the fun challenges in front of us on this project. It is underway and progress is being made. The document which holds the tagging Best Practices [which Merriam-Webster's Collegiate Dictionary defines as performances or forms which excel all others]* is being written along with some other helpful guidelines by way of examples with specific explanations. I think we will each learn a little something along the way.

Thank you.

- Jesse.

PS: As always, dialogue is welcome.

* Merriam-Webster's Collegiate Dictionary, 10th Ed. Springfield: Merriam-Webster, Incorporated, 2001. 108, 912

Thursday, March 22, 2012

A Thought on Linguistic Diversity and Classification

I have a tendency to believe linguistic diversity is also a sign of knowledge diversity and am very frustrated with attempts to globalize knowledge into one vast pot. I point to the impact of global mass communication content and technologies, the lack of allowing the "other" to truly be and the impact of the World's most widely used library classification system, Library of Congress Subject Headings. I am not taking a stance against The Library of Congress. I live in America and make use of their diverse resources regularly. Also, their main building is a work of architectural art. No, I question standardization of "knowledge" at the expense of diversity and questions. It seems to me that if we classify all the world's knowledge under one system (which is not the mission statement of the Library of Congress), then we have declared globally what everything in the world is "about." This action is accomplished by all kinds of groups around the world who write indexes to be LC compatible. But if those local knowledge resources and populations have to use another "aboutness" structure other than their own, have they not committed a kind of murder of their own knowledge system? Believe me, this is a bit scary. I am not sure that we can separate "knowledge" from "questions." I note this point because it seems to me to state up-front what something is about has already annihilated many potential questions - and thus knowledge types. How can this tendency sit well with ongoing questioning? Somehow, I feel this happens because we are afraid of uncertainty. This is not an overshadowing fear in this context, but a fear nonetheless. Surely it is different for different people. But why should we be afraid of conflicting and disagreeable classifications in information organization?