Archive for the ‘classification’ Category
When two things are like each other no comments
In the October 22nd issue of The Economist there’s an article about urban pulses (“Listen to the music of the traffic in the city,” p. 70). It reports on research (Miranda et al. 2016) that measures activities as diverse as Flickr posts and traffic volume, which together generate an impression about ebbs and flows of activity in a place over time, as well as identifying other similarities. The hook for the article is the notion that Alcatraz and Rockefeller Center turn out to have the same pulse.
It’s just one more example of the kind of situation that I wish we in knowledge organization (KO) were more concerned with. This is the notion that when two things are like each other it might be meaningful, whether the relationship between them is semantic or not. I think in KO we are too much oriented to semantic similarity systems to the exclusion of almost everything else. A good place to start might be to look for more research like this and subject it to meta-analytical analysis from the KO domain-analytical point of view. What sort of domain is urban pulse, or social-pulse taking (which apparently is a broader term, see the end of the article)? I don’t mean, who are its authors and what are its keywords, although that would be interesting too; I mean, what are the heuristics that lead to classes and how are the classes ordered?
I have been very interested in this approach to KO for a long time. It is one of the reasons I am so enthusiastic about the CIDOC-Conceptual Reference Model (CRM), a meta-level ontology for cultural heritage information sharing (http://new.cidoc-crm.org/). Apart from all of the other virtues of the CRM, it is obvious to me that metadata conformed to it can have a footprint made up of the particular combinations of entities, properties, and relationships expressed in the ontology. This was the subject of research undertaken in my last years at LIU (“Mining Maps of Information Objects” and “Classifying Information Objects” 2008). It also is the theoretical basis for my work on classification interaction (Smiraglia 2013; 2014a; 2014b), of my work with knowledge maps (Scharnhorst et al. forthcoming) and my work with Korean open government data (Park and Smiraglia 2014; Smiraglia and Park 2016).
The point is to use empirical research to discover instances when things that don’t seem to be the same actually are like each other, to generate classifications from those observations, and then to create pathways for navigating similarity discovery.
References
“Classifying Information Objects: An Exploratory Ontological Excursion.” Sergey Zherebchevsky, Nicolette Ceo, Michiko Tanaka, David Jank, Richard Smiraglia and Stephen Stead. Poster at 10th International ISKO Conference, Montréal, 5-8 August 2008.
Miranda, Fabio, Harish Doraiswamy, Marcos Lage, Kai Zhao, Bruno Goncalves, Luc Wilson, Mondrian Hsieh and Claudio Silva. 2016. “Urban Pulse: Capturing the Rhythm of Cities.” IEEE Transactions on Visualization and Computer Graphics PP, 99:1-1. doi: 10.1109/TVCG.2016.2598585
“Mining Maps of Information Objects: An Exploratory Ontological Excursion.” Sergey Zherebchevsky, Nicolette Ceo, Michiko Tanaka, David Jank, Richard Smiraglia and Stephen Stead. Poster at American Society for Information Science and Technology Annual Meeting, Columbus Ohio, October 24, 2008.
Park, Hyoungjoo and Richard P. Smiraglia. 2014. “Enhancing Data Curation of Cultural Heritage for Information Sharing: A Case Study using Open Government Data.” In Metadata and Semantics Research: 8th Research Conference, MTSR 2014, Karlsruhe, Germany, November 27‐29, 2014. Proceedings, ed. Sissi Closs, Rudi Studer, Emmanouel Garoufallou and Miguel-Angel Sicilia. Communications in Computer and Information Science 478: 95‐106.
Scharnhorst, Andrea, Richard P. Smiraglia, Alkim Almila Akdag Salah and Christophe Guéret. 2016. “Knowledge Maps of the UDC: Uses and Use Cases.” Knowledge Organization 43 forthcoming.
Smiraglia, Richard P. 2014a. “Classification Interaction Demonstrated Empirically.” In Knowledge organization in the 21st century: Between Historical Patterns and Future Prospects, Proceedings of the 13th International ISKO Conference, Krakow, Poland, May 19‐22, 2014, ed. Wiesław Babik. Advances in Knowledge Organization v. 14. Würzburg: Ergon‐Verlag, pp. 176‐83.
Smiraglia, Richard P. 2014b. “Extending the Visualization of Classification Interaction with Semantic Associations.” In Proceedings of the ASIST SIG/CR Classification Workshop, Seattle 1 November 2014.
Smiraglia, Richard P. 2013. “Big Classification: Using the Empirical Power of Classification Interaction.” In Proceedings of the ASIST SIG/CR Classification Workshop, Montréal, 2 November 2013, ed. D. Grant Campbell, p. 21‐29. doi: 10.7152/acro.v24i1.14673
Smiraglia, Richard P. and Hyoungjoo Park. 2016. “Using Korean Open Government Data for Data Curation and Data Integration.” DCMI 2016 OCS447 http://dcevents.dublincore.org/IntConf/index/pages/view/abstracts16#Smiraglia
Doubly thrilled no comments
Classification interaction is empirically demonstrated, and I’m thrilled about that. For the “Big Data” workshop at SIG/CR I proposed a preliminary survey research project in which a sample of the nine million UDC numbers in the WorldCat would be used to match deconstructed components of the UDC expressions to content-designated components of the respective bibliographic records. The purpose was to learn about the interrelationship between a faceted classification and the artifacts it represents. All of the variables (except age of work) were nominal-level, so I used Chi-squared to look for statistically-significant correlations. It was thrilling to find correlations all through the study. Results (and definitions of all of these terms!) are in the paper “Big Classification: Using the Empirical Power of Classification Interaction” in the 2013 SIG/CR Proceedings (or will be). The outcome is preliminary but exciting nonetheless.
But just when I thought it couldn’t get any better I took one more look at the largest results table and realized it was revealing a network among the correlations. I was therefore doubly thrilled (with some coaching from Laura Ridenour) to be able to create a visualization of that network structure using Gephi 0.8.2. Here is an early version (not the one that appears in the paper):
Flimsy Fabric and Epistemic Presumptions no comments
In two papers on authorship, my colleagues Hur-Li Lee and Hope Olson and I have been working out the intricate relationship between the iconic concept of attribution of responsibility, which seems to be the western notion, and the classificatory pillar, which seems to be the (what shall I say here?) non-western notion (derived in our papers from classic Chinese practice, and from what we can discern from the record of Callimichus).
What we see, however, is that “author,” even (or especially) in Anglo-American cataloging practice, is not about attribution, but about creating an alphabetico-classed arrangement of works. It gives a whole new twist to the concept of classification, but also to the comprehension of what often is called “bibliographic” description, which, it turns out, isn’t.
Here are the citations and the abstracts:
“The Flimsy Fabric of Authorship,” by Richard P. Smiraglia, Hur–‐Li Lee and Hope Olson. In Ménard, Elaine and Nesset, Valerie, eds., Information Science: Synergy through Diversity, Proceedings of the 38th Annual CAIS/ACSI Conference, Concordia University, Montreal, Quebec. June 2-4 2010. http://www.cais–‐acsi.ca/proceedings/2010/CAIS089_OlsonLeeSmiraglia_Final.pdf
This paper is about authorship, its influence on bibliography and how that influence is reflected in cataloging across cultures. Beginning with Foucault’s question “what is an author”, it proceeds to demonstrate, through an examination of cataloging standards, that it is the role that is represented rather than true intellectual responsibility.
“Epistemic Presumptions of Authorship,” Richard P. Smiraglia, Hur-Li Lee, Hope A. Olson. iConference’11, February 8–11, 2011, Seattle, Washington, U.S.A. ACM 1-58113-000-0/00/0010
The major concern of this paper is the cultural ramification of the bibliographic conception of “authorship.” Beginning with Foucault’s question “what is an author” and his notion of an author as a cultural phenomenon, the paper proceeds to examine the treatment of authorship in cataloging practices of two ancient cultures, the Greek and the Chinese, as well as in the modern Anglo-American cataloging standards from Panizzi’s 91 rules to the draft of Resource Description and Access (RDA). An author, as the study shows, is constructed as part of the recognition of “a work” as an essential communicative social entity. All cataloging practices and standards examined, east or west, ancient or modern, exhibit a similar obsessive attitude toward the imposition of an author, be it only a name or a culturally identified entity responsible for the work. In fact, the study demonstrates that as far as cataloging is concerned authorship is the role that is represented rather than any true intellectual responsibility.
A third paper has been accepted for an issue of Library Trends.
Bandwagons (originally posted 7-10-2010) no comments
I suspect in the year 2010 the bandwagon is so old a metaphor few remember what they actually were like. Originally it was a wagon that carried a band in a parade. All of the recent meanings of the term derive from this, because the music is the fun thing in the parade that makes spectators want to follow along or climb aboard. Oh well.
As I read study after study of social tagging I began to wonder about the behavior of taggers. Most studies have demonstrated various properties of the tags themselves, and several studies have suggested tagging is some sort of egalitarian indexing-for-the-masses that would be ever so more useful if the taggers would just stick to a thesaurus. But I considered both of those assumptions unlikely. For one thing, if you inhabit a social networking site just enough to watch the tags go by out of the corner of your eye each day you see a surprising number of them that are self-centered expressions (not just “todo” though there is plenty of that, but also “wtf” and so forth). Also, again watching out of the corner of your eye, the really fascinating thing about the tags is the network of associations among them–in other words, what happens if you click on one, and then when you get to that destination click on the first one there, and so on–you’ll not be following any road that a thesaurus would have led you along (stay tuned for a blog entry about my work at VKS with Wikipedia). There was a lot of discussion about the difference between the main tags and the little ones populating the outer corners of those tag clouds as well, and that reminded me of the problem of noesis, which is the ego-act of perceiving through one’s own experience–this is a hallmark of Husserl’s phenomenology.
I designed a study of tags as exploratory, with the purpose of surveying the tags assigned to a random sample of sites in Delicious.com. I wanted to compare what I would find to prior studies to see whether there was any theoretical potential (there was), and then subsequently to analyze the behavior of the taggers to look for noietic behavior. I submitted an abstract to this effect to the 11th International ISKO Conference in Rome, and also I drew my sample all in one day. I based a sample-size calculation on prior studies’ figures about the proportion of affective tags, and then in my enthusiasm drew twice as many cases (sites) as I needed for 95% confidence. I was excited to get my feet wet with this kind of research. I’m glad I drew the sample manually so I could watch the data as I downloaded the sites and their taggers and their tags. But now I know why people use crawlers for this! My abstract was accepted, and along with it came some helpful referee comments, which sent me to the literature of cognitive linguistics. Bear with me, I was on a learning curve here.
For the conference in Rome I wrote a summary paper about the behavior I observed among the taggers. I discovered plenty of noietic behavior, and interestingly enough, although I was able to affirm the proportion of affective tags–the figure from my study fell within the confidence interval of the prediction from prior studies–the surprise was that the noietic tagging was not affective tagging. I also analyzed the entire sample to see what I could learn about co-tagging–in other words, which taggers were tagging together, and here was my first surprise. A substantial core of the taggers were, in fact, all focused on work on the same sites, and their co-tagging was nested in two clusters, which I was able to identify roughly as web designers and programmers (remember, we’re talking about Delicious.com); the web designers’ tags were descriptive and the programmers’ tags were slightly more likely to be affective.
All of this convinced me I had figured it backwards–the noietic behavior was not the weird stuff in the long tail, but rather was the common ego-act perceptions of the tightly-knit group of co-taggers. In other words, here was a group of taggers all leaping on a bandwagon and in so doing classifying their commonly tagged sites with some very specific and (for taggers) relatively precise terminology.Here is a slide from the PowerPoint presentation of that paper. On the left you see the clusters of taggers, and on the right their tags. The point was that most of those tags could be seen as semantically related to two conceptual clusters–noesis as bandwagon effect. The paper is available in the ISKO Proceedings of the conference at Rome (Richard P. Smiraglia “Perception, Knowledge Organization, and Noetic Affective Social Tagging” pp. 64-7) but here is the abstract:
Knowledge organization can be postulated as existing on a continuum between classificatory activity and perception. Studying perception and its role in the identification of concepts is critical for the advancement of knowledge organization. The purpose of this research is to advance our understanding of the role of perception in knowledge organization systems. We briefly review the role of perception in knowledge organization and some preliminary evidence about affective social tagging, which is seen as a form of everyday classification. We consider how Husserlian phenomenology might be useful for analyzing the role of perception in affective social tagging. Finally, preliminary results of an empirical study are reported.
Because this was for ISKO I was intentionally focussed on the KO issue, which I here stated as a continuum between classificatory activity and perception. I gave a paper on noesis at ISKO in Montréal as well (scroll down, it’s the mailbox paper). I think that we think too often that classification is about putting things in little boxes, and therefore that we think too little about how fuzzy are the boundaries of those boxes. So here is just a glmpse at that issue.
As I said, the referees had sent me to cognitive linguistics, and I found particular resonance in the writing of Ronald Langacker (Langacker, Ronald W. 2005. Dynamicity, fictivity, and scanning: the imaginative basis of logic and linguistic meaning. In Pecher, Diane and Rolf A. Zwaan eds., Grounding cognition : the role of perception and action in memory, language, and thinking. Cambridge : Cambridge Univ. Pr., pp. 164-97). Scanning is the linguistic activity in which a kind of shorthand is used to project a landscape on which perceived activity is taking place; it results in “fictive” or at least unfactual language, but common understanding allows and even encourages this. Here’s a PowerPoint slide from my presentation at CAIS in Montreal in June.
The example is the phrase “my teacher’s books keep getting longer.” What is meant is that each time the teacher writes a book (or, one supposes, even buys a book) it is longer than the last. But that isn’t what was said at all, and obviously the idea that the teacher has a stack of books that is somehow stretching is absurd. It seemed likely that some of the variation in tagging might be due to scanning.
I wanted to complete the statistical analysis of the data and to present a fuller account of the study apart from the philosophical issue of noesis, so I submitted an abstract to CAIS for this year (2010); that abstract was accepted. To my chagrin, instead of the typical complete CAIS-paper, this year someone had decided to allow only what they called “extended abstracts,” which gave one precious little space. Nevertheless, I gave a presentation during the conference, and the “extended abstract” (Smiraglia, “Self-Reflection, Perception, Cognitive Semantics: How Social is Social Tagging?”) is in the proceedings, here: http://www.cais-acsi.ca/proceedings/2010/CAIS055_Smiraglia_Final.pdf.
This was pretty exciting for a couple of reasons. One was that the Globe and Mail got wind of it and kept asking for more text. Unfortunately all I had was the extended abstract, which must not have been enough because I never saw myself quoted. Still, for a moment there I was flirting with the thrill of being reported in the press. As I say often, oh well.
The research itself was exciting enough however. The fictive scanning was there, although once again in small proportions–less than 1% of the total. But more important was the extension of this notion of social classification. It turned out that all of the sites in the study had clusters like those we saw above. In fact, most of the tags were somehow or other associated with the bandwagon effect. There were typically 4 or 5 clusters per site, 2/3 of the tags fell into the clusters, and 1/2 of the tags fell into the two largest clusters. Voila, classification that is social.
I really want to go make a cup of tea but I suppose I should finish with the conclusions, which were:
The taggers collectively are generating a classification with a social basis.
Also, the clusters are not mutually exclusive, demonstrating that a natural classification is not necessarily either hierarchical, or mutually exclusive. But it does remain collectively, potentially, exhaustive.
Warrant becomes a new issue in such a classification, because there is no accountable literary warrant—rather warrant is cultural (as Beghtol predicted).
Those look like some interesting hypotheses for future research to me.
I suppose I should write this up for a journal. But what I really want to do now is look for the same effect on more social social-networking sites.
UDC (originally posted 1-2-2010) no comments
This post is courtesy of Aida Slavic (“Aida Slavic”<aida@acorweb.net>)
Hi,
The UDC Summary of around 2,000 classes has been online since October 2009 and can now be browsed in ten languages at
http://www.udcc.org/udcsummary/php/index.php (English, German, Dutch, French, Spanish, Russian, Swedish, Croatian, Slovenian, Finnish)
The UDC summary is fully aligned with the UDC MRF 2009 which is going to be released in the following months.This set is made available for free
use under the Creative Commons Attribution Share Alike 3.0 license (CC-BY-SA).
The work is very much ‘in progress’. We are adding language data and updates as we speak and changes will be visible on a daily basis.
Captions in all languages appear first and then scope notes, application notes and example of combinations are added as updates progress.
The effort put into the UDC Summary is entirely voluntary including the programming support, the work of our language editors and translators
for which we are most grateful. Read more at the UDC blog <http://universaldecimalclassification.blogspot.com> or at the UDC Summary webpage.
Contributions and feedback are invited
Kind regards
Aida
**************************************************
Of course, Otlet saw the UDC as the classification that would underpin all of his other ideas. Where some utopians saw brilliant cities shining on hilltops Otlet saw the interweaving of the structure of knowledge and this mechanism that could approach its explanation and yield further insight. And here it comes now.