You are on page 1of 15

Adrian Salas

MIAS 230, Winter 2015

Tagging and Controlled Vocabularies: Friends Forever


Cataloging systems are traditionally the domain of trained information professionals
and/or subject specialists who wielded knowledge of metadata schemas and specialized
vocabularies to fashion entryways into which researchers could access information within
systems.12 With the growth of the user-editable web 2.0 world in the form of forums like wikis,
through areas like user tagging (folksonomies), and the concept of crowd sourcing, the keepers
of the information gateways are not always as clear cut as before. The solution for how to adapt
information retrieval models and systems for this shifting world is not an either/or approach, but
rather to blend the two concepts of user and professional information description together.
Information specialists of some form will always be needed to assure at least a baseline level of
form and structure to data in a system, but the contributions of a knowledgeable user base are
valuable in expanding or exposing new entryways to discovery that catalogers and other
information professionals may not be able to include on their own, due to lack of time or
resources. Discovery systems could benefit immensely from the inclusion of a mediated
information tagging option which reflects the internet use habits of the public, along with the
more traditional structured data fields of the metadata universe.3 In the service of thinking ahead,
many of these discovery systems must also begin to formulate ways to incorporate the
1 Harpring, Patricia. Introduction to Controlled Vocabularies [electronic Resource]:
Terminology for Art, Architecture, and Other Cultural Works / Patricia Harpring. 1st ed. Getty
Publications Virtual Library. Los Angeles, CA.: Getty Research Institute, 2010.

2 Baca, Murtha, and Getty Research Institute. Introduction to Metadata [electronic Resource]
/ Edited by Murtha Baca. 2nd ed. Getty Publications Virtual Library. Los Angeles, CA: Getty
Research Institute, 2008.

Salas 1

possibilities offered by the interconnected semantic web, and find ways that a more advanced
form of tagging could begin to be formulated in which users would begin to provide data that
could be potentially massaged into forms that would enhance connections that localized content
could draw with external systems.
Tagging as a practice is laid out in simple form by Jonathan Furner in his article on the
concept of folksonomies: Tagging is the activity of assigning descriptive labels to useful (or
potentially useful) resources.4 Furner goes on to further elaborate how the descriptive labels that
are assigned to resources can be thought of as index terms which are the main component of
indexing. One can extrapolate the user assigned resource labeling as a rougher version of subject
cataloging as performed by trained librarians using subject headings from controlled
vocabularies. Social tagging in relation to web and electronic resources for the purposes
discussed herein though focuses on metadata created by end users of electronic resources in the
service of description, rather than an alternate term for trained professional index and catalog
work or automated processes that harvest and input data according to an algorithm or script. As
Furner goes on to elaborate, the description of resources ultimately serves a purpose of resource
discovery as the end product.5 While this statement seems to be very self-evident, it is important
to keep in mind that all the various schemas and metadata models involved with cataloging items
for use by researchers must be constructed with this goal in mind, or managed data will be
rendered invisible by lack of discoverability accessibility.
3 Elings, Mary W., and Gnter Waibel. "Metadata for All: Descriptive Standards and Metadata
Sharing across Libraries, Archives and Museums." VRA Bulletin. 34(1): 7. First Monday, 2007.
4 Furner, Jonathan. Folksonomies. In Encyclopedia of Library and Information Sciences, 3rd
ed. Taylor
and Francis. New York; Published online: 09 Dec 2009; 1858-1866. 1858.
5 Ibid. 1859.

Salas 2

Relying on social tagging by itself for resource discovery in a system is a very risky
proposal outside of the context of small specialized communities of interest, for instance online
photo albums that are already categorized by general themes or online videos of a similar form or
genre. Researchers in localized and limited scale contexts like these can reasonably be assumed
to have already arrived at content that approximates the areas of their research interests and
goals. Tagging by users in situations such as the ones aforementioned can be of great use as the
resource labels assigned by motivated social taggers can provide further granular refinements to
already aggregated collections. Refinements and description provided by active resource users
can save other researchers time in searching or even allow for enhanced discovery by illustrating
relationships and commonalities between the digital content represented in a collection. As
researcher Corinne Jrgensen notes, there are other semantic gaps that exist, between groups
with different needs, goals, and interests, between providers and users.6 These semantic gaps
can be understood as the difference between the precise, controlled language used by cataloging
and information professionals for description, and the less refined natural language employed
by everyday users. Socially generated tagging can be useful for bridging the semantic gap,
because the terminology employed could theoretically come out of community usage rather than
rigidly beholden to specialized vocabularies such as the Library of Congress Subject Headings.
Librarians and catalogers are trained to catalog materials in an unbiased manner to the best of
their abilities, which is good for constructing clean and uniform data. Social taggers are not
necessarily beholden to the same standards or training, which means that there is a danger for
sloppy or perhaps roundly inappropriate metadata creation, but also it could also mean that items
may be tagged with a more extensive natural language vocabulary and therefore offer researchers
6 Jrgensen, Corinne. 2007. Image Access, the Semantic Gap, and Social Tagging as a
Paradigm Shift. Florida State University, Classification Research Workshop. 3.

Salas 3

a larger number of access points to information that conform to less formalized patterns of
search.
While the use of tags provided by users does elicit Utopian visions of peoples
librarianship where the shared language of the masses guides the syntactic choices providing
entry into data discovery, Furner points out, retrieval systems based on tagging tend to suffer
from the effects of a particular kind of tyranny of the majority or rich get richer phenomenon,
by which resources that have received most tags in the past are most likely to receive tags in the
future, and by which tags that have been assigned to most resources in the past are most likely to
be assigned in the future.7 A consumer based tagging system relies on users to actively offer
their services, in this case their interpretive and descriptive skills. This means that discovery
tools which hope to incorporate tagging as a feature then must do more than just provide a search
portal, but also find ways to actively engage users. A failure to do so will result in a tagging
feature which exists on the page but is also largely atrophied as a useful resource.
A useful example of under-utilized tagging in a resource discovery system is the OCLC
WorldCat online catalog. The site functions as a massive aggregated catalog of library holdings
across an international range. Users can search across a variety of fields including title, author,
subject, keyword, and a variety of assigned identification numbers such as OCLC and ISBN
numbers. These results will bring up a variety of expression records which a user can then further
refine through facets or by choosing records, which can then be viewed as aggregates of multiple
editions or further separated into individual manifestations. OCLC itself has no collections, but
its search will direct users to a listing of the local catalogs of individual institutions which list the
desired item in their holdings. When a researcher looks at the bibliographic records of resources

7 Furner, 1860

Salas 4

on the WorldCat site itself, the record functions as a sort of content portal in which different
types of user-supplied information can be contributed in addition to viewing bibliographic
details. The user submitted fields include reviews, reading lists, and a tags field. The tags field
on most records viewed is noticeably bereft of user contributions. Even looking at the records for
popular contemporary book and video titles often yields fields showing only one or two tags, and
often only from one user.
While the tags field may be rather thin on WorldCat, there are nonetheless records for
items which do contain some user tags. These tags function as useful bridges between multiple
items, allowing users to find works of a similar nature. For instance, a search for Superman
archives will retrieve several separate volumes of Superman Archives books of reprints from
DC Comics. Clicking into the detailed record of these volumes will show that a user has gone
through and input consistent tags for each volume consisting of dc archives, superman, and
the relevant volume such as vol 6. Due to the consistency of this users tagging structure across
items, and the lack of other resources on WorldCat having particularly in-depth tags of their own,
the selection of tags all provide useful ways for exploring and navigating the DC Comics
Archives series holdings across WorldCat. According to one report, [o]ne of the main problems
with tags in folksonomies is the absence of context.8 Tags can be troublesome because each
variant tag will often be indexed in a system, but these do not get linked together in search
results. As can be seen on sites utilizing tags as search entry ways, often small variations on the
same phrase can be indexed differently, such as Spiderman and Spider-man. The report author
goes on to suggest that for effective social tagging going forward, ideally a way to integrate a
thesaurus or taxonomy to link together related concepts or phrases would be an ideal method to
8 Wichowski, Alexis. "Survival of the fittest tag: Folksonomies, findability, and the evolution
of information organization" First Monday [Online], Volume 14 Number 5 (8 April 2009)

Salas 5

clean up social tagging entries, while still also being open to the variety of information that could
be gleaned from the wide pool of knowledge.
YouTube has become nearly synonymous with the concept of online videos, and as such
is often a primary entry-way for those looking for online access to moving image materials.
Noticeably absent from the video pages related to any content is a tagging space for viewers to
contribute descriptive content to videos. A digital marketing blog offering guidance on meta-tag
creations reveals that historically the lack of a tagging field for video consumers is a somewhat
recent development in the evolution of YouTubes user interface, as the feature for viewer
tagging was present for a time until being disabled in August 2012.9 Citing a Google employee
posting to a help forum for YouTube, a couple of reasons are listed for doing away with social
tagging: Having them on the watch page, in some cases, gave users an opportunity to abuse
tags by copying them from other videos. We also didn't see much usage of tags by the average
viewer.10 Tags are still in use on the YouTube back-end for discovery purpose, but these are not
socially generated/crowd sourced tags, but additional metadata contributed by content creators
upon upload of materials to YouTube. The need to disable social tagging on YouTube due to what
amounts to plagiarizing is an interesting illustration of the different priorities between YouTube
and its commercial alignment and the needs of many library and archival based cataloging
enterprises which are focused more on issues of access and discoverability.

9 "How to Find the Meta Tags for Any YouTube Video [Creator's Tip #125]." ReelSEO RSS.
January 9, 2014. Accessed March 10, 2015. http://www.reelseo.com/meta-tags-youtubecreators-tip-125/.
10 "Change to Tags on the Video Page." Google Products Forum. August 16, 2012. Accessed
March 9, 2015. https://productforums.google.com/forum/#!
topic/youtube/1bfUDA_iFu8/discussion.

Salas 6

The Internet Movie Database (IMDB) has a rich selection of descriptive tags for many of
the movies they have listed in the form of user contributable Plot Keywords. While not a
catalog meant to guide users through the holdings of a specified entity, the IMDB is a useful tool
for developing a sense of user interactions with moving image information sources. Structured as
a straight database, albeit with commercial interests in mind as it is owned by Amazon, IMDB
presents filmographic information to users in a fairly straightforward manner. Once a user can
look past the ads and paid links, the site is essentially a presentation of technical metadata on a
moving image blended with user editable descriptive fields, such as plot summaries, reviews, and
the de-facto keyword tagging system of the plot keywords field. The variety and granularity of
plot keywords puts a focus on the possibilities for creating detailed and comprehensive indexes
of films visual content. A quick look at several films plot keywords sections show that the
variety of terms being listed by users are not just major plot points but also notations of items
and imagery that appear in the films. In short, the terms can function as a wide ranging visual
index to the contents of particular works. The variety of terms, often with many being just
variant forms of the same phrase, does point to the desirability to somehow incorporating
authority control to maintain clean records, and insure uniformity among data. IMDB has
somehow overcome the contribution deficiency present in the bibliographic catalogs such as
WorldCat or in catalogs more focused on archival and visual resources such as the Smithsonian
Institutions Collection Search Center.
In the realm of visual resources, both moving and still, there is a great value to tags in
creating discoverability in systems. There are two methods traditionally employed for the
creation of tags, both with strengths and deficiencies: Standard text-based methods of indexing
or classifying images (controlled vocabulary and thesauri) have emphasized the importance of

Salas 7

authority and consistency in description, while automated systems are also constrained by
explicit and implicit rule-based methods.11 Both of these methods, as opposed to social tagging
are valued for their rule based precision in tag creation. Writing in respect to web based
multimedia services the role of the information professional in assigning descriptive information
used for tagging is summarized thusly:
Experts can produce these textual features via manual inspection, a practice still used
today. This solution, however, is manpower intensive and limits the scale at which content
managers can operate. More importantly, even for well curated online videos, there is
evidence that their textual features can be further improved to attract more search traffic
(Santos-Neto, et al., 2014). Therefore, mechanisms to support this process (e.g.,
automating tag/title suggestions) are desirable.12
The issue with expert and professional textural description here is not with the quality of the
terms, (although we can once again bring the argument back to the issue of the semantic gap
and its tension between precision terminology and the natural language with which users are
presumed more likely to search) but with time and labor required to produce adequate
description, while still keeping up with processing enough to not create backlogs. The time and
resource drain caused through manual cataloging is not negligible in the face of the rapid
propagation of digital multimedia material in both commercial and cultural institutions.
Automated information indexing tools are useful for taking some burden off of
information personnel in creating data, but these systems have their limits. In a short New
11 Jrgensen, 2.
12Pontes, Tatiana, Santos-Neto, Elizeu, Almeida, Jussara, AND Ripeanu, Matei. "Where are
the key words? Optimizing multimedia textual attributes to improve viewership" First
Monday [Online], Volume 20 Number 3 (20 February 2015

Salas 8

York Times article interviewing the creative director of their research and development lab,
Alexis Lloyd, about a new social tagging project called Madison, Lloyd lays out the trouble
that the Times has with digitally indexing their advertisement content from archival
newspapers: The digitization of our archives has primarily focused on news, leaving the ads
with little data associated with them making them very hard to find and impossible to search
for. Complicating the process further is that these ads often have complex layouts and
elaborate typefaces, making them difficult to differentiate algorithmically from photographic
content, and much more difficult to scan for text.13 In the face of elements that fall out of the
realm of standardized text, automated systems like optical character readers (OCR) still are
unable to cope sufficiently with the content present to provide useful indexing data. It is here
that a crowd sourced option becomes a more desirable alternative, especially in the face of
diverting professional human information capital and resources from other projects that are of
greater immediate concern to organizations.
An interesting hybrid approach to enhancing discoverability in multimedia objects is a
hybrid approach to information creating/gathering. In an examination of keyword optimization
for searchability, researchers make the observation that, [t]he literature is rich in automatic
tag recommendation strategies, which exploit various data sources to extract candidate
keywords for a target content (including, for example, other pieces of textual attributes
associated with the target content, such as its title and description).14 The data sources

13 Allen, Erika. "Viewing Old Times Ads With a New Tool Called Madison." New York Times,
October 16, 2014. Accessed March 10, 2015. http://www.nytimes.com/timesinsider/2014/10/16/viewing-old-times-ads-with-a-new-tool-called-madison/?_r=0.
14 Pontes, Tatiana, Santos-Neto, Elizeu, Almeida, Jussara, AND Ripeanu, Matei. "Where are
the key words? Optimizing multimedia textual attributes to improve viewership" First
Monday [Online], Volume 20 Number 3 (20 February 2015

Salas 9

proposed for information harvesting by automated processes are not the primary content itself
but rather a variety of ancillary sources of varying provenance: For movies, for example,
there is a plethora of sources from which an automated tag recommendation method could
extract keyword candidates: Wikipedia (a peer-produced encyclopedia), Movie Lens and
Rotten Tomatoes (social networks where movie enthusiasts collaboratively catalog, rate, and
annotate movies), New York Times movie review section, or even YouTube comments are
potential sources of candidate keywords to annotate multimedia content.15 The choices are
rather interesting because they show that the researchers are open to using resources that
utilize the labor of socially sourced information such as Wikipedia and Rotten Tomatoes to
generate useable tags. It is not outside the realm of feasibility to see the user contributed plot
keywords on the IMDB as also being a valid data mining source, in addition to the more
traditionally structured information fields such as titles, cast and crew. One must recognize
though that data mining, particularly from commercial sites could potentially raise intellectual
property issues.
The main concern of the report authors with taking keywords through a harvesting
strategy is the need to develop a system that can make qualitative assertions on the tag material
that an automated feature would take from sources such as the ones which were
aforementioned. Utilizing data from an involved community can be of particularly good use to
moving image collections, because the use of sourced descriptive data could be a possible
component towards enhanced item level description of content in which things like individual
scenes or visual moments could be cataloged and indexed for searchability. While scene by
scene logging of footage is possible, it is very time and labor intensive, and if left to only

15 Ibid.

Salas 10

cataloging staff, it could drastically slow the workflows beyond acceptability in institutions
with moving image collections of any significant size. On the other hand, a system of
automating tag assignation to assets from existing crowd sourced data-sets could go a long
way to making content searches much more comprehensive and valuable for researchers and
interested parties. If a way could be found of efficiently assigning tags that correspond to
actual points in a moving image timecode through automation or crowd sourcing, it would
enhance usability of assets even more. One should keep in mind though that a hybrid method
utilizing automation and social tagging would still require at least cursory inspection from
information or data professionals to ensure accuracy, and thus not completely unburden
workflows concerned with at least certain degrees of accuracy.
Linked data will be important to bring tagging into the future. Catalogs need to be sure
they are taking the developing semantic web/linked data standards into account in the design
phase to ensure interoperability with the greater web outside of the local institutional contexts
that many catalogs have in mind when they are designed. Moving image cataloger Martha Yee,
writing in 2009, outlines the goal of the semantic web which is the idea that we might be able
to replace the existing HTML-based Web consisting of marked-up documents-or pages-with a
new RDF based Web consisting of data encoded as classes, class properties, and class
relationships (semantic linkages), allowing the Web to become a huge shared database.16 By
incorporating new data models such as RDF and FRBR, along with markup languages such as
XML which is noted for its interoperability across systems, catalogs are readying the
groundwork for the possibilities suggested by the huge shared database concept of the web.

16 Yee, Martha M. "Can Bibliographic Data Be Put Directly onto the Semantic Web?"
Information Technology and Libraries 28, no. 2 (2009): 55-80. Accessed March 10, 2015.
http://escholarship.org/uc/item/91b1830k. 55.

Salas 11

Hash-tagging is a related form of tagging where users can create links within a comment
or tweet by appending a subject word or phrase and using a # to delimit it from the parent text.
Hash-tagging adapts to how people use the internet in casual situations and can be incorporated
into the activities users are already engaged in, such as commenting or tweeting. One report on
the study of the social media photo sharing application Instagram links the use of tags and
hashtags to folksonomies and their user generated descriptive vocabularies. Taking the use of
descriptive hashtags a step beyond just their consideration as a descriptive device for searching
the reports authors state, [t]agging and folksonomies can be harnessed not just for the basic
organization of media and information, but via the tools and access provided by many platforms
can lead of [sic] new social and technical ways of manipulating and harnessing data.17 This
statement anticipates the role tags could play in contributing to richer linked data environments,
should they be correctly incorporated into semantic information structures going forward.
Standalone tagging features on the other hand are often relegated to areas outside the
normal flow of most users. Catalogs need to find a way to incentivize tagging so that users feel
an investment in contributing knowledge and fosters a sense of ownership in the represented
content. An alternate tact that catalog designers can try to take is the creation of guided tagging
fields, where the information contributed is not necessarily as unstructured as a free text keyword
tag field. For instance, a Facebook-like model where profile information is contributed to fill in
very specific fields ranging from name, date of birth, workplace, and hometown in a controlled
and structured manner. The data contributed by users may have some general parameters such as
date formatting for birthdate, but as far as other information there is no professional authority
who judges the correctness of data outside of what a user contributes. In the interest in keepi9ng
17 Highfield, Tim, and Leaver, Tama. "A methodology for mapping Instagram hashtags" First
Monday [Online], Volume 20 Number 1 (26 December 2014)

Salas 12

data somewhat clean and standardized there could be some data conforming implemented to a
thesaurus or taxonomy in appropriate fields, such as those dealing with geographical locations
With the growing move towards semantic web data structure implementation, the more
information a user contributes, the more searchability increases for that individual due to both
the variety of information that can searched and the refinements and connections that can be
made, as evidenced in features such Facebooks ability to draw relationships between
information concepts in its graph search function. The challenge is to create catalogs that feel
more like interactive content portals to users, rather than staid and somewhat opaque search
components requiring somewhat specialized skills to successfully navigate in the search for
desired material.
The concept of information tagging holds great promise for data creation that can be
utilized in enhancing the function of the institutional catalog. By utilizing the efforts of interested
individuals through the connecting effect of the internet, there are many opportunities for
creating and enhancing the descriptive data that is the foundation of searchability, discoverability
and access. For moving images particularly, there are many possibilities for supplementing the
cataloging efforts of professionally trained individuals with the knowledge of involved and
invested outside contributors, and in the process creating more robust and descriptive indexes to
time based media. One large hurdle that librarians, system designers, and content managers will
have to overcome to gather useful information is to find a way to engage end users to make
substantial contributions to tagging operations, particularly on catalogs which are not as sexy as
social media content portals in encouraging participation. As the internet and digital librarianship
go forward though, if institutions can find ways to integrate the structures of the semantic web
into their workflows in a way that will include tagging information supplied through

Salas 13

crowdsourcing, there are many possibilities to create a store-house of useful descriptive


information that can supplement professional work and increase the discoverability of materials,
and therefore enhance their access and invigorate their use.

Works Cited
Allen, Erika. "Viewing Old Times Ads With a New Tool Called Madison." New York
Times, October 16, 2014. Accessed March 10, 2015. http://www.nytimes.com/timesinsider/2014/10/16/viewing-old-times-ads-with-a-new-tool-called-madison/?_r=0
Baca, Murtha, and Getty Research Institute. Introduction to Metadata [electronic
Resource] / Edited by Murtha Baca. 2nd ed. Getty Publications Virtual Library. Los Angeles,
CA: Getty Research Institute, 2008.
"Change to Tags on the Video Page." Google Products Forum. August 16, 2012. Accessed
March 9, 2015. https://productforums.google.com/forum/#!
topic/youtube/1bfUDA_iFu8/discussion.
Elings, Mary W., and Gnter Waibel. "Metadata for All: Descriptive Standards and
Metadata Sharing across Libraries, Archives and Museums." VRA Bulletin. 34(1): 7. First
Monday, 2007.
Furner, Jonathan. Folksonomies. In Encyclopedia of Library and Information Sciences,
3rd ed. Taylor and Francis. New York; Published online: 09 Dec 2009; 1858-1866. 1858.
Harpring, Patricia. Introduction to Controlled Vocabularies [electronic Resource]:
Terminology for Art, Architecture, and Other Cultural Works / Patricia Harpring. 1st ed. Getty
Publications Virtual Library. Los Angeles, CA. Getty Research Institute, 2010.
Highfield, Tim, and Leaver, Tama. "A methodology for mapping Instagram hashtags"
First Monday [Online], Volume 20 Number 1 (26 December 2014)

Salas 14

"How to Find the Meta Tags for Any YouTube Video [Creator's Tip #125]." ReelSEO
RSS. January 9, 2014. Accessed March 10, 2015. http://www.reelseo.com/meta-tags-youtubecreators-tip-125/.
Jrgensen, Corinne. 2007. Image Access, the Semantic Gap, and Social Tagging as a
Paradigm Shift. Florida State University, Classification Research Workshop.
Pontes, Tatiana, Santos-Neto, Elizeu, Almeida, Jussara, AND Ripeanu, Matei. "Where are
the key words? Optimizing multimedia textual attributes to improve viewership" First Monday
[Online], Volume 20 Number 3 (20 February 2015
Wichowski, Alexis. "Survival of the fittest tag: Folksonomies, findability, and the
evolution of information organization" First Monday [Online], Volume 14 Number 5 (8 April
2009)
Yee, Martha M. "Can Bibliographic Data Be Put Directly onto the Semantic Web?"
Information Technology and Libraries 28, no. 2 (2009): 55-80. Accessed March 10, 2015.
http://escholarship.org/uc/item/91b1830k.

Salas 15