Beruflich Dokumente
Kultur Dokumente
the world, enabling communication among users and collecting massive amounts of data for
social media companies” (Lomborg & Bechmann, 2014), web archivists and researchers have
discovered a valuable resource for retrospective analysis across many domains of study. As a
blossoming “object of analysis for audience studies,” web archiving is “a particularly useful
method in studies of users’ communicative practices” (Lomborg, 2012). While the “web
encourages the constant creation and distribution of large amounts of information” (Dougherty &
Meyer, 2014), social media conglomerates, like Facebook and Twitter, have become an
influential means of “understanding human behavior and communication.” In order to “take full
advantage of the web as a research resource that extends beyond the consideration of snapshots
of the present,” social media websites have become a necessity in efforts to “take web archiving
much more seriously as an important element of any research program involving web resources.”
The aim of this paper is to evaluate the importance of web archiving for research purposes with
special deference to the preservation of social media content for retrospective analysis in
process of gathering up (harvesting) data that have been published on the World Wide Web,
storing it, ensuring the data are preserved in an archive, and making the collected data available
for future research” (Littman et al., 2016). Web archiving is paramount to scholars of cultural
heritage and recorded human knowledge, which includes “an increasingly comprehensive record
of information production and social interaction over time as more activities become web-
1 STEPHENSON
enabled or web-based” (Dougherty & Meyer, 2014). Just as web preservation has been deemed a
cultural and historical necessity, so, too, has the focus increasingly shifted to social media
analytics across a diverse range of disciplines. Archivists recognize the importance of capturing
online content and communication patterns for the study of politics, debate, linguistics, cultural
diversity, examination of public reaction to world events, and health and social sciences. Social
media has emerged as an area “of interest to researchers in fields from computer science and
medicine to business, economics, and the humanities” (Littman et al., 2016). All of these
essential points of “inquiry have contributed to shaping the descriptive, methodological, and
theoretical bases of scholarship centered on web archiving” (Dougherty & Meyer, 2014). As we
head steadfast towards a future where physical artifacts and the written word are shifting to
digital presence and online publication, it “will be very hard for future scholars even in 5 years,
10 years to understand what kinds of political and social and cultural moments or phenomena
retrospectively without key aspects of the web.” According to Julien Masanès (2010), the web is
a “pervasive and ephemeral media where modern culture in a large sense finds a natural form of
expression.” Nowhere has this been so glaringly evident and prolific than in the wake of social
social networking systems” (McCown & Nelson, 2009), web archivists have taken note,
including national libraries and non-profit institutions like the Internet Archive, who “have been
working for years on archiving the Web for posterity.” Others, like the Wayback Machine,
developed in 2001 to proactively archive the web, as well as on-demand services like archive.is,
have forged the way for web archivists to build “time capsules” of web content to be shared and
studied. These services play a vital role in “today’s information ecosystem, by ensuring the
continuing availability of information, or by deliberately caching content that might get deleted
2 STEPHENSON
or removed” (Zannettou et al., 2018). But as the web has “begun transitioning into a Web 2.0
world, archivists” (McCown & Nelson, 2009) have worked to adapt with developing new
techniques in efforts to archive social media content, because a “growing amount of personal
(and what will be historically significant) information is locked behind the walled garden of
“enabled by the web itself, and an assessment of its utility in the context of qualitative social
media research, and, more broadly audience research.” The practice of web archiving affords
researchers the opportunity to harvest texts and relevant metadata, such as time stamps, profile
information and tags from the Internet, while in social media research, the method has been
incorporated for quantitative purposes of analyzing patterns and norms across networks of
communication. But web archiving “is also well suited for qualitative studies addressing key
involves the generation of a coherent corpus of communications in a given time frame, and in a
given networked context online” (Lomborg, 2012). Therefore, social media archives are “highly
useful data corpuses for the systematic, fine-grained study of naturally occurring, textualized
interactions on social media.” Social media platforms like Facebook collect massive amounts of
data about their user base and usage patterns, which has proven to be an ethical slippery slope for
the company over concerns about privacy, consent, and potential abuse of information. While
these relevant questions and issues remain up for debate, the rich data and analytics generated by
social media engagement is a goldmine for “data-driven researchers in the social sciences”
(Thomson & Kilbride, 2015), and much “of the personal artifacts stored in Facebook accounts
3 STEPHENSON
will likely prove valuable to users’ surviving family, children and grandchildren, and certainly to
challenging as these platforms are presented in a different way to ‘traditional’ websites” (Webber,
2017), which usually constitutes the retrieval and storage of web pages via HTTP and
accomplished through the use of web crawlers. Due to technical difficulties and restrictions that
prevent access to web crawlers, social media platforms employ the use of Application
Programming Interfaces (API’s) as a way to enable controlled access to lucrative data. While
harvesting and obstacles of password protection, the use of efficient tools like APIs “provides
access to nonpublic Internet environments, such as those requiring authentication through login
and password, because the data collection runs directly through the back-end of the social media
service to which the data belong” (Thomson & Kilbride, 2015). Furthermore, compared with
direct transfer and other methods that require formalized collaboration or affiliation, APIs are
publicly available. Albeit limited in access, the large amounts of behavioral data still available
for analysis through the use of APIs has been beneficial to social media research and future
scholars alike. The quality of capture of sites like Facebook can vary, according to the UK Web
Archive, but they contend that, “a lot of valuable information can still be gathered from these
instances” (Webber, 2017). Littman et al. (2016) suggests “that aligning social media collecting
with web archiving practices and tools addresses many of the most pressing needs of current and
to collecting social media data have included the “purchasing from data resellers; using a
commercial, third-party service; using a platform self-archiving service; and harvesting with web
4 STEPHENSON
crawlers” (Littman et al., 2016). APIs, however, are advantageous for purposes of structured
data, usually JSON or XML, “which is essential for research that involves applying
computational techniques.” APIs also remain relatively stable and tend to provide more metadata
than they do via their websites, and that data is generally collected more efficiently. Some
caveats, on the other hand, are that not all social media platforms have complete, public APIs,
nor is the data readily human-viewable. It is also important to note that every API is different,
and “there is no standard accepted format for the archival storage of social media data” (Littman
et al., 2016). Another disadvantage is that some social media platforms place limitations on the
amount of data that can be harvested via their API, making it difficult or nearly impossible to
capture older content. Yet another challenge to be addressed in the coming years involves the
existing field of social media harvesting tools, which has thus far been “largely unaligned with
web archiving technology – in terms of capture, storage, format standards, access and reuse, and
legal restrictions” (Littman et al., 2016). But scholars do acknowledge “great promise in aligning
social media collecting with web archiving” in the near future as an “opportunity to jointly
engage in a conversation with social media researchers and robust archives.” As historians and
other researchers increasingly refer to and use web archives, their experiences will likely shape
continue to evolve in the development of tools, strategies, and techniques to capture dynamic
online content, but “the increasingly validated value of social media as digital heritage”
(Thomson & Kilbride, 2015) necessitates “immediate action to capture and preserve in order to
ensure access in the future.” The use of APIs, while not flawless, has provided web archivists an
opportunity to collect valuable data from social media networking systems by which to glean
5 STEPHENSON
insight useful to researchers of behavioral and communicative study amongst other disciplines.
Much literature has been published highlighting the importance of web archiving in practice, but
only in recent years has social media come to the forefront of discussion and major efforts been
made to preserve its content. “Social media services are primarily designed with immediate use
in mind and, because the content is forever changing and being deleted, it is at a high risk of
being lost forever” (Storrar, 2014). While web archiving tools develop in the shadow of a fast-
paced, ever-changing web, it is with great urgency to researchers that such important information
6 STEPHENSON
References
Dougherty, M., & Meyer, E. T. (2014). Community, Tools, and Practices in Web Archiving: The
State-of-the-Art in Relation to Social Science and Humanities Research Needs. Journal
EBSCOhost, doi:10.1080/15228835.2012.744719.
Lomborg, S., & Bechmann, A. (2014). Using APIs for Data Collection on Social Media.
Information Society, 30(4), 256-265.
McCown, F., & Nelson, M.L. (2009) What Happens When Facebook is Gone? 9th ACM/IEEE-
CS Joint Conference on Digital Libraries (pp. 251-254). Austin, USA: ACM. Retrieved
from http://www.cs.odu.edu/~mln/pubs/jcdl09/archiving-facebook-jcdl2009.pdf
Storrar, T. (2014). Archiving Social Media. The National Archives. Retrieved from
https://blog.nationalarchives.gov.uk/blog/archiving-social-media/
Thomson, S. D., & Kilbride, W. (2015). Preserving Social Media: The Problem of Access. New
Review of Information Networking, 20(1/2), 261–275. Retrieved from https://doi-
org.ezproxy.lib.usf.edu/10.1080/13614576.2015.1114842
Webber, J. (2017). The Challenges of Archiving Social Media. British Library. Retrieved from
http://blogs.bl.uk/webarchive/2017/04/the-challenges-of-web-archiving-social-
media.html
Zannettou, S., Blackburn, J., De Cristofaro, E., Sirivianos, M., & Stringhini, G. (2018).
Understanding Web Archiving Services and Their (Mis)Use on Social Media. Retrieved
from http://ezproxy.lib.usf.edu/login?url=http://search.ebscohost.com/login.aspx?
direct=true&db=edsarx&AN=edsarx.1801.10396&site=eds-live
7 STEPHENSON