Sie sind auf Seite 1von 7

Name: Crystal Stephenson

Assignment: Research Paper


Course: LIS6515 Web Archiving
Due Date: November 25, 2018
“The Importance of Web Archiving in the Age of Facebook”
As social media platforms have come to pervade “everyday life in the developed parts of

the world, enabling communication among users and collecting massive amounts of data for

social media companies” (Lomborg & Bechmann, 2014), web archivists and researchers have

discovered a valuable resource for retrospective analysis across many domains of study. As a

blossoming “object of analysis for audience studies,” web archiving is “a particularly useful

method in studies of users’ communicative practices” (Lomborg, 2012). While the “web

encourages the constant creation and distribution of large amounts of information” (Dougherty &

Meyer, 2014), social media conglomerates, like Facebook and Twitter, have become an

influential means of “understanding human behavior and communication.” In order to “take full

advantage of the web as a research resource that extends beyond the consideration of snapshots

of the present,” social media websites have become a necessity in efforts to “take web archiving

much more seriously as an important element of any research program involving web resources.”

The aim of this paper is to evaluate the importance of web archiving for research purposes with

special deference to the preservation of social media content for retrospective analysis in

behavioral and communicative studies.


Web archiving, according to the International Internet Preservation Consortium, is “the

process of gathering up (harvesting) data that have been published on the World Wide Web,

storing it, ensuring the data are preserved in an archive, and making the collected data available

for future research” (Littman et al., 2016). Web archiving is paramount to scholars of cultural

heritage and recorded human knowledge, which includes “an increasingly comprehensive record

of information production and social interaction over time as more activities become web-

1 STEPHENSON
enabled or web-based” (Dougherty & Meyer, 2014). Just as web preservation has been deemed a

cultural and historical necessity, so, too, has the focus increasingly shifted to social media

analytics across a diverse range of disciplines. Archivists recognize the importance of capturing

online content and communication patterns for the study of politics, debate, linguistics, cultural

diversity, examination of public reaction to world events, and health and social sciences. Social

media has emerged as an area “of interest to researchers in fields from computer science and

medicine to business, economics, and the humanities” (Littman et al., 2016). All of these

essential points of “inquiry have contributed to shaping the descriptive, methodological, and

theoretical bases of scholarship centered on web archiving” (Dougherty & Meyer, 2014). As we

head steadfast towards a future where physical artifacts and the written word are shifting to

digital presence and online publication, it “will be very hard for future scholars even in 5 years,

10 years to understand what kinds of political and social and cultural moments or phenomena

retrospectively without key aspects of the web.” According to Julien Masanès (2010), the web is

a “pervasive and ephemeral media where modern culture in a large sense finds a natural form of

expression.” Nowhere has this been so glaringly evident and prolific than in the wake of social

media and the growing popularity of such platforms in recent years.


As web users increasingly spend “more of their time and creative energies within online

social networking systems” (McCown & Nelson, 2009), web archivists have taken note,

including national libraries and non-profit institutions like the Internet Archive, who “have been

working for years on archiving the Web for posterity.” Others, like the Wayback Machine,

developed in 2001 to proactively archive the web, as well as on-demand services like archive.is,

have forged the way for web archivists to build “time capsules” of web content to be shared and

studied. These services play a vital role in “today’s information ecosystem, by ensuring the

continuing availability of information, or by deliberately caching content that might get deleted

2 STEPHENSON
or removed” (Zannettou et al., 2018). But as the web has “begun transitioning into a Web 2.0

world, archivists” (McCown & Nelson, 2009) have worked to adapt with developing new

techniques in efforts to archive social media content, because a “growing amount of personal

(and what will be historically significant) information is locked behind the walled garden of

Facebook” and Twitter.


Stine Lomborg (2012) contends that web archiving has become a promising method

“enabled by the web itself, and an assessment of its utility in the context of qualitative social

media research, and, more broadly audience research.” The practice of web archiving affords

researchers the opportunity to harvest texts and relevant metadata, such as time stamps, profile

information and tags from the Internet, while in social media research, the method has been

incorporated for quantitative purposes of analyzing patterns and norms across networks of

communication. But web archiving “is also well suited for qualitative studies addressing key

research questions of audience studies, for instance, concerning the organization of

communication and practices of meaning-making of social media use in context, because it

involves the generation of a coherent corpus of communications in a given time frame, and in a

given networked context online” (Lomborg, 2012). Therefore, social media archives are “highly

useful data corpuses for the systematic, fine-grained study of naturally occurring, textualized

interactions on social media.” Social media platforms like Facebook collect massive amounts of

data about their user base and usage patterns, which has proven to be an ethical slippery slope for

the company over concerns about privacy, consent, and potential abuse of information. While

these relevant questions and issues remain up for debate, the rich data and analytics generated by

social media engagement is a goldmine for “data-driven researchers in the social sciences”

(Thomson & Kilbride, 2015), and much “of the personal artifacts stored in Facebook accounts

3 STEPHENSON
will likely prove valuable to users’ surviving family, children and grandchildren, and certainly to

historians and sociologists” (McCown & Nelson, 2009).


Not without its obstacles, web archiving of social media content remains “technically

challenging as these platforms are presented in a different way to ‘traditional’ websites” (Webber,

2017), which usually constitutes the retrieval and storage of web pages via HTTP and

accomplished through the use of web crawlers. Due to technical difficulties and restrictions that

prevent access to web crawlers, social media platforms employ the use of Application

Programming Interfaces (API’s) as a way to enable controlled access to lucrative data. While

Facebook notably presents a number of challenges in acquisition, such as restricted remote

harvesting and obstacles of password protection, the use of efficient tools like APIs “provides

access to nonpublic Internet environments, such as those requiring authentication through login

and password, because the data collection runs directly through the back-end of the social media

service to which the data belong” (Thomson & Kilbride, 2015). Furthermore, compared with

direct transfer and other methods that require formalized collaboration or affiliation, APIs are

publicly available. Albeit limited in access, the large amounts of behavioral data still available

for analysis through the use of APIs has been beneficial to social media research and future

scholars alike. The quality of capture of sites like Facebook can vary, according to the UK Web

Archive, but they contend that, “a lot of valuable information can still be gathered from these

instances” (Webber, 2017). Littman et al. (2016) suggests “that aligning social media collecting

with web archiving practices and tools addresses many of the most pressing needs of current and

future scholars conducting quality social media research.”


While harvesting from an API is the most common method of retrieval, other approaches

to collecting social media data have included the “purchasing from data resellers; using a

commercial, third-party service; using a platform self-archiving service; and harvesting with web

4 STEPHENSON
crawlers” (Littman et al., 2016). APIs, however, are advantageous for purposes of structured

data, usually JSON or XML, “which is essential for research that involves applying

computational techniques.” APIs also remain relatively stable and tend to provide more metadata

than they do via their websites, and that data is generally collected more efficiently. Some

caveats, on the other hand, are that not all social media platforms have complete, public APIs,

nor is the data readily human-viewable. It is also important to note that every API is different,

and “there is no standard accepted format for the archival storage of social media data” (Littman

et al., 2016). Another disadvantage is that some social media platforms place limitations on the

amount of data that can be harvested via their API, making it difficult or nearly impossible to

capture older content. Yet another challenge to be addressed in the coming years involves the

existing field of social media harvesting tools, which has thus far been “largely unaligned with

web archiving technology – in terms of capture, storage, format standards, access and reuse, and

legal restrictions” (Littman et al., 2016). But scholars do acknowledge “great promise in aligning

social media collecting with web archiving” in the near future as an “opportunity to jointly

engage in a conversation with social media researchers and robust archives.” As historians and

other researchers increasingly refer to and use web archives, their experiences will likely shape

and contribute to future requirements and developments in the field.


The ephemerality of the web has driven archivists and researchers to a critical juncture by

which preservation is essential to future scholarship through retrospective analysis. Efforts

continue to evolve in the development of tools, strategies, and techniques to capture dynamic

online content, but “the increasingly validated value of social media as digital heritage”

(Thomson & Kilbride, 2015) necessitates “immediate action to capture and preserve in order to

ensure access in the future.” The use of APIs, while not flawless, has provided web archivists an

opportunity to collect valuable data from social media networking systems by which to glean

5 STEPHENSON
insight useful to researchers of behavioral and communicative study amongst other disciplines.

Much literature has been published highlighting the importance of web archiving in practice, but

only in recent years has social media come to the forefront of discussion and major efforts been

made to preserve its content. “Social media services are primarily designed with immediate use

in mind and, because the content is forever changing and being deleted, it is at a high risk of

being lost forever” (Storrar, 2014). While web archiving tools develop in the shadow of a fast-

paced, ever-changing web, it is with great urgency to researchers that such important information

be preserved for future analysis and study.

6 STEPHENSON
References
Dougherty, M., & Meyer, E. T. (2014). Community, Tools, and Practices in Web Archiving: The
State-of-the-Art in Relation to Social Science and Humanities Research Needs. Journal

of the Association for Information Science & Technology, 65(11), 2195-2209.


Littman, J., Chudnov, D., Kerchner, D., Peterson, C., Tan, Y., Trent, R., … Wrubel, L. (2016).
API-based social media collecting as a form of web archiving. International Journal on

Digital Libraries, 19(1), 21-38.


Lomborg, S. (2012). Researching Communicative Practice: Web Archiving in Qualitative Social
Media Research. Journal of Technology in Human Services, 30(3/4), 219-231.

EBSCOhost, doi:10.1080/15228835.2012.744719.
Lomborg, S., & Bechmann, A. (2014). Using APIs for Data Collection on Social Media.
Information Society, 30(4), 256-265.

Masanès, J. (2010). Web Archiving. Berlin: Springer-Verlag.

McCown, F., & Nelson, M.L. (2009) What Happens When Facebook is Gone? 9th ACM/IEEE-
CS Joint Conference on Digital Libraries (pp. 251-254). Austin, USA: ACM. Retrieved

from http://www.cs.odu.edu/~mln/pubs/jcdl09/archiving-facebook-jcdl2009.pdf
Storrar, T. (2014). Archiving Social Media. The National Archives. Retrieved from
https://blog.nationalarchives.gov.uk/blog/archiving-social-media/
Thomson, S. D., & Kilbride, W. (2015). Preserving Social Media: The Problem of Access. New
Review of Information Networking, 20(1/2), 261–275. Retrieved from https://doi-

org.ezproxy.lib.usf.edu/10.1080/13614576.2015.1114842
Webber, J. (2017). The Challenges of Archiving Social Media. British Library. Retrieved from
http://blogs.bl.uk/webarchive/2017/04/the-challenges-of-web-archiving-social-

media.html
Zannettou, S., Blackburn, J., De Cristofaro, E., Sirivianos, M., & Stringhini, G. (2018).
Understanding Web Archiving Services and Their (Mis)Use on Social Media. Retrieved

from http://ezproxy.lib.usf.edu/login?url=http://search.ebscohost.com/login.aspx?

direct=true&db=edsarx&AN=edsarx.1801.10396&site=eds-live

7 STEPHENSON

Das könnte Ihnen auch gefallen