Sie sind auf Seite 1von 163

PLOS ONE

Detecting Framing Changes in Topical News Publishing


--Manuscript Draft--

Manuscript Number: PONE-D-19-15324R2

Article Type: Research Article

Full Title: Detecting Framing Changes in Topical News Publishing

Short Title: Detecting Framing Changes in Topical News Publishing

Corresponding Author: Karthik Sheshadri


North Carolina State University
Raleigh, UNITED STATES

Keywords: Framing; framing change detection; public reaction; legislation

Abstract: Changes in the framing of topical news have been shown to foreshadow significant

public, legislative, and commercial events. Automated detection of framing changes is

therefore an important problem, which existing research has not considered. Previous

approaches are manual surveys, which rely on human effort and are consequently

limited in scope. We make the following contributions. We systematize discovery of

framing changes through a fully unsupervised computational method that seeks to

isolate framing change trends over several years. We demonstrate our approach by

isolating framing change periods that correlate with previously known framing changes.

We have prepared a new dataset, consisting of over 12,000 articles from seven news

topics or domains in which earlier surveys have found framing changes. Finally, our

work highlights the predictive utility of framing change detection, by identifying two

domains in which framing changes foreshadowed substantial legislative activity, or

preceded judicial interest.

Order of Authors: Karthik Sheshadri

Chaitanya Shivade

Munindar Singh

Opposed Reviewers:

Response to Reviewers: We thank the editors for their valuable feedback. We address the main points below.

Editor Comment:

My main concern with this paper deals with the evaluation of the approach. More
precisely, the experimental section illustrates a series of case studies or scenarios
where the frame change is identified through a sudden polarity drift (Figures 10.16) that
is shown to correlate with some well-known fact or event studied in the literature. The
point is: how much is this evaluation anecdotal, and to what extent can it be
quantitatively measured? In all Figures from 10 through 16, several peaks and sudden
changes can be observed in the polarity distribution (e.g., Figure 12, class 5, years
2000 through 2003, or Figure 13, class 1, year 2006, to mention just a few): do they all
correspond to frame changes? If not, how can they be detected/studied? The paper
states that the dataset was annotated by experts: how? Can such annotation be used
for quantitative evaluation of the approach?

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Response:

This comment concerns two primary aspects of the paper: (i) defining a framing
change, and in particular, isolating framing changes, and filtering out polarity drifts that
do not correspond to framing changes (ii) quantitative evaluation of the approach. We
address each aspect below.

(i) Defining framing changes and filtering out isolated drifts

We have added a section entitled Defining Framing Changes. We summarize the main
points here.

Since language and human behavior are not strictly deterministic, the measurement of
any temporally disparate pair of news corpora using adjective polarity (or any other
numerical metric) would result in different representative values of the two corpora.
Therefore, in this sense, any pair of news corpora can be said to have undergone a
framing change.

Further, individual metrics are susceptible to noisy readings due to imprecise data and
measurement. In particular, such an effect may cause sudden isolated spikes between
successive measurements.

This motivates the question of how a framing change is defined, in the context of our
computational measurements. The usual social science definition is that a framing
change is a shift in the way that a specific topic is presented to an audience. To isolate
such changes computationally, we use the following key observations from ground
truth framing changes: (i) framing changes take place as trends that are consistent
over at least $k$ years (ii) framing changes must be consistent across multiple
measurements.

Our aim in this paper is to begin from a set of time series such as the ones in figures
10 to 16, and isolate such trends. The requirement motivated by our first condition,
namely, that framing changes must last at least k years, is easy to satisfy by imposing
such a numerical threshold.

To satisfy the requirement motivated by our second observation, we rely on


correlations between different measurements, as described in the Detecting Framing
Changes using Periods of Maximum Correlation section.

Our approach thus identifies polarity drifts that are both correlated (quantitatively
measured by correlations between different measures of polarity) and sustained (by
the imposition of a threshold of duration). We point out that our approach filters out
isolated drifts in individual polarity measures, since such drifts are uncorrelated across
multiple measures. Further, we note that the magnitude of individual drifts matters only
indirectly to our approach, to the extent that a larger drift, if consistent across multiple
polarity measures, may have higher correlation than a smaller drift that is also
correlated.

(ii) Quantitative Evaluation

We provide a partial quantitative evaluation of our approach using a Precision-Recall


analysis. We label a ground truth for each domain, marking years corresponding to
framing changes as positives, and other years as negatives. We primarily obtain our
positives using the findings of large-scale surveys from earlier research.

In order to do so, we study the literature pertaining to framing changes in the domains
we examine. We identify large-scale studies conducted by reputed organizations such
as the National Cancer Institute, the Columbia Journalism Review, Pew Research, and
so on. These studies examine news and media publishing in a particular domain over a
period of time, as we do, and manually identify changes in the framing of domain news
during these periods.

The studies we rely on for ground truth sometimes provide quantitative justification for

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
their findings. These studies therefore provide an expert annotation of framing changes
in our domains, for the periods we examine..

By demonstrating substantial agreement between the results of our approach and


those of earlier ground truth surveys, we establish our claim that our approach may be
used to automatically identify framing changes in domain news publishing.

Given that the data sources and coverage between our analysis and that of prior
surveys are usually quite different, the correlations we obtain appear quite substantial.
However, quantitative evaluation remains challenging for the reasons we point out.

This paper follows the spirit of recent work in seeking to develop the study of framing
into a computational science. We acknowledge that our methods may undergo
refinement to tackle broader ground truth data, of a wider temporal and geographical
scope. Nonetheless, we posit that our methods and results have scientific value, and
hope that future work will provide greater coverage of ground truth.

Please note that the underlying data preparation requires social science expertise and
cannot be effectively crowdsourced via a platform such as Mechanical Turk. We
therefore hope that our approach piques the interest of social scientists and leads them
to pursue more comprehensive studies of framing in news media that would enable
improvements in computational methods.

Editor Comment:

* What do the annotators tag in the dataset? The paper just state that two raters code a
random sample of articles from each domain, reporting Cohen's kappa. But what do
they code? Frames, or frame changes? If so, how is a frame change defined?

Response:

To ensure that the articles returned by our term search procedure are indeed relevant
to each domain, a random sample of articles from each domain dataset was coded for
relevance by two raters.

Specifically, an article belongs to a domain if at least a component of the article


discusses a topic that is directly relevant to the domain. We term articles that are
relevant to a domain, domain positives, and irrelevant articles, domain negatives. As
an example, consider the following article from the domain smoking: ``The dirty gray
snow was ankle deep on West 18th Street the other day, and on the block between
Ninth and Tenth Avenues, a cold wind blew in off the Hudson River. On the south side
of the street, a mechanic stood in front of his garage smoking a ...'' We consider this
article a domain negative since whereas it contains the keyword 'smoking', it does not
discuss any aspect pertaining to the prevalence or control of Smoking. In contrast, the
article ``An ordinance in Bangor, Maine, allows the police to stop cars if an adult is
smoking while a child under 18 is a passenger'' is directly relevant to the domain, and
is therefore considered a domain positive. We define dataset accuracy as the fraction
of articles in a dataset that are domain positive.

Please refer to the section on Defining Framing Changes for a discussion on how we
treat the problem of identifying changes in framing.

Editor Comment:

* With regards to Figures 1 and 2, the authors state that the peak in the LGBT domain
immediately precedes a frame change. But, does this hold also for other peaks of other
domains? Such as, drones in 2004, obesity in 2005, or smoking in 2005?

Response:

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Whereas we do not claim that this correlation is true for all domains, we posit that it
motivates the utility of adjective polarity in the study of framing changes.

Editor Comment:

* What are the differences of the proposed approach with respect to an approach that
detects just frames (instead of frame changes), but then look at changes in the
detected frames...? See, e.g., the following references:
** Alashri et al., "Climate Change" Frames Detection and Categorization Based on
Generalized Concepts", International Journal of Semantic Computing, 2016
** Tsur et al., "A Frame of Mind: Using Statistical Models for Detection of Framing and
Agenda Setting Campaigns", ACL 2015

Response:

We have added paragraphs to the Related Work section, detailing the novel
contributions of our work and drawing distinctions between this paper and earlier
approaches. We summarize this discussion here.

We note that our approach is similar in spirit to Tsur et al’s [9] work, in that both that
work and this paper apply a topic modeling strategy to analyze framing as a time
series. However, we highlight the following key differences and contributions of our
work. Firstly, as both Sheshadri and Singh [8] and Tsur et al point out, framing is a
subjective aspect of communication. Therefore, a computational analysis of framing
should ideally differentiate subjective aspects from fact-based and objective
components of communication. Since adjectives in and of themselves are incapable of
communicating factual information, we take them to be artifacts of how an event or
topic is framed. In contrast, generic n-grams (as used by Tsur et al) do not provide this
distinction.

Further, Tsur et al rely upon estimating ``changes in framing'' using changes in the
relative frequencies of n-grams associated with various topics or frames. Whereas
such an approach is useful in evaluating which of a set of frames may be dominant at
any given time, it does not measure ``framing changes'' in the sense originally
described in [5]. In contrast, our work estimates changes in framing using consistent
polarity drifts of adjectives associated with individual frames. Our approach may also
be applied to each of a number of frames independently of the others, as opposed to
Tsur et al.

Editor Comment:

I have consulted with another academic editor, Dr. Marco Lippi, and we agree that a
desk rejection was premature. Nevertheless, experience tells us that all reviewers
provide a perspective similar to at least some other readers. For this reason I am
requesting that you revise the manuscript to address issues raised in the desk
rejection. Perhaps you made some revisions in your appeal. However, these were not
apparent to me with track changes. Please revise the manuscript itself, including a
rebuttal that identifies the specific location of the modifications you made in response
to the original decision.

Response:

We have submitted a revised manuscript in which we highlight our responses to each


point made in the original decision. We summarize our responses to each point in the
original decision below.

Editor Comment:

1. The related work should be discussed in detail highlighting the


advantages/limitations of existing approaches.

Response:

Our updated related work section discusses alternative approaches in detail, and

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
describes our novel contributions over these approaches.

Editor Comment:

2. The dataset and codes are not available online.

Response:

The dataset and code are available online at the following link:
https://drive.google.com/open?id=1zAH__Y1lcdriuwUcjZsKmvaqYtzAjyZ9

All our results are reproducible from the data and code in the above mentioned
repository. We will provide a guide to run our code.

Editor Comment:

4. An overview diagram for the proposed approach would help the reader understand
the flow of the proposed approach.

Response:

We have added an overview diagram illustrating our approach in Fig. 3.

Editor Comment:

5. The results are presented but not discussed. The section should be renamed to
"Results and Discussion" and appropriate discussion should be added with each pair of
graphs.

Response:

We have expanded our analysis of the results for each domain, including adding a
quantitative precision-recall analysis based on ground truth data.

Response to Original Decision:

In addition to these responses, we have appended a copy of our original response to


reviews below.

Editor Comments:

This research is focused on detecting framing changes in topical news. The authors
argue that the public opinion varies with the way the news is framed. The research
lacks motivation as it is not clear what benefits can be achieved if frame changes are
detected. Moreover, the problem is already discussed and presented in articles [4,5].
This paper seems to provide more empirical evidence in support to the existing
research [4,5]. Hence, the research contribution is unclear.
Furthermore, following points are worth considering:-
1. The related work should be discussed in detail highlighting the
advantages/limitations of existing approaches.
2. The dataset and codes are not available online.
3. The comparison of research with state of the art approaches and manual techniques
has not been conducted.
4. An overview diagram for the proposed approach would help the reader understand
the flow of the proposed approach.
5. The results are presented but not discussed. The section should be renamed to
"Results and Discussion" and appropriate discussion should be added with each pair of
graphs.

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Responses:

We provide a point by point response below.

Comment on Motivation: “The research lacks motivation as it is not clear what benefits
can be achieved if frame changes are detected.”

Framing changes have been shown to have commercial and legislative consequences,
and have also been shown to foreshadow public attention changes. We cite five
example articles here [1-5] and can readily provide more as necessary. A large body of
literature in the fields of Political Science and Communication addresses the manual
identification of framing changes in specific domains. Whereas we cite two examples
here [6-7], additional examples are available – please let us know. However, existing
work does not attempt to address the problem of computationally detecting framing
changes. Our work is the first attempt at this problem, which has significant
commercial, public, and legislative import. Our results substantially agree with the
results of earlier human surveys, and further have shown predictive utility for legislative
and public response. Our work therefore has significant scientific and potential
commercial value.

Comment on Contribution: “Moreover, the problem is already discussed and presented


in articles [4,5]. This paper seems to provide more empirical evidence in support to the
existing research [4,5]. Hence, the research contribution is unclear.”

References 4 and 5 are articles discussing framing theory in the Communications


literature. Neither article makes any attempt to computationally (or even manually)
measure framing in any real-world domain, let alone attempt to detect changes in
framing. Therefore, the problem addressed by our work is fundamentally novel and
different from the work presented in [4] and [5].

We present a fully unsupervised approach that is the first method to computationally


detect framing changes. Further, we contribute a dataset of over 12,000 news articles
from seven domains. Our work will provide a strong baseline to foster new research in
this influential area of Political Science and Communications research.

Revised sections:

“The related work should be discussed in detail highlighting the advantages/limitations


of existing approaches.”

We emphasize that our work is the first attempt at computationally modeling changes
in framing. The closest previous efforts in this area are those of [10] and [11]. We
describe our novel contributions over these efforts in detail in the Related Work
section. We are unaware of any other relevant related work and would be happy to
learn of any such work from the Editor.

“The dataset and codes are not available online.”

This statement is incorrect. We have clearly stated in our submission that all data and
code will be made available, and are available online at the following link:
https://drive.google.com/open?id=1zAH__Y1lcdriuwUcjZsKmvaqYtzAjyZ9

All our results are reproducible from the data and code in the above mentioned
repository. We will provide a guide to run our code.

“The comparison of research with state of the art approaches and manual techniques
has not been conducted.”

Please refer to our responses above to the comment on motivation and comment #1.

4) “An overview diagram for the proposed approach would help the reader
understand the flow of the proposed approach.”

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
We are grateful for this suggestion and will incorporate an overview diagram illustrating
our approach. However, this is a simple suggestion for presentation that may easily be
addressed in a revision.

5) “The results are presented but not discussed. The section should be renamed to
"Results and Discussion" and appropriate discussion should be added with each pair of
graphs.”

The Results section discusses our results for each domain, using both a qualitative
comparison with manual surveys (by other authors) and by highlighting the predictive
utility of the returned result. We show that our results both agree with previous manual
surveys, and are also able to predict significant public and legislative response in each
domain. We will rename this section to “Results and Discussion”.

References:
1. A. C. Gunther, The persuasive press inference: Effects of mass media on perceived
public opinion. Commun. Res. 25, 486–504 (1998).

2. D. C. Mutz, J. Soss, Reading public opinion: The influence of news coverage on


perceptions of public sentiment. Public Opin. Q. 61, 431–451 (1997).

3.G. King, B. Schneer, A. White, How the news media activate public expression and
influence national agendas. Science 358, 776–780 (2017).

4. F. R. Baumgartner, B. D. Jones, P. B. Mortensen, Punctuated equilibrium theory:


Explaining stability and change in public policymaking, in Theories of the Policy
Process,
P. A. Sabatier, C. M. Weible, Eds. (Westview Press, 2014), pp. 59–103.

5. R. M. Entman, Framing: Toward clarification of a fractured paradigm. J. Commun.


43,
51–58 (1993).

6. K. M. Cummings, R. N. Proctor, The changing public image of smoking in the United


States: 1964–2014. Cancer Epidemiol. Biomarkers Prev. 23, 32–36 (2014).

7 S. M. Engel, Frame spillover: Media framing and public opinion of a multifaceted


LGBT
rights agenda. Law Soc. Inq. 38, 403–441 (2013).

8 K. Sheshadri and M. P. Singh, The Public and Legislative Impact of


Hyperconcentrated Topic News, Science Advances, 2019, 5 (8).

9 O. Tsur, D. Calacci, and D. Lazer, A Frame of Mind: Using Statistical Models for
Detection of Framing and Agenda Setting Campaigns. Proceedings of the 53rd Annual
Meeting of the Association for Computational Linguistics and the 7th International Joint
Conference on Natural Language Processing (Volume 1: Long Papers).

Additional Information:

Question Response

Financial Disclosure CS has a commercial affiliation to Amazon. The funder provided support in the form
of salaries for this author, but did not have any additional role in the study design, data
Enter a financial disclosure statement that collection and analysis, decision to publish, or preparation of the manuscript. The
describes the sources of funding for the specific roles of these authors are articulated in the `author contributions' section.
work included in this submission. Review
the submission guidelines for detailed
requirements. View published research
articles from PLOS ONE for specific
examples.

This statement is required for submission

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
and will appear in the published article if
the submission is accepted. Please make
sure it is accurate.

Unfunded studies
Enter: The author(s) received no specific
funding for this work.

Funded studies
Enter a statement with the following details:
• Initials of the authors who received each
award
• Grant numbers awarded to each author
• The full name of each funder
• URL of each funder website
• Did the sponsors or funders play any role in
the study design, data collection and
analysis, decision to publish, or preparation
of the manuscript?
• NO - Include this sentence at the end of
your statement: The funders had no role in
study design, data collection and analysis,
decision to publish, or preparation of the
manuscript.
• YES - Specify the role(s) played.

* typeset

Competing Interests CS has a commercial affiliation to Amazon.

Use the instructions below to enter a The above commercial affiliation does not alter our adherence to PLOS ONE policies
competing interest statement for this on sharing
submission. On behalf of all authors, data and materials.
disclose any competing interests that
could be perceived to bias this
work—acknowledging all financial support
and any other relevant financial or non-
financial competing interests.

This statement will appear in the


published article if the submission is
accepted. Please make sure it is
accurate. View published research articles
from PLOS ONE for specific examples.

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
NO authors have competing interests

Enter: The authors have declared that no


competing interests exist.

Authors with competing interests

Enter competing interest details beginning


with this statement:

I have read the journal's policy and the


authors of this manuscript have the following
competing interests: [insert competing
interests here]

* typeset

Ethics Statement Our study involved no human or animal subjects.

Enter an ethics statement for this


submission. This statement is required if
the study involved:

• Human participants
• Human specimens or tissue
• Vertebrate animals or cephalopods
• Vertebrate embryos or tissues
• Field research

Write "N/A" if the submission does not


require an ethics statement.

General guidance is provided below.


Consult the submission guidelines for
detailed instructions. Make sure that all
information entered here is included in the
Methods section of the manuscript.

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Format for specific study types

Human Subject Research (involving human


participants and/or tissue)
• Give the name of the institutional review
board or ethics committee that approved the
study
• Include the approval number and/or a
statement indicating approval of this
research
• Indicate the form of consent obtained
(written/oral) or the reason that consent was
not obtained (e.g. the data were analyzed
anonymously)

Animal Research (involving vertebrate

animals, embryos or tissues)


• Provide the name of the Institutional Animal
Care and Use Committee (IACUC) or other
relevant ethics board that reviewed the
study protocol, and indicate whether they
approved this research or granted a formal
waiver of ethical approval
• Include an approval number if one was
obtained
• If the study involved non-human primates,
add additional details about animal welfare
and steps taken to ameliorate suffering
• If anesthesia, euthanasia, or any kind of
animal sacrifice is part of the study, include
briefly which substances and/or methods
were applied

Field Research

Include the following details if this study

involves the collection of plant, animal, or

other materials from a natural setting:


• Field permit number
• Name of the institution or relevant body that
granted permission

Data Availability Yes - all data are fully available without restriction

Authors are required to make all data


underlying the findings described fully
available, without restriction, and from the
time of publication. PLOS allows rare
exceptions to address legal and ethical
concerns. See the PLOS Data Policy and
FAQ for detailed information.

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
A Data Availability Statement describing
where the data can be found is required at
submission. Your answers to this question
constitute the Data Availability Statement
and will be published in the article, if
accepted.

Important: Stating ‘data available on request


from the author’ is not sufficient. If your data
are only available upon request, select ‘No’ for
the first question and explain your exceptional
situation in the text box.

Do the authors confirm that all data


underlying the findings described in their
manuscript are fully available without
restriction?

Describe where the data may be found in All data and most or all code will be made available in a Github/Google Drive
full sentences. If you are copying our repository.
sample text, replace any instances of XXX
with the appropriate details.

• If the data are held or will be held in a


public repository, include URLs,
accession numbers or DOIs. If this
information will only be available after
acceptance, indicate this by ticking the
box below. For example: All XXX files
are available from the XXX database
(accession number(s) XXX, XXX.).
• If the data are all contained within the
manuscript and/or Supporting
Information files, enter the following:
All relevant data are within the
manuscript and its Supporting
Information files.
• If neither of these applies but you are
able to provide details of access
elsewhere, with or without limitations,
please do so. For example:

Data cannot be shared publicly because


of [XXX]. Data are available from the
XXX Institutional Data Access / Ethics
Committee (contact via XXX) for
researchers who meet the criteria for
access to confidential data.

The data underlying the results


presented in the study are available
from (include the name of the third party

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
and contact information or URL).
• This text is appropriate if the data are
owned by a third party and authors do
not have permission to share the data.

* typeset

Additional data availability information: Tick here if the URLs/accession numbers/DOIs will be available only after acceptance
of the manuscript for publication so that we can ensure their inclusion before
publication.

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Cover letter

NORTH CAROLINA
STATE UNIVERSITY Karthik Sheshadri
Karthik Sheshadri
Department of Computer Science
Raleigh, NC 27695-8206, USA
North Carolina State University
May 28, 2019 Phone: (919) 798-0203
E-mail: ksheshah@ncsu.edu

Dear PLOS One editor:


On behalf of Dr. Chaitanya Shivade, Dr. Munindar P. Singh, and myself, I would like to submit our
manuscript “Detecting Framing Changes in Topical News Publishing” as a research article to PLOS One.
Changes in the framing of topical news have been shown to have public, legislative, and commercial
consequences. Automated detection of framing changes is therefore an important problem, which existing
research has not considered. Previous approaches are manual surveys, which rely on human effort and
are consequently limited in scope.
This paper makes the following major contributions in the study of framing change detection.
First, we systematize discovery of framing changes through a fully unsupervised computational
method that isolates framing change trends over several years. We demonstrate our approach by identifying
previously known framing changes.
Second, we have prepared a new dataset, consisting of over 12,000 articles from six news topics or
domains in which earlier surveys have found framing changes. We release the dataset with our paper.
Finally, our work highlights the predictive utility of framing change detection, by identifying two
domains in which framing changes foreshadowed substantial legislative activity.
Taken together, these contributions establish the predictive utility of automated news monitoring, as
a means to foreshadow events of commercial and legislative import.
Our contribution will be of interest to scholars across a broad range of disciplines, including political
science, communications, and computer science, as well as to private organizations, policymakers, and
governments locally and abroad.
All authors have read the manuscript, agree with the data deposition requirements and to our
submission in PLOS One, and have no conflicts of interest.
Please feel free to contact us for any questions.
Sincerely,
Karthik Sheshadri
Cover Letter

NORTH CAROLINA
STATE UNIVERSITY Karthik Sheshadri
Karthik Sheshadri
Department of Computer Science
Raleigh, NC 27695-8206, USA
North Carolina State University
March 3, 2020 Phone: (919) 798-0203
E-mail: ksheshah@ncsu.edu

Dear PLOS One editor:


On behalf of Dr. Chaitanya Shivade, Dr. Munindar P. Singh, and myself, I would like to submit a
revised version of our manuscript “Detecting Framing Changes in Topical News Publishing” as a research
article to PLOS One.
This revision makes the following major improvements:
First, we have added a quantitative evaluation of our approach using a precision-recall analysis,
which demonstrates that we successfully identify 90% of true positives, whereas only 10% of the positives
we identify are false positives.
Second, we clarify our conception of framing changes, and show that our approach successfully
filters out large drifts in individual polarity measures, while isolating sustained and correlated drifts.
Third, we have expanded our Related Work, Dataset Collection, and Approach sections to clarify
the novel contribution of our work over previous approaches.
We describe these and other improvements in our Response to Reviews document.
We have added an amended Funding Statement and Competing Interests Statement to the
manuscript. CS has a commercial affiliation to Amazon. The funder provided support in the form
of salaries for this author, but did not have any additional role in the study design, data collection and
analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are
articulated in the ‘author contributions’ section.
The above commercial affiliation does not alter our adherence to PLOS ONE policies on sharing
data and materials.
Please feel free to contact us for any questions.
Sincerely,
Karthik Sheshadri
Manuscript Click here to
access/download;Manuscript;FramingChanges____PLOS_One_

Detecting Framing Changes in Topical News Publishing


Karthik Sheshadri1* , Chaitanya Shivade2 , Munindar P. Singh1 ,

1 Department of Computer Science, North Carolina State University


2 Amazon

* kshesha@ncsu.edu

Abstract
Changes in the framing of topical news have been shown to foreshadow significant
public, legislative, and commercial events. Automated detection of framing changes is
therefore an important problem, which existing research has not considered. Previous
approaches are manual surveys, which rely on human effort and are consequently
limited in scope. We make the following contributions. We systematize discovery of
framing changes through a fully unsupervised computational method that seeks to
isolate framing change trends over several years. We demonstrate our approach by
isolating framing change periods that correlate with previously known framing changes.
We have prepared a new dataset, consisting of over 12,000 articles from seven news
topics or domains in which earlier surveys have found framing changes. Finally, our
work highlights the predictive utility of framing change detection, by identifying two
domains in which framing changes foreshadowed substantial legislative activity, or
preceded judicial interest.

“For nearly four decades, health and


fitness experts have prodded and
cajoled and used other powers of
persuasion in a futile attempt to
whip America’s youngsters into
shape.”

The New York Times, 1995

“The New York City Health


Department has embarked on a new
campaign to persuade processed
food brands to decrease sugar
content in a bid to curb obesity.”

The New York Times, 2015

Introduction and Contributions


To motivate the problem and approach of this paper, let us investigate the primary
causes of obesity in America. Public opinion and behavior on the subject have changed
measurably since the late 1990s. As an example, Gunnars [1] compiled a list in 2015 of
ten leading causes, six of which suggest that the processed food industry may be
primarily responsible. By contrast, in the 1990s and early 2000s, popular opinion
appeared to hold [2, 3] that obesity was primarily caused by individual behavior and
lifestyle choices. What led to this change in public opinion?

March 3, 2020 1/34


We posit that news publishing on the subject of obesity contributed to the change in
the public’s opinion. The above quotes from the New York Times (NYT) are
representative snippets from news articles on obesity published in 1995 and 2015,
respectively. Whereas both address the same topic, the 1995 snippet implies
responsibility on part of individuals, and the 2015 snippet implies responsibility on part
of the processed food industry. These subjective biases in news are collectively referred
to as framing.
Framing theory [4, 5] suggests that how a topic is presented to the audience (called
“the frame”) influences the choices people make about how to process that information.
The central premise of the theory is that since an issue can be viewed from a variety of
perspectives and be construed as having varying implications, the manner in which it is
presented influences public reaction.
In general, understanding news framing may be a crucial component of
decision-support in a corporate and regulatory setting. To illustrate this fact, we
present a real-life example of the influence of framing on public perception and
legislation. According to [6, 7], in late 2010, security vulnerabilities associated with
Facebook and Zynga allowed personal data from millions of users to be compromised.
The framing of news on this topic appeared to change from a neutral narrative to one
focusing on personal privacy.
Facebook and Zynga were sued over privacy breaches [8]. Further, in 2013, the
Personal Data Protection and Breach Accountability Act was promulgated in
Congress [9]. These examples motivate the problem of framing change detection, which
involves identifying when the dominant frame (or frames) [10] of a topic undergoes a
change.

Related Work
The Media Frames Corpus, compiled by Card et al. [11], studies three topics
(Immigration, Smoking, and same-sex marriages), and identifies fifteen framing
dimensions in each. We identify two major limitations of their work. Firstly, Card et al.
study framing as a static detection problem, identifying which dimensions appear in a
given news article. However, research in sociology [10] shows that most news topics
feature a dominant frame (or dominant dimension in the terminology of [11]). Further,
for a generic news topic, the dominant frame is not necessarily one of fifteen previously
chosen dimensions, but can instead be an unknown arbitrary frame specific to the topic
under consideration. For example, in the example given in the Introduction and
Contributions section, the dominant frame related to the privacy of individuals, which is
not one of the fifteen dimensions described in Card et al. [11].
Secondly, Sheshadri and Singh [12] showed that public and legislative reaction tend
to occur only after changes in the dominant frame. That finding motivates an approach
to framing that focuses on identifying and detecting changes in the dominant frame of a
news domain.
Sheshadri and Singh further propose two simple metrics that they motivate as
measures of domain framing: framing polarity and density. They define framing polarity
as the average frequency of occurrence in a domain corpus of terms from a benchmark
sentiment lexicon. Framing density is measured using an entropic approach that counts
the number of terms per article required to distinguish a current corpus from an earlier
one.
We identify the following limitations of the aforementioned measures (introduced
in [12]). Firstly, both measures make no effort to associate a given news article with a
particular frame. Prior work does not support the inherent assumption that all articles
in a given domain belong to a particular frame [10, 11]. We enhance understanding by
analyzing each domain using several distinct frames.

March 3, 2020 2/34


Secondly, framing density does not distinguish between a subjective choice made by
a news outlet to frame a domain differently, and events that necessitate media coverage.
Our work provides this distinction by analyzing framing using patterns of change in the
adjectives that describe co-occurring nouns. Since adjectives are artifacts of subjective
framing, they are not affected by events, as framing density is.
It is worthwhile to note that our approach is similar in spirit to Tsur et al.’s
work [13], in that both that work and this paper apply a topic modeling strategy to
analyze framing as a time series. However, we highlight the following key differences
and contributions of our work. Firstly, as both Sheshadri and Singh [12] and Tsur et
al. [13] point out, framing is a subjective aspect of communication. Therefore, a
computational analysis of framing should ideally differentiate subjective aspects from
fact-based and objective components of communication. Since adjectives in and of
themselves are incapable of communicating factual information, we take them to be
artifacts of how an event or topic is framed. In contrast, generic n-grams (as used by
Tsur et al. [13]) do not provide this distinction.
Further, Tsur et al. rely upon estimating “changes in framing” using changes in the
relative frequencies of n-grams associated with various topics or frames. Whereas such
an approach is useful in evaluating which of a set of frames may be dominant at any
given time, it does not measure “framing changes” in the sense originally described
in [14]. In contrast, our work estimates changes in framing using consistent polarity
drifts of adjectives associated with individual frames. Our approach may also be applied
to each of a number of frames independently of the others, as opposed to Tsur et al. [13].
We also distinguish our work from Alashri et al. [15], which uses standard machine
learning tools to classify sentences represented by linguistic features into one of four
frames. Such an approach is limited by the need to predefine a frame set, as Card etal’s
approach [11] is. Further, the paper does not discuss the problem of how to examine
changes within specific frames. To the best of our knowledge, our work is the first to
address the problem of detecting meaningful patterns of change within individual
frames.

Contributions
This paper contributes a fully unsupervised and data-driven natural language based
approach to detecting framing change trends over several years in domain news
publishing. To the best of our knowledge, this paper is the first to address framing
change detection, a problem of significant public and legislative import. Our approach
agrees with and extends the results of earlier manual surveys, which required human
data collection and were consequently limited in scope. Our approach removes this
restriction by being fully automated. Our method can thus be run simultaneously over
all news domains, limited only by the availability of real-time news data. Further, we
show that our approach yields results that foreshadow periods of legislative activity.
This motivates the predictive utility of our method for legislative activity, a problem of
significant import.
Further, we contribute a Framing Changes Dataset, which is a collection of over
12,000 news articles from seven news topics or domains. In four of these domains,
surveys carried out in earlier research have shown framing to change. In two domains,
periods with significant legislative activity are considered. Our individual domain
datasets within the framing changes dataset cover the years in which earlier research
found framing changes, as well as periods ranging up to ten years before and after the
change. Our dataset is the first to enable computational modeling of framing change
trends. We plan to release the dataset with our paper. We note that a fraction of the
articles in this dataset were used earlier for the analysis in [12].

March 3, 2020 3/34


Materials and Methods
This section describes our datasets, data sources, and inter-annotator agreement. All
data were collected in an anonymous and aggregated manner. All APIs and data used
are publicly available, and our data collection complies with the terms and conditions of
each API.

Data Sources
We use two Application Programming Interfaces (APIs) to create our datasets.

The New York Times API:


The New York Times (NYT) Developer’s API [16] provides access to news data from
the NYT, Reuters, and Associated Press (AP) newspapers—both print and online
versions—beginning in 1985. The NYT has the second largest circulation of any
newspaper in the United States [17].
The data object returned by the API includes fields such as the article type (news,
reviews, summaries, and so on), the news source (NYT, Reuters, or AP), the article’s
word count, the date of its publication, article text (in the form of the abstract, the lead
(first) paragraph, and a summary).

The Guardian API:


The Guardian Xplore API [18] provides access to news data from The Guardian, a
prominent UK newspaper that reaches 23 million UK adults per month [19].
The Guardian API returns full-length articles along with such metadata as the
article type (similar to the NYT API) and a general section name (such as sports,
politics, and so on). Although these section names are manually annotated by humans,
we do not use them in our analysis, but rely instead on a simple term search procedure
(see the Domain Dataset Generation section) to annotate our datasets.

Domain Dataset Generation


As in earlier work [12, 20, 21], we use a standard term search procedure to create our
datasets. Specifically, an article belongs to a domain if at least a component of the
article discusses a topic that is directly relevant to the domain [12]. We term articles
that are relevant to a domain domain positives, and irrelevant articles domain negatives.
As an example, consider the following article from the domain smoking: “The dirty gray
snow was ankle deep on West 18th Street the other day, and on the block between Ninth
and Tenth Avenues, a cold wind blew in off the Hudson River. On the south side of the
street, a mechanic stood in front of his garage smoking a . . . .” We consider this article
a domain negative since whereas it contains the keyword ’smoking’, it does not discuss
any aspect pertaining to the prevalence or control of Smoking. In contrast, the article
“An ordinance in Bangor, Maine, allows the police to stop cars if an adult is smoking
while a child under 18 is a passenger” is directly relevant to the domain, and is therefore
considered a domain positive. We define dataset accuracy as the fraction of articles in a
dataset that are domain positive. For each domain, our APIs were used to extract news
data during the time period b (denoting the beginning) to e (denoting the end), of the
period of interest.

March 3, 2020 4/34


Inter-Annotator Agreement
To ensure that the articles returned by our term search procedure are indeed relevant to
each domain, a random sample of articles from each domain dataset was coded by two
raters. We supply the per-domain accuracy and inter-annotator agreement as Cohen’s
Kappa for sample domains in Table 1.

Table 1. Inter-Annotator agreement as Cohen’s Kappa.


Dataset Accuracy
Domain Coder 1 Coder 2 Kappa
Surveillance 0.80 0.75 0.79
Smoking 0.84 0.82 0.93
Obesity 0.78 0.74 0.67
LGBT Rights 0.83 0.74 0.64
Abortion 0.80 0.80 0.50

Probability Distribution over Adjectives


Our approach relies on the key intuition that during a framing change, the valence of
the adjectives describing co-occurring nouns changes significantly.
To measure this change, we create a reference probability distribution of adjectives
based on the frequency of their occurrence in benchmark sentiment datasets.

Benchmark Datasets
We identified three open source benchmark review datasets from which to create our
adjective probability distribution. Together, these datasets provide about 150 million
reviews of various restaurants, services and products, with each review rated from one
to five. Given the large volume of reviews from different sources made available by these
datasets, we assume that they provide a sufficiently realistic representation of all
adjectives in the English language.
We rely primarily on the Trip Advisor dataset to create our adjective probability
distribution. We identified two other benchmark datasets, namely, the Yelp Challenge
dataset and the Amazon review dataset. Due to the fact that these datasets together
comprise about 150 million reviews, it is computationally infeasible for us to include
them in our learning procedure. Instead, we learned distributions from these datasets
for sample adjectives, to serve as a comparison with and as verification of our overall
learned distribution. The resulting distributions for these adjectives appeared
substantially similar to those of the corresponding adjectives in our learned distribution.
We therefore conclude that our learned distribution provides a valid representation of all
adjectives in the English language. We describe each dataset below.

Trip Advisor
The Trip Advisor dataset consists of 236,000 hotel reviews. Each review provides text,
an overall rating, and aspect specific ratings for the following seven aspects: Rooms,
Cleanliness, Value, Service, Location, Checkin, and Business. We limit ourselves to
using the overall rating of each review.

March 3, 2020 5/34


Yelp
The Yelp challenge dataset consists of approximately six million restaurant reviews.
Each entry is stored as a JSON string with a unique user ID, check-in data, review text,
and rating.

Amazon
The Amazon dataset provides approximately 143 million reviews from 24 product
categories such as Books, Electronics, Movies, and so on. The dataset uses the JSON
format and includes reviews comprising a rating, review text, and helpfulness votes.
Additionally, the JSON string encodes product metadata such as a product description,
category information, price, brand, and image features.

Polarity of Adjectives
For each adjective in the English language, we are interested in producing a probability
distribution that describes the relative likelihood of the adjective appearing in a review
whose rating is r. For our data, r ranges from one to five.
We began by compiling a set of reviews from the Trip Advisor dataset for each
rating from one to five. We used the Stanford CoreNLP parser [22] to parse each of the
five sets of reviews so obtained. We thus obtained sets of parses corresponding to each
review set. From the set of resultant parses, we extracted all words that were assigned a
part-of-speech of ‘JJ’ (adjective). Our search identified 454,281 unique adjectives.
For each unique adjective a, we counted the number of times it occurred in our set of
parses corresponding to review ratings one to five. We denote this by Ni , with
N1 N2 N5
1 ≤ i ≤ 5. Our probability vector for adjective a is then { Saa , Saa , . . . , Saa } where
Sa = Na1 + Na2 + Na3 + Na4 + Na5 .
Additionally, we recorded the rarity of each adjective as S1a . This estimates a
probability distribution P , with 454,281 rows and six columns.
Table 2 shows example entries from our learned probability distribution. As can be
seen from the table, our learned distribution not only correctly encodes probabilities
(the adjective ‘great’ has nearly 80% of its probability mass in the classes four and five,
whereas the adjective ‘horrible’ has nearly 80% of its mass in classes one and two), but
also implicitly learns an adjective ranking such as the one described in De Melo et
al. [23]. To illustrate this ranking, consider that the adjective ‘excellent’ has 60% of its
probability mass in class five, whereas the corresponding mass for the adjective ‘good’ is
only 38%.
For visual illustration, we depict our learned probability distribution as a heatmap in
Table 3.
Motivated by our learned probability distribution, we posit that classes 1 represents
negativity, class 2 to 4 represent neutrality, and class 5 represents positivity.

Incorporating Adjective Rarity


Our measure of adjective rarity serves as a method by which uncommon adjectives,
which rarely occur in our benchmark dataset, and whose learned probability
distributions may therefore be unreliable, can be excluded.
However, in doing so, we run the risk of excluding relevant adjectives from the
analysis. We manually inspect the set of adjectives that describe the nouns in each
domain to arrive at a domain specific threshold.

March 3, 2020 6/34


Table 2. Sample entries from our learned probability distribution for positive and
negative sentiment adjectives.
Adjective Class 1 Class 2 Class 3 Class 4 Class 5 Rarity (Inverse Scale)
Great 0.039 0.048 0.093 0.274 0.545 4.495e-07
Excellent 0.019 0.028 0.070 0.269 0.612 2.739e-06
Attractive 0.095 0.125 0.192 0.296 0.292 0.0001
Cute 0.039 0.068 0.155 0.330 0.407 1.499e-05
Compassionate 0.068 0.020 0.010 0.038 0.864 0.0004
Good 0.076 0.095 0.185 0.336 0.308 3.459e-07
Horrible 0.682 0.143 0.076 0.042 0.057 7.453e-06
Ridiculous 0.461 0.180 0.125 0.116 0.118 2.033e-05
Angry 0.546 0.138 0.092 0.098 0.126 6.955e-05
Stupid 0.484 0.136 0.099 0.117 0.164 5.364e-05
Beautiful 0.043 0.049 0.085 0.222 0.599 6.233e-06

For a majority of our domains (five out of seven), we use a threshold of q > −∞,
that is, no adjectives are excluded. For the remaining two domains, (drones and LGBT
rights), we employ a threshold of q > 10−4 .
The trends in our results appeared to be fairly consistent across a reasonable range
of threshold values.

Domain Period of Interest


We define a period of interest for each domain. Let tf be a year in which a documented
framing change took place in the domain under consideration. Then, our period of
interest for this domain is b = min(tf − 10, tf − l) to e = max(tf + 10, tf + r), where the
API provides data up to l years before, and r years after tf . All units are in years.

Corpus-Specific Representations
A domain corpus is a set of news articles from a given domain. Let a given domain have
m years in its period of interest with annual domain corpora T1 , T2 , . . . , Tm .

Corpus Clustering
An overall domain corpus is therefore T = T1 ∪ T2 ∪ . . . ∪ Tm .
We assume that a corpus has k unique frames. We adopt a standard topic modeling
approach to estimate frames. We use the benchmark Latent Dirichlet Allocation
(LDA) [24] approach to model k = 5 topics (that is, frames) in each domain corpus. We
extract the top l = 20 terms v from each frame. We also extract the set of all unique
nouns in T . We define a cluster as the set of nouns v ∩ T . We thus generate k clusters,
each representing a unique frame.

Annual Cluster Polarity


For each cluster c, we are interested in arriving at a vector of m annual polarities, i.e.,
for each year i, 1 ≤ i ≤ m in the domain period of interest.
Let xc be the set of all nouns in c. For each noun v ∈ xc , we use the Stanford
dependency parser [22] to identify all adjectives (without removing duplicates) that
describe v in Ti . We extract the polarity vectors for each of these adjectives from P as
the matrix Ai . Ai therefore has n rows, one for each adjective so identified, and five

March 3, 2020 7/34


March 3, 2020
Table 3. The entries of Table 2 depicted as a heatmap for visual illustration.
Great Excellent Attractive Cute Compassionate Good Horrible Ridiculous Angry Stupid Beautiful
3.9 · 10−2 1.9 · 10−2 9.5 · 10−2 3.9 · 10−2 6.8 · 10−2 7.6 · 10−2 6.82 · 10−1 4.61 · 10−1 5.46 · 10−1 4.84 · 10−1 4.3 · 10−2
4.8 · 10−2 2.8 · 10−2 1.25 · 10−1 6.8 · 10−2 2 · 10−2 9.5 · 10−2 1.43 · 10−1 1.8 · 10−1 1.38 · 10−1 1.36 · 10−1 4.9 · 10−2
−2 −2 −1 −1 −2
9.3 · 10 7 · 10 1.92 · 10 1.55 · 10 1 · 10 1.85 · 10−1 7.6 · 10−2 1.25 · 10−1 9.2 · 10−2 9.9 · 10−2 8.5 · 10−2
−1 −1 −1 −1 −2
2.74 · 10 2.69 · 10 2.96 · 10 3.3 · 10 3.8 · 10 3.36 · 10−1 4.2 · 10−2 1.16 · 10−1 9.8 · 10−2 1.17 · 10−1 2.22 · 10−1
−1 −1 −1 −1 −1
5.45 · 10 6.12 · 10 2.92 · 10 4.07 · 10 8.64 · 10 3.08 · 10−1 5.7 · 10−2 1.18 · 10−1 1.26 · 10−1 1.64 · 10−1 5.99 · 10−1
−7 −6 −4 −5 −4
4.5 · 10 2.74 · 10 1 · 10 1.5 · 10 4 · 10 3.46 · 10−7 7.45 · 10−6 2.03 · 10−5 6.96 · 10−5 5.36 · 10−5 6.23 · 10−6

8/34
columns (see the Polarity of Adjectives section). We estimate the annual cluster polarity
of c as the vector of column-wise averages of Ai . Let Pc = {P1 , P2 , . . . , Pm } be the set
of annual cluster polarities so obtained.
Annual polarities for representative clusters from each of our domains are shown in
figures 11 to 15.

Defining Framing Changes


Since language and human behavior are not strictly deterministic, the measurement of
any temporally disparate pair of news corpora using adjective polarity (or any other
numerical metric) would result in different representative values of the two corpora.
Therefore, in this sense, any pair of news corpora can be said to have undergone a
framing change.
Further, individual metrics are susceptible to noisy readings due to imprecise data
and measurement. In particular, such an effect may cause sudden isolated spikes
between successive measurements. For example, in figure 11, during the period between
2005 and 2006, whereas classes 1, 3, 4, and 5 changed little, class 2 showed a substantial
change.
This motivates the question of how a framing change is defined, in the context of our
computational measurements. The usual social science definition [14] is that a framing
change is a shift in the way that a specific topic is presented to an audience. To isolate
such changes computationally, we use the following key observations from ground truth
framing changes: (i) framing changes take place as trends that are consistent over at
least k years (ii) framing changes must be consistent across multiple measurements.
Our aim in this paper is to begin from a set of time series such as the ones in
figure 11, and isolate such trends. The requirement motivated by our first condition,
namely, that framing changes must last at least k years, is easy to satisfy by imposing
such a numerical threshold.
To satisfy the requirement motivated by our second observation, we rely on
correlations between different measurements, as described in the section below.

Detecting Framing Changes using Periods of Maximum


Correlation
Our five polarity classes serve as measures of framing within a domain. We conceive of
a framing change as a trend, consistent across our five polarity classes, over a period of
some years.
We describe our intuition and approach to detecting framing changes below. Firstly,
we show that the frequency with which adjectives occur in articles varies both by
domain, and in different years within a domain.
Figure 1 depicts the average number of adjectives per article for each of our domains
over the years in their respective periods of interest. We note that this count serves also
as a measure of how subjective news publishing in a domain is, since adjectives are
indicative of how events are framed.
Notice that in the domain LGBT rights, the peak in this measure immediately
precedes a framing change from an earlier study [25]. Whereas we do not claim that this
correlation is true for all domains, we posit that it motivates the utility of adjective
polarity in the study of framing changes.
Despite the fact that the volume of adjectives used per article vary dramatically (by
up to 30%), we find that the variation in our annual cluster polarity between successive
years is generally on the order of less than 1%. However, through a consistent trend
lasting multiple years, our measure of annual polarity can change (increase or decrease)

March 3, 2020 9/34


10

4
Drones
2
2003 2005 2007 2009 2011

1.00 Immigration

0.80

0.60

0.40
2000 2003 2005 2007 2009 2011 2013 2015 2017

4.00

2.00

LGBT Rights
0.00
1996 1999 2002 2005 2008 2011 2014
Fig 1. The average number of adjectives per article, shown for our domains over their
respective periods of interest. This metric serves as a measure of the subjectivity of
news in a domain. Notice that in the domain LGBT rights, the peak in this measure
immediately precedes a framing change identified in an earlier study [25].

March 3, 2020 10/34


20.00

10.00

Obesity
0.00
1990 1993 1996 1999 2002 2005 2008

40.00

30.00 Smoking

20.00

10.00

1990 1993 1996 1999 2002 2005

5.00
Surveillance
4.00

3.00

2.00

2010 2011 2012 2013 2014 2015 2016


Fig 2. The average number of adjectives per article, shown for our domains over their
respective periods of interest. This metric serves as a measure of the subjectivity of
news in a domain.

March 3, 2020 11/34


cumulatively by up to 5% (see figure 11 for an example). We identify a framing change
based on such a cumulative trend.
We now consider the problem of fusing estimates from our five measures of annual
cluster polarity. Consider the change in polarity of classes 1, 3, 4, and 5 between 2005
and 2006 in figure 11, as against the change in class 2. As mentioned earlier, classes 1,
3, 4, and 5 changed little, whereas class 2 showed a substantial change.
In contrast, we note that in the period 2001 to 2013, a consistent trend was
observable across all five classes, with substantial reductions in classes 2 and 3, and a
notable corresponding increase in class 5. We exploit correlations between the changes
in our five classes to identify framing changes.
Accordingly, we use Pearson correlation [26] between our classes as a measure of
trend consistency. Let a given domain have m years in its period of interest. We
generate all possible contiguous subsets of Tm , namely, Ti–j , where i ≤ j ≤ m, and Ti–j
denotes the domain corpus from year i to year j.
Let C =  {C1 , . . . , C5 } be the set of class vectors for this domain subset, where
C1i
C1i+1  i
C1 =   . . . , C1 is the value of class 1 for year i, and similarly for C2 , . . . , C5 .

C1j
To measure the correlation of subset Ti–j , we compute its matrix of correlation
coefficients [27] K. We reshape K into a vector of size f × 1 where f = i ∗ j, and
evaluate its median, l. We find the maximum value of l, lmax , over all possible values of
i and j. We denote the values of i and j corresponding to lmax as imax and jmax . We
return Timax –jmax as our period of maximum correlation (PMC).
We note that the smaller the duration of a PMC, the greater the possibility that our
class vectors may have a high correlation in the period due to random chance. To
compensate for this effect, we employ a threshold whereby a period is not considered as
a candidate for the domain PMC unless it lasts at least y years. We uniformly employ a
value of y = 3 in this paper.
Our approach thus identifies polarity drifts that are both correlated (quantitatively
measured by correlations between different measures of polarity) and sustained (by the
imposition of a threshold of duration). We point out that our approach filters out
isolated drifts in individual polarity measures, since such drifts are uncorrelated across
multiple measures. Further, we note that the magnitude of individual drifts matters
only indirectly to our approach, to the extent that a larger drift, if consistent across
multiple polarity measures, may have higher correlation than a smaller drift that is also
correlated.
A block diagram depicting our overall approach is shown in figure 3.

Quantitative Evaluation
We now discuss a partial quantitative evaluation of our approach using a
Precision-Recall analysis. Our analysis relies on ground truth annotation of framing
changes, as detailed in the section below.
We are unable to conduct a full precision-recall analysis over all domains due to the
limitations we discuss in the following sections, as well as in the Qualitative Analysis
and Discussion section. However, we expect that our partial analysis is representative of
the general performance of the approach.

March 3, 2020 12/34


Domain Corpora

LDA

Seed Datasets Frames

Stanford Parser Compute class frequencies

Adjectives Annual Polarities

Compute class frequencies Compute correlations

Adjective Distribution PMC

Fig 3. A block diagram illustrating our approach. Our adjective distribution is


computed using per class frequencies of occurrence for each adjective in the seed
dataset(s). We use this distribution to compute annual cluster polarities of frames
obtained using LDA from our domain corpora.

Ground Truth Annotation


We label a ground truth for each domain, marking years corresponding to framing
changes as positives, and other years as negatives. We primarily obtain our positives
using the findings of large-scale surveys from earlier research.
In order to do so, we study the literature pertaining to framing changes in the
domains we examine. We identify large-scale studies conducted by reputed
organizations such as the National Cancer Institute (NCI) [28], the Columbia
Journalism Review (CJR) [29], Pew Research [30], and so on. These studies examine
news and media publishing in a particular domain over a period of time, as we do, and
manually identify changes in the framing of domain news during these periods.
The studies we rely on for ground truth sometimes provide quantitative justification
for their findings. For example, the NCI monograph on the framing of smoking news
identifies the number of pro and anti tobacco control frames before and after a framing
change [28]. These studies therefore provide an expert annotation of framing changes in
our domains, for the periods we examine. Details of each study we used and their
findings are reported in the Results section.
By demonstrating substantial agreement between the results of our approach and
those of earlier ground truth surveys, we establish our claim that our approach may be
used to automatically identify framing changes in domain news publishing.

Precision-Recall Analysis
To gain confidence that our approach successfully identifies framing changes, we
conduct a precision-recall analysis on our data. We consider each year in each domain
as a data point in our analysis. We calculate overall precision and recall over all data
points in our domains. We consider a data point a true positive or true negative if both
a ground truth study and our approach labeled it as corresponding to a framing change,
or otherwise, respectively. We refer to a data point that was labeled as a positive (or
negative) by our approach, but which is a negative (or positive) according to the
relevant ground truth survey as a false positive or false negative, respectively.

March 3, 2020 13/34


tp tp
We calculate precision as P = tp+fp and recall as R = tp+fn where tp, fp, and fn are
the numbers of true positives, false positives, and false negatives, respectively.
For some domains, we were unable to identify an earlier survey studying the framing
of news publishing in the domain. We exclude these domains from our precision-recall
analysis. However, we show that in these cases, our estimated PMCs foreshadow events
of substantial public and legislative import. Since framing changes have been shown to
be associated with such public and legislative events [12], we argue that this provides
some measure of validation of our estimated PMCs.

Fig 4. Our estimated clusters for the domain abortion. Each cluster is said to
represent a unique frame. The frame discussed in cluster 1 (characterized by the terms
‘abortion’ and ‘ban’) concerns a proposed ban on abortion. We analyze this cluster, and
find that our estimated PMC (Figure 15) coincides with the period immediately
preceding the Partial Birth Abortion Act of 2003.

Fig 5. Our estimated clusters for the domain drones. Each cluster is said to represent
a unique frame. The frame discussed in cluster 1 concerns the use of drones against
terrorist targets. Our analysis of this cluster returns a PMC of 2009 to 2011 (Figure 17).
Our PMC immediately foreshadows the Federal Aviation Administration’s
Modernization and Reform Act of 2012.

March 3, 2020 14/34


Fig 6. Our estimated clusters for the domain LGBT Rights. Each cluster is said to
represent a unique frame. The frame discussed in cluster 3 discusses the subject of
same-sex marriage, and in particular, judicial interest in this topic. We analyze this
cluster and estimate two PMCs of nearly identical correlation score (2006 to 2008 and
2013 to 2015 Figure 14). The PMC of 2013 to 2015 coincides exactly with the Supreme
Court judgment of 2015 that legalized same-sex marriage in the entire US.

Fig 7. Our estimated clusters for the domain obesity. Each cluster is said to represent
a unique frame. We posit that cluster 2 (characterized by the terms ‘food’, ‘diet’, and
‘make’) represents societal causes of obesity (see the Obesity section). We analyze this
cluster and estimate a PMC of 2005 to 2007 (Figure 13). Our PMC agrees with the
findings of an earlier human survey [2].

March 3, 2020 15/34


Fig 8. Our estimated clusters for the domain smoking. Each cluster is said to represent
a unique frame. The frame of cluster 3, characterized by the terms ‘cancer’ and ’smoke’,
discusses the health risks associated with smoking. We analyze this cluster and estimate
a PMC of 2001 to 2003 (Figure 11). Our PMC coincides exactly with an earlier
monograph from the National Cancer Institute (NCI) that describes a progression
towards tobacco control frames in American media between 2000 and 2003.

Fig 9. Our estimated clusters for the domain surveillance. Each cluster is said to
represent a unique frame. The frame of cluster 3, characterized by the terms ‘national‘,
‘security’, and ‘agency’, discusses the Snowden revelations of 2013. We analyze this
cluster and estimate a PMC of 2013 to 2014 (Figure 12). Our PMC coincides exactly
with the period following the Snowden revelations. Additionally, we note that the
Columbia Journalism Review [29] found that following the Snowden revelations, news
coverage of Surveillance changed to a narrative focusing on individual rights and digital
privacy [12].

March 3, 2020 16/34


Fig 10. Our estimated clusters for the domain Immigration. Each cluster is said to
represent a unique frame. The frame of cluster 2 discusses the waning of asylum grants,
increased border refusals and the final 2002 white paper on “Secure Borders, Safe
Haven.” We analyze this cluster and estimate a PMC of 2000 to 2002 (Figure 16). Our
PMC coincides exactly with the period immediately foreshadowing the government
white paper.

Results
We find that our periods of maximum correlation correlate substantially with framing
changes described in earlier surveys [2, 29, 31, 32], and also foreshadow legislation.
Our computed class vectors are depicted in figures 11 to 15. We discuss each domain
below.

Smoking
The NCI published a monograph discussing the influence of the news media on tobacco
use [28]. On page 337, the monograph describes how, during the period 2001 to 2003,
American news media had progressed towards tobacco control frames. It states that
55% of articles in this period reported progress on tobacco control, whereas only 23%
reported setbacks.
In contrast, the monograph finds (also on page 337) that between 1985 to 1996,
tobacco control frames (11) were fairly well balanced with pro-tobacco frames (10). We
extracted a dataset of over 2,000 articles from 1990 to 2007.
Our approach returns a PMC of 2001 to 2003 (see figure 11) for this domain. Since
no studies cover the period 1997 to 2000 [28], we interpret the findings described in the
monograph to imply that the change towards tobacco control frames predominantly
began in 2000, and ended in 2003. This domain therefore contributes three true
positives (2001 to 2003) and one false negative (2000), with no false positives, to our
precision-recall analysis.

Surveillance
The CJR [29] found that following the Snowden revelations, news coverage of
Surveillance in the US changed to a narrative focusing on individual rights and digital
privacy [12]. We compiled a dataset consisting of approximately 2,000 surveillance
articles from the New York Times for the period 2010 to 2016.

March 3, 2020 17/34


0.260

0.240 Class 1
0.220

0.200

0.180

0.110
Class 2

0.100

0.150

0.140

Class 3
0.130

0.120
0.260

0.240

0.220

Class 4
Class 5
0.340
0.320
0.300
0.280
0.260
1990 1993 1996 1999 2002 2005
Fig 11. Annual polarities for cluster 3, (characterized by the terms ‘cancer’ and
‘smoke’), from Figure 8 from the domain smoking for the classes 1 to 5. The PMC is
shown with solid lines in square markers, and coincides exactly with a framing change
described in an earlier NCI monograph.

March 3, 2020 18/34


Our class vectors for this domain are shown in figure 12. We obtain a PMC of 2013
to 2014 for this period, corresponding closely with the ground truth framing change.
The trends in our class vectors are indicative of the change. As can be seen from the
figure, positivity (measured by class 5), drops markedly, together with a simultaneous
increase in negativity (class 1) and neutrality (classes 2 and 3). Class 4 remains close to
constant during this period and thus does not affect our hypothesis.
We interpret the findings of [29] to refer primarily to 2013, the year in which the
revelations were made, and the following year, 2014. Whereas other interpretations may
conclude a longer framing change, they must necessarily include this period. This
domain therefore contributes two true positives (2013 and 2014) with no false positives
or negatives to our quantitative evaluation.

Obesity
Kim and Willis [2] found that the framing of obesity news underwent changes between
the years 1997 and 2004. During this period, Kim and Willis found that the fraction of
news frames attributing responsibility for obesity to social causes increased significantly.
Prior to this period, obesity tended to be framed as an issue of individual responsibility.
For example, obesity news after the year 2000 has often criticized food chains for their
excessive use of sugar in fast food, as shown in the NYT snippet in the Introduction and
Contributions section. We compiled a dataset of over 3,000 articles from the New York
Times (since Kim and Willis [2] restrict their study to Americans) from 1990 to 2009.
The clusters we estimate for this domain are shown in Figure 7. Cluster 2 addresses
possible causes of obesity, with a particular focus on dietary habits. We posit that this
cluster represents societal causes more than individual ones (since individual causes, as
shown in the NYT snippet of the Introduction and Contributions section tend to discuss
topics such as fitness and sedentary lifestyles, as opposed to food content). We observe
that the PMC for this domain (2005 to 2007) is characterized by increased positivity,
shown by classes 4 and 5, and decreased negativity (class 1). Our results for this
domain thus agree with the findings of Kim and Willis [2].
We were unable to use this domain in our precision-recall analysis, since Kim and
Willis, to the best of our knowledge, do not specify a precise period during which the
framing change took place.
However, since Figures 2 and 3 of Kim and Willis [2] show a dramatic increase of
social causes in 2004, and a corresponding marked decline of individual causes, we
conclude a substantial agreement between their findings and our results.

LGBT Rights
We compiled a dataset of over 3,000 articles from the period 1996 to 2015 in this domain.
Figure 6 depicts our estimated clusters. Cluster 3 represents a frame that discusses the
subject of same-sex marriage and its legality. We note that the Supreme Court ruled to
legalize same-sex marriages in the US in the year 2015. Our class vectors for this domain
are shown in figure 14. We obtained two PMCs with nearly identical correlation scores
(0.999 for the period 2006 to 2008, and 0.989 for the period 2013 to 2015). Figure 14
highlights the period 2013 to 2015 immediately preceding the judicial interest of 2015.
We were unable to identify a prior study that discusses the framing of LGBT news
over our entire period of interest. However, we use the findings reported in Gainous et
al. [32] as our ground truth for this domain. Gainous et al. studied the framing of
LGBT related publishing in the New York Times over the period 1988 to 2012, and
found a dramatic increase in equality frames between approximately 25 in 2008 and
approximately 110 in 2012. Correspondingly, our findings of Figure 14 show that
between 2008 and 2012, there was a dramatic increase in the measures of classes 4 and 5

March 3, 2020 19/34


0.25 Class 1

0.20

0.15

0.14

0.12 Class 2

0.1
0.14

0.12 Class 3

0.1
0.24

0.22
Class 4
0.2

0.18

Class 5
0.35

0.3
2010 2011 2012 2013 2014 2015 2016
Fig 12. Annual polarities for a representative cluster (characterized by the terms
‘national‘, ‘security’, and ‘agency’) from the domain surveillance for the classes 1 to 5.
The PMC is shown with solid lines in square markers.

March 3, 2020 20/34


0.3

Class 1
0.25

0.2

0.120 Class 2

0.100

0.15

0.14
Class 3
0.13

0.12
0.24
Class 4

0.22

0.2
0.36 Class 5

0.34

0.32

0.3

0.28
1990 1993 1996 1999 2002 2005 2008
Fig 13. Annual polarities for cluster 2 (characterized by the terms ‘diet’, ‘food’, and
‘make’) from Figure 7 from the domain obesity for the classes 1 to 5. The PMC is shown
with solid lines in square markers. We posit that this cluster represents societal causes
of obesity (see the Obesity section). We observe that the PMC for this cluster (2005 to
2007) agrees with the findings of Kim and Willis [2].

March 3, 2020 21/34


(representing positivity), and a marked reduction in the measures of classes 1 and 2,
(representing negativity).
For uniformity, we imposed a threshold of y = 3 years to identify a PMC. As we
explain in the section on Detecting Framing Changes using Periods of Maximum
Correlation, high correlations are less likely for more extended periods. Therefore 2008
and 2012 is unsurprisingly not our PMC at the chosen uniform threshold. However,
given the marked increase in positivity and corresponding decrease in negativity in this
period, we posit that the period has high correlation. Also, whereas we were unable to
find a study covering the period 2013 to 2015 to use as ground truth, we note that our
PMC preceded major judicial interest in the domain.
We therefore rely on Gainous and Rhodebeck [32] for ground truth in this domain,
and note that the trend towards increased positivity (and reduced negativity) in
Figure 14 began in 2008 and ended in 2013. We therefore conclude that our measures
return four true positives, and one false positive for this domain.

Abortion
The Partial-Birth Abortion Ban Act was enacted in 2003. We obtained 248 articles for
the period 2000 to 2003, for this domain. We obtain a PMC of 2001 to 2003 for this
domain, as shown in figure 15.

Immigration
We study the framing of immigration news in the United Kingdom. We obtained about
3,600 articles on the subject of Immigration from the Guardian API for the period 2000
to 2017. For this domain, we carried out our analysis on the article titles (rather than
the full text). Since the Guardian returns full length articles, we found that this design
choice allows us to produce a more focused domain corpus than the one generated by
the full article text. We depict our estimated class vectors and PMC in figure 16.
We analyze the frame of cluster 2 in Figure 10. This cluster deals with the issue of
asylum seekers to the United Kingdom. In the period beginning immediately before the
year 2000,a new peak in asylum claims to the United Kingdom of 76,040 had been
reached [33]. This event coincided with a high-profile terrorist act by a set of Afghan
asylum seekers [33].
These events resulted in increased border refusals and the final 2002 white paper on
“Secure Borders, Safe Haven.” We estimate a PMC of 2000 to 2002 (Figure 16). Our
PMC coincides exactly with the period immediately foreshadowing the government
white paper.

Drones
We obtained nearly 4,000 articles on this domain for the period 2003 to 2012. We
obtain a PMC of 2009 to 2011 for this domain, as shown in Figure 17.
Our PMC immediately foreshadows the Federal Aviation Administration’s
Modernization and Reform Act of 2012.

Predictive Utility
The aforementioned two domains (immigration and drones) highlight the predictive
utility of news framing. Whereas we did not find earlier surveys that coincide with our
PMCs for these domains, we note that these PMCs foreshadowed substantial legislative
activity. This observation suggests that PMCs estimated through real-time monitoring
of domain news may yield predictive utility for legislative and commercial activity.

March 3, 2020 22/34


0.200
Class 1

0.180

0.160

0.120
Class 2
0.115

0.110

0.105

0.150 Class 3

0.140

0.130
0.250

0.240 Class 4

0.230

0.220

0.350
Class 5
0.340

0.330

1996 1998 2000 2002 2004 2006 2008 2010 2012 2014
Fig 14. Annual polarities for cluster 3, characterized by the terms ‘gay’, ‘rights’, and
‘marriage’, in Figure 6 from the domain LGBT Rights for the classes 1 to 5. We obtain
two PMCs with nearly identical correlation scores, namely, 2006 to 2008 and 2013 to
2015. The PMC of 2013 to 2015 is shown with solid lines in square markers,
immediately preceding the judicial interest of 2015.

March 3, 2020 23/34


0.3
Class 1
0.28

0.26

0.24

0.12

0.11
Class 2
0.10

0.09

0.14
Class 3
0.13

0.12

Class 4
0.22

0.20

0.18

0.30
Class 5
0.28

0.26

2000 2001 2002 2003


Fig 15. Annual polarities for cluster 1 (characterized by the terms ‘abortion’ and ‘ban’)
from Figure 4 from the domain abortion for the classes 1 to 5. The PMC is shown with
solid lines in square markers (2001 to 2003) and foreshadows the partial birth abortion
ban of 2003.

March 3, 2020 24/34


0.500

0.400

0.300

0.200 Class 1

0.15

0.10

0.05 Class 2

0.15

0.10

Class 3

0.25

0.20

0.15 Class 4
0.50

0.40

0.30

0.20 Class 5

2000 2003 2005 2007 2009 2011 2013 2015 2017


Fig 16. Annual polarities for cluster 2 (discussing asylum grants) from Figure 10 from
the domain immigration for the classes 1 to 5. The PMC is shown with solid lines in
square markers, and foreshadows the “Secure Borders, Safe Haven” white paper of 2002.

March 3, 2020 25/34


0.180
Class 1

0.160

0.140

0.100
Class 2
0.090

Class 3

0.140

0.130

Class 4
0.260

0.240

0.400

0.380
Class 5
0.360

0.340

2003 2005 2007 2009 2011


Fig 17. Annual polarities for cluster 1 (discussing drone strikes) from Figure 5 from
the domain drones for the classes 1 to 5. The PMC is shown with solid lines in square
markers, and immediately foreshadows the Federal Aviation Administration’s
Modernization and Reform Act of 2012. This finding suggests the predictive utility of
framing change detection for legislative activity.

March 3, 2020 26/34


Overall Precision and Recall
We obtain an overall precision of 0.90 as well as a recall of 0.90. Our results
demonstrate that we successfully identify 90% of true positives, whereas only 10% of the
positives we identify are false positives.
Further, we point out that our false positives generally either precede or succeed a
ground truth framing change. Therefore, we posit that such false positives may be due
to imprecision in measurement rather than any considerable failure of our approach.
Our results demonstrate substantial agreement with ground truth in domains for
which prior surveys have studied framing changes. In domains for which we did not find
such surveys, we demonstrate that our PMCs foreshadow periods of substantial public
and legislative import. We posit, therefore, that our approach successfully identifies
framing changes.

Qualitative Analysis and Discussion


Whereas we provide a partial quantitative evaluation of our approach using precision
and recall in the preceding sections, we confront substantial difficulties in uniformly
conducting such evaluations across all of our domains. In this section, we qualitatively
evaluate our results in the context of these limitations.
Framing, and framing changes, have in general been studied from the lens of
identifying a general trend from a particular data source. Studies often describe such
trends without stating a hard beginning or end of the trend period [28, 29].
The periods we analyze here are relatively long (often lasting almost two decades).
The available human surveys, not surprisingly are limited in scope and do not cover the
same periods in their entirety. As a result, we observe missing years during which
computational methods produce a belief that cannot be verified using a human survey,
since no human survey covers such years. Therefore, a quantitative precision-recall style
analysis becomes difficult or impossible to conduct.
Further, the language used in existing studies is not always sufficiently precise as to
interpret a fixed set of positives and negatives. For example, the Columbia Journalism
Review uses the phrase “after the Snowden revelation” to describe the change in media
attitude. The Snowden revelations were made in June 2013. Our estimated PMC is
2013 and 2014.
The NCI monograph on smoking states that between 1985 and 1996, framing was
balanced between pro and anti tobacco control, and in 2001 to 2003, framing favored
tobacco control. Our PMC is 2001 to 2003. However, there are no studies that we know
of that cover the period 1996 to 2000.
Likewise, the Kim and Willis study on Obesity [2] that was published in 2007 states
that the change happened “in recent times,” but shows quantitative measures only until
2004. The measures that Kim and Willis compute show that the frequency of societal
causes increases sharply in 2004. Our PMC over the period 1990 to 2009 (with our
uniformly imposed threshold of y = 3) is 2005 to 2007. The correlation for the period
2004 to 2007 is high but it is not the PMC at a threshold of y = 3.
These examples show that performing a precision-recall analysis based on prior work
is problematic, since obtaining a set of positives and negatives from prior studies in the
sociological and communications literature involves interpretation.
Given that the data sources and coverage between our analysis and that of prior
surveys are usually quite different, the correlations we obtain appear quite substantial.
However, quantitative evaluation remains challenging for the reasons we point out.
This paper follows the spirit of recent work [11, 13, 15] in seeking to develop the
study of framing into a computational science. We acknowledge that our dataset
collection and methods may undergo refinement to tackle broader ground truth data, of

March 3, 2020 27/34


a wider temporal and geographical scope. Nonetheless, we posit that our methods and
results have scientific value, and hope that future work will provide greater coverage of
ground truth.
Please note that the underlying data preparation requires social science expertise
and cannot be effectively crowdsourced via a platform such as Mechanical Turk. We
therefore hope that our work catches the interest of social scientists and leads them to
pursue more comprehensive studies of framing in news media that would enable
improvements in computational methods.

Conclusion
We highlight a problem of significant public and legislative importance, framing change
detection. We contribute an unsupervised natural language based approach that detects
framing change trends over several years in domain news publishing. We identify a key
characteristic of such changes, namely, that during frame changes, the polarity of
adjectives describing cooccurring nouns changes cumulatively over multiple years. Our
approach agrees with and extends the results of earlier manual surveys. Whereas such
surveys depend on human effort and are therefore limited in scope, our approach is fully
automated and can simultaneously run over all news domains. We contribute the
Framing Changes Dataset, a collection of over 12,000 news articles from seven domains
in which framing has been shown to change by earlier surveys. We will release the
dataset with our paper. Our work suggests the predictive utility of automated news
monitoring, as a means to foreshadow events of commercial and legislative import.
Our work represents one of the first attempts at a computational modeling of
framing and framing changes. We therefore claim that our approach produces promising
results, and that it will serve as a baseline for more sophisticated analysis over wider
temporal and geographical data.

Appendix: Sample Correlations


Correlations for all subsets are shown in Table 4. The PMC is shown in bold.

Table 4. All correlations for cluster 2 of the domain Immigration.


Correlation Start Index End Index
1 1 2
1 2 3
1 3 4
1 4 5
1 5 6
1 6 7
1 7 8
1 8 9
1 9 10
1 10 11
1 11 12
1 12 13
1 13 14
1 14 15
1 15 16
1 16 17
1 17 18
0.99 1 3

March 3, 2020 28/34


Table 4 All correlations for cluster 2 of the domain Immigration (continued).
Correlation Start Index End Index
0.98 10 12
0.98 12 14
0.96 12 15
0.93 12 16
0.93 9 12
0.88 11 13
0.88 9 11
0.87 11 14
0.86 11 16
0.85 11 15
0.85 8 12
0.84 8 10
0.83 7 9
0.82 15 17
0.82 14 16
0.82 10 13
0.81 7 12
0.81 9 13
0.79 10 14
0.79 10 16
0.79 9 14
0.78 9 16
0.78 10 15
0.77 9 15
0.77 8 16
0.77 8 13
0.76 8 15
0.76 8 14
0.75 12 17
0.75 5 7
0.75 12 18
0.74 11 18
0.74 11 17
0.72 6 8
0.72 10 17
0.72 10 18
0.72 7 11
0.72 13 15
0.71 7 10
0.70 8 11
0.69 2 4
0.68 7 13
0.68 4 6
0.67 3 5
0.67 16 18
0.66 7 14
0.66 9 17
0.66 9 18
0.66 14 17
0.65 8 17

March 3, 2020 29/34


Table 4 All correlations for cluster 2 of the domain Immigration (continued).
Correlation Start Index End Index
0.65 7 16
0.65 8 18
0.64 7 15
0.63 6 11
0.62 6 10
0.62 6 12
0.62 6 9
0.61 6 14
0.61 6 13
0.61 5 13
0.61 5 12
0.60 5 8
0.60 5 14
0.60 7 17
0.60 6 16
0.60 4 12
0.59 7 18
0.59 4 13
0.59 5 16
0.59 2 5
0.58 6 15
0.58 5 11
0.58 5 15
0.58 5 10
0.57 5 9
0.57 4 16
0.57 4 14
0.56 2 13
0.56 2 12
0.56 4 17
0.56 15 18
0.56 4 18
0.56 3 13
0.56 3 6
0.56 3 12
0.56 1 5
0.55 4 15
0.55 5 17
0.55 5 18
0.55 3 18
0.55 2 18
0.55 3 17
0.55 2 17
0.54 2 16
0.54 3 16
0.54 6 17
0.54 6 18
0.53 2 14
0.53 3 14
0.53 2 15

March 3, 2020 30/34


Table 4 All correlations for cluster 2 of the domain Immigration (continued).
Correlation Start Index End Index
0.53 3 15
0.52 1 12
0.52 1 13
0.51 1 18
0.51 1 17
0.50 13 17
0.50 4 7
0.49 2 6
0.49 1 16
0.49 1 6
0.48 1 14
0.48 1 15
0.47 13 18
0.46 13 16
0.44 4 8
0.43 1 4
0.42 4 9
0.40 4 10
0.40 4 11
0.39 14 18
0.36 3 7
0.36 2 11
0.36 1 7
0.36 3 11
0.35 3 8
0.36 2 7
0.35 2 8
0.35 2 9
0.34 1 8
0.34 2 10
0.34 3 9
0.34 3 10
0.34 1 11
0.34 1 9
0.32 1 10

Ethics Statement
Our study involved no human or animal subjects.

Funding Statement
CS has a commercial affiliation to Amazon. The funder provided support in the form of
salaries for this author, but did not have any additional role in the study design, data
collection and analysis, decision to publish, or preparation of the manuscript. The
specific roles of these authors are articulated in the ‘author contributions’ section.

March 3, 2020 31/34


Competing Interests Statement
The above commercial affiliation does not alter our adherence to PLOS ONE policies on
sharing data and materials.

Author Contributions
KS and CS conceived the research and designed the method. KS prepared the datasets
and performed the analysis. KS and MPS designed the evaluation approach. KS, CS,
and MPS wrote the paper.

References
1. Gunnars K. Ten Causes of Weight Gain in America; 2015.
https://www.healthline.com/nutrition/10-causes-of-weight-gain#section12.

2. Kim SH, Willis A. Talking about Obesity: News Framing of Who Is Responsible
for Causing and Fixing the Problem. Journal of Health Communication.
2007;12(4):359–376.
3. Flegal K, Carroll M, Kit B, Ogden C. Prevalence of Obesity and Trends in the
Distribution of Body Mass Index Among US Adults, 1999–2010. Journal of the
American Medical Association. 2012;307(5):491–497.

4. Chong D, Druckman J. Framing theory. Annual Reviews on Political Science.


2007;10:103–126.
5. de Vreese C. News framing: Theory and typology. Information Design Journal.
2005;13(1):51–62.

6. Constantin L. Facebook ID Leak hits Millions of Zynga Users; 2010.


http://tinyurl.com/2bqwoxq.
7. Crossley R. Facebook ID Leak hits Millions of Zynga Users; 2011. http://www.
develop-online.net/news/facebook-id-leak-hits-millions-of-zynga-users/0107956.

8. Fitzsimmons C. Facebook And Zynga Sued Over Privacy; 2014.


http://www.adweek.com/digital/facebook-zynga-sued/.
9. US Congress. Personal Data Protection and Breach Accountability Act of 2014;
2014. https://www.congress.gov/bill/113th-congress/senate-bill/1995.
10. Benford R, Snow D. Framing Processes and Social Movements: An Overview and
Assessment. Annual Review of Sociology. 2000;26(1):611–639.
11. Card D, Boydstun A, Justin Gross J, Resnik P, Smith N. The Media Frames
Corpus: Annotations of Frames across Issues. In: Proceedings of the 53rd Annual
Meeting of the Association for Computational Linguistics and the 7th
International Joint Conference on Natural Language Processing (Volume 2: Short
Papers). vol. 2; 2015. p. 438–444.
12. Sheshadri K, Singh MP. The Public and Legislative Impact of
Hyper-Concentrated Topic News. Science advances. 2019;5(8).

March 3, 2020 32/34


13. Tsur O, Calacci D, Lazer D. A Frame of Mind: Using Statistical Models for
Detection of Framing and Agenda Setting Campaigns. In: Proceedings of the
53rd Annual Meeting of the Association for Computational Linguistics and the
7th International Joint Conference on Natural Language Processing (Volume 1:
Long Papers); 2015. p. 1629–1638.
14. Entman R. Framing: Toward Clarification of a Fractured Paradigm. Journal of
Communication. 1993;43(4):51–58.
15. Alashri S, Tsai JY, Alzahrani S, Corman S, Davulcu H. “Climate Change”
Frames Detection and Categorization Based on Generalized Concepts. In:
Proceedings of the 10th IEEE International Conference on Semantic Computing
(ICSC); 2016. p. 277–284.
16. NYT. Developer APIs; 2016. http://developer.nytimes.com/.
17. Wikipedia. The New York Times; 2001.
https://en.wikipedia.org/wiki/The New York Times.
18. The Guardian. Guardian Open Platform; 2016.
http://open-platform.theguardian.com/.
19. Wikipedia. The Guardian; 2002. https://en.wikipedia.org/wiki/The Guardian.
20. King G, Schneer B, White A. How the News Media Activate Public Expression
and Influence National Agendas. Science. 2017;358(6364):776–780.
21. Sheshadri K, Ajmeri N, Staddon J. No Privacy News is Good News: An Analysis
of New York Times and Guardian Privacy News from 2010—2016. In:
Proceedings of the 15th Privacy, Security and Trust Conference. Calgary, Alberta,
Canada; 2017. p. 159–167.
22. Socher R, Perelygin A, Wu J, Chuang J, Manning C, Ng A, et al. Recursive Deep
Models for Semantic Compositionality Over a Sentiment Treebank. In:
Proceedings of the 2013 Empirical Methods in Natural Language Processing
Conference (EMNLP). Seattle, WA: Association for Computational Linguistics;
2013. p. 1631–1642.
23. Melo GD, Bansal M. Good, Great, Excellent: Global Inference of Semantic
Intensities. Transactions of the Association for Computational Linguistics.
2013;1:279–290.
24. Blei D, Ng A, Jordan M. Latent Dirichlet Allocation. Journal of Machine
Learning Research. 2003;3:993–1022.
25. Engel S. Frame Spillover: Media Framing and Public Opinion of a Multifaceted
LGBT Rights Agenda. Law and Social Inquiry. 2013;38:403–441.
26. Benesty J, Chen J, Huang Y, Yiteng C, Cohen I. Pearson Correlation Coefficient.
In: Noise reduction in speech processing. Springer; 2009. p. 1–4.
27. Mathworks. Correlation Coefficients; 2019.
https://www.mathworks.com/help/matlab/ref/corrcoef.html.
28. National Cancer Institute. How the News Media Influence Tobacco Use; 2019.
https://cancercontrol.cancer.gov/brp/tcrb/monographs/19/m19 9.pdf.
29. Vernon P. Five Years Ago, Edward Snowden Changed Journalism; 2018.
https://www.cjr.org/the media today/snowden-5-years.php.

March 3, 2020 33/34


30. Pew Research. The State of Privacy in post-Snowden America; 2016.
31. Cummings M, Proctor R. The Changing Public Image of Smoking in the United
States: 1964–2014. Cancer Epidemiology and Prevention Biomarkers.
2014;23:32–36.
32. Gainous J, Rhodebeck L. Is Same-Sex Marriage an Equality Issue? Framing
Effects Among African Americans. Journal of Black Studies. 2016;47(7):682–700.
33. Wikipedia. History of UK immigration control; 2019.
https://en.wikipedia.org/wiki/History of UK immigration control.

March 3, 2020 34/34


Manuscript Click here to
access/download;Manuscript;FramingChanges____PLOS_One_

Detecting Framing Changes in Topical News Publishing


Karthik Sheshadri1* , Chaitanya Shivade2 , Munindar P. Singh1 ,

1 Department of Computer Science, North Carolina State University


2 Amazon

* kshesha@ncsu.edu

Abstract
Changes in the framing of topical news have been shown to foreshadow significant
public, legislative, and commercial events. Automated detection of framing changes is
therefore an important problem, which existing research has not considered. Previous
approaches are manual surveys, which rely on human effort and are consequently
limited in scope. We make the following contributions. We systematize discovery of
framing changes through a fully unsupervised computational method that seeks to
isolate framing change trends over several years. We demonstrate our approach by
isolating framing change periods that correlate with previously known framing changes.
We have prepared a new dataset, consisting of over 12,000 articles from seven news
topics or domains in which earlier surveys have found framing changes. Finally, our
work highlights the predictive utility of framing change detection, by identifying two
domains in which framing changes foreshadowed substantial legislative activity, or
preceded judicial interest.

“For nearly four decades, health and


fitness experts have prodded and
cajoled and used other powers of
persuasion in a futile attempt to
whip America’s youngsters into
shape.”

The New York Times, 1995

“The New York City Health


Department has embarked on a new
campaign to persuade processed
food brands to decrease sugar
content in a bid to curb obesity.”

The New York Times, 2015

Introduction and Contributions


To motivate the problem and approach of this paper, let us investigate the primary
causes of obesity in America. Public opinion and behavior on the subject have changed
measurably since the late 1990s. As an example, Gunnars [1] compiled a list in 2015 of
ten leading causes, six of which suggest that the processed food industry may be
primarily responsible. By contrast, in the 1990s and early 2000s, popular opinion
appeared to hold [2, 3] that obesity was primarily caused by individual behavior and
lifestyle choices. What led to this change in public opinion?

March 7, 2020 1/34


We posit that news publishing on the subject of obesity contributed to the change in
the public’s opinion. The above quotes from the New York Times (NYT) are
representative snippets from news articles on obesity published in 1995 and 2015,
respectively. Whereas both address the same topic, the 1995 snippet implies
responsibility on part of individuals, and the 2015 snippet implies responsibility on part
of the processed food industry. These subjective biases in news are collectively referred
to as framing.
Framing theory [4, 5] suggests that how a topic is presented to the audience (called
“the frame”) influences the choices people make about how to process that information.
The central premise of the theory is that since an issue can be viewed from a variety of
perspectives and be construed as having varying implications, the manner in which it is
presented influences public reaction.
In general, understanding news framing may be a crucial component of
decision-support in a corporate and regulatory setting. To illustrate this fact, we
present a real-life example of the influence of framing on public perception and
legislation. According to [6, 7], in late 2010, security vulnerabilities associated with
Facebook and Zynga allowed personal data from millions of users to be compromised.
The framing of news on this topic appeared to change from a neutral narrative to one
focusing on personal privacy.
Facebook and Zynga were sued over privacy breaches [8]. Further, in 2013, the
Personal Data Protection and Breach Accountability Act was promulgated in
Congress [9]. These examples motivate the problem of framing change detection, which
involves identifying when the dominant frame (or frames) [10] of a topic undergoes a
change.

Related Work
The Media Frames Corpus, compiled by Card et al. [11], studies three topics
(Immigration, Smoking, and same-sex marriages), and identifies fifteen framing
dimensions in each. We identify two major limitations of their work. Firstly, Card et al.
study framing as a static detection problem, identifying which dimensions appear in a
given news article. However, research in sociology [10] shows that most news topics
feature a dominant frame (or dominant dimension in the terminology of [11]). Further,
for a generic news topic, the dominant frame is not necessarily one of fifteen previously
chosen dimensions, but can instead be an unknown arbitrary frame specific to the topic
under consideration. For example, in the example given in the Introduction and
Contributions section, the dominant frame related to the privacy of individuals, which is
not one of the fifteen dimensions described in Card et al. [11].
Secondly, Sheshadri and Singh [12] showed that public and legislative reaction tend
to occur only after changes in the dominant frame. That finding motivates an approach
to framing that focuses on identifying and detecting changes in the dominant frame of a
news domain.
Sheshadri and Singh further propose two simple metrics that they motivate as
measures of domain framing: framing polarity and density. They define framing polarity
as the average frequency of occurrence in a domain corpus of terms from a benchmark
sentiment lexicon. Framing density is measured using an entropic approach that counts
the number of terms per article required to distinguish a current corpus from an earlier
one.
We identify the following limitations of the aforementioned measures (introduced
in [12]). Firstly, both measures make no effort to associate a given news article with a
particular frame. Prior work does not support the inherent assumption that all articles
in a given domain belong to a particular frame [10, 11]. We enhance understanding by
analyzing each domain using several distinct frames.

March 7, 2020 2/34


Secondly, framing density does not distinguish between a subjective choice made by
a news outlet to frame a domain differently, and events that necessitate media coverage.
Our work provides this distinction by analyzing framing using patterns of change in the
adjectives that describe co-occurring nouns. Since adjectives are artifacts of subjective
framing, they are not affected by events, as framing density is.
It is worthwhile to note that our approach is similar in spirit to Tsur et al.’s
work [13], in that both that work and this paper apply a topic modeling strategy to
analyze framing as a time series. However, we highlight the following key differences
and contributions of our work. Firstly, as both Sheshadri and Singh [12] and Tsur et
al. [13] point out, framing is a subjective aspect of communication. Therefore, a
computational analysis of framing should ideally differentiate subjective aspects from
fact-based and objective components of communication. Since adjectives in and of
themselves are incapable of communicating factual information, we take them to be
artifacts of how an event or topic is framed. In contrast, generic n-grams (as used by
Tsur et al. [13]) do not provide this distinction.
Further, Tsur et al. rely upon estimating “changes in framing” using changes in the
relative frequencies of n-grams associated with various topics or frames. Whereas such
an approach is useful in evaluating which of a set of frames may be dominant at any
given time, it does not measure “framing changes” in the sense originally described
in [14]. In contrast, our work estimates changes in framing using consistent polarity
drifts of adjectives associated with individual frames. Our approach may also be applied
to each of a number of frames independently of the others, as opposed to Tsur et al. [13].
We also distinguish our work from Alashri et al. [15], which uses standard machine
learning tools to classify sentences represented by linguistic features into one of four
frames. Such an approach is limited by the need to predefine a frame set, as Card etal’s
approach [11] is. Further, the paper does not discuss the problem of how to examine
changes within specific frames. To the best of our knowledge, our work is the first to
address the problem of detecting meaningful patterns of change within individual
frames.

Contributions
This paper contributes a fully unsupervised and data-driven natural language based
approach to detecting framing change trends over several years in domain news
publishing. To the best of our knowledge, this paper is the first to address framing
change detection, a problem of significant public and legislative import. Our approach
agrees with and extends the results of earlier manual surveys, which required human
data collection and were consequently limited in scope. Our approach removes this
restriction by being fully automated. Our method can thus be run simultaneously over
all news domains, limited only by the availability of real-time news data. Further, we
show that our approach yields results that foreshadow periods of legislative activity.
This motivates the predictive utility of our method for legislative activity, a problem of
significant import.
Further, we contribute a Framing Changes Dataset, which is a collection of over
12,000 news articles from seven news topics or domains. In four of these domains,
surveys carried out in earlier research have shown framing to change. In two domains,
periods with significant legislative activity are considered. Our individual domain
datasets within the framing changes dataset cover the years in which earlier research
found framing changes, as well as periods ranging up to ten years before and after the
change. Our dataset is the first to enable computational modeling of framing change
trends. We plan to release the dataset with our paper. We note that a fraction of the
articles in this dataset were used earlier for the analysis in [12].

March 7, 2020 3/34


Materials and Methods
This section describes our datasets, data sources, and inter-annotator agreement. All
data were collected in an anonymous and aggregated manner. All APIs and data used
are publicly available, and our data collection complies with the terms and conditions of
each API.

Data Sources
We use two Application Programming Interfaces (APIs) to create our datasets.

The New York Times API:


The New York Times (NYT) Developer’s API [16] provides access to news data from
the NYT, Reuters, and Associated Press (AP) newspapers—both print and online
versions—beginning in 1985. The NYT has the second largest circulation of any
newspaper in the United States [17].
The data object returned by the API includes fields such as the article type (news,
reviews, summaries, and so on), the news source (NYT, Reuters, or AP), the article’s
word count, the date of its publication, article text (in the form of the abstract, the lead
(first) paragraph, and a summary).

The Guardian API:


The Guardian Xplore API [18] provides access to news data from The Guardian, a
prominent UK newspaper that reaches 23 million UK adults per month [19].
The Guardian API returns full-length articles along with such metadata as the
article type (similar to the NYT API) and a general section name (such as sports,
politics, and so on). Although these section names are manually annotated by humans,
we do not use them in our analysis, but rely instead on a simple term search procedure
(see the Domain Dataset Generation section) to annotate our datasets.

Domain Dataset Generation


As in earlier work [12, 20, 21], we use a standard term search procedure to create our
datasets. Specifically, an article belongs to a domain if at least a component of the
article discusses a topic that is directly relevant to the domain [12]. We term articles
that are relevant to a domain domain positives, and irrelevant articles domain negatives.
As an example, consider the following article from the domain smoking: “The dirty gray
snow was ankle deep on West 18th Street the other day, and on the block between Ninth
and Tenth Avenues, a cold wind blew in off the Hudson River. On the south side of the
street, a mechanic stood in front of his garage smoking a . . . .” We consider this article
a domain negative since whereas it contains the keyword ’smoking’, it does not discuss
any aspect pertaining to the prevalence or control of Smoking. In contrast, the article
“An ordinance in Bangor, Maine, allows the police to stop cars if an adult is smoking
while a child under 18 is a passenger” is directly relevant to the domain, and is therefore
considered a domain positive. We define dataset accuracy as the fraction of articles in a
dataset that are domain positive. For each domain, our APIs were used to extract news
data during the time period b (denoting the beginning) to e (denoting the end), of the
period of interest.

March 7, 2020 4/34


Inter-Annotator Agreement
To ensure that the articles returned by our term search procedure are indeed relevant to
each domain, a random sample of articles from each domain dataset was coded by two
raters. We supply the per-domain accuracy and inter-annotator agreement as Cohen’s
Kappa for sample domains in Table 1.

Table 1. Inter-Annotator agreement as Cohen’s Kappa.


Dataset Accuracy
Domain Coder 1 Coder 2 Kappa
Surveillance 0.80 0.75 0.79
Smoking 0.84 0.82 0.93
Obesity 0.78 0.74 0.67
LGBT Rights 0.83 0.74 0.64
Abortion 0.80 0.80 0.50

Probability Distribution over Adjectives


Our approach relies on the key intuition that during a framing change, the valence of
the adjectives describing co-occurring nouns changes significantly.
To measure this change, we create a reference probability distribution of adjectives
based on the frequency of their occurrence in benchmark sentiment datasets.

Benchmark Datasets
We identified three open source benchmark review datasets from which to create our
adjective probability distribution. Together, these datasets provide about 150 million
reviews of various restaurants, services and products, with each review rated from one
to five. Given the large volume of reviews from different sources made available by these
datasets, we assume that they provide a sufficiently realistic representation of all
adjectives in the English language.
We rely primarily on the Trip Advisor dataset to create our adjective probability
distribution. We identified two other benchmark datasets, namely, the Yelp Challenge
dataset and the Amazon review dataset. Due to the fact that these datasets together
comprise about 150 million reviews, it is computationally infeasible for us to include
them in our learning procedure. Instead, we learned distributions from these datasets
for sample adjectives, to serve as a comparison with and as verification of our overall
learned distribution. The resulting distributions for these adjectives appeared
substantially similar to those of the corresponding adjectives in our learned distribution.
We therefore conclude that our learned distribution provides a valid representation of all
adjectives in the English language. We describe each dataset below.

Trip Advisor
The Trip Advisor dataset consists of 236,000 hotel reviews. Each review provides text,
an overall rating, and aspect specific ratings for the following seven aspects: Rooms,
Cleanliness, Value, Service, Location, Checkin, and Business. We limit ourselves to
using the overall rating of each review.

March 7, 2020 5/34


Yelp
The Yelp challenge dataset consists of approximately six million restaurant reviews.
Each entry is stored as a JSON string with a unique user ID, check-in data, review text,
and rating.

Amazon
The Amazon dataset provides approximately 143 million reviews from 24 product
categories such as Books, Electronics, Movies, and so on. The dataset uses the JSON
format and includes reviews comprising a rating, review text, and helpfulness votes.
Additionally, the JSON string encodes product metadata such as a product description,
category information, price, brand, and image features.

Polarity of Adjectives
For each adjective in the English language, we are interested in producing a probability
distribution that describes the relative likelihood of the adjective appearing in a review
whose rating is r. For our data, r ranges from one to five.
We began by compiling a set of reviews from the Trip Advisor dataset for each
rating from one to five. We used the Stanford CoreNLP parser [22] to parse each of the
five sets of reviews so obtained. We thus obtained sets of parses corresponding to each
review set. From the set of resultant parses, we extracted all words that were assigned a
part-of-speech of ‘JJ’ (adjective). Our search identified 454,281 unique adjectives.
For each unique adjective a, we counted the number of times it occurred in our set of
parses corresponding to review ratings one to five. We denote this by Ni , with
N1 N2 N5
1 ≤ i ≤ 5. Our probability vector for adjective a is then { Saa , Saa , . . . , Saa } where
Sa = Na1 + Na2 + Na3 + Na4 + Na5 .
Additionally, we recorded the rarity of each adjective as S1a . This estimates a
probability distribution P , with 454,281 rows and six columns.
Table 2 shows example entries from our learned probability distribution. As can be
seen from the table, our learned distribution not only correctly encodes probabilities
(the adjective ‘great’ has nearly 80% of its probability mass in the classes four and five,
whereas the adjective ‘horrible’ has nearly 80% of its mass in classes one and two), but
also implicitly learns an adjective ranking such as the one described in De Melo et
al. [23]. To illustrate this ranking, consider that the adjective ‘excellent’ has 60% of its
probability mass in class five, whereas the corresponding mass for the adjective ‘good’ is
only 38%.
For visual illustration, we depict our learned probability distribution as a heatmap in
Table 3.
Motivated by our learned probability distribution, we posit that classes 1 represents
negativity, class 2 to 4 represent neutrality, and class 5 represents positivity.

Incorporating Adjective Rarity


Our measure of adjective rarity serves as a method by which uncommon adjectives,
which rarely occur in our benchmark dataset, and whose learned probability
distributions may therefore be unreliable, can be excluded.
However, in doing so, we run the risk of excluding relevant adjectives from the
analysis. We manually inspect the set of adjectives that describe the nouns in each
domain to arrive at a domain specific threshold.

March 7, 2020 6/34


Table 2. Sample entries from our learned probability distribution for positive and
negative sentiment adjectives.
Adjective Class 1 Class 2 Class 3 Class 4 Class 5 Rarity (Inverse Scale)
Great 0.039 0.048 0.093 0.274 0.545 4.495e-07
Excellent 0.019 0.028 0.070 0.269 0.612 2.739e-06
Attractive 0.095 0.125 0.192 0.296 0.292 0.0001
Cute 0.039 0.068 0.155 0.330 0.407 1.499e-05
Compassionate 0.068 0.020 0.010 0.038 0.864 0.0004
Good 0.076 0.095 0.185 0.336 0.308 3.459e-07
Horrible 0.682 0.143 0.076 0.042 0.057 7.453e-06
Ridiculous 0.461 0.180 0.125 0.116 0.118 2.033e-05
Angry 0.546 0.138 0.092 0.098 0.126 6.955e-05
Stupid 0.484 0.136 0.099 0.117 0.164 5.364e-05
Beautiful 0.043 0.049 0.085 0.222 0.599 6.233e-06

For a majority of our domains (five out of seven), we use a threshold of q > −∞,
that is, no adjectives are excluded. For the remaining two domains, (drones and LGBT
rights), we employ a threshold of q > 10−4 .
The trends in our results appeared to be fairly consistent across a reasonable range
of threshold values.

Domain Period of Interest


We define a period of interest for each domain. Let tf be a year in which a documented
framing change took place in the domain under consideration. Then, our period of
interest for this domain is b = min(tf − 10, tf − l) to e = max(tf + 10, tf + r), where the
API provides data up to l years before, and r years after tf . All units are in years.

Corpus-Specific Representations
A domain corpus is a set of news articles from a given domain. Let a given domain have
m years in its period of interest with annual domain corpora T1 , T2 , . . . , Tm .

Corpus Clustering
An overall domain corpus is therefore T = T1 ∪ T2 ∪ . . . ∪ Tm .
We assume that a corpus has k unique frames. We adopt a standard topic modeling
approach to estimate frames. We use the benchmark Latent Dirichlet Allocation
(LDA) [24] approach to model k = 5 topics (that is, frames) in each domain corpus. We
extract the top l = 20 terms v from each frame. We also extract the set of all unique
nouns in T . We define a cluster as the set of nouns v ∩ T . We thus generate k clusters,
each representing a unique frame.

Annual Cluster Polarity


For each cluster c, we are interested in arriving at a vector of m annual polarities, i.e.,
for each year i, 1 ≤ i ≤ m in the domain period of interest.
Let xc be the set of all nouns in c. For each noun v ∈ xc , we use the Stanford
dependency parser [22] to identify all adjectives (without removing duplicates) that
describe v in Ti . We extract the polarity vectors for each of these adjectives from P as
the matrix Ai . Ai therefore has n rows, one for each adjective so identified, and five

March 7, 2020 7/34


March 7, 2020
Table 3. The entries of Table 2 depicted as a heatmap for visual illustration.
Great Excellent Attractive Cute Compassionate Good Horrible Ridiculous Angry Stupid Beautiful
3.9 · 10−2 1.9 · 10−2 9.5 · 10−2 3.9 · 10−2 6.8 · 10−2 7.6 · 10−2 6.82 · 10−1 4.61 · 10−1 5.46 · 10−1 4.84 · 10−1 4.3 · 10−2
4.8 · 10−2 2.8 · 10−2 1.25 · 10−1 6.8 · 10−2 2 · 10−2 9.5 · 10−2 1.43 · 10−1 1.8 · 10−1 1.38 · 10−1 1.36 · 10−1 4.9 · 10−2
−2 −2 −1 −1 −2
9.3 · 10 7 · 10 1.92 · 10 1.55 · 10 1 · 10 1.85 · 10−1 7.6 · 10−2 1.25 · 10−1 9.2 · 10−2 9.9 · 10−2 8.5 · 10−2
−1 −1 −1 −1 −2
2.74 · 10 2.69 · 10 2.96 · 10 3.3 · 10 3.8 · 10 3.36 · 10−1 4.2 · 10−2 1.16 · 10−1 9.8 · 10−2 1.17 · 10−1 2.22 · 10−1
−1 −1 −1 −1 −1
5.45 · 10 6.12 · 10 2.92 · 10 4.07 · 10 8.64 · 10 3.08 · 10−1 5.7 · 10−2 1.18 · 10−1 1.26 · 10−1 1.64 · 10−1 5.99 · 10−1
−7 −6 −4 −5 −4
4.5 · 10 2.74 · 10 1 · 10 1.5 · 10 4 · 10 3.46 · 10−7 7.45 · 10−6 2.03 · 10−5 6.96 · 10−5 5.36 · 10−5 6.23 · 10−6

8/34
columns (see the Polarity of Adjectives section). We estimate the annual cluster polarity
of c as the vector of column-wise averages of Ai . Let Pc = {P1 , P2 , . . . , Pm } be the set
of annual cluster polarities so obtained.
Annual polarities for representative clusters from each of our domains are shown in
figures 11 to 15.

Defining Framing Changes


Since language and human behavior are not strictly deterministic, the measurement of
any temporally disparate pair of news corpora using adjective polarity (or any other
numerical metric) would result in different representative values of the two corpora.
Therefore, in this sense, any pair of news corpora can be said to have undergone a
framing change.
Further, individual metrics are susceptible to noisy readings due to imprecise data
and measurement. In particular, such an effect may cause sudden isolated spikes
between successive measurements. For example, in figure 11, during the period between
2005 and 2006, whereas classes 1, 3, 4, and 5 changed little, class 2 showed a substantial
change.
This motivates the question of how a framing change is defined, in the context of our
computational measurements. The usual social science definition [14] is that a framing
change is a shift in the way that a specific topic is presented to an audience. To isolate
such changes computationally, we use the following key observations from ground truth
framing changes: (i) framing changes take place as trends that are consistent over at
least k years (ii) framing changes must be consistent across multiple measurements.
Our aim in this paper is to begin from a set of time series such as the ones in
figure 11, and isolate such trends. The requirement motivated by our first condition,
namely, that framing changes must last at least k years, is easy to satisfy by imposing
such a numerical threshold.
To satisfy the requirement motivated by our second observation, we rely on
correlations between different measurements, as described in the section below.

Detecting Framing Changes using Periods of Maximum


Correlation
Our five polarity classes serve as measures of framing within a domain. We conceive of
a framing change as a trend, consistent across our five polarity classes, over a period of
some years.
We describe our intuition and approach to detecting framing changes below. Firstly,
we show that the frequency with which adjectives occur in articles varies both by
domain, and in different years within a domain.
Figure 1 depicts the average number of adjectives per article for each of our domains
over the years in their respective periods of interest. We note that this count serves also
as a measure of how subjective news publishing in a domain is, since adjectives are
indicative of how events are framed.
Notice that in the domain LGBT rights, the peak in this measure immediately
precedes a framing change from an earlier study [25]. Whereas we do not claim that this
correlation is true for all domains, we posit that it motivates the utility of adjective
polarity in the study of framing changes.
Despite the fact that the volume of adjectives used per article vary dramatically (by
up to 30%), we find that the variation in our annual cluster polarity between successive
years is generally on the order of less than 1%. However, through a consistent trend
lasting multiple years, our measure of annual polarity can change (increase or decrease)

March 7, 2020 9/34


10

4
Drones
2
2003 2005 2007 2009 2011

1.00 Immigration

0.80

0.60

0.40
2000 2003 2005 2007 2009 2011 2013 2015 2017

4.00

2.00

LGBT Rights
0.00
1996 1999 2002 2005 2008 2011 2014
Fig 1. The average number of adjectives per article, shown for our domains over their
respective periods of interest. This metric serves as a measure of the subjectivity of
news in a domain. Notice that in the domain LGBT rights, the peak in this measure
immediately precedes a framing change identified in an earlier study [25].

March 7, 2020 10/34


20.00

10.00

Obesity
0.00
1990 1993 1996 1999 2002 2005 2008

40.00

30.00 Smoking

20.00

10.00

1990 1993 1996 1999 2002 2005

5.00
Surveillance
4.00

3.00

2.00

2010 2011 2012 2013 2014 2015 2016


Fig 2. The average number of adjectives per article, shown for our domains over their
respective periods of interest. This metric serves as a measure of the subjectivity of
news in a domain.

March 7, 2020 11/34


cumulatively by up to 5% (see figure 11 for an example). We identify a framing change
based on such a cumulative trend.
We now consider the problem of fusing estimates from our five measures of annual
cluster polarity. Consider the change in polarity of classes 1, 3, 4, and 5 between 2005
and 2006 in figure 11, as against the change in class 2. As mentioned earlier, classes 1,
3, 4, and 5 changed little, whereas class 2 showed a substantial change.
In contrast, we note that in the period 2001 to 2013, a consistent trend was
observable across all five classes, with substantial reductions in classes 2 and 3, and a
notable corresponding increase in class 5. We exploit correlations between the changes
in our five classes to identify framing changes.
Accordingly, we use Pearson correlation [26] between our classes as a measure of
trend consistency. Let a given domain have m years in its period of interest. We
generate all possible contiguous subsets of Tm , namely, Ti–j , where i ≤ j ≤ m, and Ti–j
denotes the domain corpus from year i to year j.
Let C =  {C1 , . . . , C5 } be the set of class vectors for this domain subset, where
C1i
C1i+1  i
C1 =   . . . , C1 is the value of class 1 for year i, and similarly for C2 , . . . , C5 .

C1j
To measure the correlation of subset Ti–j , we compute its matrix of correlation
coefficients [27] K. We reshape K into a vector of size f × 1 where f = i ∗ j, and
evaluate its median, l. We find the maximum value of l, lmax , over all possible values of
i and j. We denote the values of i and j corresponding to lmax as imax and jmax . We
return Timax –jmax as our period of maximum correlation (PMC).
We note that the smaller the duration of a PMC, the greater the possibility that our
class vectors may have a high correlation in the period due to random chance. To
compensate for this effect, we employ a threshold whereby a period is not considered as
a candidate for the domain PMC unless it lasts at least y years. We uniformly employ a
value of y = 3 in this paper.
Our approach thus identifies polarity drifts that are both correlated (quantitatively
measured by correlations between different measures of polarity) and sustained (by the
imposition of a threshold of duration). We point out that our approach filters out
isolated drifts in individual polarity measures, since such drifts are uncorrelated across
multiple measures. Further, we note that the magnitude of individual drifts matters
only indirectly to our approach, to the extent that a larger drift, if consistent across
multiple polarity measures, may have higher correlation than a smaller drift that is also
correlated.
A block diagram depicting our overall approach is shown in figure 3.

Quantitative Evaluation
We now discuss a partial quantitative evaluation of our approach using a
Precision-Recall analysis. Our analysis relies on ground truth annotation of framing
changes, as detailed in the section below.
We are unable to conduct a full precision-recall analysis over all domains due to the
limitations we discuss in the following sections, as well as in the Qualitative Analysis
and Discussion section. However, we expect that our partial analysis is representative of
the general performance of the approach.

March 7, 2020 12/34


Domain Corpora

LDA

Seed Datasets Frames

Stanford Parser Compute class frequencies

Adjectives Annual Polarities

Compute class frequencies Compute correlations

Adjective Distribution PMC

Fig 3. A block diagram illustrating our approach. Our adjective distribution is


computed using per class frequencies of occurrence for each adjective in the seed
dataset(s). We use this distribution to compute annual cluster polarities of frames
obtained using LDA from our domain corpora.

Ground Truth Annotation


We label a ground truth for each domain, marking years corresponding to framing
changes as positives, and other years as negatives. We primarily obtain our positives
using the findings of large-scale surveys from earlier research.
In order to do so, we study the literature pertaining to framing changes in the
domains we examine. We identify large-scale studies conducted by reputed
organizations such as the National Cancer Institute (NCI) [28], the Columbia
Journalism Review (CJR) [29], Pew Research [30], and so on. These studies examine
news and media publishing in a particular domain over a period of time, as we do, and
manually identify changes in the framing of domain news during these periods.
The studies we rely on for ground truth sometimes provide quantitative justification
for their findings. For example, the NCI monograph on the framing of smoking news
identifies the number of pro and anti tobacco control frames before and after a framing
change [28]. These studies therefore provide an expert annotation of framing changes in
our domains, for the periods we examine. Details of each study we used and their
findings are reported in the Results section.
By demonstrating substantial agreement between the results of our approach and
those of earlier ground truth surveys, we establish our claim that our approach may be
used to automatically identify framing changes in domain news publishing.

Precision-Recall Analysis
To gain confidence that our approach successfully identifies framing changes, we
conduct a precision-recall analysis on our data. We consider each year in each domain
as a data point in our analysis. We calculate overall precision and recall over all data
points in our domains. We consider a data point a true positive or true negative if both
a ground truth study and our approach labeled it as corresponding to a framing change,
or otherwise, respectively. We refer to a data point that was labeled as a positive (or
negative) by our approach, but which is a negative (or positive) according to the
relevant ground truth survey as a false positive or false negative, respectively.

March 7, 2020 13/34


tp tp
We calculate precision as P = tp+fp and recall as R = tp+fn where tp, fp, and fn are
the numbers of true positives, false positives, and false negatives, respectively.
For some domains, we were unable to identify an earlier survey studying the framing
of news publishing in the domain. We exclude these domains from our precision-recall
analysis. However, we show that in these cases, our estimated PMCs foreshadow events
of substantial public and legislative import. Since framing changes have been shown to
be associated with such public and legislative events [12], we argue that this provides
some measure of validation of our estimated PMCs.

Fig 4. Our estimated clusters for the domain abortion. Each cluster is said to
represent a unique frame. The frame discussed in cluster 1 (characterized by the terms
‘abortion’ and ‘ban’) concerns a proposed ban on abortion. We analyze this cluster, and
find that our estimated PMC (Figure 15) coincides with the period immediately
preceding the Partial Birth Abortion Act of 2003.

Fig 5. Our estimated clusters for the domain drones. Each cluster is said to represent
a unique frame. The frame discussed in cluster 1 concerns the use of drones against
terrorist targets. Our analysis of this cluster returns a PMC of 2009 to 2011 (Figure 17).
Our PMC immediately foreshadows the Federal Aviation Administration’s
Modernization and Reform Act of 2012.

March 7, 2020 14/34


Fig 6. Our estimated clusters for the domain LGBT Rights. Each cluster is said to
represent a unique frame. The frame discussed in cluster 3 discusses the subject of
same-sex marriage, and in particular, judicial interest in this topic. We analyze this
cluster and estimate two PMCs of nearly identical correlation score (2006 to 2008 and
2013 to 2015 Figure 14). The PMC of 2013 to 2015 coincides exactly with the Supreme
Court judgment of 2015 that legalized same-sex marriage in the entire US.

Fig 7. Our estimated clusters for the domain obesity. Each cluster is said to represent
a unique frame. We posit that cluster 2 (characterized by the terms ‘food’, ‘diet’, and
‘make’) represents societal causes of obesity (see the Obesity section). We analyze this
cluster and estimate a PMC of 2005 to 2007 (Figure 13). Our PMC agrees with the
findings of an earlier human survey [2].

March 7, 2020 15/34


Fig 8. Our estimated clusters for the domain smoking. Each cluster is said to represent
a unique frame. The frame of cluster 3, characterized by the terms ‘cancer’ and ’smoke’,
discusses the health risks associated with smoking. We analyze this cluster and estimate
a PMC of 2001 to 2003 (Figure 11). Our PMC coincides exactly with an earlier
monograph from the National Cancer Institute (NCI) that describes a progression
towards tobacco control frames in American media between 2000 and 2003.

Fig 9. Our estimated clusters for the domain surveillance. Each cluster is said to
represent a unique frame. The frame of cluster 3, characterized by the terms ‘national‘,
‘security’, and ‘agency’, discusses the Snowden revelations of 2013. We analyze this
cluster and estimate a PMC of 2013 to 2014 (Figure 12). Our PMC coincides exactly
with the period following the Snowden revelations. Additionally, we note that the
Columbia Journalism Review [29] found that following the Snowden revelations, news
coverage of Surveillance changed to a narrative focusing on individual rights and digital
privacy [12].

March 7, 2020 16/34


Fig 10. Our estimated clusters for the domain Immigration. Each cluster is said to
represent a unique frame. The frame of cluster 2 discusses the waning of asylum grants,
increased border refusals and the final 2002 white paper on “Secure Borders, Safe
Haven.” We analyze this cluster and estimate a PMC of 2000 to 2002 (Figure 16). Our
PMC coincides exactly with the period immediately foreshadowing the government
white paper.

Results
We find that our periods of maximum correlation correlate substantially with framing
changes described in earlier surveys [2, 29, 31, 32], and also foreshadow legislation.
Our computed class vectors are depicted in figures 11 to 15. We discuss each domain
below.

Smoking
The NCI published a monograph discussing the influence of the news media on tobacco
use [28]. On page 337, the monograph describes how, during the period 2001 to 2003,
American news media had progressed towards tobacco control frames. It states that
55% of articles in this period reported progress on tobacco control, whereas only 23%
reported setbacks.
In contrast, the monograph finds (also on page 337) that between 1985 to 1996,
tobacco control frames (11) were fairly well balanced with pro-tobacco frames (10). We
extracted a dataset of over 2,000 articles from 1990 to 2007.
Our approach returns a PMC of 2001 to 2003 (see figure 11) for this domain. Since
no studies cover the period 1997 to 2000 [28], we interpret the findings described in the
monograph to imply that the change towards tobacco control frames predominantly
began in 2000, and ended in 2003. This domain therefore contributes three true
positives (2001 to 2003) and one false negative (2000), with no false positives, to our
precision-recall analysis.

Surveillance
The CJR [29] found that following the Snowden revelations, news coverage of
Surveillance in the US changed to a narrative focusing on individual rights and digital
privacy [12]. We compiled a dataset consisting of approximately 2,000 surveillance
articles from the New York Times for the period 2010 to 2016.

March 7, 2020 17/34


0.260

0.240 Class 1
0.220

0.200

0.180

0.110
Class 2

0.100

0.150

0.140

Class 3
0.130

0.120
0.260

0.240

0.220

Class 4
Class 5
0.340
0.320
0.300
0.280
0.260
1990 1993 1996 1999 2002 2005
Fig 11. Annual polarities for cluster 3, (characterized by the terms ‘cancer’ and
‘smoke’), from Figure 8 from the domain smoking for the classes 1 to 5. The PMC is
shown with solid lines in square markers, and coincides exactly with a framing change
described in an earlier NCI monograph.

March 7, 2020 18/34


The frame of cluster 3 in figure 9, characterized by the terms ‘national‘,‘security’,
and ‘agency’, discusses the Snowden revelations of 2013. We analyze this cluster. Our
class vectors for this domain are shown in figure 12. We obtain a PMC of 2013 to 2014
for this period, corresponding closely with the ground truth framing change.
The trends in our class vectors are indicative of the change. As can be seen from the
figure, positivity (measured by class 5), drops markedly, together with a simultaneous
increase in negativity (class 1) and neutrality (classes 2 and 3). Class 4 remains close to
constant during this period and thus does not affect our hypothesis.
We interpret the findings of [29] to refer primarily to 2013, the year in which the
revelations were made, and the following year, 2014. Whereas other interpretations may
conclude a longer framing change, they must necessarily include this period. This
domain therefore contributes two true positives (2013 and 2014) with no false positives
or negatives to our quantitative evaluation.

Obesity
Kim and Willis [2] found that the framing of obesity news underwent changes between
the years 1997 and 2004. During this period, Kim and Willis found that the fraction of
news frames attributing responsibility for obesity to social causes increased significantly.
Prior to this period, obesity tended to be framed as an issue of individual responsibility.
For example, obesity news after the year 2000 has often criticized food chains for their
excessive use of sugar in fast food, as shown in the NYT snippet in the Introduction and
Contributions section. We compiled a dataset of over 3,000 articles from the New York
Times (since Kim and Willis [2] restrict their study to Americans) from 1990 to 2009.
The clusters we estimate for this domain are shown in Figure 7. Cluster 2 addresses
possible causes of obesity, with a particular focus on dietary habits. We posit that this
cluster represents societal causes more than individual ones (since individual causes, as
shown in the NYT snippet of the Introduction and Contributions section tend to discuss
topics such as fitness and sedentary lifestyles, as opposed to food content). We observe
that the PMC for this domain (2005 to 2007) is characterized by increased positivity,
shown by classes 4 and 5, and decreased negativity (class 1). Our results for this
domain thus agree with the findings of Kim and Willis [2].
We were unable to use this domain in our precision-recall analysis, since Kim and
Willis, to the best of our knowledge, do not specify a precise period during which the
framing change took place.
However, since Figures 2 and 3 of Kim and Willis [2] show a dramatic increase of
social causes in 2004, and a corresponding marked decline of individual causes, we
conclude a substantial agreement between their findings and our results.

LGBT Rights
We compiled a dataset of over 3,000 articles from the period 1996 to 2015 in this domain.
Figure 6 depicts our estimated clusters. Cluster 3 represents a frame that discusses the
subject of same-sex marriage and its legality. We note that the Supreme Court ruled to
legalize same-sex marriages in the US in the year 2015. Our class vectors for this domain
are shown in figure 14. We obtained two PMCs with nearly identical correlation scores
(0.999 for the period 2006 to 2008, and 0.989 for the period 2013 to 2015). Figure 14
highlights the period 2013 to 2015 immediately preceding the judicial interest of 2015.
We were unable to identify a prior study that discusses the framing of LGBT news
over our entire period of interest. However, we use the findings reported in Gainous et
al. [32] as our ground truth for this domain. Gainous et al. studied the framing of
LGBT related publishing in the New York Times over the period 1988 to 2012, and
found a dramatic increase in equality frames between approximately 25 in 2008 and

March 7, 2020 19/34


0.25 Class 1

0.20

0.15

0.14

0.12 Class 2

0.1
0.14

0.12 Class 3

0.1
0.24

0.22
Class 4
0.2

0.18

Class 5
0.35

0.3
2010 2011 2012 2013 2014 2015 2016
Fig 12. Annual polarities for a representative cluster (characterized by the terms
‘national‘, ‘security’, and ‘agency’) from the domain surveillance for the classes 1 to 5.
The PMC is shown with solid lines in square markers.

March 7, 2020 20/34


0.3

Class 1
0.25

0.2

0.120 Class 2

0.100

0.15

0.14
Class 3
0.13

0.12
0.24
Class 4

0.22

0.2
0.36 Class 5

0.34

0.32

0.3

0.28
1990 1993 1996 1999 2002 2005 2008
Fig 13. Annual polarities for cluster 2 (characterized by the terms ‘diet’, ‘food’, and
‘make’) from Figure 7 from the domain obesity for the classes 1 to 5. The PMC is shown
with solid lines in square markers. We posit that this cluster represents societal causes
of obesity (see the Obesity section). We observe that the PMC for this cluster (2005 to
2007) agrees with the findings of Kim and Willis [2].

March 7, 2020 21/34


approximately 110 in 2012. Correspondingly, our findings of Figure 14 show that
between 2008 and 2012, there was a dramatic increase in the measures of classes 4 and 5
(representing positivity), and a marked reduction in the measures of classes 1 and 2,
(representing negativity).
For uniformity, we imposed a threshold of y = 3 years to identify a PMC. As we
explain in the section on Detecting Framing Changes using Periods of Maximum
Correlation, high correlations are less likely for more extended periods. Therefore 2008
and 2012 is unsurprisingly not our PMC at the chosen uniform threshold. However,
given the marked increase in positivity and corresponding decrease in negativity in this
period, we posit that the period has high correlation. Also, whereas we were unable to
find a study covering the period 2013 to 2015 to use as ground truth, we note that our
PMC preceded major judicial interest in the domain.
We therefore rely on Gainous and Rhodebeck [32] for ground truth in this domain,
and note that the trend towards increased positivity (and reduced negativity) in
Figure 14 began in 2008 and ended in 2013. We therefore conclude that our measures
return four true positives, and one false positive for this domain.

Abortion
The Partial-Birth Abortion Ban Act was enacted in 2003. We obtained 248 articles for
the period 2000 to 2003, for this domain. We obtain a PMC of 2001 to 2003 for this
domain, as shown in figure 15.

Immigration
We study the framing of immigration news in the United Kingdom. We obtained about
3,600 articles on the subject of Immigration from the Guardian API for the period 2000
to 2017. For this domain, we carried out our analysis on the article titles (rather than
the full text). Since the Guardian returns full length articles, we found that this design
choice allows us to produce a more focused domain corpus than the one generated by
the full article text. We depict our estimated class vectors and PMC in figure 16.
We analyze the frame of cluster 2 in Figure 10. This cluster deals with the issue of
asylum seekers to the United Kingdom. In the period beginning immediately before the
year 2000,a new peak in asylum claims to the United Kingdom of 76,040 had been
reached [33]. This event coincided with a high-profile terrorist act by a set of Afghan
asylum seekers [33].
These events resulted in increased border refusals and the final 2002 white paper on
“Secure Borders, Safe Haven.” We estimate a PMC of 2000 to 2002 (Figure 16). Our
PMC coincides exactly with the period immediately foreshadowing the government
white paper.

Drones
We obtained nearly 4,000 articles on this domain for the period 2003 to 2012. We
obtain a PMC of 2009 to 2011 for this domain, as shown in Figure 17.
Our PMC immediately foreshadows the Federal Aviation Administration’s
Modernization and Reform Act of 2012.

Predictive Utility
The aforementioned two domains (immigration and drones) highlight the predictive
utility of news framing. Whereas we did not find earlier surveys that coincide with our
PMCs for these domains, we note that these PMCs foreshadowed substantial legislative

March 7, 2020 22/34


0.200
Class 1

0.180

0.160

0.120
Class 2
0.115

0.110

0.105

0.150 Class 3

0.140

0.130
0.250

0.240 Class 4

0.230

0.220

0.350
Class 5
0.340

0.330

1996 1998 2000 2002 2004 2006 2008 2010 2012 2014
Fig 14. Annual polarities for cluster 3, characterized by the terms ‘gay’, ‘rights’, and
‘marriage’, in Figure 6 from the domain LGBT Rights for the classes 1 to 5. We obtain
two PMCs with nearly identical correlation scores, namely, 2006 to 2008 and 2013 to
2015. The PMC of 2013 to 2015 is shown with solid lines in square markers,
immediately preceding the judicial interest of 2015.

March 7, 2020 23/34


0.3
Class 1
0.28

0.26

0.24

0.12

0.11
Class 2
0.10

0.09

0.14
Class 3
0.13

0.12

Class 4
0.22

0.20

0.18

0.30
Class 5
0.28

0.26

2000 2001 2002 2003


Fig 15. Annual polarities for cluster 1 (characterized by the terms ‘abortion’ and ‘ban’)
from Figure 4 from the domain abortion for the classes 1 to 5. The PMC is shown with
solid lines in square markers (2001 to 2003) and foreshadows the partial birth abortion
ban of 2003.

March 7, 2020 24/34


0.500

0.400

0.300

0.200 Class 1

0.15

0.10

0.05 Class 2

0.15

0.10

Class 3

0.25

0.20

0.15 Class 4
0.50

0.40

0.30

0.20 Class 5

2000 2003 2005 2007 2009 2011 2013 2015 2017


Fig 16. Annual polarities for cluster 2 (discussing asylum grants) from Figure 10 from
the domain immigration for the classes 1 to 5. The PMC is shown with solid lines in
square markers, and foreshadows the “Secure Borders, Safe Haven” white paper of 2002.

March 7, 2020 25/34


activity. This observation suggests that PMCs estimated through real-time monitoring
of domain news may yield predictive utility for legislative and commercial activity.

Overall Precision and Recall


We obtain an overall precision of 0.90 as well as a recall of 0.90. Our results
demonstrate that we successfully identify 90% of true positives, whereas only 10% of the
positives we identify are false positives.
Further, we point out that our false positives generally either precede or succeed a
ground truth framing change. Therefore, we posit that such false positives may be due
to imprecision in measurement rather than any considerable failure of our approach.
Our results demonstrate substantial agreement with ground truth in domains for
which prior surveys have studied framing changes. In domains for which we did not find
such surveys, we demonstrate that our PMCs foreshadow periods of substantial public
and legislative import. We posit, therefore, that our approach successfully identifies
framing changes.

Qualitative Analysis and Discussion


Whereas we provide a partial quantitative evaluation of our approach using precision
and recall in the preceding sections, we confront substantial difficulties in uniformly
conducting such evaluations across all of our domains. In this section, we qualitatively
evaluate our results in the context of these limitations.
Framing, and framing changes, have in general been studied from the lens of
identifying a general trend from a particular data source. Studies often describe such
trends without stating a hard beginning or end of the trend period [28, 29].
The periods we analyze here are relatively long (often lasting almost two decades).
The available human surveys, not surprisingly are limited in scope and do not cover the
same periods in their entirety. As a result, we observe missing years during which
computational methods produce a belief that cannot be verified using a human survey,
since no human survey covers such years. Therefore, a quantitative precision-recall style
analysis becomes difficult or impossible to conduct.
Further, the language used in existing studies is not always sufficiently precise as to
interpret a fixed set of positives and negatives. For example, the Columbia Journalism
Review uses the phrase “after the Snowden revelation” to describe the change in media
attitude. The Snowden revelations were made in June 2013. Our estimated PMC is
2013 and 2014.
The NCI monograph on smoking states that between 1985 and 1996, framing was
balanced between pro and anti tobacco control, and in 2001 to 2003, framing favored
tobacco control. Our PMC is 2001 to 2003. However, there are no studies that we know
of that cover the period 1996 to 2000.
Likewise, the Kim and Willis study on Obesity [2] that was published in 2007 states
that the change happened “in recent times,” but shows quantitative measures only until
2004. The measures that Kim and Willis compute show that the frequency of societal
causes increases sharply in 2004. Our PMC over the period 1990 to 2009 (with our
uniformly imposed threshold of y = 3) is 2005 to 2007. The correlation for the period
2004 to 2007 is high but it is not the PMC at a threshold of y = 3.
These examples show that performing a precision-recall analysis based on prior work
is problematic, since obtaining a set of positives and negatives from prior studies in the
sociological and communications literature involves interpretation.
Given that the data sources and coverage between our analysis and that of prior
surveys are usually quite different, the correlations we obtain appear quite substantial.
However, quantitative evaluation remains challenging for the reasons we point out.

March 7, 2020 26/34


0.180
Class 1

0.160

0.140

0.100
Class 2
0.090

Class 3

0.140

0.130

Class 4
0.260

0.240

0.400

0.380
Class 5
0.360

0.340

2003 2005 2007 2009 2011


Fig 17. Annual polarities for cluster 1 (discussing drone strikes) from Figure 5 from
the domain drones for the classes 1 to 5. The PMC is shown with solid lines in square
markers, and immediately foreshadows the Federal Aviation Administration’s
Modernization and Reform Act of 2012. This finding suggests the predictive utility of
framing change detection for legislative activity.

March 7, 2020 27/34


This paper follows the spirit of recent work [11, 13, 15] in seeking to develop the
study of framing into a computational science. We acknowledge that our dataset
collection and methods may undergo refinement to tackle broader ground truth data, of
a wider temporal and geographical scope. Nonetheless, we posit that our methods and
results have scientific value, and hope that future work will provide greater coverage of
ground truth.
Please note that the underlying data preparation requires social science expertise
and cannot be effectively crowdsourced via a platform such as Mechanical Turk. We
therefore hope that our work catches the interest of social scientists and leads them to
pursue more comprehensive studies of framing in news media that would enable
improvements in computational methods.

Conclusion
We highlight a problem of significant public and legislative importance, framing change
detection. We contribute an unsupervised natural language based approach that detects
framing change trends over several years in domain news publishing. We identify a key
characteristic of such changes, namely, that during frame changes, the polarity of
adjectives describing cooccurring nouns changes cumulatively over multiple years. Our
approach agrees with and extends the results of earlier manual surveys. Whereas such
surveys depend on human effort and are therefore limited in scope, our approach is fully
automated and can simultaneously run over all news domains. We contribute the
Framing Changes Dataset, a collection of over 12,000 news articles from seven domains
in which framing has been shown to change by earlier surveys. We will release the
dataset with our paper. Our work suggests the predictive utility of automated news
monitoring, as a means to foreshadow events of commercial and legislative import.
Our work represents one of the first attempts at a computational modeling of
framing and framing changes. We therefore claim that our approach produces promising
results, and that it will serve as a baseline for more sophisticated analysis over wider
temporal and geographical data.

Appendix: Sample Correlations


Correlations for all subsets are shown in Table 4. The PMC is shown in bold.

Table 4. All correlations for cluster 2 of the domain Immigration.


Correlation Start Index End Index
1 1 2
1 2 3
1 3 4
1 4 5
1 5 6
1 6 7
1 7 8
1 8 9
1 9 10
1 10 11
1 11 12
1 12 13
1 13 14
1 14 15
1 15 16

March 7, 2020 28/34


Table 4 All correlations for cluster 2 of the domain Immigration (continued).
Correlation Start Index End Index
1 16 17
1 17 18
0.99 1 3
0.98 10 12
0.98 12 14
0.96 12 15
0.93 12 16
0.93 9 12
0.88 11 13
0.88 9 11
0.87 11 14
0.86 11 16
0.85 11 15
0.85 8 12
0.84 8 10
0.83 7 9
0.82 15 17
0.82 14 16
0.82 10 13
0.81 7 12
0.81 9 13
0.79 10 14
0.79 10 16
0.79 9 14
0.78 9 16
0.78 10 15
0.77 9 15
0.77 8 16
0.77 8 13
0.76 8 15
0.76 8 14
0.75 12 17
0.75 5 7
0.75 12 18
0.74 11 18
0.74 11 17
0.72 6 8
0.72 10 17
0.72 10 18
0.72 7 11
0.72 13 15
0.71 7 10
0.70 8 11
0.69 2 4
0.68 7 13
0.68 4 6
0.67 3 5
0.67 16 18
0.66 7 14
0.66 9 17

March 7, 2020 29/34


Table 4 All correlations for cluster 2 of the domain Immigration (continued).
Correlation Start Index End Index
0.66 9 18
0.66 14 17
0.65 8 17
0.65 7 16
0.65 8 18
0.64 7 15
0.63 6 11
0.62 6 10
0.62 6 12
0.62 6 9
0.61 6 14
0.61 6 13
0.61 5 13
0.61 5 12
0.60 5 8
0.60 5 14
0.60 7 17
0.60 6 16
0.60 4 12
0.59 7 18
0.59 4 13
0.59 5 16
0.59 2 5
0.58 6 15
0.58 5 11
0.58 5 15
0.58 5 10
0.57 5 9
0.57 4 16
0.57 4 14
0.56 2 13
0.56 2 12
0.56 4 17
0.56 15 18
0.56 4 18
0.56 3 13
0.56 3 6
0.56 3 12
0.56 1 5
0.55 4 15
0.55 5 17
0.55 5 18
0.55 3 18
0.55 2 18
0.55 3 17
0.55 2 17
0.54 2 16
0.54 3 16
0.54 6 17
0.54 6 18

March 7, 2020 30/34


Table 4 All correlations for cluster 2 of the domain Immigration (continued).
Correlation Start Index End Index
0.53 2 14
0.53 3 14
0.53 2 15
0.53 3 15
0.52 1 12
0.52 1 13
0.51 1 18
0.51 1 17
0.50 13 17
0.50 4 7
0.49 2 6
0.49 1 16
0.49 1 6
0.48 1 14
0.48 1 15
0.47 13 18
0.46 13 16
0.44 4 8
0.43 1 4
0.42 4 9
0.40 4 10
0.40 4 11
0.39 14 18
0.36 3 7
0.36 2 11
0.36 1 7
0.36 3 11
0.35 3 8
0.36 2 7
0.35 2 8
0.35 2 9
0.34 1 8
0.34 2 10
0.34 3 9
0.34 3 10
0.34 1 11
0.34 1 9
0.32 1 10

Ethics Statement
Our study involved no human or animal subjects.

Funding Statement
CS has a commercial affiliation to Amazon. The funder provided support in the form of
salaries for this author, but did not have any additional role in the study design, data
collection and analysis, decision to publish, or preparation of the manuscript. The
specific roles of these authors are articulated in the ‘author contributions’ section.

March 7, 2020 31/34


Competing Interests Statement
The above commercial affiliation does not alter our adherence to PLOS ONE policies on
sharing data and materials.

Author Contributions
KS and CS conceived the research and designed the method. KS prepared the datasets
and performed the analysis. KS and MPS designed the evaluation approach. KS, CS,
and MPS wrote the paper.

References
1. Gunnars K. Ten Causes of Weight Gain in America; 2015.
https://www.healthline.com/nutrition/10-causes-of-weight-gain#section12.

2. Kim SH, Willis A. Talking about Obesity: News Framing of Who Is Responsible
for Causing and Fixing the Problem. Journal of Health Communication.
2007;12(4):359–376.
3. Flegal K, Carroll M, Kit B, Ogden C. Prevalence of Obesity and Trends in the
Distribution of Body Mass Index Among US Adults, 1999–2010. Journal of the
American Medical Association. 2012;307(5):491–497.

4. Chong D, Druckman J. Framing theory. Annual Reviews on Political Science.


2007;10:103–126.
5. de Vreese C. News framing: Theory and typology. Information Design Journal.
2005;13(1):51–62.

6. Constantin L. Facebook ID Leak hits Millions of Zynga Users; 2010.


http://tinyurl.com/2bqwoxq.
7. Crossley R. Facebook ID Leak hits Millions of Zynga Users; 2011. http://www.
develop-online.net/news/facebook-id-leak-hits-millions-of-zynga-users/0107956.

8. Fitzsimmons C. Facebook And Zynga Sued Over Privacy; 2014.


http://www.adweek.com/digital/facebook-zynga-sued/.
9. US Congress. Personal Data Protection and Breach Accountability Act of 2014;
2014. https://www.congress.gov/bill/113th-congress/senate-bill/1995.
10. Benford R, Snow D. Framing Processes and Social Movements: An Overview and
Assessment. Annual Review of Sociology. 2000;26(1):611–639.
11. Card D, Boydstun A, Justin Gross J, Resnik P, Smith N. The Media Frames
Corpus: Annotations of Frames across Issues. In: Proceedings of the 53rd Annual
Meeting of the Association for Computational Linguistics and the 7th
International Joint Conference on Natural Language Processing (Volume 2: Short
Papers). vol. 2; 2015. p. 438–444.
12. Sheshadri K, Singh MP. The Public and Legislative Impact of
Hyper-Concentrated Topic News. Science advances. 2019;5(8).

March 7, 2020 32/34


13. Tsur O, Calacci D, Lazer D. A Frame of Mind: Using Statistical Models for
Detection of Framing and Agenda Setting Campaigns. In: Proceedings of the
53rd Annual Meeting of the Association for Computational Linguistics and the
7th International Joint Conference on Natural Language Processing (Volume 1:
Long Papers); 2015. p. 1629–1638.
14. Entman R. Framing: Toward Clarification of a Fractured Paradigm. Journal of
Communication. 1993;43(4):51–58.
15. Alashri S, Tsai JY, Alzahrani S, Corman S, Davulcu H. “Climate Change”
Frames Detection and Categorization Based on Generalized Concepts. In:
Proceedings of the 10th IEEE International Conference on Semantic Computing
(ICSC); 2016. p. 277–284.
16. NYT. Developer APIs; 2016. http://developer.nytimes.com/.
17. Wikipedia. The New York Times; 2001.
https://en.wikipedia.org/wiki/The New York Times.
18. The Guardian. Guardian Open Platform; 2016.
http://open-platform.theguardian.com/.
19. Wikipedia. The Guardian; 2002. https://en.wikipedia.org/wiki/The Guardian.
20. King G, Schneer B, White A. How the News Media Activate Public Expression
and Influence National Agendas. Science. 2017;358(6364):776–780.
21. Sheshadri K, Ajmeri N, Staddon J. No Privacy News is Good News: An Analysis
of New York Times and Guardian Privacy News from 2010—2016. In:
Proceedings of the 15th Privacy, Security and Trust Conference. Calgary, Alberta,
Canada; 2017. p. 159–167.
22. Socher R, Perelygin A, Wu J, Chuang J, Manning C, Ng A, et al. Recursive Deep
Models for Semantic Compositionality Over a Sentiment Treebank. In:
Proceedings of the 2013 Empirical Methods in Natural Language Processing
Conference (EMNLP). Seattle, WA: Association for Computational Linguistics;
2013. p. 1631–1642.
23. Melo GD, Bansal M. Good, Great, Excellent: Global Inference of Semantic
Intensities. Transactions of the Association for Computational Linguistics.
2013;1:279–290.
24. Blei D, Ng A, Jordan M. Latent Dirichlet Allocation. Journal of Machine
Learning Research. 2003;3:993–1022.
25. Engel S. Frame Spillover: Media Framing and Public Opinion of a Multifaceted
LGBT Rights Agenda. Law and Social Inquiry. 2013;38:403–441.
26. Benesty J, Chen J, Huang Y, Yiteng C, Cohen I. Pearson Correlation Coefficient.
In: Noise reduction in speech processing. Springer; 2009. p. 1–4.
27. Mathworks. Correlation Coefficients; 2019.
https://www.mathworks.com/help/matlab/ref/corrcoef.html.
28. National Cancer Institute. How the News Media Influence Tobacco Use; 2019.
https://cancercontrol.cancer.gov/brp/tcrb/monographs/19/m19 9.pdf.
29. Vernon P. Five Years Ago, Edward Snowden Changed Journalism; 2018.
https://www.cjr.org/the media today/snowden-5-years.php.

March 7, 2020 33/34


30. Pew Research. The State of Privacy in post-Snowden America; 2016.
31. Cummings M, Proctor R. The Changing Public Image of Smoking in the United
States: 1964–2014. Cancer Epidemiology and Prevention Biomarkers.
2014;23:32–36.
32. Gainous J, Rhodebeck L. Is Same-Sex Marriage an Equality Issue? Framing
Effects Among African Americans. Journal of Black Studies. 2016;47(7):682–700.
33. Wikipedia. History of UK immigration control; 2019.
https://en.wikipedia.org/wiki/History of UK immigration control.

March 7, 2020 34/34


Latex file

Click here to access/download


Other
Framing-changes-mps-v2.tex
Revised Manuscript with Track Changes

Detecting Framing Changes in Topical News Publishing


Karthik Sheshadri1* , Chaitanya Shivade2 , Munindar P. Singh1 ,

1 Department of Computer Science, North Carolina State University


2 Amazon

* kshesha@ncsu.edu

Abstract
Changes in the framing of topical news have been shown to foreshadow significant
public, legislative, and commercial events. Automated detection of framing changes is
therefore an important problem, which existing research has not considered. Previous
approaches are manual surveys, which rely on human effort and are consequently
limited in scope. We make the following contributions. We systematize discovery of
framing changes through a fully unsupervised computational method that seeks to
isolate framing change trends over several years. We demonstrate our approach by
isolating framing change periods that correlate with previously known framing changes.
We have prepared a new dataset, consisting of over 12,000 articles from seven news
topics or domains in which earlier surveys have found framing changes. Finally, our
work highlights the predictive utility of framing change detection, by identifying two
domains in which framing changes foreshadowed substantial legislative activity, or
preceded judicial interest.

“For nearly four decades, health and


fitness experts have prodded and
cajoled and used other powers of
persuasion in a futile attempt to
whip America’s youngsters into
shape.”

The New York Times, 1995

“The New York City Health


Department has embarked on a new
campaign to persuade processed
food brands to decrease sugar
content in a bid to curb obesity.”

The New York Times, 2015

Introduction and Contributions


To motivate the problem and approach of this paper, let us investigate the primary
causes of obesity in America. Public opinion and behavior on the subject have changed
measurably since the late 1990s. As an example, Gunnars [1] compiled a list in 2015 of
ten leading causes, six of which suggest that the processed food industry may be
primarily responsible. By contrast, in the 1990s and early 2000s, popular opinion
appeared to hold [2, 3] that obesity was primarily caused by individual behavior and
lifestyle choices. What led to this change in public opinion?

March 3, 2020 1/34


We posit that news publishing on the subject of obesity contributed to the change in
the public’s opinion. The above quotes from the New York Times (NYT) are
representative snippets from news articles on obesity published in 1995 and 2015,
respectively. Whereas both address the same topic, the 1995 snippet implies
responsibility on part of individuals, and the 2015 snippet implies responsibility on part
of the processed food industry. These subjective biases in news are collectively referred
to as framing.
Framing theory [4, 5] suggests that how a topic is presented to the audience (called
“the frame”) influences the choices people make about how to process that information.
The central premise of the theory is that since an issue can be viewed from a variety of
perspectives and be construed as having varying implications, the manner in which it is
presented influences public reaction.
In general, understanding news framing may be a crucial component of
decision-support in a corporate and regulatory setting. To illustrate this fact, we
present a real-life example of the influence of framing on public perception and
legislation. According to [6, 7], in late 2010, security vulnerabilities associated with
Facebook and Zynga allowed personal data from millions of users to be compromised.
The framing of news on this topic appeared to change from a neutral narrative to one
focusing on personal privacy.
Facebook and Zynga were sued over privacy breaches [8]. Further, in 2013, the
Personal Data Protection and Breach Accountability Act was promulgated in
Congress [9]. These examples motivate the problem of framing change detection, which
involves identifying when the dominant frame (or frames) [10] of a topic undergoes a
change.

Related Work
The Media Frames Corpus, compiled by Card et al. [11], studies three topics
(Immigration, Smoking, and same-sex marriages), and identifies fifteen framing
dimensions in each. We identify two major limitations of their work. Firstly, Card et al.
study framing as a static detection problem, identifying which dimensions appear in a
given news article. However, research in sociology [10] shows that most news topics
feature a dominant frame (or dominant dimension in the terminology of [11]). Further,
for a generic news topic, the dominant frame is not necessarily one of fifteen previously
chosen dimensions, but can instead be an unknown arbitrary frame specific to the topic
under consideration. For example, in the example given in the Introduction and
Contributions section, the dominant frame related to the privacy of individuals, which is
not one of the fifteen dimensions described in Card et al. [11].
Secondly, Sheshadri and Singh [12] showed that public and legislative reaction tend
to occur only after changes in the dominant frame. That finding motivates an approach
to framing that focuses on identifying and detecting changes in the dominant frame of a
news domain.
Sheshadri and Singh further propose two simple metrics that they motivate as
measures of domain framing: framing polarity and density. They define framing polarity
as the average frequency of occurrence in a domain corpus of terms from a benchmark
sentiment lexicon. Framing density is measured using an entropic approach that counts
the number of terms per article required to distinguish a current corpus from an earlier
one.
We identify the following limitations of the aforementioned measures (introduced
in [12]). Firstly, both measures make no effort to associate a given news article with a
particular frame. Prior work does not support the inherent assumption that all articles
in a given domain belong to a particular frame [10, 11]. We enhance understanding by
analyzing each domain using several distinct frames.

March 3, 2020 2/34


Secondly, framing density does not distinguish between a subjective choice made by
a news outlet to frame a domain differently, and events that necessitate media coverage.
Our work provides this distinction by analyzing framing using patterns of change in the
adjectives that describe co-occurring nouns. Since adjectives are artifacts of subjective
framing, they are not affected by events, as framing density is.
It is worthwhile to note that our approach is similar in spirit to Tsur et al.’s
work [13], in that both that work and this paper apply a topic modeling strategy to
analyze framing as a time series. However, we highlight the following key differences
and contributions of our work. Firstly, as both Sheshadri and Singh [12] and Tsur et
al. [13] point out, framing is a subjective aspect of communication. Therefore, a
computational analysis of framing should ideally differentiate subjective aspects from
fact-based and objective components of communication. Since adjectives in and of
themselves are incapable of communicating factual information, we take them to be
artifacts of how an event or topic is framed. In contrast, generic n-grams (as used by
Tsur et al. [13]) do not provide this distinction.
Further, Tsur et al. rely upon estimating “changes in framing” using changes in the
relative frequencies of n-grams associated with various topics or frames. Whereas such
an approach is useful in evaluating which of a set of frames may be dominant at any
given time, it does not measure “framing changes” in the sense originally described
in [14]. In contrast, our work estimates changes in framing using consistent polarity
drifts of adjectives associated with individual frames. Our approach may also be applied
to each of a number of frames independently of the others, as opposed to Tsur et al. [13].
We also distinguish our work from Alashri et al. [15], which uses standard machine
learning tools to classify sentences represented by linguistic features into one of four
frames. Such an approach is limited by the need to predefine a frame set, as Card etal’s
approach [11] is. Further, the paper does not discuss the problem of how to examine
changes within specific frames. To the best of our knowledge, our work is the first to
address the problem of detecting meaningful patterns of change within individual
frames.

Contributions
This paper contributes a fully unsupervised and data-driven natural language based
approach to detecting framing change trends over several years in domain news
publishing. To the best of our knowledge, this paper is the first to address framing
change detection, a problem of significant public and legislative import. Our approach
agrees with and extends the results of earlier manual surveys, which required human
data collection and were consequently limited in scope. Our approach removes this
restriction by being fully automated. Our method can thus be run simultaneously over
all news domains, limited only by the availability of real-time news data. Further, we
show that our approach yields results that foreshadow periods of legislative activity.
This motivates the predictive utility of our method for legislative activity, a problem of
significant import.
Further, we contribute a Framing Changes Dataset, which is a collection of over
12,000 news articles from seven news topics or domains. In four of these domains,
surveys carried out in earlier research have shown framing to change. In two domains,
periods with significant legislative activity are considered. Our individual domain
datasets within the framing changes dataset cover the years in which earlier research
found framing changes, as well as periods ranging up to ten years before and after the
change. Our dataset is the first to enable computational modeling of framing change
trends. We plan to release the dataset with our paper. We note that a fraction of the
articles in this dataset were used earlier for the analysis in [12].

March 3, 2020 3/34


Materials and Methods
This section describes our datasets, data sources, and inter-annotator agreement. All
data were collected in an anonymous and aggregated manner. All APIs and data used
are publicly available, and our data collection complies with the terms and conditions of
each API.

Data Sources
We use two Application Programming Interfaces (APIs) to create our datasets.

The New York Times API:


The New York Times (NYT) Developer’s API [16] provides access to news data from
the NYT, Reuters, and Associated Press (AP) newspapers—both print and online
versions—beginning in 1985. The NYT has the second largest circulation of any
newspaper in the United States [17].
The data object returned by the API includes fields such as the article type (news,
reviews, summaries, and so on), the news source (NYT, Reuters, or AP), the article’s
word count, the date of its publication, article text (in the form of the abstract, the lead
(first) paragraph, and a summary).

The Guardian API:


The Guardian Xplore API [18] provides access to news data from The Guardian, a
prominent UK newspaper that reaches 23 million UK adults per month [19].
The Guardian API returns full-length articles along with such metadata as the
article type (similar to the NYT API) and a general section name (such as sports,
politics, and so on). Although these section names are manually annotated by humans,
we do not use them in our analysis, but rely instead on a simple term search procedure
(see the Domain Dataset Generation section) to annotate our datasets.

Domain Dataset Generation


As in earlier work [12, 20, 21], we use a standard term search procedure to create our
datasets. Specifically, an article belongs to a domain if at least a component of the
article discusses a topic that is directly relevant to the domain [12]. We term articles
that are relevant to a domain domain positives, and irrelevant articles domain negatives.
As an example, consider the following article from the domain smoking: “The dirty gray
snow was ankle deep on West 18th Street the other day, and on the block between Ninth
and Tenth Avenues, a cold wind blew in off the Hudson River. On the south side of the
street, a mechanic stood in front of his garage smoking a . . . .” We consider this article
a domain negative since whereas it contains the keyword ’smoking’, it does not discuss
any aspect pertaining to the prevalence or control of Smoking. In contrast, the article
“An ordinance in Bangor, Maine, allows the police to stop cars if an adult is smoking
while a child under 18 is a passenger” is directly relevant to the domain, and is therefore
considered a domain positive. We define dataset accuracy as the fraction of articles in a
dataset that are domain positive. For each domain, our APIs were used to extract news
data during the time period b (denoting the beginning) to e (denoting the end), of the
period of interest.

March 3, 2020 4/34


Inter-Annotator Agreement
To ensure that the articles returned by our term search procedure are indeed relevant to
each domain, a random sample of articles from each domain dataset was coded by two
raters. We supply the per-domain accuracy and inter-annotator agreement as Cohen’s
Kappa for sample domains in Table 1.

Table 1. Inter-Annotator agreement as Cohen’s Kappa.


Dataset Accuracy
Domain Coder 1 Coder 2 Kappa
Surveillance 0.80 0.75 0.79
Smoking 0.84 0.82 0.93
Obesity 0.78 0.74 0.67
LGBT Rights 0.83 0.74 0.64
Abortion 0.80 0.80 0.50

Probability Distribution over Adjectives


Our approach relies on the key intuition that during a framing change, the valence of
the adjectives describing co-occurring nouns changes significantly.
To measure this change, we create a reference probability distribution of adjectives
based on the frequency of their occurrence in benchmark sentiment datasets.

Benchmark Datasets
We identified three open source benchmark review datasets from which to create our
adjective probability distribution. Together, these datasets provide about 150 million
reviews of various restaurants, services and products, with each review rated from one
to five. Given the large volume of reviews from different sources made available by these
datasets, we assume that they provide a sufficiently realistic representation of all
adjectives in the English language.
We rely primarily on the Trip Advisor dataset to create our adjective probability
distribution. We identified two other benchmark datasets, namely, the Yelp Challenge
dataset and the Amazon review dataset. Due to the fact that these datasets together
comprise about 150 million reviews, it is computationally infeasible for us to include
them in our learning procedure. Instead, we learned distributions from these datasets
for sample adjectives, to serve as a comparison with and as verification of our overall
learned distribution. The resulting distributions for these adjectives appeared
substantially similar to those of the corresponding adjectives in our learned distribution.
We therefore conclude that our learned distribution provides a valid representation of all
adjectives in the English language. We describe each dataset below.

Trip Advisor
The Trip Advisor dataset consists of 236,000 hotel reviews. Each review provides text,
an overall rating, and aspect specific ratings for the following seven aspects: Rooms,
Cleanliness, Value, Service, Location, Checkin, and Business. We limit ourselves to
using the overall rating of each review.

March 3, 2020 5/34


Yelp
The Yelp challenge dataset consists of approximately six million restaurant reviews.
Each entry is stored as a JSON string with a unique user ID, check-in data, review text,
and rating.

Amazon
The Amazon dataset provides approximately 143 million reviews from 24 product
categories such as Books, Electronics, Movies, and so on. The dataset uses the JSON
format and includes reviews comprising a rating, review text, and helpfulness votes.
Additionally, the JSON string encodes product metadata such as a product description,
category information, price, brand, and image features.

Polarity of Adjectives
For each adjective in the English language, we are interested in producing a probability
distribution that describes the relative likelihood of the adjective appearing in a review
whose rating is r. For our data, r ranges from one to five.
We began by compiling a set of reviews from the Trip Advisor dataset for each
rating from one to five. We used the Stanford CoreNLP parser [22] to parse each of the
five sets of reviews so obtained. We thus obtained sets of parses corresponding to each
review set. From the set of resultant parses, we extracted all words that were assigned a
part-of-speech of ‘JJ’ (adjective). Our search identified 454,281 unique adjectives.
For each unique adjective a, we counted the number of times it occurred in our set of
parses corresponding to review ratings one to five. We denote this by Ni , with
N1 N2 N5
1 ≤ i ≤ 5. Our probability vector for adjective a is then { Saa , Saa , . . . , Saa } where
Sa = Na1 + Na2 + Na3 + Na4 + Na5 .
Additionally, we recorded the rarity of each adjective as S1a . This estimates a
probability distribution P , with 454,281 rows and six columns.
Table 2 shows example entries from our learned probability distribution. As can be
seen from the table, our learned distribution not only correctly encodes probabilities
(the adjective ‘great’ has nearly 80% of its probability mass in the classes four and five,
whereas the adjective ‘horrible’ has nearly 80% of its mass in classes one and two), but
also implicitly learns an adjective ranking such as the one described in De Melo et
al. [23]. To illustrate this ranking, consider that the adjective ‘excellent’ has 60% of its
probability mass in class five, whereas the corresponding mass for the adjective ‘good’ is
only 38%.
For visual illustration, we depict our learned probability distribution as a heatmap in
Table 3.
Motivated by our learned probability distribution, we posit that classes 1 represents
negativity, class 2 to 4 represent neutrality, and class 5 represents positivity.

Incorporating Adjective Rarity


Our measure of adjective rarity serves as a method by which uncommon adjectives,
which rarely occur in our benchmark dataset, and whose learned probability
distributions may therefore be unreliable, can be excluded.
However, in doing so, we run the risk of excluding relevant adjectives from the
analysis. We manually inspect the set of adjectives that describe the nouns in each
domain to arrive at a domain specific threshold.

March 3, 2020 6/34


Table 2. Sample entries from our learned probability distribution for positive and
negative sentiment adjectives.
Adjective Class 1 Class 2 Class 3 Class 4 Class 5 Rarity (Inverse Scale)
Great 0.039 0.048 0.093 0.274 0.545 4.495e-07
Excellent 0.019 0.028 0.070 0.269 0.612 2.739e-06
Attractive 0.095 0.125 0.192 0.296 0.292 0.0001
Cute 0.039 0.068 0.155 0.330 0.407 1.499e-05
Compassionate 0.068 0.020 0.010 0.038 0.864 0.0004
Good 0.076 0.095 0.185 0.336 0.308 3.459e-07
Horrible 0.682 0.143 0.076 0.042 0.057 7.453e-06
Ridiculous 0.461 0.180 0.125 0.116 0.118 2.033e-05
Angry 0.546 0.138 0.092 0.098 0.126 6.955e-05
Stupid 0.484 0.136 0.099 0.117 0.164 5.364e-05
Beautiful 0.043 0.049 0.085 0.222 0.599 6.233e-06

For a majority of our domains (five out of seven), we use a threshold of q > −∞,
that is, no adjectives are excluded. For the remaining two domains, (drones and LGBT
rights), we employ a threshold of q > 10−4 .
The trends in our results appeared to be fairly consistent across a reasonable range
of threshold values.

Domain Period of Interest


We define a period of interest for each domain. Let tf be a year in which a documented
framing change took place in the domain under consideration. Then, our period of
interest for this domain is b = min(tf − 10, tf − l) to e = max(tf + 10, tf + r), where the
API provides data up to l years before, and r years after tf . All units are in years.

Corpus-Specific Representations
A domain corpus is a set of news articles from a given domain. Let a given domain have
m years in its period of interest with annual domain corpora T1 , T2 , . . . , Tm .

Corpus Clustering
An overall domain corpus is therefore T = T1 ∪ T2 ∪ . . . ∪ Tm .
We assume that a corpus has k unique frames. We adopt a standard topic modeling
approach to estimate frames. We use the benchmark Latent Dirichlet Allocation
(LDA) [24] approach to model k = 5 topics (that is, frames) in each domain corpus. We
extract the top l = 20 terms v from each frame. We also extract the set of all unique
nouns in T . We define a cluster as the set of nouns v ∩ T . We thus generate k clusters,
each representing a unique frame.

Annual Cluster Polarity


For each cluster c, we are interested in arriving at a vector of m annual polarities, i.e.,
for each year i, 1 ≤ i ≤ m in the domain period of interest.
Let xc be the set of all nouns in c. For each noun v ∈ xc , we use the Stanford
dependency parser [22] to identify all adjectives (without removing duplicates) that
describe v in Ti . We extract the polarity vectors for each of these adjectives from P as
the matrix Ai . Ai therefore has n rows, one for each adjective so identified, and five

March 3, 2020 7/34


March 3, 2020
Table 3. The entries of Table 2 depicted as a heatmap for visual illustration.
Great Excellent Attractive Cute Compassionate Good Horrible Ridiculous Angry Stupid Beautiful
3.9 · 10−2 1.9 · 10−2 9.5 · 10−2 3.9 · 10−2 6.8 · 10−2 7.6 · 10−2 6.82 · 10−1 4.61 · 10−1 5.46 · 10−1 4.84 · 10−1 4.3 · 10−2
4.8 · 10−2 2.8 · 10−2 1.25 · 10−1 6.8 · 10−2 2 · 10−2 9.5 · 10−2 1.43 · 10−1 1.8 · 10−1 1.38 · 10−1 1.36 · 10−1 4.9 · 10−2
−2 −2 −1 −1 −2
9.3 · 10 7 · 10 1.92 · 10 1.55 · 10 1 · 10 1.85 · 10−1 7.6 · 10−2 1.25 · 10−1 9.2 · 10−2 9.9 · 10−2 8.5 · 10−2
−1 −1 −1 −1 −2
2.74 · 10 2.69 · 10 2.96 · 10 3.3 · 10 3.8 · 10 3.36 · 10−1 4.2 · 10−2 1.16 · 10−1 9.8 · 10−2 1.17 · 10−1 2.22 · 10−1
−1 −1 −1 −1 −1
5.45 · 10 6.12 · 10 2.92 · 10 4.07 · 10 8.64 · 10 3.08 · 10−1 5.7 · 10−2 1.18 · 10−1 1.26 · 10−1 1.64 · 10−1 5.99 · 10−1
−7 −6 −4 −5 −4
4.5 · 10 2.74 · 10 1 · 10 1.5 · 10 4 · 10 3.46 · 10−7 7.45 · 10−6 2.03 · 10−5 6.96 · 10−5 5.36 · 10−5 6.23 · 10−6

8/34
columns (see the Polarity of Adjectives section). We estimate the annual cluster polarity
of c as the vector of column-wise averages of Ai . Let Pc = {P1 , P2 , . . . , Pm } be the set
of annual cluster polarities so obtained.
Annual polarities for representative clusters from each of our domains are shown in
figures 11 to 15.

Defining Framing Changes


Since language and human behavior are not strictly deterministic, the measurement of
any temporally disparate pair of news corpora using adjective polarity (or any other
numerical metric) would result in different representative values of the two corpora.
Therefore, in this sense, any pair of news corpora can be said to have undergone a
framing change.
Further, individual metrics are susceptible to noisy readings due to imprecise data
and measurement. In particular, such an effect may cause sudden isolated spikes
between successive measurements. For example, in figure 11, during the period between
2005 and 2006, whereas classes 1, 3, 4, and 5 changed little, class 2 showed a substantial
change.
This motivates the question of how a framing change is defined, in the context of our
computational measurements. The usual social science definition [14] is that a framing
change is a shift in the way that a specific topic is presented to an audience. To isolate
such changes computationally, we use the following key observations from ground truth
framing changes: (i) framing changes take place as trends that are consistent over at
least k years (ii) framing changes must be consistent across multiple measurements.
Our aim in this paper is to begin from a set of time series such as the ones in
figure 11, and isolate such trends. The requirement motivated by our first condition,
namely, that framing changes must last at least k years, is easy to satisfy by imposing
such a numerical threshold.
To satisfy the requirement motivated by our second observation, we rely on
correlations between different measurements, as described in the section below.

Detecting Framing Changes using Periods of Maximum


Correlation
Our five polarity classes serve as measures of framing within a domain. We conceive of
a framing change as a trend, consistent across our five polarity classes, over a period of
some years.
We describe our intuition and approach to detecting framing changes below. Firstly,
we show that the frequency with which adjectives occur in articles varies both by
domain, and in different years within a domain.
Figure 1 depicts the average number of adjectives per article for each of our domains
over the years in their respective periods of interest. We note that this count serves also
as a measure of how subjective news publishing in a domain is, since adjectives are
indicative of how events are framed.
Notice that in the domain LGBT rights, the peak in this measure immediately
precedes a framing change from an earlier study [25]. Whereas we do not claim that this
correlation is true for all domains, we posit that it motivates the utility of adjective
polarity in the study of framing changes.
Despite the fact that the volume of adjectives used per article vary dramatically (by
up to 30%), we find that the variation in our annual cluster polarity between successive
years is generally on the order of less than 1%. However, through a consistent trend
lasting multiple years, our measure of annual polarity can change (increase or decrease)

March 3, 2020 9/34


10

4
Drones
2
2003 2005 2007 2009 2011

1.00 Immigration

0.80

0.60

0.40
2000 2003 2005 2007 2009 2011 2013 2015 2017

4.00

2.00

LGBT Rights
0.00
1996 1999 2002 2005 2008 2011 2014
Fig 1. The average number of adjectives per article, shown for our domains over their
respective periods of interest. This metric serves as a measure of the subjectivity of
news in a domain. Notice that in the domain LGBT rights, the peak in this measure
immediately precedes a framing change identified in an earlier study [25].

March 3, 2020 10/34


20.00

10.00

Obesity
0.00
1990 1993 1996 1999 2002 2005 2008

40.00

30.00 Smoking

20.00

10.00

1990 1993 1996 1999 2002 2005

5.00
Surveillance
4.00

3.00

2.00

2010 2011 2012 2013 2014 2015 2016


Fig 2. The average number of adjectives per article, shown for our domains over their
respective periods of interest. This metric serves as a measure of the subjectivity of
news in a domain.

March 3, 2020 11/34


cumulatively by up to 5% (see figure 11 for an example). We identify a framing change
based on such a cumulative trend.
We now consider the problem of fusing estimates from our five measures of annual
cluster polarity. Consider the change in polarity of classes 1, 3, 4, and 5 between 2005
and 2006 in figure 11, as against the change in class 2. As mentioned earlier, classes 1,
3, 4, and 5 changed little, whereas class 2 showed a substantial change.
In contrast, we note that in the period 2001 to 2013, a consistent trend was
observable across all five classes, with substantial reductions in classes 2 and 3, and a
notable corresponding increase in class 5. We exploit correlations between the changes
in our five classes to identify framing changes.
Accordingly, we use Pearson correlation [26] between our classes as a measure of
trend consistency. Let a given domain have m years in its period of interest. We
generate all possible contiguous subsets of Tm , namely, Ti–j , where i ≤ j ≤ m, and Ti–j
denotes the domain corpus from year i to year j.
Let C =  {C1 , . . . , C5 } be the set of class vectors for this domain subset, where
C1i
C1i+1  i
C1 =   . . . , C1 is the value of class 1 for year i, and similarly for C2 , . . . , C5 .

C1j
To measure the correlation of subset Ti–j , we compute its matrix of correlation
coefficients [27] K. We reshape K into a vector of size f × 1 where f = i ∗ j, and
evaluate its median, l. We find the maximum value of l, lmax , over all possible values of
i and j. We denote the values of i and j corresponding to lmax as imax and jmax . We
return Timax –jmax as our period of maximum correlation (PMC).
We note that the smaller the duration of a PMC, the greater the possibility that our
class vectors may have a high correlation in the period due to random chance. To
compensate for this effect, we employ a threshold whereby a period is not considered as
a candidate for the domain PMC unless it lasts at least y years. We uniformly employ a
value of y = 3 in this paper.
Our approach thus identifies polarity drifts that are both correlated (quantitatively
measured by correlations between different measures of polarity) and sustained (by the
imposition of a threshold of duration). We point out that our approach filters out
isolated drifts in individual polarity measures, since such drifts are uncorrelated across
multiple measures. Further, we note that the magnitude of individual drifts matters
only indirectly to our approach, to the extent that a larger drift, if consistent across
multiple polarity measures, may have higher correlation than a smaller drift that is also
correlated.
A block diagram depicting our overall approach is shown in figure 3.

Quantitative Evaluation
We now discuss a partial quantitative evaluation of our approach using a
Precision-Recall analysis. Our analysis relies on ground truth annotation of framing
changes, as detailed in the section below.
We are unable to conduct a full precision-recall analysis over all domains due to the
limitations we discuss in the following sections, as well as in the Qualitative Analysis
and Discussion section. However, we expect that our partial analysis is representative of
the general performance of the approach.

March 3, 2020 12/34


Domain Corpora

LDA

Seed Datasets Frames

Stanford Parser Compute class frequencies

Adjectives Annual Polarities

Compute class frequencies Compute correlations

Adjective Distribution PMC

Fig 3. A block diagram illustrating our approach. Our adjective distribution is


computed using per class frequencies of occurrence for each adjective in the seed
dataset(s). We use this distribution to compute annual cluster polarities of frames
obtained using LDA from our domain corpora.

Ground Truth Annotation


We label a ground truth for each domain, marking years corresponding to framing
changes as positives, and other years as negatives. We primarily obtain our positives
using the findings of large-scale surveys from earlier research.
In order to do so, we study the literature pertaining to framing changes in the
domains we examine. We identify large-scale studies conducted by reputed
organizations such as the National Cancer Institute (NCI) [28], the Columbia
Journalism Review (CJR) [29], Pew Research [30], and so on. These studies examine
news and media publishing in a particular domain over a period of time, as we do, and
manually identify changes in the framing of domain news during these periods.
The studies we rely on for ground truth sometimes provide quantitative justification
for their findings. For example, the NCI monograph on the framing of smoking news
identifies the number of pro and anti tobacco control frames before and after a framing
change [28]. These studies therefore provide an expert annotation of framing changes in
our domains, for the periods we examine. Details of each study we used and their
findings are reported in the Results section.
By demonstrating substantial agreement between the results of our approach and
those of earlier ground truth surveys, we establish our claim that our approach may be
used to automatically identify framing changes in domain news publishing.

Precision-Recall Analysis
To gain confidence that our approach successfully identifies framing changes, we
conduct a precision-recall analysis on our data. We consider each year in each domain
as a data point in our analysis. We calculate overall precision and recall over all data
points in our domains. We consider a data point a true positive or true negative if both
a ground truth study and our approach labeled it as corresponding to a framing change,
or otherwise, respectively. We refer to a data point that was labeled as a positive (or
negative) by our approach, but which is a negative (or positive) according to the
relevant ground truth survey as a false positive or false negative, respectively.

March 3, 2020 13/34


tp tp
We calculate precision as P = tp+fp and recall as R = tp+fn where tp, fp, and fn are
the numbers of true positives, false positives, and false negatives, respectively.
For some domains, we were unable to identify an earlier survey studying the framing
of news publishing in the domain. We exclude these domains from our precision-recall
analysis. However, we show that in these cases, our estimated PMCs foreshadow events
of substantial public and legislative import. Since framing changes have been shown to
be associated with such public and legislative events [12], we argue that this provides
some measure of validation of our estimated PMCs.

Fig 4. Our estimated clusters for the domain abortion. Each cluster is said to
represent a unique frame. The frame discussed in cluster 1 (characterized by the terms
‘abortion’ and ‘ban’) concerns a proposed ban on abortion. We analyze this cluster, and
find that our estimated PMC (Figure 15) coincides with the period immediately
preceding the Partial Birth Abortion Act of 2003.

Fig 5. Our estimated clusters for the domain drones. Each cluster is said to represent
a unique frame. The frame discussed in cluster 1 concerns the use of drones against
terrorist targets. Our analysis of this cluster returns a PMC of 2009 to 2011 (Figure 17).
Our PMC immediately foreshadows the Federal Aviation Administration’s
Modernization and Reform Act of 2012.

March 3, 2020 14/34


Fig 6. Our estimated clusters for the domain LGBT Rights. Each cluster is said to
represent a unique frame. The frame discussed in cluster 3 discusses the subject of
same-sex marriage, and in particular, judicial interest in this topic. We analyze this
cluster and estimate two PMCs of nearly identical correlation score (2006 to 2008 and
2013 to 2015 Figure 14). The PMC of 2013 to 2015 coincides exactly with the Supreme
Court judgment of 2015 that legalized same-sex marriage in the entire US.

Fig 7. Our estimated clusters for the domain obesity. Each cluster is said to represent
a unique frame. We posit that cluster 2 (characterized by the terms ‘food’, ‘diet’, and
‘make’) represents societal causes of obesity (see the Obesity section). We analyze this
cluster and estimate a PMC of 2005 to 2007 (Figure 13). Our PMC agrees with the
findings of an earlier human survey [2].

March 3, 2020 15/34


Fig 8. Our estimated clusters for the domain smoking. Each cluster is said to represent
a unique frame. The frame of cluster 3, characterized by the terms ‘cancer’ and ’smoke’,
discusses the health risks associated with smoking. We analyze this cluster and estimate
a PMC of 2001 to 2003 (Figure 11). Our PMC coincides exactly with an earlier
monograph from the National Cancer Institute (NCI) that describes a progression
towards tobacco control frames in American media between 2000 and 2003.

Fig 9. Our estimated clusters for the domain surveillance. Each cluster is said to
represent a unique frame. The frame of cluster 3, characterized by the terms ‘national‘,
‘security’, and ‘agency’, discusses the Snowden revelations of 2013. We analyze this
cluster and estimate a PMC of 2013 to 2014 (Figure 12). Our PMC coincides exactly
with the period following the Snowden revelations. Additionally, we note that the
Columbia Journalism Review [29] found that following the Snowden revelations, news
coverage of Surveillance changed to a narrative focusing on individual rights and digital
privacy [12].

March 3, 2020 16/34


Fig 10. Our estimated clusters for the domain Immigration. Each cluster is said to
represent a unique frame. The frame of cluster 2 discusses the waning of asylum grants,
increased border refusals and the final 2002 white paper on “Secure Borders, Safe
Haven.” We analyze this cluster and estimate a PMC of 2000 to 2002 (Figure 16). Our
PMC coincides exactly with the period immediately foreshadowing the government
white paper.

Results
We find that our periods of maximum correlation correlate substantially with framing
changes described in earlier surveys [2, 29, 31, 32], and also foreshadow legislation.
Our computed class vectors are depicted in figures 11 to 15. We discuss each domain
below.

Smoking
The NCI published a monograph discussing the influence of the news media on tobacco
use [28]. On page 337, the monograph describes how, during the period 2001 to 2003,
American news media had progressed towards tobacco control frames. It states that
55% of articles in this period reported progress on tobacco control, whereas only 23%
reported setbacks.
In contrast, the monograph finds (also on page 337) that between 1985 to 1996,
tobacco control frames (11) were fairly well balanced with pro-tobacco frames (10). We
extracted a dataset of over 2,000 articles from 1990 to 2007.
Our approach returns a PMC of 2001 to 2003 (see figure 11) for this domain. Since
no studies cover the period 1997 to 2000 [28], we interpret the findings described in the
monograph to imply that the change towards tobacco control frames predominantly
began in 2000, and ended in 2003. This domain therefore contributes three true
positives (2001 to 2003) and one false negative (2000), with no false positives, to our
precision-recall analysis.

Surveillance
The CJR [29] found that following the Snowden revelations, news coverage of
Surveillance in the US changed to a narrative focusing on individual rights and digital
privacy [12]. We compiled a dataset consisting of approximately 2,000 surveillance
articles from the New York Times for the period 2010 to 2016.

March 3, 2020 17/34


0.260

0.240 Class 1
0.220

0.200

0.180

0.110
Class 2

0.100

0.150

0.140

Class 3
0.130

0.120
0.260

0.240

0.220

Class 4
Class 5
0.340
0.320
0.300
0.280
0.260
1990 1993 1996 1999 2002 2005
Fig 11. Annual polarities for cluster 3, (characterized by the terms ‘cancer’ and
‘smoke’), from Figure 8 from the domain smoking for the classes 1 to 5. The PMC is
shown with solid lines in square markers, and coincides exactly with a framing change
described in an earlier NCI monograph.

March 3, 2020 18/34


Our class vectors for this domain are shown in figure 12. We obtain a PMC of 2013
to 2014 for this period, corresponding closely with the ground truth framing change.
The trends in our class vectors are indicative of the change. As can be seen from the
figure, positivity (measured by class 5), drops markedly, together with a simultaneous
increase in negativity (class 1) and neutrality (classes 2 and 3). Class 4 remains close to
constant during this period and thus does not affect our hypothesis.
We interpret the findings of [29] to refer primarily to 2013, the year in which the
revelations were made, and the following year, 2014. Whereas other interpretations may
conclude a longer framing change, they must necessarily include this period. This
domain therefore contributes two true positives (2013 and 2014) with no false positives
or negatives to our quantitative evaluation.

Obesity
Kim and Willis [2] found that the framing of obesity news underwent changes between
the years 1997 and 2004. During this period, Kim and Willis found that the fraction of
news frames attributing responsibility for obesity to social causes increased significantly.
Prior to this period, obesity tended to be framed as an issue of individual responsibility.
For example, obesity news after the year 2000 has often criticized food chains for their
excessive use of sugar in fast food, as shown in the NYT snippet in the Introduction and
Contributions section. We compiled a dataset of over 3,000 articles from the New York
Times (since Kim and Willis [2] restrict their study to Americans) from 1990 to 2009.
The clusters we estimate for this domain are shown in Figure 7. Cluster 2 addresses
possible causes of obesity, with a particular focus on dietary habits. We posit that this
cluster represents societal causes more than individual ones (since individual causes, as
shown in the NYT snippet of the Introduction and Contributions section tend to discuss
topics such as fitness and sedentary lifestyles, as opposed to food content). We observe
that the PMC for this domain (2005 to 2007) is characterized by increased positivity,
shown by classes 4 and 5, and decreased negativity (class 1). Our results for this
domain thus agree with the findings of Kim and Willis [2].
We were unable to use this domain in our precision-recall analysis, since Kim and
Willis, to the best of our knowledge, do not specify a precise period during which the
framing change took place.
However, since Figures 2 and 3 of Kim and Willis [2] show a dramatic increase of
social causes in 2004, and a corresponding marked decline of individual causes, we
conclude a substantial agreement between their findings and our results.

LGBT Rights
We compiled a dataset of over 3,000 articles from the period 1996 to 2015 in this domain.
Figure 6 depicts our estimated clusters. Cluster 3 represents a frame that discusses the
subject of same-sex marriage and its legality. We note that the Supreme Court ruled to
legalize same-sex marriages in the US in the year 2015. Our class vectors for this domain
are shown in figure 14. We obtained two PMCs with nearly identical correlation scores
(0.999 for the period 2006 to 2008, and 0.989 for the period 2013 to 2015). Figure 14
highlights the period 2013 to 2015 immediately preceding the judicial interest of 2015.
We were unable to identify a prior study that discusses the framing of LGBT news
over our entire period of interest. However, we use the findings reported in Gainous et
al. [32] as our ground truth for this domain. Gainous et al. studied the framing of
LGBT related publishing in the New York Times over the period 1988 to 2012, and
found a dramatic increase in equality frames between approximately 25 in 2008 and
approximately 110 in 2012. Correspondingly, our findings of Figure 14 show that
between 2008 and 2012, there was a dramatic increase in the measures of classes 4 and 5

March 3, 2020 19/34


0.25 Class 1

0.20

0.15

0.14

0.12 Class 2

0.1
0.14

0.12 Class 3

0.1
0.24

0.22
Class 4
0.2

0.18

Class 5
0.35

0.3
2010 2011 2012 2013 2014 2015 2016
Fig 12. Annual polarities for a representative cluster (characterized by the terms
‘national‘, ‘security’, and ‘agency’) from the domain surveillance for the classes 1 to 5.
The PMC is shown with solid lines in square markers.

March 3, 2020 20/34


0.3

Class 1
0.25

0.2

0.120 Class 2

0.100

0.15

0.14
Class 3
0.13

0.12
0.24
Class 4

0.22

0.2
0.36 Class 5

0.34

0.32

0.3

0.28
1990 1993 1996 1999 2002 2005 2008
Fig 13. Annual polarities for cluster 2 (characterized by the terms ‘diet’, ‘food’, and
‘make’) from Figure 7 from the domain obesity for the classes 1 to 5. The PMC is shown
with solid lines in square markers. We posit that this cluster represents societal causes
of obesity (see the Obesity section). We observe that the PMC for this cluster (2005 to
2007) agrees with the findings of Kim and Willis [2].

March 3, 2020 21/34


(representing positivity), and a marked reduction in the measures of classes 1 and 2,
(representing negativity).
For uniformity, we imposed a threshold of y = 3 years to identify a PMC. As we
explain in the section on Detecting Framing Changes using Periods of Maximum
Correlation, high correlations are less likely for more extended periods. Therefore 2008
and 2012 is unsurprisingly not our PMC at the chosen uniform threshold. However,
given the marked increase in positivity and corresponding decrease in negativity in this
period, we posit that the period has high correlation. Also, whereas we were unable to
find a study covering the period 2013 to 2015 to use as ground truth, we note that our
PMC preceded major judicial interest in the domain.
We therefore rely on Gainous and Rhodebeck [32] for ground truth in this domain,
and note that the trend towards increased positivity (and reduced negativity) in
Figure 14 began in 2008 and ended in 2013. We therefore conclude that our measures
return four true positives, and one false positive for this domain.

Abortion
The Partial-Birth Abortion Ban Act was enacted in 2003. We obtained 248 articles for
the period 2000 to 2003, for this domain. We obtain a PMC of 2001 to 2003 for this
domain, as shown in figure 15.

Immigration
We study the framing of immigration news in the United Kingdom. We obtained about
3,600 articles on the subject of Immigration from the Guardian API for the period 2000
to 2017. For this domain, we carried out our analysis on the article titles (rather than
the full text). Since the Guardian returns full length articles, we found that this design
choice allows us to produce a more focused domain corpus than the one generated by
the full article text. We depict our estimated class vectors and PMC in figure 16.
We analyze the frame of cluster 2 in Figure 10. This cluster deals with the issue of
asylum seekers to the United Kingdom. In the period beginning immediately before the
year 2000,a new peak in asylum claims to the United Kingdom of 76,040 had been
reached [33]. This event coincided with a high-profile terrorist act by a set of Afghan
asylum seekers [33].
These events resulted in increased border refusals and the final 2002 white paper on
“Secure Borders, Safe Haven.” We estimate a PMC of 2000 to 2002 (Figure 16). Our
PMC coincides exactly with the period immediately foreshadowing the government
white paper.

Drones
We obtained nearly 4,000 articles on this domain for the period 2003 to 2012. We
obtain a PMC of 2009 to 2011 for this domain, as shown in Figure 17.
Our PMC immediately foreshadows the Federal Aviation Administration’s
Modernization and Reform Act of 2012.

Predictive Utility
The aforementioned two domains (immigration and drones) highlight the predictive
utility of news framing. Whereas we did not find earlier surveys that coincide with our
PMCs for these domains, we note that these PMCs foreshadowed substantial legislative
activity. This observation suggests that PMCs estimated through real-time monitoring
of domain news may yield predictive utility for legislative and commercial activity.

March 3, 2020 22/34


0.200
Class 1

0.180

0.160

0.120
Class 2
0.115

0.110

0.105

0.150 Class 3

0.140

0.130
0.250

0.240 Class 4

0.230

0.220

0.350
Class 5
0.340

0.330

1996 1998 2000 2002 2004 2006 2008 2010 2012 2014
Fig 14. Annual polarities for cluster 3, characterized by the terms ‘gay’, ‘rights’, and
‘marriage’, in Figure 6 from the domain LGBT Rights for the classes 1 to 5. We obtain
two PMCs with nearly identical correlation scores, namely, 2006 to 2008 and 2013 to
2015. The PMC of 2013 to 2015 is shown with solid lines in square markers,
immediately preceding the judicial interest of 2015.

March 3, 2020 23/34


0.3
Class 1
0.28

0.26

0.24

0.12

0.11
Class 2
0.10

0.09

0.14
Class 3
0.13

0.12

Class 4
0.22

0.20

0.18

0.30
Class 5
0.28

0.26

2000 2001 2002 2003


Fig 15. Annual polarities for cluster 1 (characterized by the terms ‘abortion’ and ‘ban’)
from Figure 4 from the domain abortion for the classes 1 to 5. The PMC is shown with
solid lines in square markers (2001 to 2003) and foreshadows the partial birth abortion
ban of 2003.

March 3, 2020 24/34


0.500

0.400

0.300

0.200 Class 1

0.15

0.10

0.05 Class 2

0.15

0.10

Class 3

0.25

0.20

0.15 Class 4
0.50

0.40

0.30

0.20 Class 5

2000 2003 2005 2007 2009 2011 2013 2015 2017


Fig 16. Annual polarities for cluster 2 (discussing asylum grants) from Figure 10 from
the domain immigration for the classes 1 to 5. The PMC is shown with solid lines in
square markers, and foreshadows the “Secure Borders, Safe Haven” white paper of 2002.

March 3, 2020 25/34


0.180
Class 1

0.160

0.140

0.100
Class 2
0.090

Class 3

0.140

0.130

Class 4
0.260

0.240

0.400

0.380
Class 5
0.360

0.340

2003 2005 2007 2009 2011


Fig 17. Annual polarities for cluster 1 (discussing drone strikes) from Figure 5 from
the domain drones for the classes 1 to 5. The PMC is shown with solid lines in square
markers, and immediately foreshadows the Federal Aviation Administration’s
Modernization and Reform Act of 2012. This finding suggests the predictive utility of
framing change detection for legislative activity.

March 3, 2020 26/34


Overall Precision and Recall
We obtain an overall precision of 0.90 as well as a recall of 0.90. Our results
demonstrate that we successfully identify 90% of true positives, whereas only 10% of the
positives we identify are false positives.
Further, we point out that our false positives generally either precede or succeed a
ground truth framing change. Therefore, we posit that such false positives may be due
to imprecision in measurement rather than any considerable failure of our approach.
Our results demonstrate substantial agreement with ground truth in domains for
which prior surveys have studied framing changes. In domains for which we did not find
such surveys, we demonstrate that our PMCs foreshadow periods of substantial public
and legislative import. We posit, therefore, that our approach successfully identifies
framing changes.

Qualitative Analysis and Discussion


Whereas we provide a partial quantitative evaluation of our approach using precision
and recall in the preceding sections, we confront substantial difficulties in uniformly
conducting such evaluations across all of our domains. In this section, we qualitatively
evaluate our results in the context of these limitations.
Framing, and framing changes, have in general been studied from the lens of
identifying a general trend from a particular data source. Studies often describe such
trends without stating a hard beginning or end of the trend period [28, 29].
The periods we analyze here are relatively long (often lasting almost two decades).
The available human surveys, not surprisingly are limited in scope and do not cover the
same periods in their entirety. As a result, we observe missing years during which
computational methods produce a belief that cannot be verified using a human survey,
since no human survey covers such years. Therefore, a quantitative precision-recall style
analysis becomes difficult or impossible to conduct.
Further, the language used in existing studies is not always sufficiently precise as to
interpret a fixed set of positives and negatives. For example, the Columbia Journalism
Review uses the phrase “after the Snowden revelation” to describe the change in media
attitude. The Snowden revelations were made in June 2013. Our estimated PMC is
2013 and 2014.
The NCI monograph on smoking states that between 1985 and 1996, framing was
balanced between pro and anti tobacco control, and in 2001 to 2003, framing favored
tobacco control. Our PMC is 2001 to 2003. However, there are no studies that we know
of that cover the period 1996 to 2000.
Likewise, the Kim and Willis study on Obesity [2] that was published in 2007 states
that the change happened “in recent times,” but shows quantitative measures only until
2004. The measures that Kim and Willis compute show that the frequency of societal
causes increases sharply in 2004. Our PMC over the period 1990 to 2009 (with our
uniformly imposed threshold of y = 3) is 2005 to 2007. The correlation for the period
2004 to 2007 is high but it is not the PMC at a threshold of y = 3.
These examples show that performing a precision-recall analysis based on prior work
is problematic, since obtaining a set of positives and negatives from prior studies in the
sociological and communications literature involves interpretation.
Given that the data sources and coverage between our analysis and that of prior
surveys are usually quite different, the correlations we obtain appear quite substantial.
However, quantitative evaluation remains challenging for the reasons we point out.
This paper follows the spirit of recent work [11, 13, 15] in seeking to develop the
study of framing into a computational science. We acknowledge that our dataset
collection and methods may undergo refinement to tackle broader ground truth data, of

March 3, 2020 27/34


a wider temporal and geographical scope. Nonetheless, we posit that our methods and
results have scientific value, and hope that future work will provide greater coverage of
ground truth.
Please note that the underlying data preparation requires social science expertise
and cannot be effectively crowdsourced via a platform such as Mechanical Turk. We
therefore hope that our work catches the interest of social scientists and leads them to
pursue more comprehensive studies of framing in news media that would enable
improvements in computational methods.

Conclusion
We highlight a problem of significant public and legislative importance, framing change
detection. We contribute an unsupervised natural language based approach that detects
framing change trends over several years in domain news publishing. We identify a key
characteristic of such changes, namely, that during frame changes, the polarity of
adjectives describing cooccurring nouns changes cumulatively over multiple years. Our
approach agrees with and extends the results of earlier manual surveys. Whereas such
surveys depend on human effort and are therefore limited in scope, our approach is fully
automated and can simultaneously run over all news domains. We contribute the
Framing Changes Dataset, a collection of over 12,000 news articles from seven domains
in which framing has been shown to change by earlier surveys. We will release the
dataset with our paper. Our work suggests the predictive utility of automated news
monitoring, as a means to foreshadow events of commercial and legislative import.
Our work represents one of the first attempts at a computational modeling of
framing and framing changes. We therefore claim that our approach produces promising
results, and that it will serve as a baseline for more sophisticated analysis over wider
temporal and geographical data.

Appendix: Sample Correlations


Correlations for all subsets are shown in Table 4. The PMC is shown in bold.

Table 4. All correlations for cluster 2 of the domain Immigration.


Correlation Start Index End Index
1 1 2
1 2 3
1 3 4
1 4 5
1 5 6
1 6 7
1 7 8
1 8 9
1 9 10
1 10 11
1 11 12
1 12 13
1 13 14
1 14 15
1 15 16
1 16 17
1 17 18
0.99 1 3

March 3, 2020 28/34


Table 4 All correlations for cluster 2 of the domain Immigration (continued).
Correlation Start Index End Index
0.98 10 12
0.98 12 14
0.96 12 15
0.93 12 16
0.93 9 12
0.88 11 13
0.88 9 11
0.87 11 14
0.86 11 16
0.85 11 15
0.85 8 12
0.84 8 10
0.83 7 9
0.82 15 17
0.82 14 16
0.82 10 13
0.81 7 12
0.81 9 13
0.79 10 14
0.79 10 16
0.79 9 14
0.78 9 16
0.78 10 15
0.77 9 15
0.77 8 16
0.77 8 13
0.76 8 15
0.76 8 14
0.75 12 17
0.75 5 7
0.75 12 18
0.74 11 18
0.74 11 17
0.72 6 8
0.72 10 17
0.72 10 18
0.72 7 11
0.72 13 15
0.71 7 10
0.70 8 11
0.69 2 4
0.68 7 13
0.68 4 6
0.67 3 5
0.67 16 18
0.66 7 14
0.66 9 17
0.66 9 18
0.66 14 17
0.65 8 17

March 3, 2020 29/34


Table 4 All correlations for cluster 2 of the domain Immigration (continued).
Correlation Start Index End Index
0.65 7 16
0.65 8 18
0.64 7 15
0.63 6 11
0.62 6 10
0.62 6 12
0.62 6 9
0.61 6 14
0.61 6 13
0.61 5 13
0.61 5 12
0.60 5 8
0.60 5 14
0.60 7 17
0.60 6 16
0.60 4 12
0.59 7 18
0.59 4 13
0.59 5 16
0.59 2 5
0.58 6 15
0.58 5 11
0.58 5 15
0.58 5 10
0.57 5 9
0.57 4 16
0.57 4 14
0.56 2 13
0.56 2 12
0.56 4 17
0.56 15 18
0.56 4 18
0.56 3 13
0.56 3 6
0.56 3 12
0.56 1 5
0.55 4 15
0.55 5 17
0.55 5 18
0.55 3 18
0.55 2 18
0.55 3 17
0.55 2 17
0.54 2 16
0.54 3 16
0.54 6 17
0.54 6 18
0.53 2 14
0.53 3 14
0.53 2 15

March 3, 2020 30/34


Table 4 All correlations for cluster 2 of the domain Immigration (continued).
Correlation Start Index End Index
0.53 3 15
0.52 1 12
0.52 1 13
0.51 1 18
0.51 1 17
0.50 13 17
0.50 4 7
0.49 2 6
0.49 1 16
0.49 1 6
0.48 1 14
0.48 1 15
0.47 13 18
0.46 13 16
0.44 4 8
0.43 1 4
0.42 4 9
0.40 4 10
0.40 4 11
0.39 14 18
0.36 3 7
0.36 2 11
0.36 1 7
0.36 3 11
0.35 3 8
0.36 2 7
0.35 2 8
0.35 2 9
0.34 1 8
0.34 2 10
0.34 3 9
0.34 3 10
0.34 1 11
0.34 1 9
0.32 1 10

Ethics Statement
Our study involved no human or animal subjects.

Funding Statement
CS has a commercial affiliation to Amazon. The funder provided support in the form of
salaries for this author, but did not have any additional role in the study design, data
collection and analysis, decision to publish, or preparation of the manuscript. The
specific roles of these authors are articulated in the ‘author contributions’ section.

March 3, 2020 31/34


Competing Interests Statement
The above commercial affiliation does not alter our adherence to PLOS ONE policies on
sharing data and materials.

Author Contributions
KS and CS conceived the research and designed the method. KS prepared the datasets
and performed the analysis. KS and MPS designed the evaluation approach. KS, CS,
and MPS wrote the paper.

References
1. Gunnars K. Ten Causes of Weight Gain in America; 2015.
https://www.healthline.com/nutrition/10-causes-of-weight-gain#section12.

2. Kim SH, Willis A. Talking about Obesity: News Framing of Who Is Responsible
for Causing and Fixing the Problem. Journal of Health Communication.
2007;12(4):359–376.
3. Flegal K, Carroll M, Kit B, Ogden C. Prevalence of Obesity and Trends in the
Distribution of Body Mass Index Among US Adults, 1999–2010. Journal of the
American Medical Association. 2012;307(5):491–497.

4. Chong D, Druckman J. Framing theory. Annual Reviews on Political Science.


2007;10:103–126.
5. de Vreese C. News framing: Theory and typology. Information Design Journal.
2005;13(1):51–62.

6. Constantin L. Facebook ID Leak hits Millions of Zynga Users; 2010.


http://tinyurl.com/2bqwoxq.
7. Crossley R. Facebook ID Leak hits Millions of Zynga Users; 2011. http://www.
develop-online.net/news/facebook-id-leak-hits-millions-of-zynga-users/0107956.

8. Fitzsimmons C. Facebook And Zynga Sued Over Privacy; 2014.


http://www.adweek.com/digital/facebook-zynga-sued/.
9. US Congress. Personal Data Protection and Breach Accountability Act of 2014;
2014. https://www.congress.gov/bill/113th-congress/senate-bill/1995.
10. Benford R, Snow D. Framing Processes and Social Movements: An Overview and
Assessment. Annual Review of Sociology. 2000;26(1):611–639.
11. Card D, Boydstun A, Justin Gross J, Resnik P, Smith N. The Media Frames
Corpus: Annotations of Frames across Issues. In: Proceedings of the 53rd Annual
Meeting of the Association for Computational Linguistics and the 7th
International Joint Conference on Natural Language Processing (Volume 2: Short
Papers). vol. 2; 2015. p. 438–444.
12. Sheshadri K, Singh MP. The Public and Legislative Impact of
Hyper-Concentrated Topic News. Science advances. 2019;5(8).

March 3, 2020 32/34


13. Tsur O, Calacci D, Lazer D. A Frame of Mind: Using Statistical Models for
Detection of Framing and Agenda Setting Campaigns. In: Proceedings of the
53rd Annual Meeting of the Association for Computational Linguistics and the
7th International Joint Conference on Natural Language Processing (Volume 1:
Long Papers); 2015. p. 1629–1638.
14. Entman R. Framing: Toward Clarification of a Fractured Paradigm. Journal of
Communication. 1993;43(4):51–58.
15. Alashri S, Tsai JY, Alzahrani S, Corman S, Davulcu H. “Climate Change”
Frames Detection and Categorization Based on Generalized Concepts. In:
Proceedings of the 10th IEEE International Conference on Semantic Computing
(ICSC); 2016. p. 277–284.
16. NYT. Developer APIs; 2016. http://developer.nytimes.com/.
17. Wikipedia. The New York Times; 2001.
https://en.wikipedia.org/wiki/The New York Times.
18. The Guardian. Guardian Open Platform; 2016.
http://open-platform.theguardian.com/.
19. Wikipedia. The Guardian; 2002. https://en.wikipedia.org/wiki/The Guardian.
20. King G, Schneer B, White A. How the News Media Activate Public Expression
and Influence National Agendas. Science. 2017;358(6364):776–780.
21. Sheshadri K, Ajmeri N, Staddon J. No Privacy News is Good News: An Analysis
of New York Times and Guardian Privacy News from 2010—2016. In:
Proceedings of the 15th Privacy, Security and Trust Conference. Calgary, Alberta,
Canada; 2017. p. 159–167.
22. Socher R, Perelygin A, Wu J, Chuang J, Manning C, Ng A, et al. Recursive Deep
Models for Semantic Compositionality Over a Sentiment Treebank. In:
Proceedings of the 2013 Empirical Methods in Natural Language Processing
Conference (EMNLP). Seattle, WA: Association for Computational Linguistics;
2013. p. 1631–1642.
23. Melo GD, Bansal M. Good, Great, Excellent: Global Inference of Semantic
Intensities. Transactions of the Association for Computational Linguistics.
2013;1:279–290.
24. Blei D, Ng A, Jordan M. Latent Dirichlet Allocation. Journal of Machine
Learning Research. 2003;3:993–1022.
25. Engel S. Frame Spillover: Media Framing and Public Opinion of a Multifaceted
LGBT Rights Agenda. Law and Social Inquiry. 2013;38:403–441.
26. Benesty J, Chen J, Huang Y, Yiteng C, Cohen I. Pearson Correlation Coefficient.
In: Noise reduction in speech processing. Springer; 2009. p. 1–4.
27. Mathworks. Correlation Coefficients; 2019.
https://www.mathworks.com/help/matlab/ref/corrcoef.html.
28. National Cancer Institute. How the News Media Influence Tobacco Use; 2019.
https://cancercontrol.cancer.gov/brp/tcrb/monographs/19/m19 9.pdf.
29. Vernon P. Five Years Ago, Edward Snowden Changed Journalism; 2018.
https://www.cjr.org/the media today/snowden-5-years.php.

March 3, 2020 33/34


30. Pew Research. The State of Privacy in post-Snowden America; 2016.
31. Cummings M, Proctor R. The Changing Public Image of Smoking in the United
States: 1964–2014. Cancer Epidemiology and Prevention Biomarkers.
2014;23:32–36.
32. Gainous J, Rhodebeck L. Is Same-Sex Marriage an Equality Issue? Framing
Effects Among African Americans. Journal of Black Studies. 2016;47(7):682–700.
33. Wikipedia. History of UK immigration control; 2019.
https://en.wikipedia.org/wiki/History of UK immigration control.

March 3, 2020 34/34


Revised Manuscript with Track Changes

Detecting Framing Changes in Topical News Publishing


Karthik Sheshadri1* , Chaitanya Shivade2 , Munindar P. Singh1 ,

1 Department of Computer Science, North Carolina State University


2 Amazon

* kshesha@ncsu.edu

Abstract
Changes in the framing of topical news have been shown to foreshadow significant
public, legislative, and commercial events. Automated detection of framing changes is
therefore an important problem, which existing research has not considered. Previous
approaches are manual surveys, which rely on human effort and are consequently
limited in scope. We make the following contributions. We systematize discovery of
framing changes through a fully unsupervised computational method that seeks to
isolate framing change trends over several years. We demonstrate our approach by
isolating framing change periods that correlate with previously known framing changes.
We have prepared a new dataset, consisting of over 12,000 articles from seven news
topics or domains in which earlier surveys have found framing changes. Finally, our
work highlights the predictive utility of framing change detection, by identifying two
domains in which framing changes foreshadowed substantial legislative activity, or
preceded judicial interest.

“For nearly four decades, health and


fitness experts have prodded and
cajoled and used other powers of
persuasion in a futile attempt to
whip America’s youngsters into
shape.”

The New York Times, 1995

“The New York City Health


Department has embarked on a new
campaign to persuade processed
food brands to decrease sugar
content in a bid to curb obesity.”

The New York Times, 2015

Introduction and Contributions


To motivate the problem and approach of this paper, let us investigate the primary
causes of obesity in America. Public opinion and behavior on the subject have changed
measurably since the late 1990s. As an example, Gunnars [1] compiled a list in 2015 of
ten leading causes, six of which suggest that the processed food industry may be
primarily responsible. By contrast, in the 1990s and early 2000s, popular opinion
appeared to hold [2, 3] that obesity was primarily caused by individual behavior and
lifestyle choices. What led to this change in public opinion?

March 7, 2020 1/34


We posit that news publishing on the subject of obesity contributed to the change in
the public’s opinion. The above quotes from the New York Times (NYT) are
representative snippets from news articles on obesity published in 1995 and 2015,
respectively. Whereas both address the same topic, the 1995 snippet implies
responsibility on part of individuals, and the 2015 snippet implies responsibility on part
of the processed food industry. These subjective biases in news are collectively referred
to as framing.
Framing theory [4, 5] suggests that how a topic is presented to the audience (called
“the frame”) influences the choices people make about how to process that information.
The central premise of the theory is that since an issue can be viewed from a variety of
perspectives and be construed as having varying implications, the manner in which it is
presented influences public reaction.
In general, understanding news framing may be a crucial component of
decision-support in a corporate and regulatory setting. To illustrate this fact, we
present a real-life example of the influence of framing on public perception and
legislation. According to [6, 7], in late 2010, security vulnerabilities associated with
Facebook and Zynga allowed personal data from millions of users to be compromised.
The framing of news on this topic appeared to change from a neutral narrative to one
focusing on personal privacy.
Facebook and Zynga were sued over privacy breaches [8]. Further, in 2013, the
Personal Data Protection and Breach Accountability Act was promulgated in
Congress [9]. These examples motivate the problem of framing change detection, which
involves identifying when the dominant frame (or frames) [10] of a topic undergoes a
change.

Related Work
The Media Frames Corpus, compiled by Card et al. [11], studies three topics
(Immigration, Smoking, and same-sex marriages), and identifies fifteen framing
dimensions in each. We identify two major limitations of their work. Firstly, Card et al.
study framing as a static detection problem, identifying which dimensions appear in a
given news article. However, research in sociology [10] shows that most news topics
feature a dominant frame (or dominant dimension in the terminology of [11]). Further,
for a generic news topic, the dominant frame is not necessarily one of fifteen previously
chosen dimensions, but can instead be an unknown arbitrary frame specific to the topic
under consideration. For example, in the example given in the Introduction and
Contributions section, the dominant frame related to the privacy of individuals, which is
not one of the fifteen dimensions described in Card et al. [11].
Secondly, Sheshadri and Singh [12] showed that public and legislative reaction tend
to occur only after changes in the dominant frame. That finding motivates an approach
to framing that focuses on identifying and detecting changes in the dominant frame of a
news domain.
Sheshadri and Singh further propose two simple metrics that they motivate as
measures of domain framing: framing polarity and density. They define framing polarity
as the average frequency of occurrence in a domain corpus of terms from a benchmark
sentiment lexicon. Framing density is measured using an entropic approach that counts
the number of terms per article required to distinguish a current corpus from an earlier
one.
We identify the following limitations of the aforementioned measures (introduced
in [12]). Firstly, both measures make no effort to associate a given news article with a
particular frame. Prior work does not support the inherent assumption that all articles
in a given domain belong to a particular frame [10, 11]. We enhance understanding by
analyzing each domain using several distinct frames.

March 7, 2020 2/34


Secondly, framing density does not distinguish between a subjective choice made by
a news outlet to frame a domain differently, and events that necessitate media coverage.
Our work provides this distinction by analyzing framing using patterns of change in the
adjectives that describe co-occurring nouns. Since adjectives are artifacts of subjective
framing, they are not affected by events, as framing density is.
It is worthwhile to note that our approach is similar in spirit to Tsur et al.’s
work [13], in that both that work and this paper apply a topic modeling strategy to
analyze framing as a time series. However, we highlight the following key differences
and contributions of our work. Firstly, as both Sheshadri and Singh [12] and Tsur et
al. [13] point out, framing is a subjective aspect of communication. Therefore, a
computational analysis of framing should ideally differentiate subjective aspects from
fact-based and objective components of communication. Since adjectives in and of
themselves are incapable of communicating factual information, we take them to be
artifacts of how an event or topic is framed. In contrast, generic n-grams (as used by
Tsur et al. [13]) do not provide this distinction.
Further, Tsur et al. rely upon estimating “changes in framing” using changes in the
relative frequencies of n-grams associated with various topics or frames. Whereas such
an approach is useful in evaluating which of a set of frames may be dominant at any
given time, it does not measure “framing changes” in the sense originally described
in [14]. In contrast, our work estimates changes in framing using consistent polarity
drifts of adjectives associated with individual frames. Our approach may also be applied
to each of a number of frames independently of the others, as opposed to Tsur et al. [13].
We also distinguish our work from Alashri et al. [15], which uses standard machine
learning tools to classify sentences represented by linguistic features into one of four
frames. Such an approach is limited by the need to predefine a frame set, as Card etal’s
approach [11] is. Further, the paper does not discuss the problem of how to examine
changes within specific frames. To the best of our knowledge, our work is the first to
address the problem of detecting meaningful patterns of change within individual
frames.

Contributions
This paper contributes a fully unsupervised and data-driven natural language based
approach to detecting framing change trends over several years in domain news
publishing. To the best of our knowledge, this paper is the first to address framing
change detection, a problem of significant public and legislative import. Our approach
agrees with and extends the results of earlier manual surveys, which required human
data collection and were consequently limited in scope. Our approach removes this
restriction by being fully automated. Our method can thus be run simultaneously over
all news domains, limited only by the availability of real-time news data. Further, we
show that our approach yields results that foreshadow periods of legislative activity.
This motivates the predictive utility of our method for legislative activity, a problem of
significant import.
Further, we contribute a Framing Changes Dataset, which is a collection of over
12,000 news articles from seven news topics or domains. In four of these domains,
surveys carried out in earlier research have shown framing to change. In two domains,
periods with significant legislative activity are considered. Our individual domain
datasets within the framing changes dataset cover the years in which earlier research
found framing changes, as well as periods ranging up to ten years before and after the
change. Our dataset is the first to enable computational modeling of framing change
trends. We plan to release the dataset with our paper. We note that a fraction of the
articles in this dataset were used earlier for the analysis in [12].

March 7, 2020 3/34


Materials and Methods
This section describes our datasets, data sources, and inter-annotator agreement. All
data were collected in an anonymous and aggregated manner. All APIs and data used
are publicly available, and our data collection complies with the terms and conditions of
each API.

Data Sources
We use two Application Programming Interfaces (APIs) to create our datasets.

The New York Times API:


The New York Times (NYT) Developer’s API [16] provides access to news data from
the NYT, Reuters, and Associated Press (AP) newspapers—both print and online
versions—beginning in 1985. The NYT has the second largest circulation of any
newspaper in the United States [17].
The data object returned by the API includes fields such as the article type (news,
reviews, summaries, and so on), the news source (NYT, Reuters, or AP), the article’s
word count, the date of its publication, article text (in the form of the abstract, the lead
(first) paragraph, and a summary).

The Guardian API:


The Guardian Xplore API [18] provides access to news data from The Guardian, a
prominent UK newspaper that reaches 23 million UK adults per month [19].
The Guardian API returns full-length articles along with such metadata as the
article type (similar to the NYT API) and a general section name (such as sports,
politics, and so on). Although these section names are manually annotated by humans,
we do not use them in our analysis, but rely instead on a simple term search procedure
(see the Domain Dataset Generation section) to annotate our datasets.

Domain Dataset Generation


As in earlier work [12, 20, 21], we use a standard term search procedure to create our
datasets. Specifically, an article belongs to a domain if at least a component of the
article discusses a topic that is directly relevant to the domain [12]. We term articles
that are relevant to a domain domain positives, and irrelevant articles domain negatives.
As an example, consider the following article from the domain smoking: “The dirty gray
snow was ankle deep on West 18th Street the other day, and on the block between Ninth
and Tenth Avenues, a cold wind blew in off the Hudson River. On the south side of the
street, a mechanic stood in front of his garage smoking a . . . .” We consider this article
a domain negative since whereas it contains the keyword ’smoking’, it does not discuss
any aspect pertaining to the prevalence or control of Smoking. In contrast, the article
“An ordinance in Bangor, Maine, allows the police to stop cars if an adult is smoking
while a child under 18 is a passenger” is directly relevant to the domain, and is therefore
considered a domain positive. We define dataset accuracy as the fraction of articles in a
dataset that are domain positive. For each domain, our APIs were used to extract news
data during the time period b (denoting the beginning) to e (denoting the end), of the
period of interest.

March 7, 2020 4/34


Inter-Annotator Agreement
To ensure that the articles returned by our term search procedure are indeed relevant to
each domain, a random sample of articles from each domain dataset was coded by two
raters. We supply the per-domain accuracy and inter-annotator agreement as Cohen’s
Kappa for sample domains in Table 1.

Table 1. Inter-Annotator agreement as Cohen’s Kappa.


Dataset Accuracy
Domain Coder 1 Coder 2 Kappa
Surveillance 0.80 0.75 0.79
Smoking 0.84 0.82 0.93
Obesity 0.78 0.74 0.67
LGBT Rights 0.83 0.74 0.64
Abortion 0.80 0.80 0.50

Probability Distribution over Adjectives


Our approach relies on the key intuition that during a framing change, the valence of
the adjectives describing co-occurring nouns changes significantly.
To measure this change, we create a reference probability distribution of adjectives
based on the frequency of their occurrence in benchmark sentiment datasets.

Benchmark Datasets
We identified three open source benchmark review datasets from which to create our
adjective probability distribution. Together, these datasets provide about 150 million
reviews of various restaurants, services and products, with each review rated from one
to five. Given the large volume of reviews from different sources made available by these
datasets, we assume that they provide a sufficiently realistic representation of all
adjectives in the English language.
We rely primarily on the Trip Advisor dataset to create our adjective probability
distribution. We identified two other benchmark datasets, namely, the Yelp Challenge
dataset and the Amazon review dataset. Due to the fact that these datasets together
comprise about 150 million reviews, it is computationally infeasible for us to include
them in our learning procedure. Instead, we learned distributions from these datasets
for sample adjectives, to serve as a comparison with and as verification of our overall
learned distribution. The resulting distributions for these adjectives appeared
substantially similar to those of the corresponding adjectives in our learned distribution.
We therefore conclude that our learned distribution provides a valid representation of all
adjectives in the English language. We describe each dataset below.

Trip Advisor
The Trip Advisor dataset consists of 236,000 hotel reviews. Each review provides text,
an overall rating, and aspect specific ratings for the following seven aspects: Rooms,
Cleanliness, Value, Service, Location, Checkin, and Business. We limit ourselves to
using the overall rating of each review.

March 7, 2020 5/34


Yelp
The Yelp challenge dataset consists of approximately six million restaurant reviews.
Each entry is stored as a JSON string with a unique user ID, check-in data, review text,
and rating.

Amazon
The Amazon dataset provides approximately 143 million reviews from 24 product
categories such as Books, Electronics, Movies, and so on. The dataset uses the JSON
format and includes reviews comprising a rating, review text, and helpfulness votes.
Additionally, the JSON string encodes product metadata such as a product description,
category information, price, brand, and image features.

Polarity of Adjectives
For each adjective in the English language, we are interested in producing a probability
distribution that describes the relative likelihood of the adjective appearing in a review
whose rating is r. For our data, r ranges from one to five.
We began by compiling a set of reviews from the Trip Advisor dataset for each
rating from one to five. We used the Stanford CoreNLP parser [22] to parse each of the
five sets of reviews so obtained. We thus obtained sets of parses corresponding to each
review set. From the set of resultant parses, we extracted all words that were assigned a
part-of-speech of ‘JJ’ (adjective). Our search identified 454,281 unique adjectives.
For each unique adjective a, we counted the number of times it occurred in our set of
parses corresponding to review ratings one to five. We denote this by Ni , with
N1 N2 N5
1 ≤ i ≤ 5. Our probability vector for adjective a is then { Saa , Saa , . . . , Saa } where
Sa = Na1 + Na2 + Na3 + Na4 + Na5 .
Additionally, we recorded the rarity of each adjective as S1a . This estimates a
probability distribution P , with 454,281 rows and six columns.
Table 2 shows example entries from our learned probability distribution. As can be
seen from the table, our learned distribution not only correctly encodes probabilities
(the adjective ‘great’ has nearly 80% of its probability mass in the classes four and five,
whereas the adjective ‘horrible’ has nearly 80% of its mass in classes one and two), but
also implicitly learns an adjective ranking such as the one described in De Melo et
al. [23]. To illustrate this ranking, consider that the adjective ‘excellent’ has 60% of its
probability mass in class five, whereas the corresponding mass for the adjective ‘good’ is
only 38%.
For visual illustration, we depict our learned probability distribution as a heatmap in
Table 3.
Motivated by our learned probability distribution, we posit that classes 1 represents
negativity, class 2 to 4 represent neutrality, and class 5 represents positivity.

Incorporating Adjective Rarity


Our measure of adjective rarity serves as a method by which uncommon adjectives,
which rarely occur in our benchmark dataset, and whose learned probability
distributions may therefore be unreliable, can be excluded.
However, in doing so, we run the risk of excluding relevant adjectives from the
analysis. We manually inspect the set of adjectives that describe the nouns in each
domain to arrive at a domain specific threshold.

March 7, 2020 6/34


Table 2. Sample entries from our learned probability distribution for positive and
negative sentiment adjectives.
Adjective Class 1 Class 2 Class 3 Class 4 Class 5 Rarity (Inverse Scale)
Great 0.039 0.048 0.093 0.274 0.545 4.495e-07
Excellent 0.019 0.028 0.070 0.269 0.612 2.739e-06
Attractive 0.095 0.125 0.192 0.296 0.292 0.0001
Cute 0.039 0.068 0.155 0.330 0.407 1.499e-05
Compassionate 0.068 0.020 0.010 0.038 0.864 0.0004
Good 0.076 0.095 0.185 0.336 0.308 3.459e-07
Horrible 0.682 0.143 0.076 0.042 0.057 7.453e-06
Ridiculous 0.461 0.180 0.125 0.116 0.118 2.033e-05
Angry 0.546 0.138 0.092 0.098 0.126 6.955e-05
Stupid 0.484 0.136 0.099 0.117 0.164 5.364e-05
Beautiful 0.043 0.049 0.085 0.222 0.599 6.233e-06

For a majority of our domains (five out of seven), we use a threshold of q > −∞,
that is, no adjectives are excluded. For the remaining two domains, (drones and LGBT
rights), we employ a threshold of q > 10−4 .
The trends in our results appeared to be fairly consistent across a reasonable range
of threshold values.

Domain Period of Interest


We define a period of interest for each domain. Let tf be a year in which a documented
framing change took place in the domain under consideration. Then, our period of
interest for this domain is b = min(tf − 10, tf − l) to e = max(tf + 10, tf + r), where the
API provides data up to l years before, and r years after tf . All units are in years.

Corpus-Specific Representations
A domain corpus is a set of news articles from a given domain. Let a given domain have
m years in its period of interest with annual domain corpora T1 , T2 , . . . , Tm .

Corpus Clustering
An overall domain corpus is therefore T = T1 ∪ T2 ∪ . . . ∪ Tm .
We assume that a corpus has k unique frames. We adopt a standard topic modeling
approach to estimate frames. We use the benchmark Latent Dirichlet Allocation
(LDA) [24] approach to model k = 5 topics (that is, frames) in each domain corpus. We
extract the top l = 20 terms v from each frame. We also extract the set of all unique
nouns in T . We define a cluster as the set of nouns v ∩ T . We thus generate k clusters,
each representing a unique frame.

Annual Cluster Polarity


For each cluster c, we are interested in arriving at a vector of m annual polarities, i.e.,
for each year i, 1 ≤ i ≤ m in the domain period of interest.
Let xc be the set of all nouns in c. For each noun v ∈ xc , we use the Stanford
dependency parser [22] to identify all adjectives (without removing duplicates) that
describe v in Ti . We extract the polarity vectors for each of these adjectives from P as
the matrix Ai . Ai therefore has n rows, one for each adjective so identified, and five

March 7, 2020 7/34


March 7, 2020
Table 3. The entries of Table 2 depicted as a heatmap for visual illustration.
Great Excellent Attractive Cute Compassionate Good Horrible Ridiculous Angry Stupid Beautiful
3.9 · 10−2 1.9 · 10−2 9.5 · 10−2 3.9 · 10−2 6.8 · 10−2 7.6 · 10−2 6.82 · 10−1 4.61 · 10−1 5.46 · 10−1 4.84 · 10−1 4.3 · 10−2
4.8 · 10−2 2.8 · 10−2 1.25 · 10−1 6.8 · 10−2 2 · 10−2 9.5 · 10−2 1.43 · 10−1 1.8 · 10−1 1.38 · 10−1 1.36 · 10−1 4.9 · 10−2
−2 −2 −1 −1 −2
9.3 · 10 7 · 10 1.92 · 10 1.55 · 10 1 · 10 1.85 · 10−1 7.6 · 10−2 1.25 · 10−1 9.2 · 10−2 9.9 · 10−2 8.5 · 10−2
−1 −1 −1 −1 −2
2.74 · 10 2.69 · 10 2.96 · 10 3.3 · 10 3.8 · 10 3.36 · 10−1 4.2 · 10−2 1.16 · 10−1 9.8 · 10−2 1.17 · 10−1 2.22 · 10−1
−1 −1 −1 −1 −1
5.45 · 10 6.12 · 10 2.92 · 10 4.07 · 10 8.64 · 10 3.08 · 10−1 5.7 · 10−2 1.18 · 10−1 1.26 · 10−1 1.64 · 10−1 5.99 · 10−1
−7 −6 −4 −5 −4
4.5 · 10 2.74 · 10 1 · 10 1.5 · 10 4 · 10 3.46 · 10−7 7.45 · 10−6 2.03 · 10−5 6.96 · 10−5 5.36 · 10−5 6.23 · 10−6

8/34
columns (see the Polarity of Adjectives section). We estimate the annual cluster polarity
of c as the vector of column-wise averages of Ai . Let Pc = {P1 , P2 , . . . , Pm } be the set
of annual cluster polarities so obtained.
Annual polarities for representative clusters from each of our domains are shown in
figures 11 to 15.

Defining Framing Changes


Since language and human behavior are not strictly deterministic, the measurement of
any temporally disparate pair of news corpora using adjective polarity (or any other
numerical metric) would result in different representative values of the two corpora.
Therefore, in this sense, any pair of news corpora can be said to have undergone a
framing change.
Further, individual metrics are susceptible to noisy readings due to imprecise data
and measurement. In particular, such an effect may cause sudden isolated spikes
between successive measurements. For example, in figure 11, during the period between
2005 and 2006, whereas classes 1, 3, 4, and 5 changed little, class 2 showed a substantial
change.
This motivates the question of how a framing change is defined, in the context of our
computational measurements. The usual social science definition [14] is that a framing
change is a shift in the way that a specific topic is presented to an audience. To isolate
such changes computationally, we use the following key observations from ground truth
framing changes: (i) framing changes take place as trends that are consistent over at
least k years (ii) framing changes must be consistent across multiple measurements.
Our aim in this paper is to begin from a set of time series such as the ones in
figure 11, and isolate such trends. The requirement motivated by our first condition,
namely, that framing changes must last at least k years, is easy to satisfy by imposing
such a numerical threshold.
To satisfy the requirement motivated by our second observation, we rely on
correlations between different measurements, as described in the section below.

Detecting Framing Changes using Periods of Maximum


Correlation
Our five polarity classes serve as measures of framing within a domain. We conceive of
a framing change as a trend, consistent across our five polarity classes, over a period of
some years.
We describe our intuition and approach to detecting framing changes below. Firstly,
we show that the frequency with which adjectives occur in articles varies both by
domain, and in different years within a domain.
Figure 1 depicts the average number of adjectives per article for each of our domains
over the years in their respective periods of interest. We note that this count serves also
as a measure of how subjective news publishing in a domain is, since adjectives are
indicative of how events are framed.
Notice that in the domain LGBT rights, the peak in this measure immediately
precedes a framing change from an earlier study [25]. Whereas we do not claim that this
correlation is true for all domains, we posit that it motivates the utility of adjective
polarity in the study of framing changes.
Despite the fact that the volume of adjectives used per article vary dramatically (by
up to 30%), we find that the variation in our annual cluster polarity between successive
years is generally on the order of less than 1%. However, through a consistent trend
lasting multiple years, our measure of annual polarity can change (increase or decrease)

March 7, 2020 9/34


10

4
Drones
2
2003 2005 2007 2009 2011

1.00 Immigration

0.80

0.60

0.40
2000 2003 2005 2007 2009 2011 2013 2015 2017

4.00

2.00

LGBT Rights
0.00
1996 1999 2002 2005 2008 2011 2014
Fig 1. The average number of adjectives per article, shown for our domains over their
respective periods of interest. This metric serves as a measure of the subjectivity of
news in a domain. Notice that in the domain LGBT rights, the peak in this measure
immediately precedes a framing change identified in an earlier study [25].

March 7, 2020 10/34


20.00

10.00

Obesity
0.00
1990 1993 1996 1999 2002 2005 2008

40.00

30.00 Smoking

20.00

10.00

1990 1993 1996 1999 2002 2005

5.00
Surveillance
4.00

3.00

2.00

2010 2011 2012 2013 2014 2015 2016


Fig 2. The average number of adjectives per article, shown for our domains over their
respective periods of interest. This metric serves as a measure of the subjectivity of
news in a domain.

March 7, 2020 11/34


cumulatively by up to 5% (see figure 11 for an example). We identify a framing change
based on such a cumulative trend.
We now consider the problem of fusing estimates from our five measures of annual
cluster polarity. Consider the change in polarity of classes 1, 3, 4, and 5 between 2005
and 2006 in figure 11, as against the change in class 2. As mentioned earlier, classes 1,
3, 4, and 5 changed little, whereas class 2 showed a substantial change.
In contrast, we note that in the period 2001 to 2013, a consistent trend was
observable across all five classes, with substantial reductions in classes 2 and 3, and a
notable corresponding increase in class 5. We exploit correlations between the changes
in our five classes to identify framing changes.
Accordingly, we use Pearson correlation [26] between our classes as a measure of
trend consistency. Let a given domain have m years in its period of interest. We
generate all possible contiguous subsets of Tm , namely, Ti–j , where i ≤ j ≤ m, and Ti–j
denotes the domain corpus from year i to year j.
Let C =  {C1 , . . . , C5 } be the set of class vectors for this domain subset, where
C1i
C1i+1  i
C1 =   . . . , C1 is the value of class 1 for year i, and similarly for C2 , . . . , C5 .

C1j
To measure the correlation of subset Ti–j , we compute its matrix of correlation
coefficients [27] K. We reshape K into a vector of size f × 1 where f = i ∗ j, and
evaluate its median, l. We find the maximum value of l, lmax , over all possible values of
i and j. We denote the values of i and j corresponding to lmax as imax and jmax . We
return Timax –jmax as our period of maximum correlation (PMC).
We note that the smaller the duration of a PMC, the greater the possibility that our
class vectors may have a high correlation in the period due to random chance. To
compensate for this effect, we employ a threshold whereby a period is not considered as
a candidate for the domain PMC unless it lasts at least y years. We uniformly employ a
value of y = 3 in this paper.
Our approach thus identifies polarity drifts that are both correlated (quantitatively
measured by correlations between different measures of polarity) and sustained (by the
imposition of a threshold of duration). We point out that our approach filters out
isolated drifts in individual polarity measures, since such drifts are uncorrelated across
multiple measures. Further, we note that the magnitude of individual drifts matters
only indirectly to our approach, to the extent that a larger drift, if consistent across
multiple polarity measures, may have higher correlation than a smaller drift that is also
correlated.
A block diagram depicting our overall approach is shown in figure 3.

Quantitative Evaluation
We now discuss a partial quantitative evaluation of our approach using a
Precision-Recall analysis. Our analysis relies on ground truth annotation of framing
changes, as detailed in the section below.
We are unable to conduct a full precision-recall analysis over all domains due to the
limitations we discuss in the following sections, as well as in the Qualitative Analysis
and Discussion section. However, we expect that our partial analysis is representative of
the general performance of the approach.

March 7, 2020 12/34


Domain Corpora

LDA

Seed Datasets Frames

Stanford Parser Compute class frequencies

Adjectives Annual Polarities

Compute class frequencies Compute correlations

Adjective Distribution PMC

Fig 3. A block diagram illustrating our approach. Our adjective distribution is


computed using per class frequencies of occurrence for each adjective in the seed
dataset(s). We use this distribution to compute annual cluster polarities of frames
obtained using LDA from our domain corpora.

Ground Truth Annotation


We label a ground truth for each domain, marking years corresponding to framing
changes as positives, and other years as negatives. We primarily obtain our positives
using the findings of large-scale surveys from earlier research.
In order to do so, we study the literature pertaining to framing changes in the
domains we examine. We identify large-scale studies conducted by reputed
organizations such as the National Cancer Institute (NCI) [28], the Columbia
Journalism Review (CJR) [29], Pew Research [30], and so on. These studies examine
news and media publishing in a particular domain over a period of time, as we do, and
manually identify changes in the framing of domain news during these periods.
The studies we rely on for ground truth sometimes provide quantitative justification
for their findings. For example, the NCI monograph on the framing of smoking news
identifies the number of pro and anti tobacco control frames before and after a framing
change [28]. These studies therefore provide an expert annotation of framing changes in
our domains, for the periods we examine. Details of each study we used and their
findings are reported in the Results section.
By demonstrating substantial agreement between the results of our approach and
those of earlier ground truth surveys, we establish our claim that our approach may be
used to automatically identify framing changes in domain news publishing.

Precision-Recall Analysis
To gain confidence that our approach successfully identifies framing changes, we
conduct a precision-recall analysis on our data. We consider each year in each domain
as a data point in our analysis. We calculate overall precision and recall over all data
points in our domains. We consider a data point a true positive or true negative if both
a ground truth study and our approach labeled it as corresponding to a framing change,
or otherwise, respectively. We refer to a data point that was labeled as a positive (or
negative) by our approach, but which is a negative (or positive) according to the
relevant ground truth survey as a false positive or false negative, respectively.

March 7, 2020 13/34


tp tp
We calculate precision as P = tp+fp and recall as R = tp+fn where tp, fp, and fn are
the numbers of true positives, false positives, and false negatives, respectively.
For some domains, we were unable to identify an earlier survey studying the framing
of news publishing in the domain. We exclude these domains from our precision-recall
analysis. However, we show that in these cases, our estimated PMCs foreshadow events
of substantial public and legislative import. Since framing changes have been shown to
be associated with such public and legislative events [12], we argue that this provides
some measure of validation of our estimated PMCs.

Fig 4. Our estimated clusters for the domain abortion. Each cluster is said to
represent a unique frame. The frame discussed in cluster 1 (characterized by the terms
‘abortion’ and ‘ban’) concerns a proposed ban on abortion. We analyze this cluster, and
find that our estimated PMC (Figure 15) coincides with the period immediately
preceding the Partial Birth Abortion Act of 2003.

Fig 5. Our estimated clusters for the domain drones. Each cluster is said to represent
a unique frame. The frame discussed in cluster 1 concerns the use of drones against
terrorist targets. Our analysis of this cluster returns a PMC of 2009 to 2011 (Figure 17).
Our PMC immediately foreshadows the Federal Aviation Administration’s
Modernization and Reform Act of 2012.

March 7, 2020 14/34


Fig 6. Our estimated clusters for the domain LGBT Rights. Each cluster is said to
represent a unique frame. The frame discussed in cluster 3 discusses the subject of
same-sex marriage, and in particular, judicial interest in this topic. We analyze this
cluster and estimate two PMCs of nearly identical correlation score (2006 to 2008 and
2013 to 2015 Figure 14). The PMC of 2013 to 2015 coincides exactly with the Supreme
Court judgment of 2015 that legalized same-sex marriage in the entire US.

Fig 7. Our estimated clusters for the domain obesity. Each cluster is said to represent
a unique frame. We posit that cluster 2 (characterized by the terms ‘food’, ‘diet’, and
‘make’) represents societal causes of obesity (see the Obesity section). We analyze this
cluster and estimate a PMC of 2005 to 2007 (Figure 13). Our PMC agrees with the
findings of an earlier human survey [2].

March 7, 2020 15/34


Fig 8. Our estimated clusters for the domain smoking. Each cluster is said to represent
a unique frame. The frame of cluster 3, characterized by the terms ‘cancer’ and ’smoke’,
discusses the health risks associated with smoking. We analyze this cluster and estimate
a PMC of 2001 to 2003 (Figure 11). Our PMC coincides exactly with an earlier
monograph from the National Cancer Institute (NCI) that describes a progression
towards tobacco control frames in American media between 2000 and 2003.

Fig 9. Our estimated clusters for the domain surveillance. Each cluster is said to
represent a unique frame. The frame of cluster 3, characterized by the terms ‘national‘,
‘security’, and ‘agency’, discusses the Snowden revelations of 2013. We analyze this
cluster and estimate a PMC of 2013 to 2014 (Figure 12). Our PMC coincides exactly
with the period following the Snowden revelations. Additionally, we note that the
Columbia Journalism Review [29] found that following the Snowden revelations, news
coverage of Surveillance changed to a narrative focusing on individual rights and digital
privacy [12].

March 7, 2020 16/34


Fig 10. Our estimated clusters for the domain Immigration. Each cluster is said to
represent a unique frame. The frame of cluster 2 discusses the waning of asylum grants,
increased border refusals and the final 2002 white paper on “Secure Borders, Safe
Haven.” We analyze this cluster and estimate a PMC of 2000 to 2002 (Figure 16). Our
PMC coincides exactly with the period immediately foreshadowing the government
white paper.

Results
We find that our periods of maximum correlation correlate substantially with framing
changes described in earlier surveys [2, 29, 31, 32], and also foreshadow legislation.
Our computed class vectors are depicted in figures 11 to 15. We discuss each domain
below.

Smoking
The NCI published a monograph discussing the influence of the news media on tobacco
use [28]. On page 337, the monograph describes how, during the period 2001 to 2003,
American news media had progressed towards tobacco control frames. It states that
55% of articles in this period reported progress on tobacco control, whereas only 23%
reported setbacks.
In contrast, the monograph finds (also on page 337) that between 1985 to 1996,
tobacco control frames (11) were fairly well balanced with pro-tobacco frames (10). We
extracted a dataset of over 2,000 articles from 1990 to 2007.
Our approach returns a PMC of 2001 to 2003 (see figure 11) for this domain. Since
no studies cover the period 1997 to 2000 [28], we interpret the findings described in the
monograph to imply that the change towards tobacco control frames predominantly
began in 2000, and ended in 2003. This domain therefore contributes three true
positives (2001 to 2003) and one false negative (2000), with no false positives, to our
precision-recall analysis.

Surveillance
The CJR [29] found that following the Snowden revelations, news coverage of
Surveillance in the US changed to a narrative focusing on individual rights and digital
privacy [12]. We compiled a dataset consisting of approximately 2,000 surveillance
articles from the New York Times for the period 2010 to 2016.

March 7, 2020 17/34


0.260

0.240 Class 1
0.220

0.200

0.180

0.110
Class 2

0.100

0.150

0.140

Class 3
0.130

0.120
0.260

0.240

0.220

Class 4
Class 5
0.340
0.320
0.300
0.280
0.260
1990 1993 1996 1999 2002 2005
Fig 11. Annual polarities for cluster 3, (characterized by the terms ‘cancer’ and
‘smoke’), from Figure 8 from the domain smoking for the classes 1 to 5. The PMC is
shown with solid lines in square markers, and coincides exactly with a framing change
described in an earlier NCI monograph.

March 7, 2020 18/34


The frame of cluster 3 in figure 9, characterized by the terms ‘national‘,‘security’,
and ‘agency’, discusses the Snowden revelations of 2013. We analyze this cluster. Our
class vectors for this domain are shown in figure 12. We obtain a PMC of 2013 to 2014
for this period, corresponding closely with the ground truth framing change.
The trends in our class vectors are indicative of the change. As can be seen from the
figure, positivity (measured by class 5), drops markedly, together with a simultaneous
increase in negativity (class 1) and neutrality (classes 2 and 3). Class 4 remains close to
constant during this period and thus does not affect our hypothesis.
We interpret the findings of [29] to refer primarily to 2013, the year in which the
revelations were made, and the following year, 2014. Whereas other interpretations may
conclude a longer framing change, they must necessarily include this period. This
domain therefore contributes two true positives (2013 and 2014) with no false positives
or negatives to our quantitative evaluation.

Obesity
Kim and Willis [2] found that the framing of obesity news underwent changes between
the years 1997 and 2004. During this period, Kim and Willis found that the fraction of
news frames attributing responsibility for obesity to social causes increased significantly.
Prior to this period, obesity tended to be framed as an issue of individual responsibility.
For example, obesity news after the year 2000 has often criticized food chains for their
excessive use of sugar in fast food, as shown in the NYT snippet in the Introduction and
Contributions section. We compiled a dataset of over 3,000 articles from the New York
Times (since Kim and Willis [2] restrict their study to Americans) from 1990 to 2009.
The clusters we estimate for this domain are shown in Figure 7. Cluster 2 addresses
possible causes of obesity, with a particular focus on dietary habits. We posit that this
cluster represents societal causes more than individual ones (since individual causes, as
shown in the NYT snippet of the Introduction and Contributions section tend to discuss
topics such as fitness and sedentary lifestyles, as opposed to food content). We observe
that the PMC for this domain (2005 to 2007) is characterized by increased positivity,
shown by classes 4 and 5, and decreased negativity (class 1). Our results for this
domain thus agree with the findings of Kim and Willis [2].
We were unable to use this domain in our precision-recall analysis, since Kim and
Willis, to the best of our knowledge, do not specify a precise period during which the
framing change took place.
However, since Figures 2 and 3 of Kim and Willis [2] show a dramatic increase of
social causes in 2004, and a corresponding marked decline of individual causes, we
conclude a substantial agreement between their findings and our results.

LGBT Rights
We compiled a dataset of over 3,000 articles from the period 1996 to 2015 in this domain.
Figure 6 depicts our estimated clusters. Cluster 3 represents a frame that discusses the
subject of same-sex marriage and its legality. We note that the Supreme Court ruled to
legalize same-sex marriages in the US in the year 2015. Our class vectors for this domain
are shown in figure 14. We obtained two PMCs with nearly identical correlation scores
(0.999 for the period 2006 to 2008, and 0.989 for the period 2013 to 2015). Figure 14
highlights the period 2013 to 2015 immediately preceding the judicial interest of 2015.
We were unable to identify a prior study that discusses the framing of LGBT news
over our entire period of interest. However, we use the findings reported in Gainous et
al. [32] as our ground truth for this domain. Gainous et al. studied the framing of
LGBT related publishing in the New York Times over the period 1988 to 2012, and
found a dramatic increase in equality frames between approximately 25 in 2008 and

March 7, 2020 19/34


0.25 Class 1

0.20

0.15

0.14

0.12 Class 2

0.1
0.14

0.12 Class 3

0.1
0.24

0.22
Class 4
0.2

0.18

Class 5
0.35

0.3
2010 2011 2012 2013 2014 2015 2016
Fig 12. Annual polarities for a representative cluster (characterized by the terms
‘national‘, ‘security’, and ‘agency’) from the domain surveillance for the classes 1 to 5.
The PMC is shown with solid lines in square markers.

March 7, 2020 20/34


0.3

Class 1
0.25

0.2

0.120 Class 2

0.100

0.15

0.14
Class 3
0.13

0.12
0.24
Class 4

0.22

0.2
0.36 Class 5

0.34

0.32

0.3

0.28
1990 1993 1996 1999 2002 2005 2008
Fig 13. Annual polarities for cluster 2 (characterized by the terms ‘diet’, ‘food’, and
‘make’) from Figure 7 from the domain obesity for the classes 1 to 5. The PMC is shown
with solid lines in square markers. We posit that this cluster represents societal causes
of obesity (see the Obesity section). We observe that the PMC for this cluster (2005 to
2007) agrees with the findings of Kim and Willis [2].

March 7, 2020 21/34


approximately 110 in 2012. Correspondingly, our findings of Figure 14 show that
between 2008 and 2012, there was a dramatic increase in the measures of classes 4 and 5
(representing positivity), and a marked reduction in the measures of classes 1 and 2,
(representing negativity).
For uniformity, we imposed a threshold of y = 3 years to identify a PMC. As we
explain in the section on Detecting Framing Changes using Periods of Maximum
Correlation, high correlations are less likely for more extended periods. Therefore 2008
and 2012 is unsurprisingly not our PMC at the chosen uniform threshold. However,
given the marked increase in positivity and corresponding decrease in negativity in this
period, we posit that the period has high correlation. Also, whereas we were unable to
find a study covering the period 2013 to 2015 to use as ground truth, we note that our
PMC preceded major judicial interest in the domain.
We therefore rely on Gainous and Rhodebeck [32] for ground truth in this domain,
and note that the trend towards increased positivity (and reduced negativity) in
Figure 14 began in 2008 and ended in 2013. We therefore conclude that our measures
return four true positives, and one false positive for this domain.

Abortion
The Partial-Birth Abortion Ban Act was enacted in 2003. We obtained 248 articles for
the period 2000 to 2003, for this domain. We obtain a PMC of 2001 to 2003 for this
domain, as shown in figure 15.

Immigration
We study the framing of immigration news in the United Kingdom. We obtained about
3,600 articles on the subject of Immigration from the Guardian API for the period 2000
to 2017. For this domain, we carried out our analysis on the article titles (rather than
the full text). Since the Guardian returns full length articles, we found that this design
choice allows us to produce a more focused domain corpus than the one generated by
the full article text. We depict our estimated class vectors and PMC in figure 16.
We analyze the frame of cluster 2 in Figure 10. This cluster deals with the issue of
asylum seekers to the United Kingdom. In the period beginning immediately before the
year 2000,a new peak in asylum claims to the United Kingdom of 76,040 had been
reached [33]. This event coincided with a high-profile terrorist act by a set of Afghan
asylum seekers [33].
These events resulted in increased border refusals and the final 2002 white paper on
“Secure Borders, Safe Haven.” We estimate a PMC of 2000 to 2002 (Figure 16). Our
PMC coincides exactly with the period immediately foreshadowing the government
white paper.

Drones
We obtained nearly 4,000 articles on this domain for the period 2003 to 2012. We
obtain a PMC of 2009 to 2011 for this domain, as shown in Figure 17.
Our PMC immediately foreshadows the Federal Aviation Administration’s
Modernization and Reform Act of 2012.

Predictive Utility
The aforementioned two domains (immigration and drones) highlight the predictive
utility of news framing. Whereas we did not find earlier surveys that coincide with our
PMCs for these domains, we note that these PMCs foreshadowed substantial legislative

March 7, 2020 22/34


0.200
Class 1

0.180

0.160

0.120
Class 2
0.115

0.110

0.105

0.150 Class 3

0.140

0.130
0.250

0.240 Class 4

0.230

0.220

0.350
Class 5
0.340

0.330

1996 1998 2000 2002 2004 2006 2008 2010 2012 2014
Fig 14. Annual polarities for cluster 3, characterized by the terms ‘gay’, ‘rights’, and
‘marriage’, in Figure 6 from the domain LGBT Rights for the classes 1 to 5. We obtain
two PMCs with nearly identical correlation scores, namely, 2006 to 2008 and 2013 to
2015. The PMC of 2013 to 2015 is shown with solid lines in square markers,
immediately preceding the judicial interest of 2015.

March 7, 2020 23/34


0.3
Class 1
0.28

0.26

0.24

0.12

0.11
Class 2
0.10

0.09

0.14
Class 3
0.13

0.12

Class 4
0.22

0.20

0.18

0.30
Class 5
0.28

0.26

2000 2001 2002 2003


Fig 15. Annual polarities for cluster 1 (characterized by the terms ‘abortion’ and ‘ban’)
from Figure 4 from the domain abortion for the classes 1 to 5. The PMC is shown with
solid lines in square markers (2001 to 2003) and foreshadows the partial birth abortion
ban of 2003.

March 7, 2020 24/34


0.500

0.400

0.300

0.200 Class 1

0.15

0.10

0.05 Class 2

0.15

0.10

Class 3

0.25

0.20

0.15 Class 4
0.50

0.40

0.30

0.20 Class 5

2000 2003 2005 2007 2009 2011 2013 2015 2017


Fig 16. Annual polarities for cluster 2 (discussing asylum grants) from Figure 10 from
the domain immigration for the classes 1 to 5. The PMC is shown with solid lines in
square markers, and foreshadows the “Secure Borders, Safe Haven” white paper of 2002.

March 7, 2020 25/34


activity. This observation suggests that PMCs estimated through real-time monitoring
of domain news may yield predictive utility for legislative and commercial activity.

Overall Precision and Recall


We obtain an overall precision of 0.90 as well as a recall of 0.90. Our results
demonstrate that we successfully identify 90% of true positives, whereas only 10% of the
positives we identify are false positives.
Further, we point out that our false positives generally either precede or succeed a
ground truth framing change. Therefore, we posit that such false positives may be due
to imprecision in measurement rather than any considerable failure of our approach.
Our results demonstrate substantial agreement with ground truth in domains for
which prior surveys have studied framing changes. In domains for which we did not find
such surveys, we demonstrate that our PMCs foreshadow periods of substantial public
and legislative import. We posit, therefore, that our approach successfully identifies
framing changes.

Qualitative Analysis and Discussion


Whereas we provide a partial quantitative evaluation of our approach using precision
and recall in the preceding sections, we confront substantial difficulties in uniformly
conducting such evaluations across all of our domains. In this section, we qualitatively
evaluate our results in the context of these limitations.
Framing, and framing changes, have in general been studied from the lens of
identifying a general trend from a particular data source. Studies often describe such
trends without stating a hard beginning or end of the trend period [28, 29].
The periods we analyze here are relatively long (often lasting almost two decades).
The available human surveys, not surprisingly are limited in scope and do not cover the
same periods in their entirety. As a result, we observe missing years during which
computational methods produce a belief that cannot be verified using a human survey,
since no human survey covers such years. Therefore, a quantitative precision-recall style
analysis becomes difficult or impossible to conduct.
Further, the language used in existing studies is not always sufficiently precise as to
interpret a fixed set of positives and negatives. For example, the Columbia Journalism
Review uses the phrase “after the Snowden revelation” to describe the change in media
attitude. The Snowden revelations were made in June 2013. Our estimated PMC is
2013 and 2014.
The NCI monograph on smoking states that between 1985 and 1996, framing was
balanced between pro and anti tobacco control, and in 2001 to 2003, framing favored
tobacco control. Our PMC is 2001 to 2003. However, there are no studies that we know
of that cover the period 1996 to 2000.
Likewise, the Kim and Willis study on Obesity [2] that was published in 2007 states
that the change happened “in recent times,” but shows quantitative measures only until
2004. The measures that Kim and Willis compute show that the frequency of societal
causes increases sharply in 2004. Our PMC over the period 1990 to 2009 (with our
uniformly imposed threshold of y = 3) is 2005 to 2007. The correlation for the period
2004 to 2007 is high but it is not the PMC at a threshold of y = 3.
These examples show that performing a precision-recall analysis based on prior work
is problematic, since obtaining a set of positives and negatives from prior studies in the
sociological and communications literature involves interpretation.
Given that the data sources and coverage between our analysis and that of prior
surveys are usually quite different, the correlations we obtain appear quite substantial.
However, quantitative evaluation remains challenging for the reasons we point out.

March 7, 2020 26/34


0.180
Class 1

0.160

0.140

0.100
Class 2
0.090

Class 3

0.140

0.130

Class 4
0.260

0.240

0.400

0.380
Class 5
0.360

0.340

2003 2005 2007 2009 2011


Fig 17. Annual polarities for cluster 1 (discussing drone strikes) from Figure 5 from
the domain drones for the classes 1 to 5. The PMC is shown with solid lines in square
markers, and immediately foreshadows the Federal Aviation Administration’s
Modernization and Reform Act of 2012. This finding suggests the predictive utility of
framing change detection for legislative activity.

March 7, 2020 27/34


This paper follows the spirit of recent work [11, 13, 15] in seeking to develop the
study of framing into a computational science. We acknowledge that our dataset
collection and methods may undergo refinement to tackle broader ground truth data, of
a wider temporal and geographical scope. Nonetheless, we posit that our methods and
results have scientific value, and hope that future work will provide greater coverage of
ground truth.
Please note that the underlying data preparation requires social science expertise
and cannot be effectively crowdsourced via a platform such as Mechanical Turk. We
therefore hope that our work catches the interest of social scientists and leads them to
pursue more comprehensive studies of framing in news media that would enable
improvements in computational methods.

Conclusion
We highlight a problem of significant public and legislative importance, framing change
detection. We contribute an unsupervised natural language based approach that detects
framing change trends over several years in domain news publishing. We identify a key
characteristic of such changes, namely, that during frame changes, the polarity of
adjectives describing cooccurring nouns changes cumulatively over multiple years. Our
approach agrees with and extends the results of earlier manual surveys. Whereas such
surveys depend on human effort and are therefore limited in scope, our approach is fully
automated and can simultaneously run over all news domains. We contribute the
Framing Changes Dataset, a collection of over 12,000 news articles from seven domains
in which framing has been shown to change by earlier surveys. We will release the
dataset with our paper. Our work suggests the predictive utility of automated news
monitoring, as a means to foreshadow events of commercial and legislative import.
Our work represents one of the first attempts at a computational modeling of
framing and framing changes. We therefore claim that our approach produces promising
results, and that it will serve as a baseline for more sophisticated analysis over wider
temporal and geographical data.

Appendix: Sample Correlations


Correlations for all subsets are shown in Table 4. The PMC is shown in bold.

Table 4. All correlations for cluster 2 of the domain Immigration.


Correlation Start Index End Index
1 1 2
1 2 3
1 3 4
1 4 5
1 5 6
1 6 7
1 7 8
1 8 9
1 9 10
1 10 11
1 11 12
1 12 13
1 13 14
1 14 15
1 15 16

March 7, 2020 28/34


Table 4 All correlations for cluster 2 of the domain Immigration (continued).
Correlation Start Index End Index
1 16 17
1 17 18
0.99 1 3
0.98 10 12
0.98 12 14
0.96 12 15
0.93 12 16
0.93 9 12
0.88 11 13
0.88 9 11
0.87 11 14
0.86 11 16
0.85 11 15
0.85 8 12
0.84 8 10
0.83 7 9
0.82 15 17
0.82 14 16
0.82 10 13
0.81 7 12
0.81 9 13
0.79 10 14
0.79 10 16
0.79 9 14
0.78 9 16
0.78 10 15
0.77 9 15
0.77 8 16
0.77 8 13
0.76 8 15
0.76 8 14
0.75 12 17
0.75 5 7
0.75 12 18
0.74 11 18
0.74 11 17
0.72 6 8
0.72 10 17
0.72 10 18
0.72 7 11
0.72 13 15
0.71 7 10
0.70 8 11
0.69 2 4
0.68 7 13
0.68 4 6
0.67 3 5
0.67 16 18
0.66 7 14
0.66 9 17

March 7, 2020 29/34


Table 4 All correlations for cluster 2 of the domain Immigration (continued).
Correlation Start Index End Index
0.66 9 18
0.66 14 17
0.65 8 17
0.65 7 16
0.65 8 18
0.64 7 15
0.63 6 11
0.62 6 10
0.62 6 12
0.62 6 9
0.61 6 14
0.61 6 13
0.61 5 13
0.61 5 12
0.60 5 8
0.60 5 14
0.60 7 17
0.60 6 16
0.60 4 12
0.59 7 18
0.59 4 13
0.59 5 16
0.59 2 5
0.58 6 15
0.58 5 11
0.58 5 15
0.58 5 10
0.57 5 9
0.57 4 16
0.57 4 14
0.56 2 13
0.56 2 12
0.56 4 17
0.56 15 18
0.56 4 18
0.56 3 13
0.56 3 6
0.56 3 12
0.56 1 5
0.55 4 15
0.55 5 17
0.55 5 18
0.55 3 18
0.55 2 18
0.55 3 17
0.55 2 17
0.54 2 16
0.54 3 16
0.54 6 17
0.54 6 18

March 7, 2020 30/34


Table 4 All correlations for cluster 2 of the domain Immigration (continued).
Correlation Start Index End Index
0.53 2 14
0.53 3 14
0.53 2 15
0.53 3 15
0.52 1 12
0.52 1 13
0.51 1 18
0.51 1 17
0.50 13 17
0.50 4 7
0.49 2 6
0.49 1 16
0.49 1 6
0.48 1 14
0.48 1 15
0.47 13 18
0.46 13 16
0.44 4 8
0.43 1 4
0.42 4 9
0.40 4 10
0.40 4 11
0.39 14 18
0.36 3 7
0.36 2 11
0.36 1 7
0.36 3 11
0.35 3 8
0.36 2 7
0.35 2 8
0.35 2 9
0.34 1 8
0.34 2 10
0.34 3 9
0.34 3 10
0.34 1 11
0.34 1 9
0.32 1 10

Ethics Statement
Our study involved no human or animal subjects.

Funding Statement
CS has a commercial affiliation to Amazon. The funder provided support in the form of
salaries for this author, but did not have any additional role in the study design, data
collection and analysis, decision to publish, or preparation of the manuscript. The
specific roles of these authors are articulated in the ‘author contributions’ section.

March 7, 2020 31/34


Competing Interests Statement
The above commercial affiliation does not alter our adherence to PLOS ONE policies on
sharing data and materials.

Author Contributions
KS and CS conceived the research and designed the method. KS prepared the datasets
and performed the analysis. KS and MPS designed the evaluation approach. KS, CS,
and MPS wrote the paper.

References
1. Gunnars K. Ten Causes of Weight Gain in America; 2015.
https://www.healthline.com/nutrition/10-causes-of-weight-gain#section12.

2. Kim SH, Willis A. Talking about Obesity: News Framing of Who Is Responsible
for Causing and Fixing the Problem. Journal of Health Communication.
2007;12(4):359–376.
3. Flegal K, Carroll M, Kit B, Ogden C. Prevalence of Obesity and Trends in the
Distribution of Body Mass Index Among US Adults, 1999–2010. Journal of the
American Medical Association. 2012;307(5):491–497.

4. Chong D, Druckman J. Framing theory. Annual Reviews on Political Science.


2007;10:103–126.
5. de Vreese C. News framing: Theory and typology. Information Design Journal.
2005;13(1):51–62.

6. Constantin L. Facebook ID Leak hits Millions of Zynga Users; 2010.


http://tinyurl.com/2bqwoxq.
7. Crossley R. Facebook ID Leak hits Millions of Zynga Users; 2011. http://www.
develop-online.net/news/facebook-id-leak-hits-millions-of-zynga-users/0107956.

8. Fitzsimmons C. Facebook And Zynga Sued Over Privacy; 2014.


http://www.adweek.com/digital/facebook-zynga-sued/.
9. US Congress. Personal Data Protection and Breach Accountability Act of 2014;
2014. https://www.congress.gov/bill/113th-congress/senate-bill/1995.
10. Benford R, Snow D. Framing Processes and Social Movements: An Overview and
Assessment. Annual Review of Sociology. 2000;26(1):611–639.
11. Card D, Boydstun A, Justin Gross J, Resnik P, Smith N. The Media Frames
Corpus: Annotations of Frames across Issues. In: Proceedings of the 53rd Annual
Meeting of the Association for Computational Linguistics and the 7th
International Joint Conference on Natural Language Processing (Volume 2: Short
Papers). vol. 2; 2015. p. 438–444.
12. Sheshadri K, Singh MP. The Public and Legislative Impact of
Hyper-Concentrated Topic News. Science advances. 2019;5(8).

March 7, 2020 32/34


13. Tsur O, Calacci D, Lazer D. A Frame of Mind: Using Statistical Models for
Detection of Framing and Agenda Setting Campaigns. In: Proceedings of the
53rd Annual Meeting of the Association for Computational Linguistics and the
7th International Joint Conference on Natural Language Processing (Volume 1:
Long Papers); 2015. p. 1629–1638.
14. Entman R. Framing: Toward Clarification of a Fractured Paradigm. Journal of
Communication. 1993;43(4):51–58.
15. Alashri S, Tsai JY, Alzahrani S, Corman S, Davulcu H. “Climate Change”
Frames Detection and Categorization Based on Generalized Concepts. In:
Proceedings of the 10th IEEE International Conference on Semantic Computing
(ICSC); 2016. p. 277–284.
16. NYT. Developer APIs; 2016. http://developer.nytimes.com/.
17. Wikipedia. The New York Times; 2001.
https://en.wikipedia.org/wiki/The New York Times.
18. The Guardian. Guardian Open Platform; 2016.
http://open-platform.theguardian.com/.
19. Wikipedia. The Guardian; 2002. https://en.wikipedia.org/wiki/The Guardian.
20. King G, Schneer B, White A. How the News Media Activate Public Expression
and Influence National Agendas. Science. 2017;358(6364):776–780.
21. Sheshadri K, Ajmeri N, Staddon J. No Privacy News is Good News: An Analysis
of New York Times and Guardian Privacy News from 2010—2016. In:
Proceedings of the 15th Privacy, Security and Trust Conference. Calgary, Alberta,
Canada; 2017. p. 159–167.
22. Socher R, Perelygin A, Wu J, Chuang J, Manning C, Ng A, et al. Recursive Deep
Models for Semantic Compositionality Over a Sentiment Treebank. In:
Proceedings of the 2013 Empirical Methods in Natural Language Processing
Conference (EMNLP). Seattle, WA: Association for Computational Linguistics;
2013. p. 1631–1642.
23. Melo GD, Bansal M. Good, Great, Excellent: Global Inference of Semantic
Intensities. Transactions of the Association for Computational Linguistics.
2013;1:279–290.
24. Blei D, Ng A, Jordan M. Latent Dirichlet Allocation. Journal of Machine
Learning Research. 2003;3:993–1022.
25. Engel S. Frame Spillover: Media Framing and Public Opinion of a Multifaceted
LGBT Rights Agenda. Law and Social Inquiry. 2013;38:403–441.
26. Benesty J, Chen J, Huang Y, Yiteng C, Cohen I. Pearson Correlation Coefficient.
In: Noise reduction in speech processing. Springer; 2009. p. 1–4.
27. Mathworks. Correlation Coefficients; 2019.
https://www.mathworks.com/help/matlab/ref/corrcoef.html.
28. National Cancer Institute. How the News Media Influence Tobacco Use; 2019.
https://cancercontrol.cancer.gov/brp/tcrb/monographs/19/m19 9.pdf.
29. Vernon P. Five Years Ago, Edward Snowden Changed Journalism; 2018.
https://www.cjr.org/the media today/snowden-5-years.php.

March 7, 2020 33/34


30. Pew Research. The State of Privacy in post-Snowden America; 2016.
31. Cummings M, Proctor R. The Changing Public Image of Smoking in the United
States: 1964–2014. Cancer Epidemiology and Prevention Biomarkers.
2014;23:32–36.
32. Gainous J, Rhodebeck L. Is Same-Sex Marriage an Equality Issue? Framing
Effects Among African Americans. Journal of Black Studies. 2016;47(7):682–700.
33. Wikipedia. History of UK immigration control; 2019.
https://en.wikipedia.org/wiki/History of UK immigration control.

March 7, 2020 34/34


Response to Reviewers

Dear PLOS One Editor-in-Chief, 


 
We would respectfully like to appeal the decision made on our submission PONE-D-19-15324. Our 
manuscript was not sent for review. We discuss the editorial comments and provide a point by point 
response below. 
 
Editor Comments: 
 
This research is focused on detecting framing changes in topical news. The authors argue that
the public opinion varies with the way the news is framed. The research lacks motivation as it is
not clear what benefits can be achieved if frame changes are detected. Moreover, the problem
is already discussed and presented in articles [4,5]. This paper seems to provide more empirical
evidence in support to the existing research [4,5]. Hence, the research contribution is unclear.
Furthermore, following points are worth considering:-
1. The related work should be discussed in detail highlighting the advantages/limitations of
existing approaches.
2. The dataset and codes are not available online.
3. The comparison of research with state of the art approaches and manual techniques has not
been conducted.
4. An overview diagram for the proposed approach would help the reader understand the flow of
the proposed approach.
5. The results are presented but not discussed. The section should be renamed to "Results and
Discussion" and appropriate discussion should be added with each pair of graphs.

Responses:

We provide a point by point response below.

Comment on Motivation: “The research lacks motivation as it is not clear what benefits
can be achieved if frame changes are detected.”

Framing changes have been shown to have commercial and legislative consequences,
and have also been shown to foreshadow public attention changes. We cite five
example articles here [1-5] and can readily provide more as necessary. A large body of
literature in the fields of Political Science and Communication addresses the manual
identification of framing changes in specific domains. Whereas we cite two examples
here [6-7], additional examples are available – please let us know. However, existing
work does not attempt to address the problem of computationally detecting framing
changes. Our work is the first attempt at this problem, which has significant commercial,
public, and legislative import. Our results substantially agree with the results of earlier
human surveys, and further have shown predictive utility for legislative and public
response. Our work therefore has significant scientific and potential commercial value.
Comment on Contribution: “Moreover, the problem is already discussed and presented
in articles [4,5]. This paper seems to provide more empirical evidence in support to the
existing research [4,5]. Hence, the research contribution is unclear.”

References 4 and 5 are articles discussing framing theory in the Communications


literature. Neither article makes any attempt to computationally (or even manually)
measure framing in any real-world domain, let alone attempt to detect changes in
framing. Therefore, the problem addressed by our work is fundamentally novel and
different from the work presented in [4] and [5].

We present a fully unsupervised approach that is the first method to computationally


detect framing changes. Further, we contribute a dataset of over 12,000 news articles
from seven domains. Our work will provide a strong baseline to foster new research in
this influential area of Political Science and Communications research.

1) “The related work should be discussed in detail highlighting the advantages/limitations of


existing approaches.”

We emphasize that our work is the first attempt at computationally modeling changes in
framing. The closest previous efforts in this area are those of [10] and [11]. We describe
our novel contributions over these efforts in detail in the Related Work section. We are
unaware of any other relevant related work and would be happy to learn of any such
work from the Editor.

2) “The dataset and codes are not available online.”

This statement is incorrect. We have clearly stated in our submission that all data and
code will be made available, and are available online at the following link:
https://drive.google.com/open?id=1zAH__Y1lcdriuwUcjZsKmvaqYtzAjyZ9

All our results are reproducible from the data and code in the above mentioned
repository. We will provide a guide to run our code.

3) “The comparison of research with state of the art approaches and manual techniques
has not been conducted.”

Please refer to our responses above to the comment on motivation and comment #1.

4) “An overview diagram for the proposed approach would help the reader understand the
flow of the proposed approach.”
We are grateful for this suggestion and will incorporate an overview diagram illustrating
our approach. However, this is a simple suggestion for presentation that may easily be
addressed in a revision.

5) “The results are presented but not discussed. The section should be renamed to "Results
and Discussion" and appropriate discussion should be added with each pair of graphs.”

The Results section discusses our results for each domain, using both a qualitative
comparison with manual surveys (by other authors) and by highlighting the predictive utility of
the returned result. We show that our results both agree with previous manual surveys, and are
also able to predict significant public and legislative response in each domain. We will rename
this section to “Results and Discussion”.

References:
1. A. C. Gunther, The persuasive press inference: Effects of mass media on perceived
public opinion. Commun. Res. 25, 486–504 (1998).

2. D. C. Mutz, J. Soss, Reading public opinion: The influence of news coverage on


perceptions of public sentiment. Public Opin. Q. 61, 431–451 (1997).

3.G. King, B. Schneer, A. White, How the news media activate public expression and
influence national agendas. Science 358, 776–780 (2017).

4. F. R. Baumgartner, B. D. Jones, P. B. Mortensen, Punctuated equilibrium theory:


Explaining stability and change in public policymaking, in Theories of the Policy Process,
P. A. Sabatier, C. M. Weible, Eds. (Westview Press, 2014), pp. 59–103.

5. R. M. Entman, Framing: Toward clarification of a fractured paradigm. J. Commun. 43,


51–58 (1993).

6. K. M. Cummings, R. N. Proctor, The changing public image of smoking in the United


States: 1964–2014. Cancer Epidemiol. Biomarkers Prev. 23, 32–36 (2014).

7 S. M. Engel, Frame spillover: Media framing and public opinion of a multifaceted LGBT
rights agenda. Law Soc. Inq. 38, 403–441 (2013).
Response to Reviewers

We thank the editors for their valuable feedback. We address the main points below.

Editor Comment:

My main concern with this paper deals with the evaluation of the approach. More precisely, the
experimental section illustrates a series of case studies or scenarios where the frame change is
identified through a sudden polarity drift (Figures 10.16) that is shown to correlate with some
well-known fact or event studied in the literature. The point is: how much is this evaluation
anecdotal, and to what extent can it be quantitatively measured? In all Figures from 10 through
16, several peaks and sudden changes can be observed in the polarity distribution (e.g., Figure
12, class 5, years 2000 through 2003, or Figure 13, class 1, year 2006, to mention just a few):
do they all correspond to frame changes? If not, how can they be detected/studied? The paper
states that the dataset was annotated by experts: how? Can such annotation be used for
quantitative evaluation of the approach?

Response:

This comment concerns two primary aspects of the paper: (i) defining a framing change, and in
particular, isolating framing changes, and filtering out polarity drifts that do not correspond to
framing changes (ii) quantitative evaluation of the approach. We address each aspect below.

(i) Defining framing changes and filtering out isolated drifts

We have added a section entitled ​Defining Framing Changes​. We summarize the main points
here.

Since language and human behavior are not strictly deterministic, the measurement of any
temporally disparate pair of news corpora using adjective polarity (or any other numerical
metric) would result in different representative values of the two corpora. Therefore, in this
sense, any pair of news corpora can be said to have undergone a framing change.

Further, individual metrics are susceptible to noisy readings due to imprecise data and
measurement. In particular, such an effect may cause sudden isolated spikes between
successive measurements.

This motivates the question of how a framing change is defined, in the context of our
computational measurements. The usual social science definition is that a framing change is a
shift in the way that a specific topic is presented to an audience. To isolate such changes
computationally, we use the following key observations from ground truth framing changes: (i)
framing changes take place as trends that are consistent over at least $k$ years (ii) framing
changes must be consistent across multiple measurements.
Our aim in this paper is to begin from a set of time series such as the ones in figures 10 to 16,
and isolate such trends. The requirement motivated by our first condition, namely, that framing
changes must last at least k years, is easy to satisfy by imposing such a numerical threshold.

To satisfy the requirement motivated by our second observation, we rely on ​correlations


between different measurements, as described in the ​Detecting Framing Changes using
Periods of Maximum Correlation ​section.

Our approach thus identifies polarity drifts that are both correlated (quantitatively measured by
correlations between different measures of polarity) and sustained (by the imposition of a
threshold of duration). We point out that our approach filters out isolated drifts in individual
polarity measures, since such drifts are uncorrelated across multiple measures. Further, we
note that the magnitude of individual drifts matters only indirectly to our approach, to the extent
that a larger drift, if consistent across multiple polarity measures, may have higher correlation
than a smaller drift that is also correlated.

(ii) Quantitative Evaluation

We provide a partial quantitative evaluation of our approach using a Precision-Recall analysis.


We label a ground truth for each domain, marking years corresponding to framing changes as
positives, and other years as negatives. We primarily obtain our positives using the findings of
large-scale surveys from earlier research.

In order to do so, we study the literature pertaining to framing changes in the domains we
examine. We identify large-scale studies conducted by reputed organizations such as the
National Cancer Institute, the Columbia Journalism Review, Pew Research, and so on. These
studies examine news and media publishing in a particular domain over a period of time, as we
do, and manually identify changes in the framing of domain news during these periods.

The studies we rely on for ground truth sometimes provide quantitative justification for their
findings. These studies therefore provide an expert annotation of framing changes in our
domains, for the periods we examine..

By demonstrating substantial agreement between the results of our approach and those of
earlier ground truth surveys, we establish our claim that our approach may be used to
automatically identify framing changes in domain news publishing.

Given that the data sources and coverage between our analysis and that of prior surveys are
usually quite different, the correlations we obtain appear quite substantial. However, quantitative
evaluation remains challenging for the reasons we point out.

This paper follows the spirit of recent work in seeking to develop the study of framing into a
computational science. We acknowledge that our methods may undergo refinement to tackle
broader ground truth data, of a wider temporal and geographical scope. Nonetheless, we posit
that our methods and results have scientific value, and hope that future work will provide greater
coverage of ground truth.

Please note that the underlying data preparation requires social science expertise and cannot
be effectively crowdsourced via a platform such as Mechanical Turk. We therefore hope that
our approach piques the interest of social scientists and leads them to pursue more
comprehensive studies of framing in news media that would enable improvements in
computational methods.

Editor Comment:

* What do the annotators tag in the dataset? The paper just state that two raters code a random
sample of articles from each domain, reporting Cohen's kappa. But what do they code? Frames,
or frame changes? If so, how is a frame change defined?

Response:

To ensure that the articles returned by our term search procedure are indeed relevant to each
domain, a random sample of articles from each domain dataset was coded for relevance by two
raters.

Specifically, an article belongs to a domain if at least a component of the article discusses a


topic that is directly relevant to the domain. We term articles that are relevant to a domain,
domain positives, and irrelevant articles, domain negatives. As an example, consider the
following article from the domain smoking: ``The dirty gray snow was ankle deep on West 18th
Street the other day, and on the block between Ninth and Tenth Avenues, a cold wind blew in
off the Hudson River. On the south side of the street, a mechanic stood in front of his garage
smoking a ...'' We consider this article a domain negative since whereas it contains the keyword
'smoking', it does not discuss any aspect pertaining to the prevalence or control of Smoking. In
contrast, the article ``An ordinance in Bangor, Maine, allows the police to stop cars if an adult is
smoking while a child under 18 is a passenger'' is directly relevant to the domain, and is
therefore considered a domain positive. We define dataset accuracy as the fraction of articles in
a dataset that are domain positive.

Please refer to the section on ​Defining Framing Changes ​for a discussion on how we treat the
problem of identifying changes in framing.

Editor Comment:
* With regards to Figures 1 and 2, the authors state that the peak in the LGBT domain
immediately precedes a frame change. But, does this hold also for other peaks of other
domains? Such as, drones in 2004, obesity in 2005, or smoking in 2005?

Response:

Whereas we do not claim that this correlation is true for all domains, we posit that it motivates
the utility of adjective polarity in the study of framing changes.

Editor Comment:

* What are the differences of the proposed approach with respect to an approach that detects
just frames (instead of frame changes), but then look at changes in the detected frames...? See,
e.g., the following references:
** Alashri et al., "Climate Change" Frames Detection and Categorization Based on Generalized
Concepts", International Journal of Semantic Computing, 2016
** Tsur et al., "A Frame of Mind: Using Statistical Models for Detection of Framing and Agenda
Setting Campaigns", ACL 2015

Response:

We have added paragraphs to the Related Work section, detailing the novel contributions of our
work and drawing distinctions between this paper and earlier approaches. We summarize this
discussion here.

We note that our approach is similar in spirit to Tsur et al’s [9] work, in that both that work and
this paper apply a topic modeling strategy to analyze framing as a time series. However, we
highlight the following key differences and contributions of our work. Firstly, as both Sheshadri
and Singh [8] and Tsur et al point out, framing is a subjective aspect of communication.
Therefore, a computational analysis of framing should ideally differentiate subjective aspects
from fact-based and objective components of communication. Since adjectives in and of
themselves are incapable of communicating factual information, we take them to be artifacts of
how an event or topic is framed. In contrast, generic n-grams (as used by Tsur et al) do not
provide this distinction.

Further, Tsur et al rely upon estimating ``changes in framing'' using changes in the relative
frequencies of n-grams associated with various topics or frames. Whereas such an approach is
useful in evaluating which of a set of frames may be dominant at any given time, it does not
measure ``framing changes'' in the sense originally described in [5]. In contrast, our work
estimates changes in framing using consistent polarity drifts of adjectives associated with
individual frames. Our approach may also be applied to each of a number of frames
independently of the others, as opposed to Tsur et al.
Editor Comment:

I have consulted with another academic editor, Dr. Marco Lippi, and we agree that a desk
rejection was premature. Nevertheless, experience tells us that all reviewers provide a
perspective similar to at least some other readers. For this reason I am requesting that you
revise the manuscript to address issues raised in the desk rejection. Perhaps you made some
revisions in your appeal. However, these were not apparent to me with track changes. Please
revise the manuscript itself, including a rebuttal that identifies the specific location of the
modifications you made in response to the original decision.

Response:

We have submitted a revised manuscript in which we highlight our responses to each point
made in the original decision. We summarize our responses to each point in the original
decision below.

Editor Comment:

1. The related work should be discussed in detail highlighting the advantages/limitations of


existing approaches.

Response:

Our updated related work section discusses alternative approaches in detail, and describes our
novel contributions over these approaches.

Editor Comment:

2. The dataset and codes are not available online.

Response​:

The dataset and code are available online at the following link:
https://drive.google.com/open?id=1zAH__Y1lcdriuwUcjZsKmvaqYtzAjyZ9

All our results are reproducible from the data and code in the above mentioned
repository. We will provide a guide to run our code.

Editor Comment:

4. An overview diagram for the proposed approach would help the reader understand the flow of
the proposed approach.
Response:

We have added an overview diagram illustrating our approach in Fig. 3.

Editor Comment:

5. The results are presented but not discussed. The section should be renamed to "Results and
Discussion" and appropriate discussion should be added with each pair of graphs.

Response:

We have expanded our analysis of the results for each domain, including adding a quantitative
precision-recall analysis based on ground truth data.

Response to Original Decision:

In addition to these responses, we have appended a copy of our original response to reviews
below.

Editor Comments: 
 
This research is focused on detecting framing changes in topical news. The authors argue that
the public opinion varies with the way the news is framed. The research lacks motivation as it is
not clear what benefits can be achieved if frame changes are detected. Moreover, the problem
is already discussed and presented in articles [4,5]. This paper seems to provide more empirical
evidence in support to the existing research [4,5]. Hence, the research contribution is unclear.
Furthermore, following points are worth considering:-
1. The related work should be discussed in detail highlighting the advantages/limitations of
existing approaches.
2. The dataset and codes are not available online.
3. The comparison of research with state of the art approaches and manual techniques has not
been conducted.
4. An overview diagram for the proposed approach would help the reader understand the flow of
the proposed approach.
5. The results are presented but not discussed. The section should be renamed to "Results and
Discussion" and appropriate discussion should be added with each pair of graphs.

Responses:

We provide a point by point response below.


Comment on Motivation: “The research lacks motivation as it is not clear what benefits
can be achieved if frame changes are detected.”

Framing changes have been shown to have commercial and legislative consequences,
and have also been shown to foreshadow public attention changes. We cite five
example articles here [1-5] and can readily provide more as necessary. A large body of
literature in the fields of Political Science and Communication addresses the manual
identification of framing changes in specific domains. Whereas we cite two examples
here [6-7], additional examples are available – please let us know. However, existing
work does not attempt to address the problem of computationally detecting framing
changes. Our work is the first attempt at this problem, which has significant commercial,
public, and legislative import. Our results substantially agree with the results of earlier
human surveys, and further have shown predictive utility for legislative and public
response. Our work therefore has significant scientific and potential commercial value.

Comment on Contribution: “Moreover, the problem is already discussed and presented


in articles [4,5]. This paper seems to provide more empirical evidence in support to the
existing research [4,5]. Hence, the research contribution is unclear.”

References 4 and 5 are articles discussing framing theory in the Communications


literature. Neither article makes any attempt to computationally (or even manually)
measure framing in any real-world domain, let alone attempt to detect changes in
framing. Therefore, the problem addressed by our work is fundamentally novel and
different from the work presented in [4] and [5].

We present a fully unsupervised approach that is the first method to computationally


detect framing changes. Further, we contribute a dataset of over 12,000 news articles
from seven domains. Our work will provide a strong baseline to foster new research in
this influential area of Political Science and Communications research.

Revised sections:

1) “The related work should be discussed in detail highlighting the advantages/limitations of


existing approaches.”

We emphasize that our work is the first attempt at computationally modeling changes in
framing. The closest previous efforts in this area are those of [10] and [11]. We describe
our novel contributions over these efforts in detail in the Related Work section. We are
unaware of any other relevant related work and would be happy to learn of any such
work from the Editor.
2) “The dataset and codes are not available online.”

This statement is incorrect. We have clearly stated in our submission that all data and
code will be made available, and are available online at the following link:
https://drive.google.com/open?id=1zAH__Y1lcdriuwUcjZsKmvaqYtzAjyZ9

All our results are reproducible from the data and code in the above mentioned
repository. We will provide a guide to run our code.

3) “The comparison of research with state of the art approaches and manual techniques
has not been conducted.”

Please refer to our responses above to the comment on motivation and comment #1.

4) “An overview diagram for the proposed approach would help the reader understand the
flow of the proposed approach.”

We are grateful for this suggestion and will incorporate an overview diagram illustrating
our approach. However, this is a simple suggestion for presentation that may easily be
addressed in a revision.

5) “The results are presented but not discussed. The section should be renamed to "Results
and Discussion" and appropriate discussion should be added with each pair of graphs.”

The Results section discusses our results for each domain, using both a qualitative
comparison with manual surveys (by other authors) and by highlighting the predictive utility of
the returned result. We show that our results both agree with previous manual surveys, and are
also able to predict significant public and legislative response in each domain. We will rename
this section to “Results and Discussion”.

References:
1. A. C. Gunther, The persuasive press inference: Effects of mass media on perceived
public opinion. Commun. Res. 25, 486–504 (1998).

2. D. C. Mutz, J. Soss, Reading public opinion: The influence of news coverage on


perceptions of public sentiment. Public Opin. Q. 61, 431–451 (1997).

3.G. King, B. Schneer, A. White, How the news media activate public expression and
influence national agendas. Science 358, 776–780 (2017).

4. F. R. Baumgartner, B. D. Jones, P. B. Mortensen, Punctuated equilibrium theory:


Explaining stability and change in public policymaking, in Theories of the Policy Process,
P. A. Sabatier, C. M. Weible, Eds. (Westview Press, 2014), pp. 59–103.
5. R. M. Entman, Framing: Toward clarification of a fractured paradigm. J. Commun. 43,
51–58 (1993).

6. K. M. Cummings, R. N. Proctor, The changing public image of smoking in the United


States: 1964–2014. Cancer Epidemiol. Biomarkers Prev. 23, 32–36 (2014).

7 S. M. Engel, Frame spillover: Media framing and public opinion of a multifaceted LGBT
rights agenda. Law Soc. Inq. 38, 403–441 (2013).

8 K. Sheshadri and M. P. Singh, The Public and Legislative Impact of Hyperconcentrated


Topic News, Science Advances, 2019, 5 (8).

9 O. Tsur, D. Calacci, and D. Lazer, A Frame of Mind: Using Statistical Models for
Detection of Framing and Agenda Setting Campaigns. Proceedings of the 53rd Annual
Meeting of the Association for Computational Linguistics and the 7th International Joint
Conference on Natural Language Processing (Volume 1: Long Papers).

Das könnte Ihnen auch gefallen