Reflection Analysis on articles JuLY/AuGuST 2012 1541-1672/12/$31.00 © 2012 IEEE 81 Published by the IEEE Computer Society A I A N D H E A L T H Editor:

Reflection Analysis on articles JuLY/AuGuST 2012 1541-1672/12/$31.00 © 2012 IEEE 81
Published by the IEEE Computer Society


Click here to Order a Custom answer to this Question from our writers. It’s fast and plagiarism-free.

Reflection Analysis on articles JuLY/AuGuST 2012 1541-1672/12/$31.00 © 2012 IEEE 81
Published by the IEEE Computer Society

Editor: Daniel B. Neill, H.J. Heinz III College, Carnegie Mellon University,

How Social Media Will
Change Public Health
Mark Dredze, Johns Hopkins University

in-the-moment updates, they’re fi lled with use-
ful observations and information about the larger
world. Researchers have examined a range of ap-
plications based on tweets, ranging from politi-
cal polling1 to earthquake monitoring,2 that have
demonstrated Twitter’s ability to deliver fast,
cheap, and reliable tools for monitoring real-world

These successes have drawn interest from the
public-health community, whose goal is to study
the health of a population and develop policies
that improve health outcomes. Traditionally, this
requires expensive, time-consuming monitoring
mechanisms, primarily surveys and data collec-
tion from clinical encounters. Even high-priority
projects, such as the US Centers for Disease Con-
trol and Prevention’s (CDC’s) FluView program
that tracks the weekly US infl uenza rate, are still
slow because they require clinical data aggrega-
tion. Twitter and other social media could reduce
cost and provide real-time statistics about public

Recent work in machine learning and natural
language processing has studied the health con-
tent of tweets and demonstrated the potential for
extracting useful public-health information from
their aggregation. This article examines the types
of health topics discussed on Twitter, and how
tweets can both augment existing public-health ca-
pabilities and enable new ones. I also discuss key
challenges that researchers must address to deliver
high-quality tools to the public-health community.

Discovering Health Topics on Twitter
Twitter’s size and breadth make it diffi cult to de-
termine exactly which types of public-health

work it can support. Initial work in my research
group3,4 explored health-related tweets and topics
on Twitter through the development of new com-
putational models. Because many public-health ac-
tivities are disease-oriented, we developed a model
that discovered diseases (ailments) from raw tweets
for guided exploration, rather than relying on pre-
defi ned illnesses. We used supervised learning to
filter tweets and find health-related messages,
yielding 1.6 million English health tweets from
March 2009 to October 2010.

To explore these tweets, we developed the Ail-
ment Topic Aspect Model (ATAM), a probabilistic
graphical model for uncovering ailments.3 ATAM
assumes that each message discusses a single ail-
ment, manifested through the message’s words,
and associates three types of words (general dis-
ease words, symptoms, and treatments) with ail-
ments. For example, the message “fever + head-
ache = fl u, home sick with Tylenol” discusses
influenza, where “fever” and “headache” are
symptoms, “Tylenol” a treatment, and “fl u” a gen-
eral word associated with the ailment.

Human annotators labeled 15 ailments discov-
ered by ATAM, including headaches, infl uenza,
insomnia, obesity, dental problems, and seasonal
allergies. Examining the words, symptoms, and
treatments most associated with each ailment,
and the groups of messages that discuss each
ailment, can support a variety of public-health

Augmenting Existing
Public-Health Capabilities
A core capability of public-health programs, bio-
surveillance monitors a population for adverse
health events, which include expected seasonal
events, such as infl uenza or environmental aller-
gies, disease outbreaks, such as the H1N1 virus,
and other health threats, such as food poisoning
or a biochemical contaminant. Surveillance is the

Social media such as Twitter have created platforms for people to broadcast informa-
tion, thoughts, and feelings about their daily lives.

Since Twitter messages (called tweets) often refl ect

IS-27-04-Health.indd 81 7/25/12 12:37 PM


key first step in any comprehensive
response strategy. Consider the exam-
ple of the H1N1 virus, which struck
the US in 2009. Public-health offi-
cials had to direct vaccine supplies to
the areas and populations where they
were most needed, requiring accurate
information about where H1N1 infec-
tions were occurring and which de-
mographic groups were most affected.

Traditional biosurveillance relies
on information collected from clin-
ical encounters, a time-consuming
process. For example, in the case of
influenza tracking, the CDC requires
two weeks to collect and release sta-
tistics about the US flu rate. Web-
based approaches can produce faster
results, such as Google Flu Trends,5
which analyzes real-time search que-
ries to produce a daily flu rate. When
users search for flu-related queries,
such as “flu medicine” or “flu symp-
toms,” Google aggregates these sta-
tistics to measure rises in flu traffic.
These have been shown to correlate
with the CDC’s official estimates,
providing more timely influenza

Analyzing Twitter messages could
provide a similar surveillance capa-
bility. Studies4,6 have shown corre-
lations between influenza tweets
and CDC data, using supervised
learning and unsupervised learning.
This idea has been extended to low-
resource settings in developing coun-
tries, such as surveillance of cholera
in Haiti.7

Because Twitter provides location
information for some tweets, biosur-
veillance can be geographically local-
ized. For example, we visualized the
per capita tweeting rate about sea-
sonal allergies for the month of June
2010 (in Figure 1, where the darker
colors indicate more tweets4). As ex-
pected, the Midwest and Northeast
have substantial Twitter traffic as
compared to other regions of the US,
which follows the expected start of
allergy season. By contrast, the win-
ter months have few allergy messages.

Beyond surveillance, Twitter can
support other public-health tasks,
such as health risk assessments. For
example, the annual CDC Behav-
ioral Risk Factors Study surveys

more than 300,000 people nation-
wide for several risk factors, such as
asthma, smoking, and exercise. The
study is both expensive and time-
consuming, making it inappropriate
for rapid hypothesis generation and
testing. Twitter could augment this
survey by investigating additional
questions or providing faster results.
We compared4 each of the survey ques-
tions that had corresponding ail-
ments discovered by ATAM across
the 50 states, uncovering interesting
correlations, such as a positive corre-
lation between states with high smok-
ing rates and those with high Twit-
ter message rates about cancer (r =
0.648), a negative correlation between
exercise and obesity messages (r =
−0.201), and a negative correlation
between good healthcare coverage
and messages about ailments in gen-
eral (r = −0.253).

Creating New Public-
Health Capabilities
The monitoring of Twitter data can
also enable the creation of entirely
new public-health capabilities, sup-
ported by both the expressiveness of
tweets and the coverage of topics not
normally included in public-health
data, particularly those that people
are reluctant to discuss with health-
care workers.

The public forum of social media
encourages messages that express a
range of details, yielding health infor-
mation such as the illness, symptoms,
and treatment strategy—for example,
“took some Tylenol for my flu” or
“stuck home with flu and 102 fever.”
Consider Figure 2, which shows the
word cloud for insomnia generated
via ATAM output, in which word
size corresponds to influenza likeli-
hood, and color indicates word type
(red are symptoms, green are treat-
ments, and blue are general words).
Although search engine users might

Figure 1. The rate of Twitter messages about seasonal allergies for June 2010.
Messages were automatically coded using a machine-learning method and geo-
located based on user-provided location. Overall shading indicates significant
allergy messages, showing the heart of allergy season. States in the Northeast
and Midwest are particularly active. Dashed states had insufficient data.

IS-27-04-Health.indd 82 7/25/12 12:37 PM

JuLY/AuGuST 2012 83

turn to Google to look for insomnia
remedies, Twitter users provide a va-
riety of details about their sleepless

Additionally, Twitter covers differ-
ent topics than those covered by tra-
ditional public-health data sources
such as clinical encounters and phone
surveys. Behaviors that people might
be reluctant to share with physicians
are on full display on Twitter, includ-
ing behaviors, opinions, and sub-
populations that are otherwise dif-
ficult to track through traditional
mechanisms, suggesting a whole
new area of large-scale public-health

Disease self-management can be
hard to study, as it doesn’t involve a
physician and patients might be re-
luctant to share unapproved prac-
tices with health officials. We stud-
ied medication usage from tweets
by creating medication usage pro-
files based on ailment groupings.4
For pain relievers, for example, we
found that Tylenol and Advil have
broad profiles (headache, cold relief,
and so on) while Vicodin is targeted
at dental problems and injuries. For
allerg y medication, Claritin and
Zyrtec were almost exclusively used
to treat allergies, while off-label uses
of Benadryl included insomnia. Mon-
itoring medication usage on Twit-
ter can discover new trends in self-
medication otherwise unreported by

The information gap in traditional
public health is especially prevalent
in patient-directed programs such
as weight loss and smoking cessation.
These depend on a sustained effort
from patients outside the clinical set-
ting, making it difficult to track and
measure patient efforts. For exam-
ple, a recent study of 15,000 tweets
found that Twitter is commonly used
to manage and share information
about health-promoting physical

activities.8 Tweets focused on exer-
cise included muscle-strengthening,
aerobic, and flexibility-enhancing
activities. An analysis of the content
revealed that most tweets reported
evidence of or plans for exercising.
The frequency of such messages sug-
gests that the social supports provided
by Twitter could be used as a plat-
form for encouraging exercise and
health-promoting behavior. Addi-
tionally, roughly 10 percent of the
messages posed exercise questions to
other users, and many contained ad-
vertisements for a product or service.
Mining this information could reveal
trends of physical activity, as well
as new ways of promoting healthy

Although dental pain is a com-
mon proble m , on ly a fe w c om –
plaints result in seeing a dentist, and
thus clinical evidence covers only a
small part of the population. A re-
cent study considered reports of den-
tal pain on Twitter as a means of
surveying a larger spectrum of pa-
tients.9 A survey of 772 messages re-
vealed a variety of topics, including

reporting dental pain, action taken
in response, impact on daily life,
and advice sought from the Twitter
community. More than 80 percent
of messages discussed general pain,
22 percent discussed taking some re-
sponsive action, and 15 percent dis-
cussed impact on daily activities.
While actions included seeing a den-
tist (44 percent), just as many self-
medicated (43 percent). These find-
ings show that Twitter can broaden
the study population and indicate
that effective treatment of dental
problems relies on providing accurate
information about self-management.
The prevalence of dental communi-
cations suggests that Twitter can be
an effective medium for dental profes-
sionals to disseminate self-management

Twitter is of special interest in stud-
ies of patient health behaviors, such
as drug and alcohol use. Kyle Prier
and colleagues explored tweets about
smoking to learn more about tobacco
use.10 They used an unsupervised
clustering algorithm to group smok-
ing tweets, discovering themes that

Figure 2. A word cloud visualization showing the words most associated with
the ailment “insomnia” as discovered by a machine-learning model that examined
1.6 million tweets related to health. Larger fonts indicate more related terms, blue
indicates general terms, red highlights symptoms, and green represents treatments.
General words such as “hours,” “awake,” and “tired” characterize insomnia
messages, with symptoms such as “nightmares” and “yawning” and treatments of
“Benadryl” and “sleeping pills.”

IS-27-04-Health.indd 83 7/25/12 12:37 PM


reflected general substance abuse, ad-
diction recovery, tobacco promotion
(bars and clubs), and antismoking ad-
vertising campaigns. These themes
suggest Twitter as a promising data
source for tobacco use behaviors and

Each of these studies— on self-
medication, exercise, dental pain,
and smoking—demonstrates the po-
tential for new areas of public-health
research based on Twitter data. The
next few years promise detailed clini-
cal studies using data in each of these
areas as well as whole new types of
questions. How will Twitter data im-
pact the study of community health
behaviors, mental health, and biosur-
veillance customized to specific de-
mographic groups? The development
of new technologies coupled with the
exploration of these questions has
great potential to expand the scope of
public-health practice.

Automatically extracting mean-
ing from text with computational
tools is difficult, particularly when
the text is multilingual and infor-
mal. Yet computational algorithms
will be required for practical public
health applications of Twitter data.
Preliminary research suggests that
aggregating millions of messages can
resolve difficulties in understanding
individual messages: the tweet “head-
ache” is ambiguous, but a corre-
sponding increase in messages of the
form “bad headache, home sick with
flu” suggests a common cause. How-
ever, a deeper analysis of individual
tweets, which might be required for
some tasks, remains a challenging

In addition, bias pervades all stages
of social media analysis: social media
users aren’t representative of the en-
tire population, user groups may be

more or less inclined to tweet infor-
mation, and users might report inac-
curate or imprecise diagnoses (for ex-
ample, influenza instead of H1N1).
Controlling for bias is a hallmark of
clinical research, yet social media bi-
ases are little understood. Large-scale
aggregation could help obviate bi-
ases for common illnesses. However,
smaller populations will require bias
correction, which may rely on auto-
matically inferring user demograph-
ics for sampling adjustments.

Finally, social media research must
consider user privacy. Even when
data are publicly available, users have
privacy expectations, such as concern
over algorithms that infer unstated
user demographics or diagnoses from
public data. Although studies have
posed little concern so far, an increase
in research complexity could cause a
commensurate rise in legal and ethi-
cal issues. Social media researchers
must remain vigilant regarding pri-
vacy issues.

Regardless, with the development
of new technologies addressing these
challenges, we can expect to see en-
tirely new capabilities for public-health
research, policy, and practice.

1. B. O’Connor et al., “From Tweets to

Polls: Linking Text Sentiment to Public

Opinion Time Series,” Proc. Int’l Conf.

Weblogs and Social Media (ICWSM),

AAAI, 2010, pp. 122–129.

2. T. Sakaki, M. Okazaki, and Y. Matsuo,

“Earthquake Shakes Twitter Users:

Real-Time Event Detection by Social

Sensors,” Proc. World Wide Web

Conf. (W W W 10), ACM, 2010,

pp. 851–860.

3. M.J. Paul and M. Dredze, “A Model

for Mining Public Health Topics from

Twitter,” tech. report, Johns Hopkins

Univ., 2011.

4. M.J. Paul and M. Dredze, “You Are

What You Tweet: Analyzing Twitter

for Public Health,” Proc. Int’l Conf.

Weblogs and Social Media (ICWSM),

AAAI, 2011, pp. 265–272.

5. J. Ginsberg et al., “Detecting Influenza

Epidemics Using Search Engine Query

Data,” Nature, vol. 457, no. 7232,

2008, pp. 1012–1014.

6. A. Culotta, “Lightweight Methods to

Estimate Influenza Rates and Alcohol

Sales Volume from Twitter Messages,”

Language Resources and Evaluation,



7. R. Chunara, J.R. Andrews, and J.S.

Brownstein, “Social and News Media

Enable Estimation of Epidemiologi-

cal Patterns Early in the 2010 Haitian

Cholera Outbreak,” Am. J. Tropical

Medicine and Hygiene, vol. 86, no. 1,

2012, pp. 39–45.

8. L. Kendall et al., “Descriptive Analysis

of Physical Activity Conversations on

Twitter,” Proc. Extended Abstracts on

Human Factors in Computing Systems

(CHI), ACM, 2011, pp. 1555–1560.

9. N. Heaivilin et al., “Public Health Sur-

veillance of Dental Pain via Twitter,”

J. Dental Research, vol. 90, no. 9, 2011,

pp. 1047–1051.

10. K.W. Prier et al., “Identifying Health-

Related Topics on Twitter: An Explora-

tion of Tobacco-Related Tweets as

a Test Topic,” Social Computing,

Behavioral-Cultural Modeling, and

Prediction, LNCS 6589, Springer, 2011,

pp. 18–25.

Mark Dredze is an assistant research pro-

fessor in computer science and a research

scientist at the Human Language Technol-

ogy Center of Excellence at Johns Hopkins

University. Dredze has a PhD in computer

science from the University of Pennsylvania.

Contact him at

Selected CS articles and columns
are also available for free at

IS-27-04-Health.indd 84 7/25/12 12:37 PM

Place your order now for a similar assignment and have exceptional work written by one of our experts, guaranteeing you an A result.

Need an Essay Written?

This sample is available to anyone. If you want a unique paper order it from one of our professional writers.

Get help with your academic paper right away

Quality & Timely Delivery

Free Editing & Plagiarism Check

Security, Privacy & Confidentiality