Twitter: Reliable search for flu-tweets

01/30/2013

Photo: Twitter flu map 2013

Prevalence of flu cases in the first week of January 2013, as measured by the volume of flu-related tweets. Dark red means higher flu rates; © Mark Dredze/John Hopkins University

Searching twitter messages has become a popular way to track when and where flu cases occur. A hurdle hampers the process: how to identify flu-infection tweets. Some tweets come from sick people, others are only talking about the flu.

Such conversations about the flu in general can skew the results. To address this problem, Johns Hopkins computer scientists and researchers in the university's School of Medicine have developed a new tweet-screening method that not only delivers real-time data on flu cases, but also filters out online chatter that is not linked to actual flu infections. Comparing their method, which is based on analysis of 5,000 publicly available tweets per minute, to other Twitter-based tracking tools, the Johns Hopkins researchers say their real-time results track more closely with government disease data that takes much longer to compile.

"When you look at Twitter posts, you can see people talking about being afraid of catching the flu or asking friends if they should get a flu shot or mentioning a public figure who seems to be ill," said Mark Dredze, an assistant research professor in the Department of Computer Science who uses tweets to monitor public health trends. "But posts like this do not measure how many people have actually contracted the flu. "

Dredze led a team that released one of the first and most comprehensive studies showing that Twitter data can yield useful public health information. The reliability of many computer models can be weakened by too many tweets that point to flu-related news reports and other matters not directly linked to a specific flu case, said David Broniatowski, a School of Medicine postdoctoral fellow in the Department of Emergency Medicine's Center for Advanced Modeling in the Social, Behavioral, and Health Sciences. "For example," he said, "a recent spike in Twitter flu activity was caused by discussions about basketball legend Kobe Bryant's flu-like symptoms during a recent game. Mr. Bryant's health notwithstanding, such tweets do very little to help public health officials prepare our nation for the next big outbreak."

To improve their accuracy when using tweets to track the flu, the John Hopkins team developed sophisticated statistical methods based on human language processing technologies. The methods are designed to filter out the chatter. The system can distinguish, for example, between "I have the flu" and "I'm worried about getting the flu." Another advantage of the Johns Hopkins flu projection method is that it can produce real-time results.

To check the reliability of their enhanced system, the Johns Hopkins researchers recently compared their results to data from the Center for Disease Control and Prevention (CDC) for the same period. The researchers said that during November and December 2012, their system demonstrated a substantial improvement in tracking with CDC figures as compared to previous Twitter-based tracking methods.

MEDICA.de; Source: John Hopkins University