Visual Analytics and Big Data for Spatiotemporal Sensemaking

See Related: Computer Science / IT Geoscience Social Science

JOSHUA STEVENS

Pennsylvania State University

Years in Grad School: 2

Making Sense of the Immense: An Application of Big Data Visual Analytics for Spatiotemporal Sensemaking in Social Media

This work describes SensePlace2, a web-accessible geovisual analytics tool that allows analysts to achieve greater situational awareness about key topics of interest. SensePlace2 collects, analyzes, and visualizes an immense and growing number of tweets that analysts can explore through ad hoc queries and a rich set of user interactions. We pair entity extraction and geocoding algorithms to determine the context and topic of tweets, the locations being referred to by the tweets, and how these topics and relationships among them vary across time and space. SensePlace2 is designed with multiple coordinated views so that users can query, filter, select, and manipulate tweets across both geographic and temporal domains. During an analysis session queries are stored in a history that can be returned to at any time. The system also employs feedback mechanisms that allow analysts to report errors directly to the development team. Work is currently underway to enrich the context foraging capabilities of SensePlace2 by connecting the existing database to ancillary data sources and information.

Abstract
Share

Making Sense of the Immense: An Application of Big Data Visual Analytics for Spatiotemporal Sensemaking in Social Media

13543 Views

Share Presentation

Link

Judges’ Queries (11) Discussion (11)

Judges’ Queries and Presenter’s Replies

Sandra Pinel

Judge
Faculty

May 21, 2013 | 09:36 p.m.

This is an interesting application to learn more from TWEETS. I am asking you the “so what” question. What are the applications to social benefits? What might we learn about social behavior from this method that could help us solve a problem such as where people are during disasters or how they adapt or move in such moments?
Joshua Stevens

Lead Presenter

May 22, 2013 | 04:16 a.m.

This is a great question, Sandra!

As an exploratory tool, SensePlace2 is about hypothesis generation and decision-making. This means there isn’t one “this software answers this specific question about a certain topic” answer.

What the software does allow us to do is to lean how topics and the conversations around topics are spatialized, and how the geography inherent within Tweets unfolds over time. For example, the countries and places mentioned alongside ‘Earthquake’ will have very different spatial and temporal patterns than Tweets discussing “Smallpox.”

By allowing analysts to investigate trends in topics they’re interested in, SensePlace2 helps inform and guide the thinking process. For instance, a faculty member studying infectious diseases wanted to see how mentions of the disease in SensePlace2 coincide with known settlement patterns of the at-risk population. Without identifying which places are mentioned in Tweets also containing mentions of the disease, she wouldn’t be able to do this. After all, every event happens at a place and time, so communicating the spatiotemporal landscape surrounding an event contributes a much needed piece of the analytical puzzle.

We hope that this kind of querying and investigation will enrich the way scientists think about the problems they’re trying to solve.
Jeffrey Lidz

Judge
Faculty: Project Co-PI

May 21, 2013 | 11:26 p.m.

What are the potential impacts of understanding locations being discussed in tweets? What kinds of social science research will your tool enable that would not have been possible before?
Joshua Stevens

Lead Presenter

May 22, 2013 | 04:27 a.m.

This is somewhat difficult to answer, for two reasons. First, the spatial and temporal patterns associated with key terms varies considerably from one term to the next. This means that the impacts of understanding these patterns will be different depending on the phenomena under investigation. Second, the potential results of exploratory analysis are always hard to predict. The knowledge generation and discovery process that is afforded by freely querying and filtering information depends on the analyst, the topic, and the problem one is trying to solve.

The good news is that we know from social science literature that many phenomena behave similarly and are discussed similarly in social media. This means we can reveal key differences between how events like natural hazards and violent conflicts rely on place mentions and social media use. As an example, less than 1% of Tweets from all topics combined make use of the geolocation feature in Twitter. However, when filtered to natural disasters such as earthquakes, that number increases to nearly 15%. So there is a clear relationship between the type of event being Tweeted about and how important “place” as a concept is to users.
Wayde Morse

Judge
Faculty

May 21, 2013 | 11:35 p.m.

Interesting. Though there are likely many applications you have thought of, I am also wondering what we do with this information. The mention of protests brings to mind the reapid response of police of government to violence, or disease outbreaks, or disasters. Please discuss applications.
Joshua Stevens

Lead Presenter

May 22, 2013 | 04:48 a.m.

This is a very valid question! The applications you mention are some of the most common examples (especially government response, as this project is funded by the department of Homeland Security). Twitter is much larger than outbreaks and protests though!

One thing we are looking at is how mentions of places correspond to people actually moving to/from those places. So in addition to the things I’ve mentioned in my responses to Jeffrey and Sandra, we can also identify when the same users Tweet from different locations and are therefore “Tweeting on the move” or Tweeting about a recent relocation (be it leisurely travel or in a refugee situation). By pairing social media use with human mobility patterns, we can begin to answer all sorts of interesting questions about why people move, when they do it, and how they communicate about these events.
Aurora Sherman

Judge
Faculty

May 21, 2013 | 11:43 p.m.

Hi Joshua,
I also have questions about implications and applications, as raised by the judges before me.
I also wonder what your specific contribution to this project is; do you have a specific research question in mind? Did you/will you contribute something specific to the team effort?
Joshua Stevens

Lead Presenter

May 22, 2013 | 11:12 p.m.

Hi Aurora,

My contribution is in two parts. On one end, I help guide the interface design and use of cartographic technology in the system (two examples are provided in the poster view regarding UI layout and new symbology types). I’m very interested in usability and ensuring that the way the system performs is consistent with the tasks users are trying to accomplish, so I am constantly pushing to make the UI as intuitive and usable as possible. On the other end, as an IGERT trainee in Big Data Social Science, I am working to expand the capabilities of the system to handle larger amounts and different kinds of data. At present I am building a database of flight records that will be brought into the system to identify flights, airports, and airlines that are discussed in Tweets (Twitter is full of both excited and unhappy travelers). Using this database alongside Tweets, we can show the exact aircraft and flight routes being discussed, when they were discussed, and in which context the discussion occurs. The flight data allows us to identify delays, cancellations, and other deviations from the scheduled plan. From this, we can begin asking questions about how delays influence social media activity, what travelers Tweet about before, after, and even during the flight (if WiFi is available). This helps us paint a much more dynamic picture of the patterns that exist between human mobility and social media use.
Gary Kofinas

Judge
Faculty: Project Co-PI

May 22, 2013 | 01:15 a.m.

Interesting work! I would imagine that the DoD is very interested in your work. I’nm curious to hear if your group has any direct discussions about how these findings can have national security implications and if and how that interest may shape the focus of the research, Thank, g
Joshua Stevens

Lead Presenter

May 22, 2013 | 11:21 p.m.

Thanks, Gary!

Indeed we have many discussion about national security implications. This project is funded by the Department of Homeland Security (DHS) and the US Army Corps of Engineers (USACE) through an initiative called VACCINE (Visual Analytics for Command, Control, and Interoperability Environments). The driver behind the software and its use of social media began by the need these agencies have for determining how social media are used during crisis events and disaster relief efforts.

Due to the generalizability of the system and its functions, we continue to work closely with DHS and USACE to meet their objectives, while at the same time expanding the capabilities of SensePlace2 more broadly within the realm of social science (as in the airline example in my previous response).
Gary Kofinas

Judge
Faculty: Project Co-PI

May 23, 2013 | 07:31 p.m.

Very good. Thanks for your efforts. Much appreciated!

Presentation Discussion

Kathy Hoogeboom-Pot

Graduate Student

Kathy's video »

May 21, 2013 | 09:58 p.m.

Very nice video and poster, Josh! It looks like very cool piece of software that would be fun to play around with. I wonder if you know of any interesting projects that have actually used it. Thanks!
Joshua Stevens

Lead Presenter

May 22, 2013 | 04:05 a.m.

Thanks Kathy!

So far the software is not yet open to the public, so there haven’t been any use cases outside of the US Dept. of Homeland Security and US Army Corps of Engineers.

A public release is one of the major milestones we’re shooting for in the near feature.

Thanks for checking it out!
Jason Hong

Faculty: Project Co-PI

May 22, 2013 | 09:36 a.m.

Nice work! How do you evaluate the accuracy of your geotagging approach? Do you have some set of ground truth to compare against? Also, what kinds of cases does it work well for, and not? Lastly, what kind of granularity can you get? Is it country, city, street?
Joshua Stevens

Lead Presenter

May 22, 2013 | 10:57 p.m.

Great question! Ground truth is tricky since computationally, context is very difficult to determine (especially with social media being notoriously sarcastic, snarky, and prone to puns). To account for this, we’re beginning to evaluate the performance of our system by comparing it to the success of humans, who are really good at picking out the proper context. We’re having participants read a set of Tweets and decide which places are being discussed, and then seeing how well the software can do the same.

This is an important task because as your question brings up, there are indeed some cases that are more difficult than others. A simple example is “Lansing.” Is the user talking about Lansing, MI or Lansing, IL? We’re likely to guess the former, but it takes previous knowledge (e.g., knowing one is more popular, a state capital, etc) and experience to make that call. A similar example is “Georgia,” which is both a state and a country. Other cases that pose a challenge are “Main St” and place names that are common to many cities.

Our granularity is at the sub-city level (per above, streets are difficult without additional context, which we can determine if it is available). Using the Geonames database, which works with nested indices (e.g., Central Park is within Manhattan, which is within New York, New York, USA), we are able to determine the places mentioned at several levels; we choose and map the most granular. In cases where only the country is mentioned, the centroid is used instead.
Dr. Ledley

Guest

May 22, 2013 | 11:26 a.m.

Can you explain a little more about the connecting lines in the map view. Is this connecting places that are mentioned in the tweets or the places that the tweets are being sent and received? I guess I am asking is it making an association within the content of tweet or if it represents the dissemination of a tweet.

In any case – very interesting work
Joshua Stevens

Lead Presenter

May 22, 2013 | 11:02 p.m.

Absolutely! The connecting lines identify two (or more) places that are mentioned in the same Tweets. For example, If I Tweet “Leaving State College for Zurich”, two points would be drawn on the map – one for each location. Hovering the mouse over either point would show a connection to the other, and highlight my Tweet in the Tweet list (inversely, hovering the Tweet would highlight the points and the connection between them).

So it is making an association between the context of the Tweet. We also have a function that shows “about-from” relationships. This reveals Tweets about one location that were made from another. Since users rarely enable location tracking and less than 1% of all Tweets record the sender’s lat/lon, this doesn’t capture a very large spectrum, thus our focus is on establishing the context inherent, but previously invisible, in the vast majority of Tweets.
Lee Giles

Faculty: Project Co-PI

May 23, 2013 | 08:55 a.m.

Hi Josh:

Nice work. How does the search actually work?

Also, how does it scale for a very popular term and keep the map from becoming completely overcolored?
Joshua Stevens

Lead Presenter

May 23, 2013 | 09:37 a.m.

Hi Lee,

Thanks – glad you enjoyed it! The search works via Lucene (currently transitioning to an improvement in Solr). The SensePlace2 database is populated by an external search that looks for key terms (at present these reflect terms related to emergency management/crisis response, and more recently, airline traffic). Text from each Tweet is then processed to identify three primary types of entities: (1) Named people, (2) named organizations, and (3) locations. The first two are based on the GATE/ANNIE named entity extractor from The University of Sheffield. The third type of entity, locations, is then identified and looked up in the GeoNames database. This step identifies the places mentioned by name, retrieves the GeoNames ID and coordinate information for the mentioned place(s), and uses this to draw points on the map.

For each search, we handle clutter by only showing the 1000 most relevant Tweets (relevancy is determined by the entities that were successfully extracted). When the user filters their query, for example by adjusting the temporal controls to a new time frame, the entire query is re-run so that the relevancy then reflects the new criteria and shows the 1000 Tweets that correspond (thus each query refinement/filter results in a new relevancy calculation).
Ashley Richter

Graduate Student

Ashley's video »

May 23, 2013 | 05:26 p.m.

Wonderful video (I particularly loved the twitter logo avalanche) and super pertinent research for today’s big data society. Its exciting to see projects that are seeking to pull meaning from the proverbial Borge’s Library of Babel. Great stuff!
Peter Khooshabeh

IGERT Alum

May 23, 2013 | 05:37 p.m.

nice contribution. good luck.

peter
Joshua Stevens

Lead Presenter

May 24, 2013 | 04:01 a.m.

Thanks Ashley and Peter! I appreciate it.
Further posting is closed as the event has ended.

13543 Views

Share Presentation

Link

JOSHUA STEVENS

Making Sense of the Immense: An Application of Big Data Visual Analytics for Spatiotemporal Sensemaking in Social Media

Making Sense of the Immense: An Application of Big Data Visual Analytics for Spatiotemporal Sensemaking in Social Media

13543 Views

Share Presentation

Sandra Pinel

Joshua Stevens

Jeffrey Lidz

Joshua Stevens

Wayde Morse

Joshua Stevens

Aurora Sherman

Joshua Stevens

Gary Kofinas

Joshua Stevens

Gary Kofinas

Kathy Hoogeboom-Pot

Joshua Stevens

Jason Hong

Joshua Stevens

Dr. Ledley

Joshua Stevens

Lee Giles

Joshua Stevens

Ashley Richter

Peter Khooshabeh

Joshua Stevens

13543 Views

Share Presentation