Characterization of Earthquake Damage and Effects Using Social Media Data
People in the locality of earthquakes are publishing anecdotal information about the shaking within seconds of their occurrences via social network technologies, such as Twitter. In contrast, depending on the size and location of the earthquake, scientific alerts can take between two to twenty minutes to publish. The goals of this project are to assess earthquake damage and effects information, as impacts unfold, by leveraging expeditious, free and ubiquitous social-media data to enhance our response to earthquake damage and effects.
Principal Investigator : Michelle Guy, Paul S Earle
Cooperator/Partner : Scott R Horvath, Douglas Bausch, Gregory M Smoczyk
The project leverages an existing system that performs data acquisition from Twitter of earthquake related tweets, and geocoding from Yahoo. The acquired data stream is archived into a Postgres database, filtered and monitored by the detection application (Tedect). The social media data acquisition and distribution application (TED) was enhanced to concurrently feed the focused data stream into an Elasticsearch index. Elasticsearch is an open source tool built on Apache Lucene, a high-performance text search engine library, that creates optimized searchable data in JSON format that is in turn visualized by Kibana, a web-based interactive interface. Elasticsearch and Kibana are new technologies that the team was not aware of at the time the proposal was written, but are proving to be a valuable, low cost way of providing rapid indications of earthquake significance and impacts.
As a proof of concept the geographic data associated with the project was upgraded to make it more readily and programmatically available for collaborators. The creation of GIS services for sharing and distributing data was deemed the easiest data feed source to most readily integrate with a variety of GIS end users, including FEMA. Implementation of two service types on a per event basis were established for manual configuration following significant, detected events. The first is a point based tweet dataset that can be mapped and queried. The second is a raster based heat map of tweet density which covers the region of interest. Once manually configured these services are created to acquire related data at three distinct time intervals after an event, which supports the series of post detection summary reports.
All of this data acquisition, processing, analysis, and organization leads to the automated production of valuable earthquake significance, damage and effects information for use by practitioners (scientists, earthquake responders and collaborators). The rapid availability of such earthquake characterization information allows practitioners to customize their alerts, from this system, according to event significance.
The proposed project included integration of additional social media data sources. However, further analysis, of data from Flickr and Instagram, found the signal to noise ratio too high to be of benefit in the scope of this project. Future efforts and tool development to more efficiently filter and assess the data may prove fruitful, however, with the resources available focusing on Twitter data and media proved more effective.
The outcomes from this project support USGS goals of advancing our understanding of earthquake effects, establishing data sharing and collaboration, as well as providing additional rapid situational awareness.
Deliverables
- Updated Social Media Earthquake Acquisition and Distribution Application, also known as TED. The enhancements to the system include derivation and distribution of earthquake significance summary reports and archival of tweet based event detections. The most recent code is available on the USGS stash git repository at: https://my.usgs.gov/stash/projects/NEIC/repos/ted/browse. Efforts are still underway to migrate to publically accessible git repository, which is planned for completion June 2015.
- Updated Social Media Earthquake Detection Application, also known as Tedect, was updated to send tweet based event detections into TED and performance improved. The most recent code is on the USGS stash git repository at: https://my.usgs.gov/stash/projects/NEIC/repos/tedect/browse. An older reviewed version is publically available at https://github.com/mguy-usgs/tedect.
- Earthquake Significance Summary Reports are automatically derived for tweet based event detections at five and ten minutes after events occur. The reports give rapid indications of public interest in the event, for use internally and by collaborators.
- Web Services
- The team established a workflow for using the Kibana visual data analysis tools to, in near real time, to determine earthquake significance as well as impacts and effects, as illustrated in Figure 2 and 3. This open source service currently has no free mechanism to secure the data in order to preserve data integrity, so it remains internal.
- As a proof of concept, with coordination with FEMA, a GIS service feed for detected events was implemented. An example heatmap created from this service included in figure 4 and is available at: http://geohazards.usgs.gov/arcgis/rest/services/NapaHeatmap/MapServer (content no longer available).
- Integrated Social Media and Seismic Earthquake Dataset - this integrated dataset provides validation of the system, and analysis demonstrates the system performance as described in figures 5, 6 and 7. The analysis of this dataset was shared publically at scientific conferences including American Geophysical Union (AGU) in December 2014 (https://agu.confex.com/agu/fm14/meetingapp.cgi#Paper/13007) and Association of Computing Machinery (ACM) Special Interest Group Knowledge Discovery and Data Mining (SIGKDD) Learning about Emergencies from Social Informaion (LESI) workshop in August 2014 (https://sites.google.com/site/kddlesi2014/program/papers).
Note: this description is from the CDI FY14 Annual Report
- Source: USGS Sciencebase (id: 53207404e4b0224be0a979e1)
People in the locality of earthquakes are publishing anecdotal information about the shaking within seconds of their occurrences via social network technologies, such as Twitter. In contrast, depending on the size and location of the earthquake, scientific alerts can take between two to twenty minutes to publish. The goals of this project are to assess earthquake damage and effects information, as impacts unfold, by leveraging expeditious, free and ubiquitous social-media data to enhance our response to earthquake damage and effects.
Principal Investigator : Michelle Guy, Paul S Earle
Cooperator/Partner : Scott R Horvath, Douglas Bausch, Gregory M Smoczyk
The project leverages an existing system that performs data acquisition from Twitter of earthquake related tweets, and geocoding from Yahoo. The acquired data stream is archived into a Postgres database, filtered and monitored by the detection application (Tedect). The social media data acquisition and distribution application (TED) was enhanced to concurrently feed the focused data stream into an Elasticsearch index. Elasticsearch is an open source tool built on Apache Lucene, a high-performance text search engine library, that creates optimized searchable data in JSON format that is in turn visualized by Kibana, a web-based interactive interface. Elasticsearch and Kibana are new technologies that the team was not aware of at the time the proposal was written, but are proving to be a valuable, low cost way of providing rapid indications of earthquake significance and impacts.
As a proof of concept the geographic data associated with the project was upgraded to make it more readily and programmatically available for collaborators. The creation of GIS services for sharing and distributing data was deemed the easiest data feed source to most readily integrate with a variety of GIS end users, including FEMA. Implementation of two service types on a per event basis were established for manual configuration following significant, detected events. The first is a point based tweet dataset that can be mapped and queried. The second is a raster based heat map of tweet density which covers the region of interest. Once manually configured these services are created to acquire related data at three distinct time intervals after an event, which supports the series of post detection summary reports.
All of this data acquisition, processing, analysis, and organization leads to the automated production of valuable earthquake significance, damage and effects information for use by practitioners (scientists, earthquake responders and collaborators). The rapid availability of such earthquake characterization information allows practitioners to customize their alerts, from this system, according to event significance.
The proposed project included integration of additional social media data sources. However, further analysis, of data from Flickr and Instagram, found the signal to noise ratio too high to be of benefit in the scope of this project. Future efforts and tool development to more efficiently filter and assess the data may prove fruitful, however, with the resources available focusing on Twitter data and media proved more effective.
The outcomes from this project support USGS goals of advancing our understanding of earthquake effects, establishing data sharing and collaboration, as well as providing additional rapid situational awareness.
Deliverables
- Updated Social Media Earthquake Acquisition and Distribution Application, also known as TED. The enhancements to the system include derivation and distribution of earthquake significance summary reports and archival of tweet based event detections. The most recent code is available on the USGS stash git repository at: https://my.usgs.gov/stash/projects/NEIC/repos/ted/browse. Efforts are still underway to migrate to publically accessible git repository, which is planned for completion June 2015.
- Updated Social Media Earthquake Detection Application, also known as Tedect, was updated to send tweet based event detections into TED and performance improved. The most recent code is on the USGS stash git repository at: https://my.usgs.gov/stash/projects/NEIC/repos/tedect/browse. An older reviewed version is publically available at https://github.com/mguy-usgs/tedect.
- Earthquake Significance Summary Reports are automatically derived for tweet based event detections at five and ten minutes after events occur. The reports give rapid indications of public interest in the event, for use internally and by collaborators.
- Web Services
- The team established a workflow for using the Kibana visual data analysis tools to, in near real time, to determine earthquake significance as well as impacts and effects, as illustrated in Figure 2 and 3. This open source service currently has no free mechanism to secure the data in order to preserve data integrity, so it remains internal.
- As a proof of concept, with coordination with FEMA, a GIS service feed for detected events was implemented. An example heatmap created from this service included in figure 4 and is available at: http://geohazards.usgs.gov/arcgis/rest/services/NapaHeatmap/MapServer (content no longer available).
- Integrated Social Media and Seismic Earthquake Dataset - this integrated dataset provides validation of the system, and analysis demonstrates the system performance as described in figures 5, 6 and 7. The analysis of this dataset was shared publically at scientific conferences including American Geophysical Union (AGU) in December 2014 (https://agu.confex.com/agu/fm14/meetingapp.cgi#Paper/13007) and Association of Computing Machinery (ACM) Special Interest Group Knowledge Discovery and Data Mining (SIGKDD) Learning about Emergencies from Social Informaion (LESI) workshop in August 2014 (https://sites.google.com/site/kddlesi2014/program/papers).
Note: this description is from the CDI FY14 Annual Report
- Source: USGS Sciencebase (id: 53207404e4b0224be0a979e1)