Storm Events Classification

Predicting damages or injures caused by a severe event is a key part of risk modeling software. The software calculates (financial) impacts of catastrophes before they occur. Typically, the input consists of an event generation and exposure data. Damage is then estimated for each affected exposure. Insured loss is evaluated based on policy conditions and the damage estimation.

In general, quantitative risk assessment requires calculations of two components of risk: the magnitude of the potential loss, and the probability that the loss will occur. Here I focus my attention on the former and analyze NOAA’s storm events database. The database consists of individual storm observations described by features, including event type, timestamp (beginning and end of a storm event), latitude and longitude, state, number of injuries and deaths, property and crops damage, range and azimuth (if applicable) and others. The data comes from the National Weather Service. The National Weather service receives their information from a variety of sources: county, local, state and federal law enforcement and emergency management officials, skywarn spotters, NWS damage surveys, newspaper clipping services, the insurance industry and the general public, among others.

Event types vary from wind (such as strong wind, thunderstorm wind) and storm (including blizzard, hail, rain), tornadoes (including waterspout), hurricanes (including tropical storms and depressions), floods (and landslides), to events such as fires (heat), tsunami (tide) or winter weather (cold, avalanche, snow). Here I categorize all storm events into the eight categories, and apply an algorithm to test accuracy of the chosen categorization scheme. I find an average 83% accuracy with a random forest classifier. There are overlaps among the different categories (such as storm and wind, or tsunami and flood), however, a categorization like this condenses vast amount of different labels into a more organized labeling scheme.

Let’s explore the past twenty years of the dataset. The following plot shows average annual number of events of a given event type across the US (left), as well as the average annual property damage (inflation adjusted) (center and right) as a function of the year.
Notice a few peaks: 2005 Atlantic Hurricane Season, 2008 several storm and wind events, 2012 warmest year in US.

Although the number of storm events stays relatively unchanged, there are much wider variations in average property damage (similarly crops damage, or number of injuries/deaths) as a function of time. Catastrophic events, such as 2005 Katrina or 2008 Hurricane season, lead to large insured and uninsured property loss and number of injuries/deaths compared to other storm events even within the same event type.

Here I predict the amount of property damage based on a few storm-related features, including (beginning) latitude, longitude of the event, event type and season, among others. The most difficult task is to separate low property damage events (say, zero property damage events) from high property damage events (say, nonzero property damage events). I find that the two groups are present with an almost equal weight in the NOAA’s storm event dataset. By employing a random forest algorithm with an adjusted probability threshold value I classify the nonzero damage events with a high precision rate. Furthermore, a continuous regression on the labeled nonzero propery damage events leads to the Rsquared score of about 0.22. Notice that a similar analysis on the number of injuries or deaths is fundamentaly more difficult to accomplish, mostly because of very unbalanced data; the majority of reported events results in no injuries/deaths, with a few outsiders representing catastrophic events.

I created a simple predictor app that predicts the amount of damage to be the median annual damage per state per event type based on events between years 1996 and 2012. This visualization tool allows us to compare the amount of reported damage across states due to the eight different event types. Feel free to try it out !

Lastly, here is a slide deck on some of my findings and a source code.

Written on May 17, 2016