A shiny app, model of carbon compass for NYC buildings.
Predicting damages or injures caused by a severe event is a key part of risk modeling software. The software calculates (financial) impacts of catastrophes before they occur. Typically, the input consists of an event generation and exposure data. Damage is then estimated for each affected exposure. Insured loss is evaluated based on policy conditions and the damage estimation.
Linear Regression is a simple tool for modelling the relationship between a scalar dependent variable and one or more explanatory variables, where this relationship is expressed as a linear function of the explanatory variables. If the model assumptions are met, it can predict an increase or decrease in the dependent variable based on the changes in the explanatory variables; e.g., for each $1 that I invest into the production budget of my future movie, how much will I earn on the movie’s total gross? Answering a question like this is not an easy task. Naturally, we would ask: (i) What is the prediction power of our model? (ii) Can we trust the model’s linear coefficient(s)? (iii) What features do we include/omit in our analysis? Could we perform any better, and if so, how does the final model look like?
MTA has publicly available datasets on turnstile activity. The data is recorded weekly. Here I analyze the collected data in a certain time interval (end of April through beginning of June). I identify the most frequent subway stations and the busiest times on a given day of an average week (Monday, Tuesday, .. Sunday). As expected, the commuter hub stations such as 34St - Penn Station, 42St - Grand Central, 34St - Herald Square or 42St - Times Square show most turnstile activity. The goal of this brief investigation is to find other frequent commuter stations that might not be easily identified based on transit patterns.