About the dataset:
Hurricane Michael was the third-most intense Atlantic hurricane to make landfall in the United States in terms of pressure. This dataset was collected from Twitter during Hurricane Michael. The dataset was processed and analyzed using the AIDR (http://aidr.qcri.org) platform.
Dataset Description:
This is a Twitter dataset collected during Hurricane Michael 2018. The data was collected, processed, and analyzed by the AIDR (http://aidr.qcri.org) platform using state-of-the-art machine learning techniques. The data includes the number of injured and dead people, infrastructure damage reports, missing or found people, urgent needs and donation offers for each hour. Due to Twitter TOS, we do not share full tweets content on HDX. Please contact us via HDX or on aidr.qcri@gmail.com to get tweet ids of the dataset along with a tool which can be used to rehydrate tweets from tweet ids.
This resource is comprised of Twitter data collected and processed by the AIDR system during the 2017 hurricane Maria. The data contains information about number of people affected, injured, dead, reports of damages, missing people and so on. Please contact us if you need full dataset with tweets content.
This dataset contains:
windspeeds of Typhoon Nina
rainfall of Typhoon Nina
Priority Index of Typhoon Nina
The predicted priority index of Typhoon Nina is produced by a machine learning algorithm that was trained on five past typhoons: Haiyan, Melor, Hagupit and Rammasun and Haima, It uses base line data for the whole country, combined with impact data of windspeeds and rains, and trained on counts by the Philippine government on houses damaged and completely destroyed.
The output is a weighted index between partially damaged and completely damaged, where partially damaged is counted as 25% of the completely damaged. This has proven to give he highest accuracy.
The absolute number of houses damaged / people affected is insufficiently validated at the moment, and should just be used for further trainng and ground-truthing.
Scoring
The model has an best r2 score of 0.794933727 and an accuracy of 0.699470899
Data sources:
Administrative boundaries (P_Codes) - Philippines Government; Published by GADM and UN OCHA (HDX)
Census 2015 (population) - Philippine Statistics Authority; received from UN OCHA (HDX)
Avg. wind speed (mph) - University College London
Typhoon path - University College London
Houses damaged - NDRRMC
Rainfall - GPM
Poverty - Pantawid pamilyang pilipino program (aggregated)
Roof and wall materials
New geographical features
All the columns with feat_ indicates the importance of that feature, if not present that feature was not used.
learn_matrix name of the learning matrix with the 5 typhoons
run_name unique run name (pickle files and csv files have this name for this model)
typhoon_to_predict name of a new typhoon to predict
val_accuracy accuracy based on 10 categories of damage 0% 10% 20% …
val_perc_down perc of underpredicted categories
val_perc_up perc of overpredicted categories
Val_best_score best r2 score
Val_stdev_best_score error on best score based on the CV
Val_score_test r2 score on the test set (this should be around +- 5% of the previus number to not overfit
Val_mean_error_num_houses average error on the number of houses
val_median_error_num_houses median
val_std_error_num_houses std deviation of the errors (lower is better)
Algorithm developed by 510.global the data innovation initiative of the Netherlands Red Cross.
This dataset contains the location of affected schools by village, district, and governorate level. Includes number of students by gender, state of damage, school affected status, cause of damage, damage description, among other variables.