Notebook Created by: David Rusho (Github Blog | Tableau | Linkedin)

Tableau Dashboard: US Fireworks Inuries (2016-2021)

Introduction

Fireworks in the US are commonly used during the 4th of July and New Years' celebrations. Besides being nice to look at fireworks also cause a lot of apparent problems.

Fires

"More than one-quarter (28 percent) of fires started by fireworks in 2014–2018 were reported on the Fourth of July."

Personal Injury

"The Consumer Product Safety Commission (CPSC) estimates that 9,100 consumer fireworks-related injuries were seen in US hospital emergency departments in 2018."

Project Goals

To provide an overview of the types of incidents that involve fireworks. This includes understanding which age groups are most affected and the frequency of injury types. Insights from this analysis may help prevent future injuries or at least assist with increasing awareness of the dangers that fireworks can cause.

In addition to the above analysis, we'll take a look into sales data for all US states for the last five years (2016-2021).

Summary of Results

Age group of 0-20 showed the highest rate of injury.

Injuries to the hands, face, and eyes were the most common, while injuries to lower extremities were less common.

Missouri held the record for most spent on fireworks (over 250 million dollars over the past 5 years).

There was no significate correlation between the number of injuries in a year compared to the number of sales.

Import Libraries

# install libraries to save plotly images to disk
%%capture
!pip install kaleido
!pip install plotly>=4.0.0
!wget https://github.com/plotly/orca/releases/download/v1.2.1/orca-1.2.1-x86_64.AppImage -O /usr/local/bin/orca
!chmod +x /usr/local/bin/orca
!apt-get install xvfb libgtk2.0-0 libgconf-2-

!pip install texthero

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import seaborn as sns
import texthero as hero
from texthero import preprocessing
from texthero import stopwords
import warnings
warnings.filterwarnings('ignore')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.

Import Firework Injury Datasets

# Import clean injury dataframe
injury = 'https://github.com/drusho/fireworks_data_exploration/raw/main/data/data_clean/df_injury_clean.csv'
df_injury = pd.read_csv(injury,usecols=[1,2,3,4,5,6,7,8,9,10])
df_injury.head(3)

Time Series Analysis of Injuries (Scatter Plot)

July 4th firework celebrations are the clear winner in number of firework related injuries.

#Time Series Analysis of Injuries (Scatter Plot)
#groupby treatment
injury_dates = df_injury.groupby('Treatment_Date').count().reset_index()
injury_dates = injury_dates.rename(columns={'Age':'Count'})

fig = px.scatter(injury_dates, x="Treatment_Date", y="Count")

fig.update_layout({"plot_bgcolor":"rgba(255,255,255, 0.9)"},
                  title={'text': "Firework Injury Counts by Date",
                  'y':.98,
                  'x':.5,
                  'xanchor': 'center',
                  'yanchor': 'top'},
                  xaxis=dict(title='Date of Injury'),
                  yaxis=dict(title='Injury Counts'))

fig.update_traces(marker_color='#1f77b4')

fig.show()

Narrative Column Word Frequency (Bar Plot)

The Narrative column contains a detailed description of each injury reported.

Notice how the body parts 'hand', 'eye', and 'face' make the list.

#create a custom cleaning pipeline
custom_pipeline = [preprocessing.fillna
                   , preprocessing.lowercase
                   , preprocessing.remove_digits
                   , preprocessing.remove_punctuation
                   , preprocessing.remove_diacritics
                   #, preprocessing.remove_stopwords
                   , preprocessing.remove_whitespace]
                  #  , preprocessing.stem]

#pass the custom_pipeline to the pipeline argument
df_injury['clean_nar'] = hero.clean(df_injury['Narrative'], pipeline = custom_pipeline)

#add a list of stopwords to the stopwords
default_stopwords = stopwords.DEFAULT 

#Call remove_stopwords and pass the custom_stopwords list
custom_stopwords = default_stopwords.union(set(["'","I","r","dx","i","l","yom","yow","pt","type","p","w"]))

df_injury['clean_nar'] = hero.remove_stopwords(df_injury['clean_nar'], custom_stopwords)

tw  = hero.visualization.top_words(df_injury['clean_nar']).head(20).reset_index()

fig = px.bar(tw, 
             x='index', 
             y='clean_nar',
             orientation='v')

fig.update_layout({"plot_bgcolor":"rgba(255,255,255, 0.9)"},
                  title={'text': "Word Frequency for Injury Reports (2016-2021)",
                  'y':.98,
                  'x':.5,
                  'xanchor': 'center',
                  'yanchor': 'top'},
                  xaxis=dict(title=''),
                  yaxis=dict(title='Word Counts'))

fig.update_traces(marker_color='#1f77b4')

fig.show()

Wordcloud of Injury Report (2016-2021)

Words are taken from the Narrative column and are narrowed down to the top 200 words.

Looking at this wordcloud I get the idea that the initial lighting of fireworks is when a lot of injuries occur.

#Wordcloud from Narrative column using hero

hero.wordcloud(df_injury['clean_nar'], max_words=200,contour_color='', 
               background_color='white',colormap='Blues',
               height = 500,width=800)

Counting Drug and Alchol Use

The number of people injuried with positive results for drug or alcohol use was very insignificant compared to overall count of injuries.

Drug and Alcohol Usuage (Bar Plot)

#define figure size
sns.set(rc={"figure.figsize":(15, 6)})

#set background to white
sns.set_style("white")

fig, ax = plt.subplots(1,2)

sns.countplot(df_injury['Drug'], 
              ax=ax[0],
              palette="Oranges_r")

ax[0].set_title('Drug Related Incidents', 
             fontdict = {'fontsize': 15})
ax[0].set(ylabel='Counts',
       xlabel='')

sns.countplot(df_injury['Alcohol'], 
              ax=ax[1],
              palette="Blues_r")

ax[1].set_title('Alcohol Related Incidents', 
             fontdict = {'fontsize': 15})
ax[1].set(ylabel='',
       xlabel='')

# remove spines
sns.despine(left=True)

#save to png
# fig.savefig("Drug/Alcohol Counts.png")

plt.show()
fig.savefig('Drug_and_Alcohol_Counts.png')

plt.show()

Incident Counts by Year (Bar Plot)

The most injuries occured in 2020, which also saw its largest increase in firework sales. This correlation doesn't seem to match prior years.

# Incident Counts by Year BarGraph

#define figure size
sns.set(rc = {"figure.figsize":(12,8)})

#set background to white
sns.set_style("white")

treamentDates = df_injury['Treatment_Date'].dt.year.value_counts().sort_index().reset_index()

ax = sns.barplot(y="Treatment_Date", 
                 x="index", 
                 data=treamentDates, 
                 palette="Blues")

#set x,y labels
ax.set(xlabel='',
       ylabel='Incident Counts')

#set titles
ax.set_title('Firework Injury Counts by Year', 
             fontdict = {'fontsize': 15})

#remove spiens
sns.despine(left=True)

#save to png
ax.figure.savefig("Firework Injury Counts by Year.png")

plt.show()

Incident Counts by Gender (Bar Plot)

# Incident Counts by Sex

incidentSex = df_injury['Sex'].value_counts().reset_index(name='incidents')

#define figure size
sns.set(rc={"figure.figsize":(12, 8)})

#set background to white
sns.set_style("white")

ax = sns.barplot(x="incidents", 
                 y="index", 
                 data=incidentSex, 
                 palette="Blues_r")

#set x,y labels
ax.set(xlabel='',
       ylabel='Injury Counts')

#set titles
ax.set_title('Firework Injury Counts by Gender (2016-2020)', 
             fontdict = {'fontsize':15})

#remove spines
sns.despine(left=True)

#save to png
ax.figure.savefig("Firework Injury Counts by Gender.png")

plt.show()

Incident Counts by Body Part (Bar Plot)

The body parts listed also match the body parts mentioned in word frequency count of narratives.

# Incident Counts by Body Part

#define figure size
sns.set(rc={"figure.figsize":(12,8)})

#set background color
sns.set_style("white")

incidentBp = df_injury['Body_Part'].value_counts().reset_index(name='incidents').head(23)

ax = sns.barplot(x="incidents", 
                 y="index", 
                 data=incidentBp, 
                 palette="Blues_r")

#set x,y labels
ax.set(xlabel='',
       ylabel='')

#set title
ax.set_title('Firework Injury Counts by Body Part (2016-2020)', 
             fontdict = {'fontsize':15})

#remove spines
sns.despine(left=True)

#save to png
ax.figure.savefig("Incident Counts by Body Part.png")

plt.show()

Age Groups (Histogram)

Details

The histogram below shows that the majority of firework related injuries that were reported occured with indivuals under the age of 40. The largest grouping belonged to individuals under the age of 20, which is also below the legal drinking age in the US.

# Histogram of Ages

#set figsize
sns.set(rc={"figure.figsize":(8, 8)})

#set background color
sns.set_style("white")

ax = sns.histplot(data=df_injury, 
                  x='Age_Fix', 
                  bins=5)

#set x,y labels
ax.set(xlabel='Age',
       ylabel='Counts')

#set title
ax.set_title('Firework Injury Counts by Age (2016-2020)', 
             fontdict = {'fontsize':15})

#remove spines
sns.despine(left=True)

#save to png
ax.figure.savefig("Incident Counts by Age_Hist.png")

plt.show()

Incident Counts by Age (Bar Plot)

# Incident Counts by Age (Age_Fix)

#define figure size
sns.set(rc={"figure.figsize":(30, 12)})

#set background color
sns.set_style("white")

incidentAge = df_injury['Age_Fix'].value_counts().reset_index(name='incidents')

ax = sns.barplot(y="incidents", 
                 x="index", 
                 data=incidentAge, 
                 palette="Blues_r")

#set x,y labels
ax.set(xlabel='Age',
       ylabel='Count')

#set title
ax.set_title('Firework Injury Counts by Age (2016-2020)', 
             fontdict = {'fontsize':30})

#remove spines
sns.despine(left=True)

#save to png
ax.figure.savefig("Incident Counts by Age_Bar.png")

plt.show()

Firework Incident Counts by Age, Year, and Gender (Swarm Plot)

# Swarm graph by age, year, and gender

#define figure size
sns.set(rc={"figure.figsize":(14,10)})

#set background color
sns.set_style("white")

ax = sns.swarmplot(data = df_injury, 
                   x = df_injury['Treatment_Date'].dt.year, 
                   y = "Age_Fix",
                   hue = "Sex",
                   palette = "Blues_r")

#set x,y labels
ax.set(xlabel = 'Year',
       ylabel = 'Age')

#set title
ax.set_title('Firework Injury Counts by Age', 
             fontdict = {'fontsize':18})

#remove spines
sns.despine(left=True)

#save to png
ax.figure.savefig("Incident Counts by Age_Swarm.png")

plt.show()

Incident Counts by Diagnosis (Bar Plot)

Most injuries were from burns. Combining this information with the analysis above on injuries to body parts (eyes, face, and hands) provides a clear picture of how severe these burn injuries are to people involved.

# Incident Counts by Diagnosis

incidentDia = df_injury['Diagnosis'].value_counts().reset_index(name='incidents').head(10)

#define figure size
sns.set(rc={"figure.figsize":(14,10)})

#set background color
sns.set_style("white")

ax = sns.barplot(x = "incidents", 
                 y = "index", 
                 data = incidentDia, 
                 palette = "Blues_r")

#set x,y labels
ax.set(xlabel = '',
       ylabel = '')

#set title
ax.set_title('Firework Injury Counts by Diagnosis (2016-2020)', 
             fontdict = {'fontsize':18})

#remove spine
sns.despine(left=True)

# #set y axis labels (shortened longer labels to fit for print out)
# ax.set_yticklabels(['Burns', 'Contusions', 'Abrasions','Other/Not Stated',
#                     'Laceration','Fracture','Amputation','Foreign body',
#                     'Internal organ','Strain or Sprain','Avulsion',
#                     'Anoxia','Puncture','Poisoning','Dermatitis', 'Conjunctivitis',
#                     'Concussions','Hematoma'])

#save to png
ax.figure.savefig("Incident Counts by Diagnosis.png")

plt.show()

Import Firework Sales Data (State)

sales_state = 'https://github.com/drusho/fireworks_data_exploration/raw/main/data/data_raw/State%20Imports%20by%20HS%20Commodities.csv'
df_sales_st = pd.read_csv(sales_state,skiprows=4,usecols=[0,1,2,3])
df_sales_st.head()

Web Scraping for State Abbreviations

The dataframe was missing state abbreviations that are needed to plot data onto a map using Plotly. Used pandas function 'read_hml' to read tables from a website that contained state and state abbreviation data.

#WebScrap State Abbreviations

#scrap state names and abbrev
states_abrev = pd.read_html('https://abbreviations.yourdictionary.com/articles/state-abbrev.html')[0].iloc[1:,:2]

#scrap US territory names and abbrev
territories = pd.read_html('https://abbreviations.yourdictionary.com/articles/state-abbrev.html')[1].iloc[[2,5],:2]

#merge dfs
st_at = states_abrev.merge(territories,how='outer').sort_values(by=0).reset_index(drop=True)

#rename cols
st_at.rename(columns={0:'State',1:'Abbrevation'},inplace=True)

st_at.head()

Merging State Abbreviations with Master Dataframe

#merge abbrevation with state sales data
df_sales_st2 = df_sales_st.merge(st_at,how='inner')

df_sales_st2.head()

Top Sales per State (Bar Plot)

# Visualization State Sales (Bar Plot)
st_sales = df_sales_st2.copy()

st_sales = st_sales.groupby('State')['Total Value ($US)'].sum().reset_index(name='Sales').sort_values(by='Sales',ascending=False).reset_index(drop=True).head(20)

st_sales.sort_values(by='Sales',ascending=True,inplace=True)

fig = px.bar(st_sales, 
             y='State', 
             x='Sales',
             orientation='h',
             color_continuous_scale='Blues',
             color="Sales")

fig.update_layout({"plot_bgcolor":"rgba(255,255,255, 0.9)"},
                  title={'text': "Firework Total Sales ($USD) 2016-2020",
                  'y':.98,
                  'x':.5,
                  'xanchor': 'center',
                  'yanchor': 'top'})

fig.show()

# # save fig to image
# fig.write_image("Total Firework Sales ($USD) 2016-2020.png", width=1980, height=1080)

# fig.write_html("Total Firework Sales ($USD) 2016-2020.html")

State Sales (Scatter Plot)

Note

Plotly currently does not have the ability to position colorscales to horizontal.

# Visualization State Sales (Scatter Plot)

df_sales_st2.sort_values(by='State',ascending=False,inplace=True)

fig = px.scatter(df_sales_st2, 
                 y="State", 
                 x="Time", 
                 color="Total Value ($US)",
                 size='Total Value ($US)', 
                 width=800, height=1100,
                 color_continuous_scale='Blues')

#change background and legend background to white
fig.update_layout({"plot_bgcolor":"rgba(255,255,255, 0.9)"},
                  #  "paper_bgcolor": "rgba(255,255,255, 0.9)"},
                  title={'text': "Firework Sales ($USD)",
                         'y':.98,
                         'x':.5,'xanchor':'center',
                         'yanchor': 'top'},
                  xaxis=dict(title=''),
                  yaxis=dict(title=''))

fig.show()

# save fig to image
fig.write_image("Firework Sales ($USD) (scatter_plot).png", width=800, height=1000)

fig.write_html("Firework Sales ($USD) (scatter_plot).html")

State Sales (Heatmap Plot of US)

fig = px.choropleth(df_sales_st2,  # Input Pandas DataFrame
                    locations="Abbrevation",  # DataFrame column with locations
                    color="Total Value ($US)",  # DataFrame column with color values
                    hover_name="Abbrevation", # DataFrame column hover info
                    locationmode = 'USA-states', # Set to plot as US States
                    color_continuous_scale='Blues')

fig.update_layout(
    title={
        'text': "Firework Total Sales ($USD) 2016-2020",
        'y':.95,
        'x':.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    geo_scope='usa')  # Plot only the USA instead of globe

fig.show()

# save fig to image
fig.write_image("Total State Firework Sales ($USD) 2016-2020 (map).png", 
                width=1980, 
                height=1080)
fig.write_html("Total State Firework Sales ($USD) 2016-2020 (map).html")

Comparing Sales and Injury Reports

Total Sales Groupby Year

# Total Sales groupby Year
sales_year = df_sales_st2.groupby(df_sales_st2['Time'].dt.year).sum().reset_index(drop=False)
sales_year.rename(columns={'Time':'Year','Total Value ($US)':'Sales'},inplace=True)
sales_year

Total Injuries Groupby Year

# df_injury.groupby(['Treatment_Date']
df_injury_count = df_injury.groupby(df_injury['Treatment_Date'].dt.year)['Age'].count().reset_index(name='Count')
df_injury_count.rename(columns={'Treatment_Date':'Year'},inplace=True)
df_injury_count

Merging Sales and Injury DataFrame on Year

# Merge sales and injury dfs on year
df_merged = sales_year.merge(df_injury_count,how='left')
df_merged

Determining Correlation of New DataFrame

# Correlation
df_merged.corr()

Conclusion

Injuries

Age groups of 0-20 showed the highest rate of injury. Injury rates by age decrease with age tend to slowly decrease after age 20. The 60+ age groups showed the lowest rate of injury.

Injuries to the hands, face, and eyes were the most common, while injuries to lower extremities were less common. This is reflected numerous times in the data, such as with word frequency of injury narratives, where an explanation is given for how a person was injured.

Time series analysis showed that the month of July has the highest frequency of firework related injuries.

Sales

Missouri held the record for most spent on fireworks (over 250 million dollars over the past 5 years). For comparison, Alaska spent around 560,000 dollars in the last five years. Sales for fireworks saw a considerable increase in sales during 2020, most likely due to COVID-19.

Correlation Between Sales and Injury Counts

There was no significate correlation between the number of injuries in a year compared to the number of sales.

Ending Remarks

Never hold fireworks while lighting them, and hand, eye, and face protection should be worn at all times when fireworks are nearby fireworks. This is especially true if you are a male below the age of 21.

Tools Used

Matplotlib
Pandas
Plotly
Seaborn

Resources

Consumer Product Safety Commision
USA Trade Census
National Fire Protection Association: Fireworks fires and injuries

Source Details

Source 1 consisted multiple excel incident reports involving fireworks over the past 5 years taken from the U.S. Consumer Product Safety Commission (CPSC). The NEISS injury data are gathered from the emergency departments (ED) of 96 hospitals selected as a probability sample of all U.S. hospitals with 24-hour EDs and at least 6 inpatient beds. Each participating NEISS hospital is hand-selected by CPSC because it provides an important representation of all other hospitals of its size and unique characteristics in the U.S. Source 2 conatins Sales and Trade data for each state regarding fireworks. Reports data range from 2016 to April 2021.

	Treatment_Date	Age	Sex	Alcohol	Drug	Narrative	Incident Locale	Body_Part	Diagnosis	Disposition
0	1/1/16	39	Male	NaN	NaN	39YOM WAS LIGHTING BOTTLE ROCKETS AND ONE FLEW...	Home	Eyeball	Contusions, Abrasions	Treated/Untreated and Released
1	1/1/16	13	Male	NaN	NaN	13YOM SOMEONE POINTED FIREWORKS AT HIM FROM 10...	Home	Eyeball	Contusions, Abrasions	Treated/Untreated and Released
2	7/5/16	31	Female	NaN	NaN	A 31YOF WAS STRUCK TO EYE WITH PIECE OF FIRECR...	Home	Eyeball	Contusions, Abrasions	Treated/Untreated and Released

	State	Commodity	Time	Total Value ($US)
0	Alabama	360410 Fireworks	2016	29,602,090
1	Alabama	360410 Fireworks	2017	19,396,430
2	Alabama	360410 Fireworks	2018	26,399,895
3	Alabama	360410 Fireworks	2019	28,353,392
4	Alabama	360410 Fireworks	2020	23,141,950

	State	Abbrevation
0	Alabama	AL
1	Alaska	AK
2	Arizona	AZ
3	Arkansas	AR
4	California	CA

	State	Commodity	Time	Total Value ($US)	Abbrevation
0	Alabama	360410 Fireworks	2016	29,602,090	AL
1	Alabama	360410 Fireworks	2017	19,396,430	AL
2	Alabama	360410 Fireworks	2018	26,399,895	AL
3	Alabama	360410 Fireworks	2019	28,353,392	AL
4	Alabama	360410 Fireworks	2020	23,141,950	AL

	Year	Sales
0	2016	307825710
1	2017	279962808
2	2018	331072715
3	2019	320021354
4	2020	300987616
5	2021	157186415

	Year	Sales	Count
0	2016	307825710	268.0
1	2017	279962808	329.0
2	2018	331072715	234.0
3	2019	320021354	261.0
4	2020	300987616	440.0
5	2021	157186415	NaN

	Year	Sales	Count
Year	1.000000	-0.585906	0.529818
Sales	-0.585906	1.000000	-0.590087
Count	0.529818	-0.590087	1.000000