Introduction


Goals

Combine and clean firework injury data

About the Data

Source: Firework Injury Reports

The following notebook represents steps that were taken to organize and clean 5 excel files that consist of incident reports involving fireforks over the past 5 years from the CPSC (United States Consumer Product Safety Commision).

The data needed to be converted to csv since reading excel into pandas is extremely slow. CSV did create larger files but the trade off for increased reading speeds was worth it. This was done manually through Excel and then exporting each file as a csv. If there were more files a more "pythonic" method would have been used to save time.

Begin Cleaning Process


Import Libraries

import pandas as pd

Merge csv Files

filelist = ['NEISS_2016.csv','NEISS_2017.csv','NEISS_2018.csv','NEISS_2019.csv','NEISS_2020.csv']
df = pd.concat(map(pd.read_csv, filelist))
df.head(3)

CPSC_Case_Number Treatment_Date Age Sex Race Other_Race Hispanic Body_Part Diagnosis Other_Diagnosis ... Fire_Involvement Alcohol Drug Product_1 Product_2 Product_3 Narrative Stratum PSU Weight
0 160101845 1/1/16 92 1 0 NaN NaN 79 57 NaN ... 0 NaN NaN 1645 1807 0 92YOM TRYINGO TO TAKE OFF PANTS AND LOST BALAN... M 63 103.2251
1 160101847 1/1/16 90 1 0 NaN NaN 79 57 NaN ... 0 NaN NaN 670 0 0 90YOM FELL GETTING OUT OF A RECLINER CHAIR AND... M 63 103.2251
2 160101848 1/1/16 71 2 0 NaN NaN 79 57 NaN ... 0 NaN NaN 1807 0 0 71YOF SLIPPED AND FELL TO HER WET KITCHEN FLOO... M 63 103.2251

3 rows × 25 columns

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1791854 entries, 0 to 309369
Data columns (total 25 columns):
 #   Column             Dtype  
---  ------             -----  
 0   CPSC_Case_Number   int64  
 1   Treatment_Date     object 
 2   Age                int64  
 3   Sex                int64  
 4   Race               int64  
 5   Other_Race         object 
 6   Hispanic           float64
 7   Body_Part          int64  
 8   Diagnosis          int64  
 9   Other_Diagnosis    object 
 10  Body_Part_2        float64
 11  Diagnosis_2        float64
 12  Other_Diagnosis_2  object 
 13  Disposition        int64  
 14  Location           int64  
 15  Fire_Involvement   int64  
 16  Alcohol            float64
 17  Drug               float64
 18  Product_1          int64  
 19  Product_2          int64  
 20  Product_3          int64  
 21  Narrative          object 
 22  Stratum            object 
 23  PSU                int64  
 24  Weight             float64
dtypes: float64(6), int64(13), object(6)
memory usage: 355.4+ MB

Reduce DataFrame Size

Filter results to only diplay firework related incidents

  • Fireworks product code is '1313'
df1 = df.query('(Product_1 == 1313) | (Product_2 == 1313) | (Product_3 == 1313)')
df1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1532 entries, 57 to 308881
Data columns (total 25 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   CPSC_Case_Number   1532 non-null   int64  
 1   Treatment_Date     1532 non-null   object 
 2   Age                1532 non-null   int64  
 3   Sex                1532 non-null   int64  
 4   Race               1532 non-null   int64  
 5   Other_Race         88 non-null     object 
 6   Hispanic           701 non-null    float64
 7   Body_Part          1532 non-null   int64  
 8   Diagnosis          1532 non-null   int64  
 9   Other_Diagnosis    146 non-null    object 
 10  Body_Part_2        267 non-null    float64
 11  Diagnosis_2        267 non-null    float64
 12  Other_Diagnosis_2  28 non-null     object 
 13  Disposition        1532 non-null   int64  
 14  Location           1532 non-null   int64  
 15  Fire_Involvement   1532 non-null   int64  
 16  Alcohol            701 non-null    float64
 17  Drug               701 non-null    float64
 18  Product_1          1532 non-null   int64  
 19  Product_2          1532 non-null   int64  
 20  Product_3          1532 non-null   int64  
 21  Narrative          1532 non-null   object 
 22  Stratum            1532 non-null   object 
 23  PSU                1532 non-null   int64  
 24  Weight             1532 non-null   float64
dtypes: float64(6), int64(13), object(6)
memory usage: 311.2+ KB

Reduce Columns

Remove unnecessary columns

# reduce df size and clean na values
df2 = df1[['Treatment_Date','Age','Sex','Body_Part','Diagnosis','Disposition','Location','Alcohol','Drug','Narrative']].fillna('').reset_index(drop=True)
df2.head()

Treatment_Date Age Sex Body_Part Diagnosis Disposition Location Alcohol Drug Narrative
0 1/1/16 39 1 77 53 1 1 39YOM WAS LIGHTING BOTTLE ROCKETS AND ONE FLEW...
1 1/1/16 10 1 82 51 1 1 10YOM SUSTAINED A THERMAL BURN TO HAND AFTER H...
2 1/1/16 35 1 31 62 4 0 35YOM HIT IN THE CHEST WITH A MORTAR TYPE FIRE...
3 1/1/16 13 1 77 53 1 0 13YOM SOMEONE POINTED FIREWORKS AT HIM FROM 10...
4 1/1/16 216 1 31 51 1 0 16MOM FAMILY PLAYING WITH FIREWORKS AND ONE SH...

Fix Sex Data

# Fix Sex columns
df2.Sex = df2.Sex.replace(1,"Male").replace(2,"Female")
df2.head(2)

Treatment_Date Age Sex Body_Part Diagnosis Disposition Location Alcohol Drug Narrative
0 1/1/16 39 Male 77 53 1 1 39YOM WAS LIGHTING BOTTLE ROCKETS AND ONE FLEW...
1 1/1/16 10 Male 82 51 1 1 10YOM SUSTAINED A THERMAL BURN TO HAND AFTER H...

Import Incident Local Dataframe

Dataframe was manual input taken from the 2020 NEISS Coding Manual.pdf

local_df = pd.read_pickle('df_incident_local.pkl')
local_df

Code Incident Locale
0 1 Home
1 2 Farm/Ranch
2 4 Street or highway
3 5 Other public property
4 6 Manufactured (mobile) home
5 7 Industrial place
6 8 School
7 9 Place of recreation or sports
8 0 Not recorded

Fix and Mege Incident Locale

# fix Incident Locale
df3 = pd.merge(df2, local_df, left_on='Disposition', right_on='Code').drop(['Code','Location'],axis=1)
df3.sample(3)

Treatment_Date Age Sex Body_Part Diagnosis Disposition Alcohol Drug Narrative Incident Locale
1221 5/29/16 4 Female 31 51 4 4YOF W/BURNS TO CHEST & FINGER TIPS 2/2 PLAYIN... Street or highway
541 7/4/18 15 Male 36 72 1 15YOM WAS HIT IN BACK OF LEGS WHEN LARGE FIREW... Home
457 8/2/17 25 Male 76 51 1 25YOM PUT FIREWORKS BOX BON FIRE FIREWORKS WEN... Home

Import Body Part Dataframe

Dataframe was manual input taken from the 2020 NEISS Coding Manual.pdf

body_part_df = pd.read_pickle('df_body_part.pkl')
body_part_df

Code Body_Part
0 0 Internal
1 30 Shoulder
2 31 Upper Trunk
3 32 Elbow
4 33 Lower Arm
5 34 Wrist
6 35 Knee
7 36 Lower Leg
8 37 Ankle
9 38 Pubic Region
10 75 Head
11 76 Face
12 77 Eyeball
13 79 Lower Trunk
14 80 Upper Arm
15 81 Upper Leg
16 82 Hand
17 83 Foot
18 84 25-50'%' of Body
19 85 All Parts of Body
20 87 Not Stated
21 88 Mouth
22 89 Neck
23 92 Finger
24 93 Toe
25 94 Ear

Fix and Merge Body Part Data

# fix Body Part
df4 = pd.merge(df3, body_part_df, left_on='Body_Part', right_on='Code').drop(['Code','Body_Part_x'],axis=1)
df4.rename(columns={'Body_Part_y':'Body_Part'},inplace=True)
df4.sample(3)

Treatment_Date Age Sex Diagnosis Disposition Alcohol Drug Narrative Incident Locale Body_Part
169 7/4/20 14 Male 56 1 0.0 0.0 14YOM WAS PLAYING WITH AND WATCHING FIREWORKS ... Home Eyeball
63 7/5/17 32 Female 56 1 32YOF W/FOREIGN BODY IN EYE & CONJ IRRITATION ... Home Eyeball
1153 7/1/20 54 Male 50 2 1.0 0.0 54 YOM PRESENTED TO THE ER WITH A FIREWORK INJ... Farm/Ranch Finger

Import Diagnosis Dataframe

Dataframe was manual input taken from the 2020 NEISS Coding Manual.pdf

# import Diagnosis pkl
diagnosis_df = pd.read_pickle('df_diagnosis.pkl')
diagnosis_df

Diagnosis1 Code
0 Ingested foreign object 41
1 Aspirated foreign object 42
2 Burns, electrical 46 Burns, not specified 47
3 Burns, scald (from hot liquids or steam) 48
4 Burns, chemical (caustics, etc.) 49
5 Amputation 50
6 Burns, thermal (from flames or hot surface) 51
7 Concussions 52
8 Contusions, Abrasions 53
9 Crushing 54
10 Dislocation 55
11 Foreign body 56
12 Fracture 57
13 Hematoma 58
14 Laceration 59
15 Dental injury 60
16 Nerve damage 61
17 Internal organ injury 62
18 Puncture 63
19 Strain or Sprain 64
20 Anoxia 65
21 Hemorrhage 66
22 Electric shock 67
23 Poisoning 68
24 Submersion (including Drowning) 69
25 Other/Not Stated 71
26 Avulsion 72
27 Burns, radiation (includes all cell damage by ... 73
28 Dermatitis, Conjunctivitis 74

Fix and Merge Diagnosis Data

# fix Diagnosis
df5 = pd.merge(df4, diagnosis_df, left_on='Diagnosis', right_on='Code').drop(['Code','Diagnosis'],axis=1)
df5.rename(columns={'Diagnosis1':'Diagnosis'},inplace=True)
df5.sample(3)

Treatment_Date Age Sex Disposition Alcohol Drug Narrative Incident Locale Body_Part Diagnosis
153 6/10/19 58 Female 1 0.0 0.0 58YOF FALL AND C/O R KNEE, SHOULDER WHEN NEIGH... Home Knee Contusions, Abrasions
250 7/2/20 32 Male 2 0.0 0.0 32 YOM PESENTS WITH EYE INJURY. PT WAS LIGHTIN... Farm/Ranch Eyeball Other/Not Stated
783 7/1/16 24 Female 1 24 YO FEMALE HURT FINGER ON A TYPE L SPARKLER.... Home Finger Burns, thermal (from flames or hot surface)

Import Disposition Dataframe

Dataframe was manual input taken from the 2020 NEISS Coding Manual.pdf

# import Disposition pkl
disposition_df = pd.read_pickle('df_disposition.pkl')
disposition_df

Code Disposition
0 1 Treated/Untreated and Released
1 2 Treated and transferred to another hospital
2 3 Treated and admitted for hospitalization
3 4 Held for observation
4 5 Left without being seen
5 6 Left against medical advice
6 7 Left without treatment
7 8 Eloped Fatality/DOA/died in the ED/Died after ...
8 9 Not recorded

Fix and Merge Disposition Data

# fix Disposition
df6 = pd.merge(df5, disposition_df, left_on='Disposition', right_on='Code').drop(['Code','Disposition_x'],axis=1)
df6.rename(columns={'Disposition_y':'Disposition'},inplace=True)
df6.sample(3)

Treatment_Date Age Sex Alcohol Drug Narrative Incident Locale Body_Part Diagnosis Disposition
551 7/4/18 11 Female 11YOF WENT TO LIGHT UNKNOWN FIREWORK (TYPE R),... Home Upper Leg Burns, thermal (from flames or hot surface) Treated/Untreated and Released
374 7/8/17 212 Female 12MOF C/O BURN TO R HAND X1 HOUR PTA S/P GRABB... Home Hand Burns, thermal (from flames or hot surface) Treated/Untreated and Released
159 7/18/16 12 Male 12YOM FOREIGN BODY RT EYE WAS PLAYING W/ "BLAS... Home Eyeball Foreign body Treated/Untreated and Released

Export Cleaned Data to csv

df6.to_csv('injury_clean.csv')