Although by this point some of you may be avoiding the information overload surrounding Covid-19, understanding what is currently happening by getting up-to-date data directly from the source, and extracting one’s own conclusions can be empowering.
At this point of the term you’ve already an R toolbox broad enough to tackle the massive amount of data found in Johns Hopkins COVID-19 repository (click here). These data has been updated daily since the COVID-19 pandemic started. I also included links to other supplementary data sets to potentially explore the effectiveness of measures taken throughout the pandemic, including-mask use and vaccination. The datasets included are:
covid.ts.cases: daily time series for the confirmed number of COVID-19 cases at the county level for the US (file description here).
covid.ts.deaths: daily time series for the confirmed number of COVID-19 deaths at the county level for the US (file description here).
covid.usa.daily: COVID-19 USA daily state reports with the number of confirmed cases between April 14th and May 7th (file description here).
Vaccination data: state level COVID-19 daily vaccination numbers time series data from the Johns Hopkins University repository (file description here, )
Mask use data: information on mask use in the NY Times repository (file description here).
State policy data: data files (one file by state) about dates and description of policies going into/out of effect. To load data for a particular state go to this link, find the name of the state file you want to work with.
Here is the data:
# confirmed COVID-19 time series cases US (trimmed to include from 11/01/2020 to 05/12/2021)
covid.usa.ts.confirmed <- read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv') %>% select(c(1:11,296:488))
# Confirmed COVID-19 time series deaths US (trimmed to include from 11/01/2020 to 05/12/2021)
covid.usa.ts.deaths <- read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_US.csv') %>% select(c(1:11,296:488))
# Daily data summary by state for 05-12-2021
covid.usa.daily <- read_csv("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports_us/05-12-2021.csv")
# US vaccinated people data
vacc.people <- read_csv("https://raw.githubusercontent.com/govex/COVID-19/master/data_tables/vaccine_data/us_data/time_series/people_vaccinated_us_timeline.csv")
# mask use data
maskuse <- read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/mask-use/mask-use-by-county.csv")
# Texas
policytrackerTX <- read_csv("https://raw.githubusercontent.com/govex/COVID-19/govex_data/data_tables/policy_data/table_data/Current/Texas_policy.csv")
# Florida
policytrackerFL <- read_csv("https://raw.githubusercontent.com/govex/COVID-19/govex_data/data_tables/policy_data/table_data/Current/Florida_policy.csv")
# Hawaii
policytrackerHI <- read_csv("https://raw.githubusercontent.com/govex/COVID-19/govex_data/data_tables/policy_data/table_data/Current/Hawaii_policy.csv")
# Maine
policytrackerME <- read_csv("https://raw.githubusercontent.com/govex/COVID-19/govex_data/data_tables/policy_data/table_data/Current/Maine_policy.csv")
# Montana
policytrackerMT <- read_csv("https://raw.githubusercontent.com/govex/COVID-19/govex_data/data_tables/policy_data/table_data/Current/Montana_policy.csv")
# California
policytrackerCA <- read_csv("https://raw.githubusercontent.com/govex/COVID-19/govex_data/data_tables/policy_data/table_data/Current/California_policy.csv")
covid.usa.ts.confirmed
and covid.use.ts.deaths
, which are both at the county level and are in wide format, reshape them into long format (using either of the functions gather
or pivot_longer
) to generate a single new data frame with the daily time series BY STATE including both number of confirmed cases and deaths, so that you have one row for each combination of state and date. Call this new data frame covid.usa.states.ts
. Note: after reshaping your file into long format, your new date
column (or however you decide to call it) needs to be converted into a Date-Time
type variable. For example, if your variable is called my.date.variable
, you can make this conversion using lubridate::mdy(my.date.variable)
.## # A tibble: 5 x 4
## # Groups: Province_State [1]
## Dates Deaths Province_State Confirmed_Cases
## <date> <dbl> <chr> <dbl>
## 1 2021-01-01 4872 Alabama 365747
## 2 2021-01-10 5334 Alabama 401900
## 3 2021-01-11 5347 Alabama 404000
## 4 2021-01-12 5573 Alabama 407848
## 5 2021-01-13 5760 Alabama 410995
covid.usa.states.ts
(created in previous problem) all of the information from matching rows in vacc.people
(without repeating columns with the same info in the two data sets).## # A tibble: 5 x 6
## # Groups: Province_State [1]
## Dates Province_State Deaths Confirmed_Cases People_Fully_Vaccinated
## <date> <chr> <dbl> <dbl> <dbl>
## 1 2021-01-01 Alabama 4872 365747 NA
## 2 2021-01-10 Alabama 5334 401900 NA
## 3 2021-01-11 Alabama 5347 404000 NA
## 4 2021-01-12 Alabama 5573 407848 NA
## 5 2021-01-13 Alabama 5760 410995 NA
## # … with 1 more variable: People_Partially_Vaccinated <dbl>
Note: People_Fully_Vaccinated and People_Partially_Vaccinated show as NA because of the dates displayed were before the vaccine was released.
covid.usa.daily
select 3 highly impacted states, 3 mildly impacted states by COVID-19, where by ‘highly impacted’ I mean the states with high numbers of confirmed cases.Highly Impacted States
Mildly Impacted States
covid.usa.states.ts
data.frame to create the figure).Hawaii | ||
---|---|---|
Average Daily COVID Cases and Deaths 90 Days Before and After the Vaccine | ||
90 Days Before Vaccine | 90 Days After Vaccine | |
Average Confirmed Cases | 21,738.58 | 30,698.38 |
Average Deaths | 297.30 | 458.05 |
Maine | ||
---|---|---|
Average Daily COVID Cases and Deaths 90 Days Before and After the Vaccine | ||
90 Days Before Vaccine | 90 Days After Vaccine | |
Average Confirmed Cases | 14,705.47 | 42,874.47 |
Average Deaths | 236.53 | 647.44 |
Montana | ||
---|---|---|
Average Daily COVID Cases and Deaths 90 Days Before and After the Vaccine | ||
90 Days Before Vaccine | 90 Days After Vaccine | |
Average Confirmed Cases | 67,038.40 | 100,255.80 |
Average Deaths | 769.41 | 1,370.69 |
California | ||
---|---|---|
Average Daily COVID Cases and Deaths 90 Days Before and After the Vaccine | ||
90 Days Before Vaccine | 90 Days After Vaccine | |
Average Confirmed Cases | 2,121,173.24 | 3,639,255.15 |
Average Deaths | 26,717.37 | 56,556.90 |
Florida | ||
---|---|---|
Average Daily COVID Cases and Deaths 90 Days Before and After the Vaccine | ||
90 Days Before Vaccine | 90 Days After Vaccine | |
Average Confirmed Cases | 1,049,231.80 | 1,818,799.32 |
Average Deaths | 19,097.03 | 28,963.04 |
Texas | ||
---|---|---|
Average Daily COVID Cases and Deaths 90 Days Before and After the Vaccine | ||
90 Days Before Vaccine | 90 Days After Vaccine | |
Average Confirmed Cases | 1,371,918.26 | 2,515,123.77 |
Average Deaths | 23,334.68 | 41,117.35 |
maskuse
data explore if mask use seems to correlate with your selected states being midly or highly impacted.How do open and closure days relate to number of Deaths, Confirmed Cases, and People Fully Vaccinated from 3 highly affected states (California, Florida, Texas), and 3 mildly affected states (Hawaii, Maine, Montana)?
Note:
Opening are shown with gold vertical lines
Closures are shown with red vertical lines