
Influenza is a disease which affects millions of people every year and causes hundreds of thousends of deads every year. This disease causes substantial direct and indirect costs every year.
The influenza epidemic have a particular behavior which shapes the statistical methods for their detection. Seasonal epidemics happen virtually every year in the temperate parts of the globe during the cold months and extend throughout whole regions, countries and even continents.
Besides the seasonal epidemics, some nonseasonal epidemics can be observed at unexpected times, usually caused by strains which jump the barrier between animals and humans, as happened with the well known Swine Flu epidemic, which caused great alarm in 2009.
Several statistical methods have been proposed for the detection of outbreaks of diseases and, in particular, for influenza outbreaks. A reduced version of the review present in this thesis has been published in REVSTATStatistical Journal by Amorós et al. in 2015.
An interesting tool for the modeling of statistical methods for the detection of influenza outbreaks is the use of Markov switching models, where latent variables are paired with the observations, indicating the epidemic or endemic phase. Two different models are applied to the data according to the value of the latent variable. The latent variables are temporally linked through a Markov chain. The observations are also conditionally dependent on their temporal or spatiotemporal neighbors. Models using this tool can offer a probability of being in epidemic as an outcome instead of just a ‘yes’ or ‘no’.
Bayesian paradigm offers an interesting framework where the outcomes can be interpreted as probability distributions. Also, inference can be done over complex hierarchical models, as usually the Markov switching models are.
This research offer two extensions of the model proposed by MartinezBeneito et al. in 2008, published in Statistics in Medicine. The first proposal is a framework of Poison Markov switching models over the counts. This proposal has been published in Statistical Methods in Medical Research by Conesa et al. in 2015. In this proposal, the counts are modeled through a Poisson distribution, and the mean of these counts is related to the rates through the population. Then, the rates are modeled through a Normal distribution. The the mean and variance of the rates depend on whether we are in the epidemic or nonepidemic phase for each week. The latent variables which determine the epidemic phase are modeled through a hidden Markov chain.
The mean and the variance on the epidemic phase is considered to be larger than the ones on the endemic phase. Different degrees of temporal dependency of the mean of the data can be defined. A first option is be to consider the rates conditionally independent. A second option is to consider that every observation is conditionally dependent on the previous observation through an autoregressive process of order 1. Higher orders of dependency can be defined, but we limited our framework of models to an autoregressive process of order 2 to avoid unnecessary complexity, as no big changes in the outcome were appreciated using higher orders of autocorrelation.
The application of this framework of methods over several data bases showed that this proposal outperforms other methodologies present in the literature. It also stresses several difficulties in the process of evaluation of statistical methods for the detection of influenza outbreaks.
The second proposal of this research is a spatiotemporal Markov switching model over the differentiated rates, which are considered to follow a normal distribution, with mean and variance parameters dependent on the epidemic state. The latent variables are modeled in the same way as in the temporal proposal, but having one conditionally independent hidden Markov chain for each of the locations. The variance of the endemic phase is also considered to be lower than that of the epidemic phase.
Three components are defined for the mean of the differentiated rates: First of all, a common term for all the regions for each time is set in both the endemic and epidemic mean. These terms are defined as two random effects, with mean zero and a higher variance for the epidemic phase. The variances of these random effects are linked to those of the likelihood to avoid problems of identifiability.
An autoregressive term for each location is also defined for the epidemic term, as it is expected that from the begining of the epidemic until the peak we observe similar positive jumps and from the peak to the end of the epidemic we observe similar negative jumps.
An intrinsic CAR structure is also defined for the epidemic mean, considering that the epidemic can spread to neighbor regions which will have similar epidemic increases of the rates.
This proposal has been applied over the United States Google Flu Trends data from 2007 to 2013 for the 48 spatially connected states plus Washington D.C. The comparison of the model with several simplifications and variations has stressed the necessity of several of the assumptions made during the modeling process.
