Use of data mining and artificial intelligence to derive public health evidence from large datasets

Sammanfattning: This thesis explores the use of data mining and AI-tailored frameworks for extracting public health evidence from large health datasets. The research presented in this thesis demonstrates the potential of these tools for automating and simplifying the data mining process, and for providing valuable insights into various public health issues.In Paper I, we used data mining and natural language processing to analyze the characteristics of genomic research on non-communicable diseases (NCDs) from the GWAS Catalog (2005 to 2022). We found that the majority of research institutions leading the work are often US-based and the majority of first, senior and all authors were male. The vast majority of complex trait GWAS has been performed in European ancestry populations, with cohorts and scientists predominantly located in medium-to-high socioeconomically ranked countries. This lack of diversity in both the data and the authorship of GWAS research has potential implications for the generalizability of genetic discoveries and the development of future interventions.In Paper II, we analyzed data collected through the app-based COVID Symptom Study in Sweden. We then created a symptom-based model to estimate the individual probability of symptomatic COVID-19 and employed this to estimate daily regional COVID-19 prevalence. We also used this data to predict next week COVID-19 hospital admissions and compared it to a model based on case notifications. We found that the symptom-based model had a lower median absolute percentage error during the first wave of the pandemic and that the model was transferable to an English dataset. The findings of this study demonstrate the feasibility of large-scale syndromic surveillance and the potential for population-based participatory surveillance initiatives in future pandemics and epidemics.In Paper III, we used data from over 500,000 participants in the COVID Symptom Study to investigate the impact of obesity and diabetes on the symptoms and duration of long-COVID. Using advanced data mining techniques, we found that individuals with higher BMI and diabetes had a higher burden of symptoms during the initial COVID-19 infection and a prolonged duration of long-COVID symptoms. We also found that vaccination had a protective effect against both COVID-19 symptoms and long-COVID symptoms in these at-risk groups. Our results demonstrate the disproportionate impact of COVID-19 on certain populations and the utility of app-based syndromic surveillance in providing timely and accurate information on the spread and impact of the virus.

  KLICKA HÄR FÖR ATT SE AVHANDLINGEN I FULLTEXT. (PDF-format)