kth.sePublications
Planned maintenance
A system upgrade is planned for 10/12-2024, at 12:00-13:00. During this time DiVA will be unavailable.
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Mobile Phone Data Analytics to Support Disaster and Disease Outbreak Response
KTH, School of Architecture and the Built Environment (ABE), Urban Planning and Environment, Geoinformatics.ORCID iD: 0000-0001-7218-9082
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Natural disasters result in devastating losses in human life, environmental assets, and personal, regional, and national economies. The availability of different big data such as satellite images, Global Positioning System (GPS) traces, mobile Call Detail Records (CDR), social media posts, etc., in conjunction with advances in data analytic techniques (e.g., data mining and big data processing, machine learning and deep learning), can facilitate the extraction of geospatial information that is critical for rapid and effective disaster response. However, disaster response system development usually requires the integration of data from different sources (streaming data sources and data sources at rest) with different characteristics and types, which consequently have different processing needs. Deciding which processing framework to use for a specific big data to perform a given task is usually a challenge for researchers from the disaster management field. While many tasks can be accomplished with population and movement data, for disaster management, a key and arguably most important task is to analyze the displacement of the population during and after a disaster. Therefore, in this thesis, the knowledge and framework resulted from a literature review. These were used to select tools and processing strategies to perform population displacement (the forced movement or relocation of people from their original homes) analysis after a disaster. This is a use case of the framework as well as an illustration of the value and challenges (e.g., gaps in data due to power outages) of using CDR data analysis to support disaster management.

Displaced populations were inferred by analyzing the variation of home cell-tower for each anonymized mobile phone subscriber before and after a disaster using CDR data. The effectiveness of the proposed method is evaluated using remote sensing-based building damage assessment data and Displacement Tracking Matrix (DTM) from individuals’ survey responses at shelters after a severe cyclone in Beira city, central Mozambique, in March 2019. The results show an encouraging correlation coefficient (over 70%) between the number of arrivals in each neighborhood estimated using CDR data and from DTM. In addition to this, CDR-based analysis derives the spatial distribution of displaced populations with high coverage of people, i.e., including not only people in shelters but everyone who used a mobile phone before and after disaster. Moreover, results suggest that if CDR data are available after a disaster, population displacement can be estimated. These details can be used for response activities and for example to contribute to reducing waterborne diseases (e.g., diarrheal disease) and diseases associated with crowding (e.g., acute respiratory infections) in shelters and host communities.

Although COVID-19 is not a post-disaster disease, it is an acute respiratory illness that can be severe. By assuming that its characteristics can be similar to an acute respiratory infection following a disaster, a deep learning approach was tested to predict the spread of COVID-19. The tested deep learning approach consists of multilayer BiLSTM. In order to train the model to predict daily COVID-19 cases in low-income countries, mobility trend data from Google, temperature, and relative humidity were used. The performance of the proposed multilayer BiLSTM is evaluated by comparing its RMSE with the one from multilayer LSTM (with the same settings as BiLSTM) in four developing countries namely Mozambique, Rwanda, Nepal, and Myanmar. The proposed multilayer BiLSTM outperformed the multilayer LSTM in all four countries. The proposed multilayer BiLSTM was also evaluated by comparing its root mean squared error (RMSE) with multilayer LSTM models, ARIMA- and stacked LSTM-based models in 8 countries, namely Italy, Turkey, Australia, Brazil, Canada, Egypt, Japan, and the UK. Finally, the proposed multilayer BiLSTM model was evaluated at the city level by comparing its average relative error (ARE) with the other four models, namely the LSTM-based model considering multilayer architecture, Google Cloud Forecasting, the LSTM-based model with mobility data only, and the LSTM-based model with mobility, temperature, and relative humidity data for 7 periods (of 28 days each) in six highly populated regions in Japan, namely Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka. The proposed multilayer BiLSTM model outperformed the multilayer LSTM model and other previous models by up to 1.6 and 0.6 times in terms of RMSE and ARE, respectively. Therefore, the proposed model enables more accurate forecasting of COVID-19 cases. This can support governments and health authorities in their decisions, mainly in developing countries with limited resources.

In addition to understanding the disease spread dynamics, rapid implementation of control measures is critical in the case of a post-disaster outbreak. This is crucial to stopping the spread of the disease. However, its implementation is based on informed decisions. Therefore, in order to support the decision-makers, a data-driven approach for estimating spatio-temporal exposure risk of locations using mobile phone data was tested. The approach used anonymized CDR from one of the biggest mobile network operators in Mozambique to estimate the daily origin-destination (OD) matrices. The daily OD matrices are estimated at province level since the available daily COVID-19 cases (validation data) are at that level. COVID-19 was used as a proxy of a post-disaster disease due to the unavailability of daily real-world data of a disease following a natural disaster in Mozambique. The estimated daily OD matrices are then used to construct the daily directed-weighted networks, in which the nodes represent provinces and the edges, the people flowing between each pair of provinces. Then, three centrality measures, namely weighted in-degree centrality, improved in-degree centrality, and weighted PageRank were used to estimate the daily exposure risk of each province. The results were evaluated by computing the Spearman’s rank correlation between risk score estimated using the daily COVID-19 reported cases and the exposure risk estimated using the three measures. The comparison results revealed that the overall weighted PageRank algorithm is the best measure at estimating exposure risk compared to the other two measures. Accordingly, three Poisson regression models were implemented to model the relationship between the COVID-19 cases in each province and the corresponding exposure risk estimated using the three centrality measures. The results showed that the coefficients of the models estimated using the maximum likelihood method are statistically significant (p-value <0.05). This means that the exposure risk does in fact influence the number of COVID-19 cases. Since the sign of the coefficients of the models is positive, we conclude that the number of COVID-19 cases in each province increases with an increase in the spatial exposure risk. The analysis was also conducted at district level, i.e., in Greater Maputo Area (GMA), which is located in the southern part of Mozambique and consists of all Maputo city districts (except Kanyaka), Matola city, Matola-Rio, Boane, and Marracuene districts. However, due to the unavailability of daily COVID-19 cases at district level, the evaluation was done by comparing the daily exposure risk estimated using the three centrality measures and the distribution of different types of points of interest, namely commercial, education, financial, government, healthcare, public, sport, and transport. The results revealed a good Spearman’s rank correlation between education, financial, and transport related points of interest and the three centrality measures. Government related points of interest presented the lowest correlation results compared to the three centrality measures. The remainder of points of interest showed medium-low to medium-high Spearman’s correlation coefficient compared to the three centrality measures. Therefore, anonymized CDR in conjunction with weighted PageRank algorithm can help decision-makers estimate the exposure risk in case of an outbreak and hence reduce the impact of a disease on human lives by imposing several informed interventions to contain and delay its spread. 

Abstract [sv]

Naturkatastrofer leder till förödande förluster i människoliv, miljötillgångar ochpersonlig, regional och nationell ekonomi. Tillgängligheten till olika data, såsomsatellitbilder, GPS-spår (Global Positioning System), detaljerade register av mobilsamtal (CDR), inlägg på sociala medier etc., i kombination med framsteg inomdataanalystekniker (t.ex. datautvinning och stordataberarbetning, maskininlärningoch djupinlärning) kan underlätta utvinningen av geospatial information som är avgörande för snabb och effektiv katastrofinsats. Utvecklingen av katastrofinsatssystem kräver dock vanligtvis integration av data från olika källor (strömmandedatakällor och data i vila) med olika egenskaper och typer, vilka följaktligen harolika bearbetningsbehov. Att bestämma vilket bearbetningsramverk som ska användas för en specifik datatyp för att utföra en given uppgift är vanligtvis en utmaning för forskare från katastrofhanteringsområdet. Medan många uppgifter kanutföras med befolknings- och rörelsedata, är en nyckeluppgift, och utan tvekan denviktigaste uppgiften för katastrofhantering, att analysera befolkningens förflyttningunder och efter en katastrof. I denna avhandling inhämtades därför kunskapen ochramverket genom en litteraturstudie. Dess resultat användes för att välja verktygoch bearbetningsstrategier för att utföra en analys av befolkningsförflyttning efter en katastrof. Detta är ett användningsfall av ramverket samt en illustration avvärdet av och utmaningarna i (t.ex. luckor i data på grund av strömavbrott) attanvända CDR-dataanalys för att stödja katastrofhantering.Mängden av förflyttad befolkning härleddes genom att analysera variationenvid hemmobilmasten för varje anonymiserad mobiltelefonabonnent före och efteren katastrof med hjälp av CDR-data. Metodens effektivitet utvärderas med hjälpav fjärranalysbaserad skadebedömning av byggnader och en Displacement Tracking Matrix (DTM) från individers enkätsvar samlat i skyddsrum efter en allvarlig cyklon i Beira stad, centrala Moçambique, i mars 2019. Resultaten visar enuppmuntrande korrelationskoefficient (över 70 %) mellan antalet ankomster i varjegrannskap uppskattad med hjälp av CDR-data och från DTM. Utöver detta härleder CDR-baserad analys den rumsliga fördelningen av fördrivna befolkningar medhög täckning av människor, dvs. inklusive inte bara personer i skyddsrum utan allasom använde en mobiltelefon före och efter katastrofen. Resultaten tyder dessutompå att om CDR-data är tillgängliga efter en katastrof kan befolkningsförflyttningaruppskattas. Denna information kan användas för insatser, till exempel för att bidra till att minska vattenburna sjukdomar (t.ex. diarrésjukdomar) och sjukdomar isamband med trängsel (t.ex. akuta luftvägsinfektioner) i skyddsrum och värdsamhällen.Även om covid-19 inte är en sjukdom efter en katastrof, är det en akut luftvägssjukdom som kan vara allvarlig. Genom att anta att dess egenskaper kan liknaen akut luftvägsinfektion efter en katastrof testades en djupinlärningsmetod för attförutsäga spridningen av covid-19. Den testade djupinlärningsmetoden består avflerskikts BiLSTM. För att träna modellen användes mobilitetsdata från Google,viisamt temperatur och relativ luftfuktighet för att förutsäga dagliga covid-19-falli låginkomstländer. Prestandan för den föreslagna flerskikts BiLSTM utvärderasgenom att jämföra dess RMSE med den från flerskikts LSTM (med samma inställningar som BiLSTM) i fyra utvecklingsländer, nämligen Moçambique, Rwanda, Nepal och Myanmar. Den föreslagna flerskikts BiLSTM överträffade flerskikts LSTMi alla fyra länderna. Den föreslagna flerskikts BiLSTM utvärderades också genomatt jämföra dess root mean squared error (RMSE) med flerskikts LSTM-modeller,ARIMA- och staplade LSTM-baserade modeller i åtta länder, nämligen Italien,Turkiet, Australien, Brasilien, Kanada, Egypten, Japan och Storbritannien. Slutligen utvärderades den föreslagna flerskikts BiLSTM-modellen på stadsnivå genomatt jämföra dess genomsnittliga relativa fel (ARE) med de andra fyra modellerna,nämligen den LSTM-baserade modellen med hänsyn till flerskiktsarkitektur, Google Cloud Forecasting, den LSTM-baserade modellen med enbart mobilitetsdata,och den LSTM-baserade modellen med mobilitet, temperatur och relativ luftfuktighetsdata för sju perioder (på 28 dagar vardera) i sex tätbefolkade regioner iJapan, nämligen Tokyo, Aichi, Osaka, Hyogo, Kyoto och Fukuoka. Den föreslagnaflerskikts BiLSTM-modellen överträffade flerskikts LSTM-modellen och andra tidigare modeller med upp till 1,6 respektive 0,6 gånger i termer av RMSE respektiveARE. Därför möjliggör den föreslagna modellen en mer exakt prognostisering avcovid-19-fall. Detta kan stödja regeringar och hälsovårdsmyndigheter i deras beslut,främst i utvecklingsländer med begränsade resurser.Förutom att förstå sjukdomsspridningsdynamiken är snabb implementering avkontrollåtgärder avgörande vid ett utbrott efter en katastrof. Detta är avgörandeför att stoppa spridningen av sjukdomen. Dess implementering måste dock baseraspå välgrundade beslut. För att stödja beslutsfattarna testades därför en datadriven metod för att uppskatta den spatiotemporala exponeringsrisken för platsermed hjälp av mobiltelefondata. Metoden använde anonymiserade CDR:er från enav de största mobilnätoperatörerna i Moçambique för att uppskatta de dagligaursprungs-destinationsmatriserna (OD-matriser). De dagliga OD-matriserna uppskattas på provinsnivå eftersom tillgängliga dagliga covid-19-fall (valideringsdata)är på den nivån. Covid-19 användes som en proxy för en sjukdom efter en katastrofpå grund av bristen på dagliga verkliga data om en sjukdom efter en naturkatastrofi Moçambique. De uppskattade dagliga OD-matriserna används sedan för att konstruera de dagliga riktade viktade nätverk, där noderna representerar provinser, ochkanterna, människorna som flödar mellan varje par av provinser. Sedan användestre centralitetsmått, nämligen viktad ingradscentralitet, förbättrad ingradscentralitet och viktad PageRank för att uppskatta den dagliga exponeringsrisken för varjeprovins. Resultaten utvärderades genom att beräkna Spearmans-rang korrelationmellan riskpoäng uppskattad med hjälp av de dagliga rapporterade covid-19-fallenoch exponeringsrisken uppskattad med hjälp av de tre måtten. Jämförelseresultaten visade att den övergripande viktade PageRank-algoritmen är det bästa måttetför att uppskatta exponeringsrisken jämfört med de andra två måtten. I enlighet med detta implementerades tre Poisson-regressionsmodeller för att modellerasambandet mellan covid-19-fallen i varje provins och motsvarande exponeringsriskviiiuppskattad med hjälp av de tre centralitetsmåtten. Resultaten visade att koefficienterna för modellerna uppskattade med maximum likelihood-metoden är statistisktsignifikanta (p-värde <0,05). Detta innebär att exponeringsrisken faktiskt påverkarantalet covid-19-fall. Eftersom tecknet på koefficienterna för modellerna är positivt, drar vi slutsatsen att antalet covid-19-fall i varje provins ökar med en ökningav den rumsliga exponeringsrisken. Analysen genomfördes också på distriktsnivå,dvs. i Greater Maputo Area (GMA), som ligger i södra delen av Moçambique ochbestår av alla Maputo’s stadsdistrikt (förutom Kanyaka), Matola stad, MatolaRio, Boane och Marracuene distrikt. På grund av bristen på dagliga covid-19-fallpå distriktsnivå gjordes dock utvärderingen genom att jämföra den dagliga exponeringsrisken uppskattad med hjälp av de tre centralitetsmåtten och fördelningenav olika typer av intressanta platser, nämligen kommersiella, utbildnings-, finans-,regerings-, hälsovårds-, offentliga, sport- och transportrelaterade platser. Resultatenvisade god Spearmans-rang korrelation mellan utbildnings-, finans- och transportrelaterade intressanta platser och de tre centralitetsmåtten. Regeringsrelateradeintressanta platser presenterade de lägsta korrelationsresultaten jämfört med de trecentralitetsmåtten. Resterande intressanta platser visade medel-låg till medelhögSpearmans korrelationskoefficient jämfört med de tre centralitetsmåtten. Därförkan anonymiserade CDR:er i kombination med viktad PageRank-algoritm hjälpabeslutsfattare att uppskatta exponeringsrisken vid ett utbrott och därmed minska effekterna av en sjukdom på människoliv genom att införa flera välgrundadeinsatser för att begränsa och fördröja dess spridning. 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. , p. 102
Series
TRITA-ABE-DLT ; 2428
Keywords [en]
Processing frameworks, mobile Call Detail Records (CDR), displaced population, disaster response, COVID-19, deep learning, BiLSTM model, centrality measures
Keywords [sv]
Bearbetningsramverk, detaljerade register av mobilsamtal (CDR), förflyttad befolkning, katastrofinsats, COVID-19, djupinlärning, BiLSTM-modellen, centralitetsmått
National Category
Computer Sciences Other Natural Sciences Public Health, Global Health, Social Medicine and Epidemiology Physical Geography
Research subject
Geodesy and Geoinformatics, Geoinformatics
Identifiers
URN: urn:nbn:se:kth:diva-355379ISBN: 978-91-8106-111-6 (print)OAI: oai:DiVA.org:kth-355379DiVA, id: diva2:1909001
Public defence
2024-11-27, Kollegiesalen, Brinellvägen 6, Stockholm, https://kth-se.zoom.us/j/67011152784, Stockholm, 13:00 (English)
Opponent
Supervisors
Available from: 2024-11-04 Created: 2024-10-29 Last updated: 2024-11-05Bibliographically approved
List of papers
1. Review of Big Data and Processing Frameworks for Disaster Response Applications
Open this publication in new window or tab >>Review of Big Data and Processing Frameworks for Disaster Response Applications
2019 (English)In: ISPRS International Journal of Geo-Information, ISSN 2220-9964, Vol. 8, no 9, article id 387Article in journal (Refereed) Published
Abstract [en]

Natural hazards result in devastating losses in human life, environmental assets and personal, and regional and national economies. The availability of different big data such as satellite imageries, Global Positioning System (GPS) traces, mobile Call Detail Records (CDRs), social media posts, etc., in conjunction with advances in data analytic techniques (e.g., data mining and big data processing, machine learning and deep learning) can facilitate the extraction of geospatial information that is critical for rapid and effective disaster response. However, disaster response systems development usually requires the integration of data from different sources (streaming data sources and data sources at rest) with different characteristics and types, which consequently have different processing needs. Deciding which processing framework to use for a specific big data to perform a given task is usually a challenge for researchers from the disaster management field. Therefore, this paper contributes in four aspects. Firstly, potential big data sources are described and characterized. Secondly, the big data processing frameworks are characterized and grouped based on the sources of data they handle. Then, a short description of each big data processing framework is provided and a comparison of processing frameworks in each group is carried out considering the main aspects such as computing cluster architecture, data flow, data processing model, fault-tolerance, scalability, latency, back-pressure mechanism, programming languages, and support for machine learning libraries, which are related to specific processing needs. Finally, a link between big data and processing frameworks is established, based on the processing provisioning for essential tasks in the response phase of disaster management.

Place, publisher, year, edition, pages
MDPI AG, 2019
Keywords
big data, processing frameworks, disaster response
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-257858 (URN)10.3390/ijgi8090387 (DOI)000488826400047 ()2-s2.0-85072551156 (Scopus ID)
Note

QC 20190906. QC 20191028

Available from: 2019-09-05 Created: 2019-09-05 Last updated: 2024-11-07Bibliographically approved
2. Spatial Distribution of Displaced Population Estimated Using Mobile Phone Data to Support Disaster Response Activities
Open this publication in new window or tab >>Spatial Distribution of Displaced Population Estimated Using Mobile Phone Data to Support Disaster Response Activities
2021 (English)In: ISPRS International Journal of Geo-Information, ISSN 2220-9964, Vol. 10, no 6, p. 421-, article id 421Article in journal (Refereed) Published
Abstract [en]

Under normal circumstances, people's homes and work locations are given by their addresses, and this information is used to create a disaster management plan in which there are instructions to individuals on how to evacuate. However, when a disaster strikes, some shelters are destroyed, or in some cases, distance from affected areas to the closest shelter is not reasonable, or people have no possibility to act rationally as a natural response to physical danger, and hence, the evacuation plan is not followed. In each of these situations, people tend to find alternative places to stay, and the evacuees in shelters do not represent the total number of the displaced population. Knowing the spatial distribution of total displaced people (including people in shelters and other places) is very important for the success of the response activities which, among other measures, aims to provide for the basic humanitarian needs of affected people. Traditional methods of people displacement estimation are based on population surveys in the shelters. However, conducting a survey is infeasible to perform at scale and provides low coverage, i.e., can only cover the numbers for the population that are at the shelters, and the information cannot be delivered in a timely fashion. Therefore, in this research, anonymized mobile Call Detail Records (CDRs) are proposed as a source of information to infer the spatial distribution of the displaced population by analyzing the variation of home cell-tower for each anonymized mobile phone subscriber before and after a disaster. The effectiveness of the proposed method is evaluated using remote-sensing-based building damage assessment data and Displacement Tracking Matrix (DTM) from an individual's questionnaire survey conducted after a severe cyclone in Beira city, central Mozambique, in March 2019. The results show an encouraging correlation coefficient (over 70%) between the number of arrivals in each neighborhood estimated using CDRs and from DTM. In addition to this, CDRs derive spatial distribution of displaced populations with high coverage of people, i.e., including not only people in the shelter but everyone who used a mobile phone before and after the disaster. Moreover, results suggest that if CDRs data are available right after a disaster, population displacement can be estimated, and this information can be used for response activities and hence contribute to reducing waterborne diseases (e.g., diarrheal disease) and diseases associated with crowding (e.g., acute respiratory infections) in shelters and host communities.

Place, publisher, year, edition, pages
MDPI AG, 2021
Keywords
disaster response, mobile Call Detail Records (CDRs), displaced population
National Category
Human Geography
Identifiers
urn:nbn:se:kth:diva-299049 (URN)10.3390/ijgi10060421 (DOI)000666988800001 ()2-s2.0-85109906266 (Scopus ID)
Note

QC 20210730

Available from: 2021-07-30 Created: 2021-07-30 Last updated: 2024-11-07Bibliographically approved
3. Deep learning-based approach for COVID-19 spread prediction
Open this publication in new window or tab >>Deep learning-based approach for COVID-19 spread prediction
2024 (English)In: International Journal of Data Science and Analytics, ISSN 2364-415X, p. 1-17Article in journal (Refereed) Epub ahead of print
Abstract [en]

Spread prediction models are vital tools to help health authorities and governments fight against infectious diseases such as COVID-19. The availability of historical daily COVID-19 cases, in conjunction with other datasets such as temperature and humidity (which are believed to play a key role in the spread of the disease), has opened a window for researchers to investigate the potential of different techniques to model and thereby expand our understanding of the factors (e.g., interaction or exposure resulting from mobility) that govern the underlying dynamics of the spread. Traditionally, infectious diseases are modeled using compartmental models such as the SIR model. However, this model shortcoming is that it does not account for mobility, and the resulting mixing or interactions, which we conjecture are a key factor in the dynamics of the spread. Statistical analysis and deep learning-based approaches such as autoregressive integrated moving average (ARIMA), gated recurrent units, variational autoencoder, long short-term memory (LSTM), convolution LSTM, stacked LSTM, and bidirectional LSTM have been tested with COVID-19 historical data to predict the disease spread mainly in medium- and high-income countries with good COVID-19 testing capabilities. However, few studies have focused on low-income countries with low access to COVID-19 testing and, hence, highly biased historical datasets. In addition to this, the arguable best model (BiLSTM) has not been tested with an arguably good set of features (people mobility data, temperature, and relative humidity). Therefore, in thisstudy, the multi-layer BiLSTM model is tested with mobility trend data from Google, temperature, and relative humidity to predict daily COVID-19 cases in low-income countries. The performance of the proposed multi-layer BiLSTM is evaluated by comparing its RMSE with the one from multi-layer LSTM (with the same settings as BiLSTM) in four developing countries namely Mozambique, Rwanda, Nepal, and Myanmar. The proposed multi-layer BiLSTM outperformed the multilayer LSTM in all four countries. The proposed multi-layer BiLSTM was also evaluated by comparing its root mean-squared error (RMSE) with multi-layer LSTM models, ARIMA- and stacked LSTM-based models in eight countries, namely Italy, Turkey, Australia, Brazil, Canada, Egypt, Japan, and the UK. Finally, the proposed multi-layer BiLSTM model was evaluated at the city level by comparing its average relative error with the other four models, namely the LSTM-based model considering multi-layer architecture, Google Cloud Forecasting, the LSTM-based model with mobility data only, and the LSTM-based model with mobility, temperature, and relative humidity data for 7 periods (of 28 days each) in six highly populated regions in Japan, namely Tokyo, Aichi, Osaka, Hyogo, Kyoto, and Fukuoka. The proposed multi-layer BiLSTM model outperformed the multi-layer LSTM model and other previous models by up to 1.6 and 0.6 times in terms of RMSE and ARE, respectively.Therefore, the proposed model enables more accurate forecasting of COVID-19 cases and can support governments and health authorities in their decisions, mainly in developing countries with limited resources.

Place, publisher, year, edition, pages
Springer Nature, 2024
National Category
Computer Sciences
Research subject
Geodesy and Geoinformatics, Geoinformatics
Identifiers
urn:nbn:se:kth:diva-355376 (URN)10.1007/s41060-024-00558-1 (DOI)001242732100002 ()2-s2.0-85195551094 (Scopus ID)
Note

QC 20241030

Available from: 2024-10-29 Created: 2024-10-29 Last updated: 2024-11-07Bibliographically approved
4. Spatio-temporal Exposure Risk Estimation for COVID-19 Using Social Network Analysis and Mobile Phone Data
Open this publication in new window or tab >>Spatio-temporal Exposure Risk Estimation for COVID-19 Using Social Network Analysis and Mobile Phone Data
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Spatio-temporal exposure risk to an infectious disease is vital information for helping decision-makers fighting against an outbreak or spread, e.g., COVID-19. In order to estimate the spatio-temporal exposure risk, mobility data in conjunction with social network analysis have been used. However, existing studies have been using data with a narrow range of covered users to estimate the flow between locations, which in turn is used to estimate the exposure risk. In addition to this, none of the existing studies have explicitly used the risk score estimated using real-world data to validate the exposure risk from social network analysis. Moreover, there are no studies that have investigated the relationship between the exposure risk estimated using mobility data and the distribution of Points of Interest (PoI). Therefore, in this study, over 240 million anonymized Call Detail Records (CDRs) from one of the biggest Mobile Network Operator in Mozambique are used to estimate the daily origin-destination (OD) matrices. The daily OD matrices are estimated at province level since the available COVID-19 validation data is at that level. The estimated daily OD matrices are then used to construct the daily directed-weighted networks, in which the nodes represent provinces and the edges indicate the people flowing between each pair of provinces. Then, three centrality measures, namely weighted in-degree centrality, improved in-degree centrality, and weighted PageRank, were used to estimate the spatio-temporal exposure risk at province level. The results were evaluated by computing the Spearman's rank correlation between risk score estimated using the daily COVID-19 reported cases and the exposure risk estimated using the three measures. The comparison results revealed that, in general, the weighted PageRank algorithm is the best measure at estimating exposure risk compared to the other two measures. Accordingly, three Poisson regression models were implemented to model the relationship between the COVID-19 cases in each province and the corresponding exposure risk estimated using the three centrality measures. The results showed that the coefficients of the models estimated using the maximum likelihood method are statistically significant (p-value \textless 0.05). This means that the exposure risk does in fact influence the number of COVID-19 cases. Since the sign of the coefficients of the models is positive, we conclude that the number of COVID-19 cases in each province increases with increasing of the spatial exposure risk. The analysis was also  conducted at district level, i.e., in Greater Maputo Area, which is located in south part of Mozambique and consists of all Maputo city districts (except Kanyaka), Matola city, Matola-Rio, Boane, and Marracuene districts. However, due to the unavailability of daily COVID-19 cases at district level, the validation was done by comparing the daily exposure risk estimated using the three centrality measures and the distribution of different types of PoI, namely commercial, education, financial, government, healthcare, public, sport, and transport. The results revealed good Spearman's rank correlation between education, financial, and transport related PoI and the three centrality measures. Government related PoI presented the lowest correlation results compared to the three centrality measures. The remainder of PoI showed medium-low to medium-high correlation coefficient compared to the three centrality measures. In order to capture the differences, average Spearnman's rank correlation between the centrality measures and PoI was computed. Weighted pagerank outperformed the other two centrality measures in most of PoI classes, namely education, healthcare, public, sport, and transport. Weighted pagerank was only outperformed by improved in-degree centrality measure in one PoI class (commercial). Although with small differences, overall weighted pagerank revealed to be a good algorithm to estimate the spatio-temporal exposure risk for COVID-19. Therefore, anonymized CDRs in conjunction with weighted pagerank algorithm can help decision-makers estimate the spatio-temporal exposure risk in case of an outbreak and hence reduce the impact of a disease on human lives by imposing several informed interventions to contain and delay its spread.

Keywords
Anonymized Call Detail Records (CDRs); COVID-19 data; Points of Interest (PoI); weighted in-degree centrality; improved in-degree centrality; weighted pagerank; Poisson regression
National Category
Other Natural Sciences Computer Sciences
Identifiers
urn:nbn:se:kth:diva-355378 (URN)
Note

QC 20241030

Available from: 2024-10-29 Created: 2024-10-29 Last updated: 2024-10-30Bibliographically approved

Open Access in DiVA

fulltext(24652 kB)67 downloads
File information
File name FULLTEXT01.pdfFile size 24652 kBChecksum SHA-512
ab2644d6d8ad2ffa62cfa2a85a00c6584bb37366c02e1e4f0112327b3f2626181697a2393df842cf9d9205088c89f2ed230c2340557d8614a5391035cc24b381
Type fulltextMimetype application/pdf

Authority records

Cumbane, Silvino Pedro

Search in DiVA

By author/editor
Cumbane, Silvino Pedro
By organisation
Geoinformatics
Computer SciencesOther Natural SciencesPublic Health, Global Health, Social Medicine and EpidemiologyPhysical Geography

Search outside of DiVA

GoogleGoogle Scholar
Total: 67 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 795 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf