Description
Building a dynamic human population density model requires the estimation of real-time population distribution from multiple data sources with advanced dasymetric mapping methods. This research utilized the Twitter Streaming Application Program Interfaces (APIs) to collect real-time geo-tagged tweets, and to estimate the spatial distribution patterns of tweets into human population density patterns through dasymetric mapping methods. This thesis research combines methods and techniques derived from the knowledge domains of social media and Big Data, spatial data quality, and dasymetric maps, and builds the dynamic population distribution model through geotagged tweets in an urban area (San Diego County). The first part of this thesis examined the spatial data accuracy of the geo-tagged Twitter data in San Diego County by removing possible noises. A semi-manual Twitter content verification procedure was applied to separate the tweets created by human beings from non-human beings (bots) because a non-human beings tweet could not represent human population. This study also discovered and summarized the patterns of tweet noises for data cleaning procedure. he second part of the thesis examines weekdays’ and weekends’ unique Twitter user density maps using LandScan-grid and census block polygons with different time intervals. Geo-tagged Twitter data was integrated upon fishnet grids and census block polygons of San Diego County. One hour temporal resolution was selected to represent the dynamic changes of unique Twitter user frequency. The unique user was defined to ensure the weight of every Twitter user is the same. The third part of this thesis estimated actual dynamic human population density through a Spatio-Temporal Modeling for Estimating Population (STEP) framework by transforming the numbers of unique Twitter users in each census block or grid into estimated population densities using spatial and temporal variation factors. The STEP framework was developed to generate a more refined and balanced population density maps comparing to traditional census population maps. The main goal of this research examined the feasibility and reliability of using geo-tagged tweets to represent the spatio-temporal pattern of human population distribution and density. The proposed STEP data analysis method is inexpensive (using free public Twitter data) comparing to traditional census data or expensive cell phone records. This study framework can be applied in other urban area in the U.S. or other countries if there are a significant number of Twitter users, comprehensive land use data, and census data.