Description
Tweets posted on the Twitter social networking site consist of a rich dataset for semantic and temporal analysis. This study collected over 14 million tweets using Twitters Streaming Application Programming Interface (API) from May 5, 2012 to June 1, 2012. The dataset was filtered to 4,404,731 tweets. Three subsets from the filtered dataset were used as case studies based on user profile location: 56,713 tweets for San Diego County, 70,489 tweets for San Francisco County and 78,724 tweets for Dallas County. The GPS-enabled tweets for the filtered dataset and each case study location were geo-coded using the GPS coordinate information. Analysis focused on using both GPS coordinates from GPS-enabled tweets and the user profile location to analyze the spatial patterns, keyword semantics and temporal patterns of the tweet data. The first goal of this study was to discover semantic spatial relationships with GPS-enabled tweets that were located within versus outside the county entered in the Twitter user profile location. Tweet distributions were visualized as kernel density maps generated using the Spatial Analyst extension in ArcGIS. The semantic analysis was conducted by running a Vocab Script developed by Dr. Mark Gawron, a Linguistics Professor at San Diego State University, to generate the top 300 most frequently occurring keywords and using those keyword/frequencies to generate word clouds using the R programming language. The second goal of this study was to analyze the temporal patterns of GPS-enabled tweet counts within and outside the county location stated in the Twitter user profile location. The temporal analysis was conducted by creating daily, weekly and weekday versus weekend charts using the tweet datasets for the United States and the three county case study locations. A third goal of this study was to compare different meanings between GPS-enabled tweets and their associated user profile location. Different meanings regarding Spatial Accuracy, Place vs Space and Privacy were compared between the GPS-enabled location and the associated user profile location and then summarized using a cross-reference table. Semantic analysis showed that GPS-enabled tweets within the case study boundaries included the case study location name as a common keyword in Twitter conversation. Keywords associated with travel (such as "International Airport") commonly occurred with GPS-enabled Twitter conversations located outside the case study location. Semantic spatial patterns of GPSenabled tweets are related to whether the user is within or outside the location listed in their user profile. Average weekend tweet counts were higher than weekday tweet counts in the USA, San Diego and San Francisco datasets with the exception of Dallas. Tweets with GPS-enabled location when compared to the associated user profile location were generally higher in Spatial Accuracy, reflects the users actual spatial location and risks potentially risks user privacy. In conclusion, the conversations of Twitter users were influenced by their location.