We've Moved!
Visit SDSU’s new digital collections website at https://digitalcollections.sdsu.edu
Description
When any person in the world buys or rents a home, that person will definitely think about the location. Surely there are important criteria for a home buyer to choose a location, like good schools, job proximity, and many more but a very important factor is safety. Moreover in this high-tech world, everyone looks for homes online in different websites, but none of these show the most important criteria i.e. “crime related information” for that particular community. These websites show lots of stuff like square feet area, HOA and even street view, but none show safety level. This thesis is motivated by this problem and gives a solution for it. We know massive data sets have become more than commonplace today and with terabytes of data being stored in systems there is now the need to be able to make sense of it from an analytical perspective. Traditional storage and analytical tools and techniques like RDBMS fail to provide the speed and parallel processing that is required to store and effectively analyze these large sets of data. One such use case is the crime data originating from local PDs. Statisticians estimate that a crime is committed every 5 seconds. Needless to say, such statistics are going to result in a large volume of data in terms of these crimes being reported and to make sense of it we would need some sort of framework to analyze it efficiently. That is where Hadoop, an open source programming framework for distributed computing with massive data sets using a cluster of networked computers, comes into the picture. Map Reduce is a Java based computational framework provided by Apache Hadoop and/or Cloudera to harness the powers of distributed computing to analyze the files stored in the Hadoop file system. This thesis explores the use of MapReduce to analyze Crime data and infer meaningful information from it. This information is then shown in the form of a GIS application which gives a Visual representation, rather than just a simple list of outputs.