Jump to navigation
Visual-Based Web Page Analysis
viii, 38 pages : illustrations (some colored).
This research investigates efforts to identify different content areas appearing on a webpage by comparing the visual features and the relative characteristics of each content area, called visual block in this study. The process is to use the Image Segmentation technique to extract and parse a webpage's visual features, as well as analyze it to identify the functionality of each content area based on its layout and position. To accomplish this, this study reviews several techniques that have been used in related fields and discusses the strengths and the weaknesses of these techniques. The main weakness for the past techniques is they rely heavily on HTML; in other words, they are language-dependent. This paper proposes a visual-based technique that focuses on using visual features rather than HTML; hence it is more language-independent. To determine the functionality of each visual block, the technique uses an algorithm to parse webpages into a tree structure and apply a rule of how humans determine the relationship between two objects on a 2D monitor. The goal of this research is to design an automated visual-based algorithm to exam each visual block showing on the webpage and apply human cognitive processes to decide the role of each block. For example, one might wish to identify the main content, the sub content, the navigation menu, and the advertisement. Chapter 1 describes the motivation, the issue, and possible solution to the problem. Chapter 2 reviews several different technologies that can be used to solve the problem and elucidates possible future research. Chapter 3 focuses on explaining how to prepare the test environment and techniques that have been used. Chapter 4 describes the result, what was accomplished, what was missing, and necessary further research. Chapter 5 concludes with the possibilities of this research and how future research might help accomplish the final goal of this research.
Includes bibliographical references (pages 32-33).
Master of Science (M.S.) San Diego State University, 2014
© 2015 SDSU Library & Information Access. All Rights Reserved.