Luka Šarc (2014) Design and Development of Web Crawler for Dynamic Web Application Analysis. EngD thesis.
Abstract
Dynamic web applications represent the largest share in web applications ecosystem. They integrate with each other in a web browser. Users are not aware of connections with third-party service providers and may be unknowingly revealing their browsing data. In this thesis, a web crawler for dynamic web application analysis was designed and implemented to address this problem. Traditional crawlers are not sufficient for described area, since their interest is in semantics of web applications. Our implementation of crawler executes web application in a sandbox within virtual web browser. This allows crawler to track resources needed for the execution and detect integration of web applications. We conducted a crawl through 100,000 web applications. The results revealed high level of web application integration. In average, a web application integrates with six third-party providers. The results confirmed that the proposed solution provides effective analysis for described problem domain.
Actions (login required)