Marko Balažic (2015) Extraction and processing data from the web. EngD thesis.
Abstract
Web users are searching information on the internet on daily basis. Easiest way to acquire data from the internet is by using search engines, that provide us with many different results. Despite the fact these results are being well chosen for us, there is still a great deal of filtering involved when looking for vital information. In my thesis I have also dealt with solving this problem myself. I wrote a web application in which I merged the information from several domains. The application’s dashboard enables the user a complete overview of snippets from various websites. The user can add and edit the snippets themselves. One can set up alarms through which the system informs the user of changes that have occurred. I implemented the system with the help of web crawlers that scrape data and saves it to a database.
Actions (login required)