Web Crawler

Example run of the webcrawler started at google.de

After a long time, it was time to develop something new. This time i wanted to go towards web and bots. For this i designed this tool, which serves a “sort of useful” purpose, by surfing the web automatically und looking at its text.

For password cracking and something like speech analysis the current vocabulary used by peoply might be of high interest. Thus, this tools starts crawling the web from a starting side on and collects all the words and hyperlinks on the site. It follows the hyperlinks until it reaches the desired amount of websites or in infinite mode, until the user requests a quit.

Once the application is shut down, it sorts the collected list of words and discards every word that occured less than 3 times. Afterwards this list is exported to a textfile.

With this tool some kind of speech analysis like the “Duden Jugendwort des Jahres” (Word of the year used by teenagers, mostly funny) could be based on real occurence on websites.

Be the first to comment

Leave a Reply

Your email address will not be published.