Have you ever thought how does the search engines like Google or Bing get so much of information on every possible topic or word you can think of? And from where?
A search engine crawler (also known as spider, robot, or simply a bot) helps them do so by searching new information on the internet. Google’s web crawler is known as ‘Google Bot’.
Web Crawlers apparently got this unique name because like spiders, they actually crawl (in this case, the internet) and collect documents to build a searchable index for the search engines. They crawl through a website-a page at a time, following the links to other pages on the site until all pages have been read.
How google return millions of search results in less than a second?
Different search engines run thousands of instances of their web crawling programs at the same time, on multiple servers. So as the Web spider crawls 24×7 from one website to another everything on web eventually get discovered & spidered by them. Remember that there are some rules that all crawlers are supposed to follow, and the search engine giants like Google & Bing do follow most part of these rules. They both are also working together on standards.
Easy Explanation- Remember Bollywood films of 80s & 90s where in every other film, you saw a police informer or a ‘tip-off guy’ who used to give information to the police. How he used to get all the information? For ease of explanation, just imagine police department as google, these informers as crawlers and all the gangs & its members as websites and its pages. Starting from individual gang members, these informers used to crawl through different gangs one by one until they got all the information. In an another example, you can think of Amitabh Bachchan starrer Hindi action film Don where his duplicate is forced by the police to pose as Don in order to help them get all the inside information of the gang.
Of course, unlike these real life informers, search engines do not craw in a cautious or surreptitious manner, so as not to be seen or heard. Rather, they have to follow some set standards. These examples were just to explain the process of crawling in a simple way.