Indexing
Various leading search engines, namely Google and Yahoo!, make use of crawlers and find pages for their algorithmic search results. Generally if we link pages to any search engine, those paged need not be submitted again to another as they are found automatically. Search engines, particularly Yahoo, guarantee crawling for either a set fee or cost per click while they maneuver a paid submission. This helps in mutual benefaction. These programs never guarantee any specific ranking but they usually assure inclusion of that in the database. The Yahoo Directory and the Open Directory Project, the two major directories both need manual submission as well as human editorial review. Whereas Google Webmaster Tools are offered by Google, in which an XML Sitemap feed can be created and submitted for free to ensure that all pages are found. The greatest advantage of this is, even the pages that aren't discoverable by automatically following links are found. Every single page is not indexed by the search engines. Basically the distance of pages from the root directory of a site stands a vivid factor in whether pages get crawled or not.
Crawling Prevention
Webmasters instruct spiders not to crawl certain files or directories through the standard robots.txt file in the root directory of the domain to avoid undesirable content in the search indexes. Also instead of instructing the spiders, a page can be excluded by using a meta-tag specific to robots. The general theorem is that the robots.txt file located in the root directory will be the first file to be crawled in when a search engine visits the site. The robots.txt file is then analyzed syntactically by assigning a constituent structure to and that will instruct the robot in crawling of the other pages associated. Pages usually prevented from being crawled include login specific pages such as shopping carts and user-specific content such as search results from internal searches. Sometimes, it may crawl into pages which are not wished to be crawled. For this purpose, a search engine crawler needs to keep a cached copy of that file. Internal search results pages shall be excluded from indexing as they are considered search spam.
Escalating Importance
A webpage can be made to show up in the search results by employing a numerous other methods. These include:
Making cross links between pages of the same website. Making more links to the main pages of the website. This is carried out to increase the Page rank used by the search engines. Using links through other websites, including comment spamming and link farming cam prove effective. One of the best methods is to write the contents using frequently searched keywords and phrases, which will obviously make it more appropriate to the search queries. URL normalization of web pages accessible via multiple urls, using the "canonical" meta-tag and keyword stuffing are other useful methods.