Most webmasters have some basic knowledge of how crawling and indexing work and affect their website’s ranking on SERPs (Search Engine Result Page).
To simplify down to the basics, crawling and indexing is Google’s way of discovering relevant web pages and then ranking them on the basis of its close to 200 ranking factors. Think of it as a celebrity manager discovering a singer’s talent and then introducing them to record companies.
What is Google Spider?
Google has their own crawling bot that is sent out to crawl billions of websites daily.
And since this bot simultaneously crawls a number of websites like a spider’s many legs, it is also called spider. The basic SEO requirement that you need to remember is that unless your website is crawler friendly, it won’t be indexed by Google.
So how does Google spider work?
Spider friendly sites are the ones that have relevant and quality links on them. Google bot only crawls links, don’t expect the bot to put in login details, if your page cannot be accessed by a link the bot would not see it let alone crawl it. There’s no fixed time for the spider to crawl your website, but it does not do it in real time. Understand that the entire Google algorithm and how things work is extremely private information available only to Google’s team.
Whatever information is available is mostly culled and inferred from Google’s own words and through search and analysis. If we had to put a time frame on it, at best it could be said that Google spider crawls your site every few seconds.
What happens after Google crawls your site is that it creates a Cache. You’ve probably heard of the term and must have definitely cleaned Cache too.
Anyway, not getting too technical about it, Cache is Google’s way of taking screenshots of your site, which it keeps on its servers and refers to whenever someone runs a search for it. Understand that your website’s ranking is also based on the cache content rather than real-time content. So you can’t expect rankings to go up with one change.
So for a new blog post to affect your ranking would take some time.
You’d notice that Googlebot downloading only one copy of each of your web page at a given time. If you see repetition in web pages, it’s because of a network error pertaining to which Google had to restart.
Is my website crawlable?
There are a lot of ways to test if Google bots are crawling your website or not. If you find that your website isn’t crawlable, refer to the Crawl Errors page that you would find in the Search console.
You’d get access to a Crawl Errors report with detailed information of what’s not working in your favor.
Maybe you are running an AJAX application that has a history of being difficult to crawl and hence index.
For AJAX applications you can resort to manual maintenance and updating as required.
From Google’s Webmaster’s blog:
If you’re running an AJAX application with content that you’d like to appear in search results, we have a new process that, when implemented, can help Google (and potentially other search engines) crawl and index your content. Historically, AJAX applications have been difficult for search engines to process because AJAX content is produced dynamically by the browser and thus not visible to crawlers.
There are other ways too to tailor AJAX applications to make it crawlable and hence indexable. You can browse through Google’s blog on the subject to know more.
Can you block your website from getting crawled?
You cannot keep your entire website uncrawlable because when someone links from your website to another, your URL would appear on their referrer tag. And spider crawls links and would be able to crawl your website through it.
Using robots.txt is one way to block Google bots from crawling certain directories.
You can get a lot of information on your Webmaster’s Tool about how Google views your website and why your website isn’t performing optimally. A detailed analysis of the tool would help you gauge whether it is because of a lack of crawling or duplicate content or site hierarchy. We will discuss this and many other Google spider facts in the next part.