Google Crawling Archives

Crawl budget is a combination of crawl rate limit and crawl demand. The maximum rate of fetching of a site is given the term “crawl rate”. The parallel connections used by GoogleBot to crawl the site and the time spent between fetches constitute the crawl rate.

To mention on factors influencing the crawl rate,

Faster sites with quicker navigation have high crawl rate limits and the bot uses more connections to crawl. Slower sites have low rate limits and the bot crawls less.
In case of google making too many crawls to your site using up the resources, you can reduce the crawl rate.

Crawl demand is dependent on demand from indexing which contributes to activity from GoogleBot. Low indexing demand results in lesser bot activity. To talk on factors influencing it,

Popular sites on the internet have higher crawl demands
Sites with low quality or obsolete or stale content have low or nil crawl demand.
Site moves result in higher crawl demand due to the content being re-indexed in new urls.

To define the crawl budget, it is what GoogleBot can and wants to crawl.

To mention on factors influencing the crawl budget,

Faceted navigations resulting from filters aren’t search friendly because they cause multiple urls with duplicate content. Indexing gets lost between duplicate URL versions.
- Don’t use non standard URL encoding for parameters,
- Don’t use directories to list values that don’t contribute to page content. Use parameters.
- Don’t use user generated values in URL in view of becoming crawlable and indexable. It doesn’t give useful search results
- Don’t append URL parameters with no logic
- Don’t offer filters when there are no results to follow them
- Determine the necessary URL parameters to crawl all content pages individually
- Determine valuable urls making sense to searchers
- Include logic in URL parameter displays
- Improve indexing of content by various ways described in webmasters
- Implement values, that don’t change page content, as key value pairs
- Maintain a consistent parameter order in urls
Multiple urls and search unfriendly urls are known to cause lower crawl rate limits. Avoid them.
- Algorithms combine duplicate urls into a cluster and use a single URL to represent this cluster in search results.
- Use 301 redirects to track visitor details
- Use cookies where needed
- Check ways to remove duplicates here.
Don’t have soft or crypto 404s but design proper 404 pages. Check soft 404s listed in webmaster tools, determine what’s the content in such urls and what redirects they do, configure http properly, and customize 404 to sound meaningful to users.
Don’t have infinite spaces in your site. Pages with no or very little content constitute the infinite spaces. Redirect them by making them nofollow urls to search bots. Check webmaster tools and messages from google for such alerts.
Don’t have hacked pages and pages with poor content.

Hope you have a good idea on what crawl budget is and how it impacts ranking. Note that crawl budget is not the only ranking signal for google but it’s one of the factors. If you require further assistance in this context or any other web services, ask us now.

Google Crawling

What is Crawl Budget?

Weekly Newsletter Subscription