Google lists Googlebot file limits for crawling

Google has updated two of its help documents to explain the limits of Googlebot when it crawls. Specifically, how much Googlebot can consume by filetype and format.

The limits. The limits, some of which were documented already and are not new, include:

15MB for web pages: Google wrote, “By default, Google’s crawlers and fetchers only crawl the first 15MB of a file.”
64MB for PDF files: Google wrote, “When crawling for Google Search, Googlebot crawls the first 2MB of a supported file type, and the first 64MB of a PDF file.”
2MB for supported files types: Google wrote, “When crawling for Google Search, Googlebot crawls the first 2MB of a supported file type, and the first 64MB of a PDF file.”

Note, these limits are pretty large and the vast majority of websites do not need to be concerned with these limits.

Full text. Here is what Google posted fully in its help documents:

“By default, Google’s crawlers and fetchers only crawl the first 15MB of a file. Any content beyond this limit is ignored. Individual projects may set different limits for their crawlers and fetchers, and also for different file types. For example, a Google crawler may set a larger file size limit for a PDF than for HTML.”
“When crawling for Google Search, Googlebot crawls the first 2MB of a supported file type, and the first 64MB of a PDF file. From a rendering perspective, each resource referenced in the HTML (such as CSS and JavaScript) is fetched separately, and each resource fetch is bound by the same file size limit that applies to other files (except PDF files). Once the cutoff limit is reached, Googlebot stops the fetch and only sends the already downloaded part of the file for indexing consideration. The file size limit is applied on the uncompressed data. Other Google crawlers, for example Googlebot Video and Googlebot Image, may have different limits.”

Why we care. It is important to know of these limits but again, most sites will likely never even come close to these limits. That being said these are the document limits of Googlebot’s crawling.

Search Engine Land is owned by Semrush. We remain committed to providing high-quality coverage of marketing topics. Unless otherwise noted, this page’s content was written by either an employee or a paid contractor of Semrush Inc.

Barry Schwartz is a technologist and a Contributing Editor to Search Engine Land and a member of the programming team for SMX events. He owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on very advanced SEM topics.

In 2019, Barry was awarded the Outstanding Community Services Award from Search Engine Land, in 2018 he was awarded the US Search Awards the “US Search Personality Of The Year,” you can learn more over here and in 2023 he was listed as a top 50 most influential PPCer by Marketing O’Clock.

Barry can be followed on X here and you can learn more about Barry Schwartz over here or on his personal site.

Google lists Googlebot file limits for crawling

What higher ed data shows about SEO visibility and AI search

Why Google’s Performance Max advice often fails new advertisers