Robots.txt
Case-1
User-agent: *
Disallow: /
- Comment: Search engine bots to not index
Case-2
User-agent: *
Disallow:
- Comment: Search engine bots would be able to crawl & index everything on the website.
Case-3
User-agent: *
Disallow: courier.html
- Comment: Specific pages that you do not want to be indexed.
Case-4
User-agent: Googlebot-Image
Disallow: /images/usa-shipping.jpg
- Comment: Blocking specific images from google images.
Case-5
User-agent: *
Disallow: /ebooks/*.pdf
Disallow: /staging/
- Disallow: /ebooks/*.pdf — In conjunction with the first line, this link means that all web crawlers should not crawl any pdf files in the ebooks folder within this website. This means search engines won’t include these direct PDF links in search results.
- Disallow: /staging/ —In conjunction with the first line, this line asks all crawlers not to crawl anything in the staging folder of the website. This can be helpful if you’re running a test and don’t want the staged content to appear in the search results.
Case-6
User-agent: *
Disallow: /*?utm=*
- Disallow: /*?utm=* — In conjunction with the first line, this link means that all web crawlers should not crawl any links with UTM parameters.
Case-7
User-agent: *
Disallow: /*?
- Block search engines from accessing any URL that has a ? in it
Case-8
User-agent: *
Disallow: /*.php$
- The $ character is used for “end of URL” matches. This example blocks GoogleBot crawling URLs that end with “.php”
Case-9
User-agent: *
Disallow: /search?s=*
- Stop any crawler from crawling search parameter pages.
Case-10
User-agent: *
Disallow: /search?s=* (Disallow: /query?kw=*)
- Stop any crawler from crawling search parameter pages
Case-11
User-agent: Googlebot-Image
Disallow: /*.gif$
- By specifying Googlebot-Image as the User-agent, the images will be excluded from Google Image Search.
Case-12
User-agent: Googlebot-Image
Disallow: /*.gif$
- By specifying Googlebot-Image as the User-agent, the images will be excluded from Google Image Search
Case-13
User-agent: Googlebot-Image
Disallow: /*?color
Allow: /*?color=blue
- By specifying - Block search engines from crawling any URL with the ?color= parameter in it, except for ?color=blue
Case-14
User-agent: Googlebot-Image
Disallow: /blog/*/page/
- This means that URI paths such as /blog/category-name/page/3 will be blocked from crawling, without having to specify each category and each pagination.