1. Robots Meta Tag #
- Used to control page-level indexing and crawling.
- Placed inside the <head> section of HTML.
- Applies to all crawlers if name=”robots” or can target specific Google crawlers using:
- googlebot (for all Google search text results)
- googlebot-news (for Google News)
- googlebot (for all Google search text results)
Common directives you can set: #
Directive | Purpose |
noindex | Do not index this page |
nofollow | Do not follow links on this page |
nosnippet | Do not show a snippet in search results |
notranslate | Do not offer automatic translation |
noarchive | Do not show cached version link |
Example blocking all indexing: #
<meta name=”robots” content=”noindex, nofollow”>
Example blocking snippet only on Google Search: #
<meta name=”googlebot” content=”nosnippet”>
Example different rules for Google Search and News: #
<meta name=”googlebot” content=”notranslate”>
<meta name=”googlebot-news” content=”nosnippet”>
2. data-nosnippet Attribute #
- Use this on specific HTML elements (div, span, section) inside the page.
- Prevents Google from showing that part of the page’s content in the search snippet.
- Does not block indexing of the content, only excludes it from the snippet.
Example: #
<p>This paragraph is visible in snippets.</p>
<p data-nosnippet>This paragraph will NOT appear in search snippets.</p>
3. X-Robots-Tag HTTP Header #
- Controls crawling and indexing for non-HTML resources like PDFs, images, videos, or any files served by your server.
- Sent as an HTTP header in server responses.
- Same directives as the robots meta tag (noindex, nofollow, etc.).
Example HTTP header to block PDF from indexing: #
makefile
X-Robots-Tag: noindex, nofollow
Important Notes #
- These settings only apply if Googlebot can access the page/file (not blocked by robots.txt).
- The name and content attributes in meta tags are case-insensitive.
- Use multiple <meta> tags to set directives for different crawlers.
- If your site uses a CMS (like WordPress, Wix), use its SEO or advanced settings to add meta tags safely.
- For non-HTML files, always use the X-Robots-Tag header instead of meta tags.
Using X-Robots-Tag HTTP Header for Google Search Control #
The X-Robots-Tag is an HTTP header you can send with your server responses to control how search engines crawl and index your pages or non-HTML resources like PDFs, images, and videos.
Basic Usage Example #
To prevent a page from being indexed, your HTTP response can include:
yaml
HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
X-Robots-Tag: noindex
Multiple Directives and User Agents #
You can combine directives in a comma-separated list:
X-Robots-Tag: noindex, nofollow
Or specify multiple X-Robots-Tag headers separately:
X-Robots-Tag: noimageindex
X-Robots-Tag: unavailable_after: 2024-12-31T23:59:59Z
You can target specific crawlers by prefixing the user-agent:
X-Robots-Tag: googlebot: nofollow
X-Robots-Tag: otherbot: noindex, nofollow
Important Notes #
- The HTTP header name, user-agent names, and directives are case-insensitive.
- When conflicting directives exist, the most restrictive rule applies (e.g., nosnippet overrides max-snippet:50).
Supported Rules/Directives #
Directive | What It Does |
all | Default; no restrictions on indexing or serving. |
noindex | Do not show the page/resource in search results. |
nofollow | Do not follow links on this page. |
none | Equivalent to noindex, nofollow. |
nosnippet | Prevents text or video snippets in search results (image thumbnails may still appear). |
indexifembedded | Allows indexing if embedded (e.g., iframe) only works with noindex. |
max-snippet:[number] | Limits snippet length to a max number of characters. Use 0 to disallow snippets (nosnippet). |
max-image-preview:[setting] | Controls max image preview size in search: none, standard, or large. |
max-video-preview:[number] | Limits video snippet length in seconds. Use 0 for no preview, -1 for unlimited. |
notranslate | Prevents Google from offering translation of page content in search results. |
noimageindex | Prevents images on the page from being indexed and shown in image search. |
unavailable_after:[date/time] | Removes page from search results after specified date/time (ISO 8601, RFC 822, etc.). |
Examples #
Prevent snippet and indexing for all crawlers #
X-Robots-Tag: noindex, nosnippet
Allow Googlebot to follow links but block others #
X-Robots-Tag: googlebot: follow
X-Robots-Tag: otherbot: noindex, nofollow
Set page to expire from search after a specific date #
X-Robots-Tag: unavailable_after: 2025-07-30T23:59:59Z
Limit snippet length to 50 characters #
X-Robots-Tag: max-snippet:50
Historical and Unused Robots Rules (Ignored by Google) #
Rule | Status & Notes |
noarchive | No longer used; Google removed cached links feature. |
nocache | Not used by Google Search. |
nositelinkssearchbox | Deprecated; Google no longer shows sitelink search box. |
Combining Robots Meta Tag Rules #
Combine multiple rules with commas in a single meta tag:
<meta name=”robots” content=”noindex, nofollow”>
- Or use multiple meta tags:
<meta name=”robots” content=”nofollow”>
<meta name=”googlebot” content=”noindex”>
- When combining general and crawler-specific rules, the most restrictive rules apply for that crawler.
Using data-nosnippet HTML Attribute #
- Use on div, span, or section elements to exclude content from search snippets.
Treated as a boolean attribute — value is ignored:
<p>This text can be shown in a snippet
<span data-nosnippet>but this part won’t be shown</span>.
</p>
<div data-nosnippet>
<p>This whole block is excluded from snippets.</p>
</div>
- Must use valid HTML, close tags properly.
- Avoid dynamically adding or removing data-nosnippet attributes with JavaScript after page load.
- If you use custom elements, wrap content with valid elements (div, span, section) before adding data-nosnippet.
Structured Data & Robots Meta Tags #
- Structured data (schema.org) is not blocked by robots meta tags, except for article.description and other description fields.
- You can control snippet length of structured data content with max-snippet.
- Structured data remains usable even inside data-nosnippet elements.
- Modify structured data itself to control what information you want to expose in rich results.
Practical Implementation of X-Robots-Tag (Server-side) #
- Use web server config files to add X-Robots-Tag HTTP headers globally or by file type.
Apache examples: #
Add noindex, nofollow for all PDFs:
<Files ~ “\.pdf$”>
Header set X-Robots-Tag “noindex, nofollow”
</Files>
Add noindex for all image files (.png, .jpg, .gif):
<Files ~ “\.(png|jpe?g|gif)$”>
Header set X-Robots-Tag “noindex”
</Files>
Add X-Robots-Tag for a single file (place .htaccess in that file’s directory):
<Files “unicorn.pdf”>
Header set X-Robots-Tag “noindex, nofollow”
</Files>
NGINX examples: #
For PDFs:
location ~* \.pdf$ {
add_header X-Robots-Tag “noindex, nofollow”;
}
For images:
location ~* \.(png|jpe?g|gif)$ {
add_header X-Robots-Tag “noindex”;
}
For a single file:
location = /path/to/unicorn.pdf {
add_header X-Robots-Tag “noindex, nofollow”;
}
Important Note: Combining robots.txt and Robots Meta Tags/X-Robots-Tag #
- If a URL is disallowed in robots.txt, Google won’t crawl it, so it won’t see any robots meta tags or X-Robots-Tag headers on that URL.
- To enforce indexing or serving rules on a page, do not disallow crawling in robots.txt for that URL.