Robots Meta Tag, X-Robots-Tag & data-nosnippet

1. Robots Meta Tag #

Used to control page-level indexing and crawling.
Placed inside the <head> section of HTML.
Applies to all crawlers if name=”robots” or can target specific Google crawlers using:
- googlebot (for all Google search text results)
- googlebot-news (for Google News)

Common directives you can set: #

Directive	Purpose
noindex	Do not index this page
nofollow	Do not follow links on this page
nosnippet	Do not show a snippet in search results
notranslate	Do not offer automatic translation
noarchive	Do not show cached version link

Example blocking all indexing: #

Example blocking snippet only on Google Search: #

Example different rules for Google Search and News: #

2. data-nosnippet Attribute #

Use this on specific HTML elements (div, span, section) inside the page.
Prevents Google from showing that part of the page’s content in the search snippet.
Does not block indexing of the content, only excludes it from the snippet.

Example: #

This paragraph is visible in snippets.

This paragraph will NOT appear in search snippets.

3. X-Robots-Tag HTTP Header #

Controls crawling and indexing for non-HTML resources like PDFs, images, videos, or any files served by your server.
Sent as an HTTP header in server responses.
Same directives as the robots meta tag (noindex, nofollow, etc.).

Example HTTP header to block PDF from indexing: #

makefile

X-Robots-Tag: noindex, nofollow

Important Notes #

These settings only apply if Googlebot can access the page/file (not blocked by robots.txt).
The name and content attributes in meta tags are case-insensitive.
Use multiple <meta> tags to set directives for different crawlers.
If your site uses a CMS (like WordPress, Wix), use its SEO or advanced settings to add meta tags safely.
For non-HTML files, always use the X-Robots-Tag header instead of meta tags.

Using X-Robots-Tag HTTP Header for Google Search Control #

The X-Robots-Tag is an HTTP header you can send with your server responses to control how search engines crawl and index your pages or non-HTML resources like PDFs, images, and videos.

Basic Usage Example #

To prevent a page from being indexed, your HTTP response can include:

yaml

HTTP/1.1 200 OK

Date: Tue, 25 May 2010 21:42:43 GMT

X-Robots-Tag: noindex

Multiple Directives and User Agents #

You can combine directives in a comma-separated list:

X-Robots-Tag: noindex, nofollow

Or specify multiple X-Robots-Tag headers separately:

X-Robots-Tag: noimageindex

X-Robots-Tag: unavailable_after: 2024-12-31T23:59:59Z

You can target specific crawlers by prefixing the user-agent:

X-Robots-Tag: googlebot: nofollow

X-Robots-Tag: otherbot: noindex, nofollow

Important Notes #

The HTTP header name, user-agent names, and directives are case-insensitive.
When conflicting directives exist, the most restrictive rule applies (e.g., nosnippet overrides max-snippet:50).

Supported Rules/Directives #

Directive	What It Does
all	Default; no restrictions on indexing or serving.
noindex	Do not show the page/resource in search results.
nofollow	Do not follow links on this page.
none	Equivalent to noindex, nofollow.
nosnippet	Prevents text or video snippets in search results (image thumbnails may still appear).
indexifembedded	Allows indexing if embedded (e.g., iframe) only works with noindex.
max-snippet:[number]	Limits snippet length to a max number of characters. Use 0 to disallow snippets (nosnippet).
max-image-preview:[setting]	Controls max image preview size in search: none, standard, or large.
max-video-preview:[number]	Limits video snippet length in seconds. Use 0 for no preview, -1 for unlimited.
notranslate	Prevents Google from offering translation of page content in search results.
noimageindex	Prevents images on the page from being indexed and shown in image search.
unavailable_after:[date/time]	Removes page from search results after specified date/time (ISO 8601, RFC 822, etc.).

Examples #

Prevent snippet and indexing for all crawlers #

X-Robots-Tag: noindex, nosnippet

Allow Googlebot to follow links but block others #

X-Robots-Tag: googlebot: follow

X-Robots-Tag: otherbot: noindex, nofollow

Set page to expire from search after a specific date #

X-Robots-Tag: unavailable_after: 2025-07-30T23:59:59Z

Limit snippet length to 50 characters #

X-Robots-Tag: max-snippet:50

Historical and Unused Robots Rules (Ignored by Google) #

Rule	Status & Notes
noarchive	No longer used; Google removed cached links feature.
nocache	Not used by Google Search.
nositelinkssearchbox	Deprecated; Google no longer shows sitelink search box.

Combining Robots Meta Tag Rules #

Combine multiple rules with commas in a single meta tag:

<meta name=”robots” content=”noindex, nofollow”>

Or use multiple meta tags:

<meta name=”robots” content=”nofollow”>

When combining general and crawler-specific rules, the most restrictive rules apply for that crawler.

Using data-nosnippet HTML Attribute #

Use on div, span, or section elements to exclude content from search snippets.

Treated as a boolean attribute — value is ignored:

This text can be shown in a snippet

but this part won’t be shown.

This whole block is excluded from snippets.

</div>

Must use valid HTML, close tags properly.
Avoid dynamically adding or removing data-nosnippet attributes with JavaScript after page load.
If you use custom elements, wrap content with valid elements (div, span, section) before adding data-nosnippet.

Structured Data & Robots Meta Tags #

Structured data (schema.org) is not blocked by robots meta tags, except for article.description and other description fields.
You can control snippet length of structured data content with max-snippet.
Structured data remains usable even inside data-nosnippet elements.
Modify structured data itself to control what information you want to expose in rich results.

Practical Implementation of X-Robots-Tag (Server-side) #

Use web server config files to add X-Robots-Tag HTTP headers globally or by file type.

Apache examples: #

Add noindex, nofollow for all PDFs:

<Files ~ “\.pdf$”>

Header set X-Robots-Tag “noindex, nofollow”

</Files>

Add noindex for all image files (.png, .jpg, .gif):

<Files ~ “\.(png|jpe?g|gif)$”>

Header set X-Robots-Tag “noindex”

</Files>

Add X-Robots-Tag for a single file (place .htaccess in that file’s directory):

<Files “unicorn.pdf”>

Header set X-Robots-Tag “noindex, nofollow”

</Files>

NGINX examples: #

For PDFs:

location ~* \.pdf$ {

add_header X-Robots-Tag “noindex, nofollow”;

}

For images:

location ~* \.(png|jpe?g|gif)$ {

add_header X-Robots-Tag “noindex”;

}

For a single file:

location = /path/to/unicorn.pdf {

add_header X-Robots-Tag “noindex, nofollow”;

}

Important Note: Combining robots.txt and Robots Meta Tags/X-Robots-Tag #

If a URL is disallowed in robots.txt, Google won’t crawl it, so it won’t see any robots meta tags or X-Robots-Tag headers on that URL.
To enforce indexing or serving rules on a page, do not disallow crawling in robots.txt for that URL.