Knowledge Panel Archive - FSIDM (Full Stack Institute of Digital Marketing)

Complete Guide to Robots Meta Tag, X-Robots-Tag, and data-nosnippet for Google Search Control

Last Updated: August 14, 2025

1. Robots Meta Tag Common directives you can set: Directive Purpose noindex Do not index this page nofollow Do not follow links on this page nosnippet Do not show a snippet in search results notranslate Do not offer automatic translation noarchive Do not show cached version link Example blocking all indexing: <meta name=”robots” content=”noindex, nofollow”> Example blocking snippet only on Google Search: <meta name=”googlebot” content=”nosnippet”> Example different rules for Google Search and News: <meta name=”googlebot” content=”notranslate”> <meta name=”googlebot-news” content=”nosnippet”> 2. data-nosnippet Attribute Example: This paragraph is visible in snippets. This paragraph will NOT appear in search snippets. 3. X-Robots-Tag HTTP Header Example HTTP header to block PDF from indexing: makefile X-Robots-Tag: noindex, nofollow Important Notes Using X-Robots-Tag HTTP Header for Google Search Control The X-Robots-Tag is an HTTP header you can send with your server responses to control how search engines crawl and index your pages or non-HTML resources like PDFs, images, and videos. Basic Usage Example To prevent a page from being indexed, your HTTP response can include: yaml HTTP/1.1 200 OK Date: Tue, 25 May 2010 21:42:43 GMT X-Robots-Tag: noindex Multiple Directives and User Agents You can combine directives in a comma-separated list: X-Robots-Tag: noindex, nofollow Or specify multiple X-Robots-Tag headers separately: X-Robots-Tag: noimageindex X-Robots-Tag: unavailable_after: 2024-12-31T23:59:59Z You can target specific crawlers by prefixing the user-agent: X-Robots-Tag: googlebot: nofollow X-Robots-Tag: otherbot: noindex, nofollow Important Notes Supported Rules/Directives Directive What It Does all Default; no restrictions on indexing or serving. noindex Do not show the page/resource in search results. nofollow Do not follow links on this page. none Equivalent to noindex, nofollow. nosnippet Prevents text or video snippets in search results (image thumbnails may still appear). indexifembedded Allows indexing if embedded (e.g., iframe) only works with noindex. max-snippet:[number] Limits snippet length to a max number of characters. Use 0 to disallow snippets (nosnippet). max-image-preview:[setting] Controls max image preview size in search: none, standard, or large. max-video-preview:[number] Limits video snippet length in seconds. Use 0 for no preview, -1 for unlimited. notranslate Prevents Google from offering translation of page content in search results. noimageindex Prevents images on the page from being indexed and shown in image search. unavailable_after:[date/time] Removes page from search results after specified date/time (ISO 8601, RFC 822, etc.). Examples Prevent snippet and indexing for all crawlers X-Robots-Tag: noindex, nosnippet Allow Googlebot to follow links but block others X-Robots-Tag: googlebot: follow X-Robots-Tag: otherbot: noindex, nofollow Set page to expire from search after a specific date X-Robots-Tag: unavailable_after: 2025-07-30T23:59:59Z Limit snippet length to 50 characters X-Robots-Tag: max-snippet:50 Historical and Unused Robots Rules (Ignored by Google) Rule Status & Notes noarchive No longer used; Google removed cached links feature. nocache Not used by Google Search. nositelinkssearchbox Deprecated; Google no longer shows sitelink search box. Combining Robots Meta Tag Rules Combine multiple rules with commas in a single meta tag: <meta name=”robots” content=”noindex, nofollow”> <meta name=”googlebot” content=”noindex”> Using data-nosnippet HTML Attribute Treated as a boolean attribute — value is ignored: This text can be shown in a snippet but this part won’t be shown. <div data-nosnippet> This whole block is excluded from snippets. </div> Structured Data & Robots Meta Tags Practical Implementation of X-Robots-Tag (Server-side) Apache examples: Add noindex, nofollow for all PDFs: <Files ~ “\.pdf$”> Header set X-Robots-Tag “noindex, nofollow” </Files> Add noindex for all image files (.png, .jpg, .gif): <Files ~ “\.(png|jpe?g|gif)$”> Header set X-Robots-Tag “noindex” </Files> Add X-Robots-Tag for a single file (place .htaccess in that file’s directory): <Files “unicorn.pdf”> Header set X-Robots-Tag “noindex, nofollow” </Files> NGINX examples: For PDFs: location ~* \.pdf$ { add_header X-Robots-Tag “noindex, nofollow”; } For images: location ~* \.(png|jpe?g|gif)$ { add_header X-Robots-Tag “noindex”; } For a single file: location = /path/to/unicorn.pdf { add_header X-Robots-Tag “noindex, nofollow”; } Important Note: Combining robots.txt and Robots Meta Tags/X-Robots-Tag

Meta Tags Supported by Google

Last Updated: August 14, 2025

Meta Tag Purpose & Usage Example description Provides a short summary of the page, sometimes used in search snippets. <meta name=”description” content=”Brief page summary here”> robots Controls crawling and indexing by all search engines. <meta name=”robots” content=”noindex, nofollow”> googlebot Controls crawling and indexing specifically by Googlebot (Google’s crawler). <meta name=”googlebot” content=”noindex”> notranslate Prevents Google from offering automatic translation for this page. <meta name=”googlebot” content=”notranslate”> nopagereadaloud Prevents Google’s text-to-speech services from reading the page aloud. <meta name=”google” content=”nopagereadaloud”> google-site-verification Verifies site ownership for Google Search Console. <meta name=”google-site-verification” content=”verification_code”> Content-Type / charset Defines character encoding and content type. Recommended to use UTF-8. <meta charset=”UTF-8″> or <meta http-equiv=”Content-Type” content=”text/html; charset=UTF-8″> refresh Redirects users after a specified time (not recommended, better to use 301 redirects). <meta http-equiv=”refresh” content=”5;url=https://example.com/”> viewport Controls page layout on mobile devices; essential for mobile-friendliness. <meta name=”viewport” content=”width=device-width, initial-scale=1″> rating Labels adult content for SafeSearch filtering. <meta name=”rating” content=”adult”> Supported HTML Tag Attributes for Indexing & Search Important Notes Example Meta Tag Block in <head> <head> <meta charset=”UTF-8″> <meta name=”description” content=”High-quality used books for children.”> <meta name=”robots” content=”index, follow”> <meta name=”google-site-verification” content=”your_verification_code_here”> <meta name=”viewport” content=”width=device-width, initial-scale=1″> <title>Example Books</title> </head>

Why Use Valid HTML for Page Metadata?

Last Updated: August 14, 2025

What Is Allowed Inside <head>? According to the HTML standard, only these elements are valid inside <head>: What to Avoid Inside <head> Quick Tips Summary Allowed in <head> Not allowed in <head> title iframe meta img link other invalid tags script style base noscript template

What Is Dynamic Rendering?

Last Updated: August 14, 2025

Dynamic rendering is a workaround for these issues: When Should You Use Dynamic Rendering? Note: Dynamic rendering adds extra server and maintenance overhead, so it’s not the preferred or long-term solution. How Does Dynamic Rendering Work? Is Dynamic Rendering Cloaking? Summary Aspect Explanation What it solves JavaScript content not seen properly by crawlers How it works Serve static HTML to crawlers, JS to users When to use Complex JS, fast-changing content, crawler limitations Downsides Additional complexity and resources Cloaking concerns Only if content differs drastically

Fixing JavaScript Issues That Block Google Search Visibility

Last Updated: August 14, 2025

Googlebot can execute JavaScript but with some differences and limitations. Follow these steps to ensure your JavaScript-powered pages work well for Search: 1. Diagnose with Google Tools Audit JavaScript errors on your site, including those Googlebot encounters.Example: Log errors globally for debugging: window.addEventListener(‘error’, function(e) { console.log(`JS Error: ${e.message} at ${e.filename}:${e.lineno}:${e.colno}`); // Optionally send error to remote logging service }); 2. Prevent Soft 404s in Single-Page Apps (SPA) Redirect to a real 404 page with HTTP 404 status: fetch(`/api/items/${id}`) .then(res => res.json()) .then(item => { if (!item.exists) { window.location.href = ‘/not-found’; // Server returns 404 status here } }); Or inject a noindex robots meta tag dynamically: const meta = document.createElement(‘meta’); meta.name = ‘robots’; meta.content = ‘noindex’; document.head.appendChild(meta); 3. Avoid User Permission Requests Blocking Content 4. Avoid Using URL Fragments for Routing 5. Do Not Rely on Client-side Persistent Storage for Content 6. Use Content Fingerprinting to Avoid Cache Stale Resources 7. Feature Detection & Polyfills 8. Support HTTP Connections 9. Ensure Web Components Render Correctly 10. Test & Iterate Summary Checklist:

Avoid Soft 404 Errors in Client-Side Rendered SPAs

Last Updated: August 14, 2025

Problem In SPAs, server can’t always send meaningful HTTP status codes (like 404), which causes Google to think an error page is a valid page—leading to soft 404 errors and ranking issues. Solutions JavaScript Redirect to a Server 404 Page If content doesn’t exist, redirect the user to a real 404 page on your server that returns a proper 404 status code. fetch(`/api/products/${productId}`) .then(response => response.json()) .then(product => { if (product.exists) { showProductDetails(product); } else { window.location.href = ‘/not-found’; // 404 page from server } }); Inject <meta name=”robots” content=”noindex”> on Error Pages If redirect isn’t feasible, add a noindex meta tag dynamically to prevent Google from indexing the error page. fetch(`/api/products/${productId}`) .then(response => response.json()) .then(product => { if (product.exists) { showProductDetails(product); } else { const metaRobots = document.createElement(‘meta’); metaRobots.name = ‘robots’; metaRobots.content = ‘noindex’; document.head.appendChild(metaRobots); } }); Use the History API Instead of URL Fragments (#) Bad practice (fragments):<a href=”#/products”>Products</a> Better practice (History API):<a href=”/products”>Products</a> Properly Inject rel=”canonical” with JavaScript (If Needed) Dynamically add one correct canonical tag only: fetch(‘/api/cats/’ + id) .then(res => res.json()) .then(cat => { const linkTag = document.createElement(‘link’); linkTag.setAttribute(‘rel’, ‘canonical’); linkTag.href = `https://example.com/cats/${cat.urlFriendlyName}`; document.head.appendChild(linkTag); }); Avoid multiple or conflicting canonical tags. Use Robots Meta Tags Carefully Additional Best Practices for JavaScript SEO in SPAs Avoid Soft 404 Errors in Client-Side Rendered SPAs Problem In SPAs, server can’t always send meaningful HTTP status codes (like 404), which causes Google to think an error page is a valid page—leading to soft 404 errors and ranking issues. Solutions JavaScript Redirect to a Server 404 Page If content doesn’t exist, redirect the user to a real 404 page on your server that returns a proper 404 status code. fetch(`/api/products/${productId}`) .then(response => response.json()) .then(product => { if (product.exists) { showProductDetails(product); } else { window.location.href = ‘/not-found’; // 404 page from server } }); Inject <meta name=”robots” content=”noindex”> on Error Pages If redirect isn’t feasible, add a noindex meta tag dynamically to prevent Google from indexing the error page. fetch(`/api/products/${productId}`) .then(response => response.json()) .then(product => { if (product.exists) { showProductDetails(product); } else { const metaRobots = document.createElement(‘meta’); metaRobots.name = ‘robots’; metaRobots.content = ‘noindex’; document.head.appendChild(metaRobots); } }); Use the History API Instead of URL Fragments (#) Bad practice (fragments):<a href=”#/products”>Products</a> Better practice (History API):<a href=”/products”>Products</a> Properly Inject rel=”canonical” with JavaScript (If Needed) Dynamically add one correct canonical tag only: fetch(‘/api/cats/’ + id) .then(res => res.json()) .then(cat => { const linkTag = document.createElement(‘link’); linkTag.setAttribute(‘rel’, ‘canonical’); linkTag.href = `https://example.com/cats/${cat.urlFriendlyName}`; document.head.appendChild(linkTag); }); Avoid multiple or conflicting canonical tags. Use Robots Meta Tags Carefully Additional Best Practices for JavaScript SEO in SPAs

JavaScript SEO Basics: How Google Processes Your JavaScript Site

Last Updated: August 14, 2025

1. How Google Processes JavaScript: 3 Phases 2. Key Points About Crawling JavaScript Sites 3. Best Practices to Optimize JavaScript for SEO 4. Additional Tips

How to Remove AMP Pages from Google Search

Last Updated: August 14, 2025

Page Types Recap: 1. Remove All Versions (AMP + non-AMP) Use this if you want to remove the entire page, including both AMP and canonical versions: ⚠️ Caution: Users might see errors temporarily during removal. 2. Remove Only AMP Pages, Keep Canonical Non-AMP Live Use this if you want to remove AMP pages but keep your regular site live: Tip: If you want to keep the AMP URL live but redirect, use HTTP 301 redirect to canonical non-AMP URL. 3. Remove AMP Content Using a CMS Delete a Single Page (AMP + non-AMP) Disable AMP Site-Wide Additional Notes

Validate AMP Content for Google Search

Last Updated: August 14, 2025

1. Use the AMP Test Tool 2. Use the Rich Results Test (for structured data) 3. Monitor AMP Pages via Search Console Fix Common AMP Errors if Your AMP Page Doesn’t Appear in Google Search Ensure Proper Linking & Canonical Tags Make AMP Content Crawlable Follow Structured Data Guidelines Additional Troubleshooting

How to Enhance AMP Content for Google Search

Last Updated: August 14, 2025

1. Create a Basic AMP Page 2. Create AMP Pages Using a CMS 3. Optimize for Rich Results 4. Monitor and Improve Your AMP Pages 5. Practice with AMP Codelabs

veerani823gmail-com