Knowledge Panel Archive - FSIDM (Full Stack Institute of Digital Marketing)

How AMP Works in Google Search Results

Last Updated: August 14, 2025

1. Fast, Reliable Experience Through AMP 2. AMP Cache and Loading 3. AMP Pages as Rich Results 4. AMP Pages as Web Stories What Happens After Users Click AMP Content? Two Display Methods: a. Google AMP Viewer: b. Signed Exchange: Additional Notes

AMP on Google Search: Key Guidelines

Last Updated: August 14, 2025

1. Follow AMP HTML Specification 2. Content Parity with Canonical Pages 3. AMP URL Structure Should Make Sense 4. Validate Your AMP Pages 5. Structured Data Compliance Additional Notes

Mobile Site & Mobile-First Indexing Best Practices

Last Updated: August 14, 2025

What is Mobile-First Indexing? Google primarily uses the mobile version of your site (crawled by smartphone user-agent) to index and rank your pages. So, your mobile site’s content matters most. 1. Create a Mobile-Friendly Site Choose one of these configurations: 2. Make Content Accessible to Google 3. Structured Data & Metadata 4. Ads & Visual Content 5. Extra Tips for Separate URL Setup (m-dot sites) Example mobile canonical tag: <link rel=”canonical” href=”https://example.com/”> Example hreflang for mobile URLs: <link rel=”alternate” hreflang=”es” href=”https://m.example.com/es/”> 6. Crawl Budget & Robots.txt Summary Mobile-First Indexing Troubleshooting Guide 1. Missing Structured Data Cause: Mobile page lacks the structured data present on desktop.Fix: 2. Noindex Tag on Mobile Pages Cause: Mobile pages blocked by noindex meta tag.Fix: 3. Missing or Blocked Images Cause: Important images missing or blocked by robots.txt on mobile.Fix: 4. Low Quality or Missing Alt Text for Images Cause: Images are too small, low resolution, or missing alt text on mobile.Fix: 5. Missing Page Title or Meta Description Cause: Mobile pages lack title or meta description.Fix: 6. Mobile URL Is an Error Page Cause: Mobile page returns error while desktop serves content.Fix: 7. Mobile URL Has Anchor Fragment (#) Cause: Mobile URLs include fragments which Google can’t index.Fix: 8. Mobile Page Blocked by Robots.txt Cause: Mobile pages blocked by robots.txt disallow rules.Fix: 9. Duplicate Mobile Page Target Cause: Multiple desktop URLs redirect to the same mobile URL.Fix: 10. Desktop Redirects to Mobile Home Page Cause: Desktop pages redirect broadly to mobile homepage.Fix: 11. Page Quality Issues on Mobile Cause: Ads, missing content, or poor titles on mobile.Fix: 12. Video Issues Cause: Videos on mobile unsupported, hard to find, or slow loading.Fix: 13. Hostload Issues Cause: Server can’t handle increased mobile crawl rate.Fix:

How to Fix Canonicalization Issues

Last Updated: August 13, 2025

1. Use the URL Inspection Tool 2. Common Canonicalization Issues & Fixes a. Language Variants Without hreflang Annotations b. Incorrect Canonical Elements c. Server Misconfigurations d. Malicious Hacking or Spam Injection e. Syndicated Content f. Copycat Websites 3. Best Practices to Prevent Canonical Issues

How to Specify a Canonical URL

Last Updated: August 13, 2025

When you have multiple URLs showing the same or very similar content, you can tell Google which URL is the “main” or canonical one by using several methods — from strongest to weakest signals: 1. Redirects (Strongest Signal) 2. rel=”canonical” Link Annotation (Strong Signal) <head> <title>Product Page</title> <link rel=”canonical” href=”https://www.example.com/dresses/green-dress” /> </head> <link rel=”alternate” media=”only screen and (max-width: 640px)” href=”https://m.example.com/dresses/green-dress” /> 3. Sitemap Inclusion (Weak Signal) 4. Other Notes & Best Practices Summary Table Method Strength Use Case Notes Redirects (301) Strongest When removing duplicates Fastest way to consolidate URLs rel=”canonical” tag Strong Preferred method for HTML pages Must be in <head>, use absolute URLs rel=”canonical” header Strong (for non-HTML files) PDFs, Word docs, etc. Server configuration needed Sitemap URLs Weak Large sites Helps but doesn’t enforce canonicals Why Specify a Canonical URL?

What is Canonicalization?

Last Updated: August 13, 2025

Canonicalization is the process of choosing a single, preferred URL (called the canonical URL) among multiple URLs that show the same or very similar content. Google uses this process to avoid showing duplicate content in search results, ensuring users see the best version of a page. Why Does Duplicate Content Happen? Why Is Canonicalization Important? How Google Chooses the Canonical URL When Google indexes pages, it looks for duplicate or near-duplicate content and picks the URL that seems: You can suggest a canonical URL using the rel=”canonical” tag or by setting preferred versions in sitemaps and redirects, but Google treats it as a hint, not a rule — it may pick a different canonical URL based on its own assessment. Example: If your site has: Google might select https://example.com/page as canonical, but show the mobile version in search results to mobile users. Quick FSIDM Pro Tip: Always use canonical tags on duplicate or very similar pages to guide Google’s canonicalization. It improves SEO and avoids dilution of page authority.

🔍 How Google Interprets robots.txt (REP)

Last Updated: August 13, 2025

Google’s crawlers follow the Robots Exclusion Protocol (REP) to check which parts of a website they can crawl. 📌 What robots.txt Does Example:User-agent: * Disallow: /private/ User-agent: Googlebot Allow: /includes/ Sitemap: https://example.com/sitemap.xml 📍 File Location Rules 📏 Google’s Key Interpretations Example: Disallow: /includes/ Allow: /includes/css/ ⚠️ What robots.txt Does NOT Do 💡 FSIDM Quick Tip for Students & Site Owners: Think of robots.txt as a traffic cop — it directs crawlers, but it doesn’t lock doors. If you need real privacy, use authentication or noindex. 📌 Valid robots.txt URL Rules (Google’s View) The robots.txt file only applies to the exact protocol, domain/subdomain, and port it’s hosted on. Robots.txt Location Valid For Not Valid For https://example.com/robots.txt https://example.com/ https://other.example.com/, http://example.com/, https://example.com:8181/ https://www.example.com/robots.txt https://www.example.com/ https://example.com/, https://shop.www.example.com/ https://example.com/folder/robots.txt ❌ Crawlers don’t check subdirectories All https://www.exämple.com/robots.txt https://www.exämple.com/, https://xn--exmple-cua.com/ https://www.example.com/ ftp://example.com/robots.txt ftp://example.com/ https://example.com/ https://212.96.82.21/robots.txt https://212.96.82.21/ https://example.com/ https://example.com:443/robots.txt https://example.com/, https://example.com:443/ https://example.com:444/ https://example.com:8181/robots.txt https://example.com:8181/ https://example.com/ 💡 FSIDM Tip: Each subdomain needs its own robots.txt if you want to control crawling separately. ⚡ Handling HTTP Status Codes for robots.txt Google treats robots.txt responses differently based on status code: HTTP Code Google’s Behavior 2xx (Success) Reads and applies rules normally. 3xx (Redirects) Follows up to 5 hops → then treats as 404. 4xx (Client Errors) Treated as no robots.txt (full crawl allowed), except 429 (rate limit). 5xx (Server Errors) – First 12h: Stops crawling.- Next 30 days: Uses cached version if available.- After 30 days: If site available → crawls as if no robots.txt. DNS/Network Errors Treated as 5xx. ⏱ Caching Rules 📏 Format & Size Rules 💡 FSIDM Takeaway for Students & SEO Managers:👉 Correct location, format, and encoding are just as important as the rules themselves.👉 Always test robots.txt in Search Console after uploading to avoid indexing issues. 🛠 Robots.txt Syntax Basics <field>:<value> # optional comment 📌 Supported Fields (Google) Field Purpose Example user-agent Specifies the crawler the rules apply to User-agent: Googlebot disallow Path not allowed to crawl Disallow: /private/ allow Path allowed to crawl (overrides disallow) Allow: /private/public-page.html sitemap Location of sitemap(s) Sitemap: https://example.com/sitemap.xml ❌ Fields like crawl-delay are not supported by Google. 🔍 Path Rules 🧠 User-Agent Selection Logic Google picks the most specific group for the crawler: User-agent: googlebot-news # Group 1 User-agent: * # Group 2 User-agent: googlebot # Group 3 📌 Order in file doesn’t matter. Google groups all relevant rules for a user agent internally. 📂 Grouping Rules Multiple user-agents can share rules: User-agent: e User-agent: f Disallow: /g → Both e and f follow /g restriction. 📜 Example of Correct Syntax # Block all bots from /private/ User-agent: * Disallow: /private/ # Allow Googlebot access to /private/reports/ User-agent: Googlebot Allow: /private/reports/ # Add sitemap location Sitemap: https://example.com/sitemap.xml 💡 FSIDM Practical Tip for Students 🚦 URL Matching Based on Path Values in robots.txt Google uses the path part of a URL (after domain name) to decide if a robots.txt rule applies. It compares this path to the allow and disallow rules. 🎯 Key Wildcards Supported: Wildcard Meaning Example Match * Matches 0 or more characters /fish* matches /fish.html, /fishheads, etc. $ Matches end of URL /*.php$ matches /index.php but not /index.php?x=1 📌 Examples of Matching Rules Rule Matches Doesn’t Match / The root and everything below it (whole site) — /fish /fish, /fish.html, /fish/salmon.html, /fish.php?id=anything /Fish.asp (case-sensitive), /catfish, /desert/fish /fish/ Anything inside /fish/ folder, e.g. /fish/salmon.htm, /fish/?id=anything /fish (without slash), /fish.html /*.php Any URL containing .php, e.g. /index.php, /folder/filename.php?params /windows.PHP (case sensitive) /*.php$ URLs ending exactly with .php e.g. /file.php, /folder/file.php /file.php5, /file.php?param /fish*.php URLs containing /fish followed by .php somewhere, e.g. /fish.php, /fishheads/catfish.php /Fish.PHP (case sensitive) ⚖️ Order of Precedence — Which Rule Wins? 🔥 Real-World Examples URL Rules Which Rule Applies? Why? https://example.com/page allow: /pdisallow: / allow: /p /p is more specific than / https://example.com/folder/page allow: /folderdisallow: /folder allow: /folder In conflict, Google picks least restrictive rule https://example.com/page.htm allow: /pagedisallow: /*.htm disallow: /*.htm Longer, more specific disallow rule applies https://example.com/page.php5 allow: /pagedisallow: /*.ph allow: /page Least restrictive rule wins https://example.com/ allow: /$disallow: / allow: /$ $ means exact root, more specific https://example.com/page.htm allow: /$disallow: / disallow: / allow: /$ only matches root URL, not /page.htm 💡 FSIDM Pro Tip When writing rules:

🔄 How to Update Your robots.txt File

Last Updated: August 13, 2025

Sometimes you need to change your robots.txt — maybe to unblock a page for SEO or block unwanted crawling. Here’s the step-by-step. 1️⃣ Download Your Current robots.txt You have a few options: Using cURL (technical option): https://yourdomain.com/robots.txt -o robots.txt 2️⃣ Edit Your robots.txt Make changes using correct syntax: User-agent: * Disallow: /private/ Allow: /public/ Sitemap: https://yourdomain.com/sitemap.xml 3️⃣ Upload the Updated File 4️⃣ Refresh Google’s Cache ⚡ FSIDM Quick Tips:

🛠 How to Write & Submit a robots.txt File (Simple FSIDM Guide)

Last Updated: August 13, 2025

A robots.txt file tells search engine crawlers which parts of your site they can or cannot access.Think of it like a traffic signal for Googlebot — not a security lock. 📍 Where to Place the robots.txt File 🛠 Basic Structure of a robots.txt File A robots.txt file is just plain text.Here’s a simple example: User-agent: Googlebot Disallow: /nogooglebot/ User-agent: * Allow: / Sitemap: https://www.example.com/sitemap.xml 🔍 What this means: 📝 Step-by-Step: Create a robots.txt File 1️⃣ Create the File 2️⃣ Write Rules Each “rule set” includes: 3️⃣ Upload the File 4️⃣ Test the File 📌 Common robots.txt Rules Block Entire Site User-agent: * Disallow: / Allow Only Public Folder User-agent: * Disallow: / Allow: /public/ Block Specific File User-agent: * Disallow: /private-file.html Block All Images in Google Images User-agent: Googlebot-Image Disallow: / Block Specific File Type (e.g., .xls) User-agent: Googlebot Disallow: /*.xls$ ⚠️ Important Notes 💡 FSIDM Tip: A well-optimized robots.txt protects server resources and guides Google to focus on valuable pages. Pair it with a sitemap for best crawling efficiency.

🛠 Introduction to robots.txt (Simple & Practical Guide)

Last Updated: August 13, 2025

robots.txt is like a “Do’s & Don’ts” sign for search engine crawlers. It tells them which parts of your site they can or cannot access. ⚠️ Important: robots.txt controls crawling, not indexing. 📌 What is robots.txt Used For? A robots.txt file mainly helps you: 🔍 Effect on Different File Types File Type robots.txt Impact Web Pages (HTML, PDF) Stops crawling but URL may still appear in search if other sites link to it. Media Files (Images, Videos, Audio) Can block them from appearing in Google Search results, but doesn’t stop direct linking. Resource Files (CSS, JS, Images) You can block unimportant ones to save bandwidth. But don’t block essential resources needed to render or understand your page. ⚠️ Limitations of robots.txt 📝 Example robots.txt User-agent: * Disallow: /private/ Allow: /public/ ✅ Best Practices 💡 FSIDM Tip: Think of robots.txt as a “polite request” to crawlers, not a locked door. If you want something truly hidden from search, lock it properly (password protection or noindex).

veerani823gmail-com