Knowledge Panel Archive - FSIDM (Full Stack Institute of Digital Marketing)

veerani823gmail-com

153 Docs

How to Verify if a Crawler Is Really Googlebot (Or Other Google Crawlers)

Last Updated: July 30, 2025

Spammers often spoof Googlebot’s user-agent to disguise themselves. So, verifying the crawler’s identity by IP is important to avoid fake bots accessing your site. 3 Types of Google Crawlers & Their IP Patterns Type Description Reverse DNS Mask Examples IP List Reference Files Common Crawlers Googlebot and other main Google crawlers. They obey robots.txt rules. crawl-***-***-***-***.googlebot.comgeo-crawl-***-***-***-***.geo.googlebot.com googlebot.json Special-Case Crawlers Used for specific Google products like AdsBot, may or may not obey robots.txt. rate-limited-proxy-***-***-***-***.google.com special-crawlers.json User-Triggered Fetchers Initiated by user actions, e.g., Google Site Verifier, Google Cloud Platform fetches. ***-***-***-***.gae.googleusercontent.comgoogle-proxy-***-***-***-***.google.com user-triggered-fetchers.jsonuser-triggered-fetchers-google.json How to Verify Googlebot Manually (Command Line) host <IP-address> host <domain-name> Example Walkthrough Say crawler IP is 66.249.66.1 host 66.249.66.1 Output:1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com. host crawl-66-249-66-1.googlebot.com Output:crawl-66-249-66-1.googlebot.com has address 66.249.66.1 If both checks match, it’s a verified Googlebot. Why Verify Googlebot? Pro Tip for FSIDM Students Automate this check for bigger sites with tools or scripts that compare incoming crawler IPs with official Google IP ranges (from googlebot.json and other published IP lists). For most small sites, manual or occasional checks suffice.

Google Crawl Rate – What It Is, How to Manage & Emergency Fixes

Last Updated: July 30, 2025

📌 What is Crawl Rate? ⚠️ Common Reasons for Crawl Spikes 👉 Fix: Review site structure, use robots.txt, noindex, or URL parameter handling in Search Console. 🚑 Emergency Crawl Rate Reduction If Googlebot is causing server strain, temporarily: ✅ Google will slow down crawling automatically.⚠️ Do NOT keep this for more than 1–2 days (it can hurt indexing & rankings). 📤 Permanent / Special Request If you cannot return error codes: 📌 FSIDM Tip for Students

Googlebot & Related Crawlers Explained – Types, Behavior & SEO Control (2025 Guide)

Last Updated: July 30, 2025

📌 Googlebot – Core Crawler for Google Search 🛠 Types of Googlebot 👉 Note:Both share the same User-agent: Googlebot in robots.txt, so you cannot block one without blocking the other. 🌐 How Googlebot Crawls Your Site 🚫 Blocking Googlebot (Crawl vs Index) ✅ Verifying Googlebot (Avoid Fake Crawlers) 💡 Quick FSIDM Tip for Students:If your site shows crawling overload or errors in Search Console, it’s often not Googlebot being aggressive—it may be fake bots spoofing Googlebot UA. Always verify before blocking. 📖 What is Google Read Aloud? 🔍 Crawl Frequency & Behavior 🚫 How to Block or Control It Block completely: <meta name=”google” content=”nopagereadaloud”> Paywalled or subscription content:Use structured data to mark restricted content: “isAccessibleForFree”: false 📜 Old vs New User Agent 💡 FSIDM Pro Tip for Students:If you run a membership site, premium articles, or gated content, always use nopagereadaloud or mark isAccessibleForFree:false in structured data — otherwise, Google’s TTS may read it aloud to users for free. 🌐 What is APIs-Google? 📡 How APIs-Google Accesses Your Site ⚙️ How to Prepare Your Site 🚫 How to Block APIs-Google Robots.txt: User-agent: APIs-Google Disallow: / ✅ Verifying APIs-Google Requests 💡 FSIDM Tip for Developers:If APIs-Google traffic is hitting your site too often, it’s usually due to: 📌 What is Feedfetcher? ⚡ How Feedfetcher Works 📈 Frequency of Retrieval 🚫 Blocking Feedfetcher Since robots.txt doesn’t work: 🔍 Why It Might Fetch “Odd” or “Secret” URLs 📊 Technical Details 💡 FSIDM Tip for Students/Marketers:If you run a podcast, news site, or blog:

Types of Google Crawlers & Fetchers

Last Updated: July 30, 2025

Google uses three main types: 1. Common Crawlers 2. Special-Case Crawlers 3. User-Triggered Fetchers 2️⃣ Technical Properties Distributed Crawling Protocols Supported Compression Supported (Specified in Accept-Encoding header.) 3️⃣ Crawl Rate & Host Load 4️⃣ HTTP Caching Support Google crawlers support caching using: 💡 Tip: 5️⃣ Key Best Practices ✅ Use correct robots.txt for controlling crawl.✅ Implement ETag or Last-Modified for efficient recrawls.✅ Ensure server handles HTTP/2 (unless opting out).✅ Compress responses (gzip, br) to save resources.✅ Monitor crawl activity in Search Console → Crawl Stats. 📌 Google’s Common Crawlers (Reference Table) Crawler Name User Agent (Example) Robots.txt Token Affected Products Googlebot Smartphone Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X…) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) Googlebot Google Search (Mobile), Discover, Images, Video, News Googlebot Desktop Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Googlebot Google Search (Desktop), Discover, Images, Video, News Googlebot Image Googlebot-Image/1.0 Googlebot-Image Google Images, Search features with images/logos/favicons Googlebot Video Googlebot-Video/1.0 Googlebot-Video Video features in Google Search, video indexing Googlebot News Uses Googlebot UA strings Googlebot-News Google News, news.google.com, Google News App Google StoreBot Mozilla/5.0 (X11; Linux x86_64; Storebot-Google/1.0) Chrome/W.X.Y.Z Safari/537.36 Storebot-Google Google Shopping (Shopping tab, Shopping surfaces) Google-InspectionTool Mozilla/5.0 (compatible; Google-InspectionTool/1.0;) Google-InspectionTool Search Console tools (URL Inspection, Rich Result Test) GoogleOther Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X…) (compatible; GoogleOther) GoogleOther Generic fetcher for internal research; not used for Search GoogleOther-Image GoogleOther-Image/1.0 GoogleOther-Image Fetching publicly accessible images (non-Search) GoogleOther-Video GoogleOther-Video/1.0 GoogleOther-Video Fetching publicly accessible videos (non-Search) Google-CloudVertexBot Contains Google-CloudVertexBot in UA Google-CloudVertexBot Vertex AI Agents (site-owner requested crawls) Google-Extended Uses existing Google UA; token used for permissions Google-Extended Controls if site content can be used to train Gemini models Key Notes: 📌 Google’s Special-Case Crawlers Crawler Name User Agent (Example) Robots.txt Token Notes / Products Affected APIs-Google APIs-Google (+https://developers.google.com/webmasters/APIs-Google.html) APIs-Google Push notification delivery via Google APIs (Ignores *) AdsBot Mobile Web Mozilla/5.0 (… Mobile Safari/537.36) (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html) AdsBot-Google-Mobile Google Ads ad quality checks for mobile pages (Ignores *) AdsBot AdsBot-Google (+http://www.google.com/adsbot.html) AdsBot-Google Google Ads ad quality checks (Ignores *) AdSense Mediapartners-Google Mediapartners-Google Google AdSense crawler to deliver relevant ads (Ignores *) Google-Safety Google-Safety (Ignores robots.txt) Malware/abuse discovery for links on Google properties (Retired) AdsBot Mobile Web (iPhone) Mozilla/5.0 (iPhone; CPU iPhone OS…) (compatible; AdsBot-Google-Mobile…) AdsBot-Google-Mobile Used for iPhone ad quality checks (retired) (Retired) Duplex on the Web Mozilla/5.0 (Linux; Android 11; Pixel 2; DuplexWeb-Google/1.0) DuplexWeb-Google Supported “Duplex on the Web” service (retired) (Retired) Google Favicon Mozilla/5.0 (X11; Linux x86_64) … Google Favicon Googlebot-Image Favicon fetching (retired; handled by Googlebot-Image) (Retired) Mobile Apps Android AdsBot-Google-Mobile-Apps AdsBot-Google-Mobile-Apps Checked Android app page ad quality (retired) (Retired) Web Light Mozilla/5.0 (… googleweblight) Chrome/… Mobile Safari/… googleweblight Served lightweight pages under slow network (retired) Key Points for FSIDM Students: 📌 Google User-Triggered Fetchers Fetcher Name User Agent (Example) Purpose / Product Feedfetcher FeedFetcher-Google; (+http://www.google.com/feedfetcher.html) Crawls RSS/Atom feeds for Google News & PubSubHubbub Google Publisher Center GoogleProducer; (+https://developers.google.com/search/docs/crawling-indexing/google-producer) Fetches publisher-supplied feeds for Google News landing pages Google Read Aloud Mobile: Mozilla/5.0 (Linux; Android 10; K) … (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943) Desktop: Mozilla/5.0 (X11; Linux x86_64) … (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943) (Former: google-speakr) On user request, fetches and reads webpages aloud using TTS Google Site Verifier Mozilla/5.0 (compatible; Google-Site-Verification/1.0) Fetches Search Console verification tokens Key Notes for FSIDM Students

How to Request Google to Recrawl Your URLs

Last Updated: July 30, 2025

How to Request Google to Recrawl Your URLs 1. Using URL Inspection Tool (For a Few URLs) 2. Submit or Resubmit a Sitemap (For Many URLs) Important Points What’s the Issue with Faceted Navigation URLs? Faceted navigation lets users filter items (products, articles, events) by parameters in the URL query string, like: https://example.com/items.shtm?products=fish&color=radioactive_green&size=tiny The problem: Many filter combinations generate tons of URLs, which can lead to: How to Manage Faceted Navigation URLs 1. Prevent Crawling of Faceted URLs (If You Don’t Need Them Indexed) Use robots.txt to block crawling of URLs with specific query parameters: User-agent: Googlebot Disallow: /*?*products= Disallow: /*?*color= Disallow: /*?*size= Allow: /*?products=all$ Note: rel=”canonical” and rel=”nofollow” are less effective at saving crawl budget than robots.txt or URL fragments. 2. If You Need Faceted URLs to Be Crawled and Indexed (Use Best Practices) Summary Tips: Large Site Owner’s Guide to Managing Crawl Budget — Key Points Who Should Read This? If your site is smaller or pages get crawled quickly after publishing, this guide is not essential. What Is Crawl Budget? Crawl Budget = How many pages Googlebot can and wants to crawl on your site within a given time. It’s controlled by two main factors: How to Increase Your Crawl Budget? Google allocates crawl budget based on: Important Tips for Large Sites What If Your Pages Aren’t Indexed? If pages have been around but never indexed, check their status with the URL Inspection tool rather than relying on crawl budget changes. Best Practices to Maximize Google Crawling Efficiency 1. Manage Your URL Inventory 2. Block Unwanted URLs with Robots.txt 3. Handle Removed Pages Properly 4. Keep Sitemaps Fresh and Relevant 5. Avoid Long Redirect Chains 6. Make Pages Fast and Efficient to Load 7. Use HTTP Caching Headers 8. Monitor Crawl Activity and Site Availability 9. Help Google Discover Important Content 10. Avoid Over-Exposing Low-Value URLs 11. Don’ts 12. Handling Overcrawling Emergencies How HTTP Status Codes, Network, and DNS Errors Affect Google Search What are HTTP Status Codes? HTTP Status Code Categories & Impact on Google Search Status Code Range Meaning Google Search Impact 2xx (Success) Request succeeded; page delivered Page content can be indexed (but 2xx doesn’t guarantee indexing). 3xx (Redirects) Page moved or redirecting Google follows redirect to new URL; if redirect fails, Search Console shows errors. 4xx (Client errors) Page not found, forbidden, etc. Pages with 4xx errors aren’t indexed; Google reports these as errors. 5xx (Server errors) Server failed to respond properly Crawling is delayed; Google may reduce crawl rate; pages won’t be indexed until fixed. Most Important Status Codes to Know Network and DNS Errors Key Takeaways Status Code Meaning How Google Handles It 2xx (Success) Content considered for indexing, but indexing is not guaranteed. 200 Success Content passed to indexing pipeline. 201, 202 Created, Accepted Googlebot waits briefly for content, then passes what it has to indexing. 204 No Content Signals no content; may show soft 404 in Search Console. 3xx (Redirects) Googlebot follows up to 10 redirects; final URL content is indexed, intermediate redirect content ignored. 301 Moved Permanently Strong signal that target URL is canonical. 302, 307 Temporary Redirect Weak signal that target URL is canonical. 303 See Other Treated like 302. 304 Not Modified Signals content unchanged since last crawl; no impact on indexing. 308 Permanent Redirect Treated like 301. 4xx (Client Errors) URLs returning 4xx are not indexed; previously indexed URLs are removed from index. Content ignored by Googlebot. 400 Bad Request Signals content doesn’t exist; URL removed from index if previously indexed; crawling frequency reduces gradually. 401 Unauthorized Treated like other 4xx; no effect on crawl rate. 403 Forbidden Same as 401. 404 Not Found Same as 400. 410 Gone Same as 400; stronger signal to remove URL from index faster. 411 Length Required Treated like 400. 429 Too Many Requests Treated as a server error; signals server overload; Googlebot slows crawl. 5xx (Server Errors) Googlebot slows crawl rate; content ignored; URLs persistently failing are eventually dropped from index. 500 Internal Server Error Crawl rate decreased proportionally to number of errors. 502 Bad Gateway Same as 500. 503 Service Unavailable Same as 500; signals temporary server overload. Soft 404 Errors What is a Soft 404? A page that shows a “not found” or error message but returns a 200 OK HTTP status code instead of a 404 or 410. Sometimes it’s an empty page or one missing content due to backend issues. Why it’s bad: How to Fix Soft 404 Errors: Network and DNS Errors Impact on Googlebot: How to Debug Network Errors How to Debug DNS Errors Verify DNS records with dig or similar tools: dig +nocmd example.com a +noall +answer dig +nocmd www.example.com cname +noall +answer dig +nocmd example.com ns +noall +answer

Image, News & Video Sitemaps Guide – Structure, Best Practices & XML Examples

Last Updated: July 30, 2025

Image Sitemaps: What & How What are Image Sitemaps? Why Use Image Sitemaps? Basic Structure of an Image Sitemap (XML) <?xml version=”1.0″ encoding=”UTF-8″?> <urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″ xmlns:image=”http://www.google.com/schemas/sitemap-image/1.1″> <url> <loc>https://example.com/sample1.html</loc> <image:image> <image:loc>https://example.com/image.jpg</image:loc> </image:image> <image:image> <image:loc>https://example.com/photo.jpg</image:loc> </image:image> </url> <url> <loc>https://example.com/sample2.html</loc> <image:image> <image:loc>https://example.com/picture.jpg</image:loc> </image:image> </url> </urlset> Required Tags Important Notes News Sitemaps: What & How What are News Sitemaps? Why Use News Sitemaps? Best Practices Basic Structure of a News Sitemap (XML) <?xml version=”1.0″ encoding=”UTF-8″?> <urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″ xmlns:news=”http://www.google.com/schemas/sitemap-news/0.9″> <url> <loc>http://www.example.org/business/article55.html</loc> <news:news> <news:publication> <news:name>The Example Times</news:name> <news:language>en</news:language> </news:publication> <news:publication_date>2008-12-23</news:publication_date> <news:title>Companies A, B in Merger Talks</news:title> </news:news> </url> </urlset> Required Tags Explained Extra Tips Video Sitemaps & Alternatives What is a Video Sitemap? A video sitemap is a sitemap that includes extra metadata about videos on your site. It helps Google discover and understand your video content more effectively—especially newly added or hard-to-find videos. Why Use Video Sitemaps? Alternatives to Video Sitemaps Key Video Sitemap Best Practices Basic Video Sitemap XML Example <?xml version=”1.0″ encoding=”UTF-8″?> <urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″ xmlns:video=”http://www.google.com/schemas/sitemap-video/1.1″> <url> <loc>https://www.example.com/videos/some_video_landing_page.html</loc> <video:video> <video:thumbnail_loc>https://www.example.com/thumbs/123.jpg</video:thumbnail_loc> <video:title>Grilling steaks for summer</video:title> <video:description>Alkis shows you how to get perfectly done steaks every time</video:description> <video:content_loc>http://streamserver.example.com/video123.mp4</video:content_loc> <video:player_loc>https://www.example.com/videoplayer.php?video=123</video:player_loc> <video:duration>600</video:duration> <video:publication_date>2007-11-05T19:20:30+08:00</video:publication_date> <video:family_friendly>yes</video:family_friendly> <video:live>no</video:live> </video:video> </url> </urlset> Embedding Videos from Platforms <video:player_loc>https://player.vimeo.com/video/987654321</video:player_loc> <video:player_loc>https://www.youtube.com/embed/1a2b3c4d</video:player_loc> Video Sitemap Tags Reference (Namespace: http://www.google.com/schemas/sitemap-video/1.1) Each video on a page must be enclosed in its own <video:video> tag nested inside the page’s <url> tag. Required Tags for Each Video Summary: At minimum, each <video:video> must contain: <video:video> <video:thumbnail_loc>THUMBNAIL_URL</video:thumbnail_loc> <video:title><![CDATA[Video Title]]></video:title> <video:description><![CDATA[Video description here…]]></video:description> <video:content_loc>VIDEO_FILE_URL</video:content_loc> <!– OR –> <video:player_loc>VIDEO_PLAYER_URL</video:player_loc> </video:video> Optional Video Sitemap Tags Explained Tag Name Purpose Notes / Values <video:duration> Length of the video in seconds Integer from 1 to 28,800 (8 hours max) <video:expiration_date> When the video expires and should no longer appear in search W3C date format: YYYY-MM-DD or full datetime with timezone (e.g., 2022-07-16T19:20:30+08:00) <video:rating> Video rating (quality/popularity) Float between 0.0 (lowest) and 5.0 (highest) <video:view_count> Number of times the video has been viewed Integer (e.g., 12345) <video:publication_date> Date the video was first published W3C date format similar to expiration_date <video:family_friendly> Whether video is suitable for SafeSearch yes = visible with SafeSearch on; no = visible only if SafeSearch is off <video:restriction> Control video visibility by country Requires relationship attribute: allow or deny. Use ISO country codes (e.g., US CA). Example: <video:restriction relationship=”allow”>CA MX</video:restriction> <video:platform> Control video visibility by device/platform Requires relationship attribute: allow or deny. Values: web, mobile, tv. Example: <video:platform relationship=”allow”>web tv</video:platform> <video:requires_subscription> Whether a subscription is needed to view Values: yes or no <video:uploader> Name of the video uploader Max 255 chars; optional info attribute for uploader info URL on the same domain <video:live> Indicates if the video is a livestream Values: yes or no <video:tag> Descriptive tags related to the video content Multiple tags allowed (max 32); use separate <video:tag> per tag Example snippet using some optional tags: <video:video> <video:thumbnail_loc>https://example.com/thumb.jpg</video:thumbnail_loc> <video:title><![CDATA[How to Grill Perfect Steaks]]></video:title> <video:description><![CDATA[Step-by-step guide to grilling juicy steaks.]]></video:description> <video:content_loc>https://example.com/video.mp4</video:content_loc> <video:duration>600</video:duration> <video:publication_date>2024-07-20T10:00:00+05:30</video:publication_date> <video:family_friendly>yes</video:family_friendly> <video:restriction relationship=”allow”>IN US CA</video:restriction> <video:platform relationship=”deny”>mobile</video:platform> <video:requires_subscription>no</video:requires_subscription> <video:uploader info=”https://example.com/uploader-profile”>GrillMaster</video:uploader> <video:live>no</video:live> <video:tag>grilling</video:tag> <video:tag>steak</video:tag> <video:tag>outdoor</video:tag> </video:video> Deprecated Video Sitemap Tags and Attributes Google removed these from their video sitemap specification: Sitemap Alternative: mRSS Feeds Google supports mRSS (Media RSS) as an alternative or complement to video sitemaps. mRSS is an extension of RSS 2.0 designed specifically for multimedia content. Basic Structure of an mRSS Feed with Video Example <?xml version=”1.0″ encoding=”UTF-8″?> <rss version=”2.0″ xmlns:media=”http://search.yahoo.com/mrss/” xmlns:dcterms=”http://purl.org/dc/terms/”> <channel> <title>Example MRSS</title> <link>https://www.example.com/examples/mrss/</link> <description>MRSS Example</description> <item> <link>https://www.example.com/examples/mrss/example.html</link> <media:content url=”https://www.example.com/examples/mrss/example.flv” fileSize=”405321″ type=”video/x-flv” height=”240″ width=”320″ duration=”120″ medium=”video” isDefault=”true”> <media:player url=”https://www.example.com/shows/example/video.swf?flash_params” /> <media:title>Grilling Steaks for Summer</media:title> <media:description>Get perfectly done steaks every time</media:description> <media:thumbnail url=”https://www.example.com/examples/mrss/example.png” height=”120″ width=”160″/> <media:price price=”19.99″ currency=”EUR” /> <media:price type=”subscription” /> </media:content> <media:restriction relationship=”allow” type=”country”>us ca</media:restriction> <dcterms:valid>end=2020-10-15T00:00+01:00; scheme=W3C-DTF</dcterms:valid> <dcterms:type>live-video</dcterms:type> </item> </channel> </rss> Required mRSS Tags for Google Tag Purpose Notes <media:content> Encapsulates video info and URL medium=”video”, direct video URL in url attribute or <media:player> required <media:player> URL of the video player Must differ from <link> URL (which is page URL) <media:title> Video title Max 100 chars; escape HTML or use CDATA <media:description> Video description Max 2048 chars; escape HTML or use CDATA <media:thumbnail> Video thumbnail URL Follow thumbnail requirements Useful Optional Tags Tag Purpose <dcterms:valid> Publication and expiration date/time range <media:restriction> Country-based access restrictions (with relationship and type=”country” attributes) <media:price> Pricing info for purchase, rent, subscription, or package options Key Differences: Video Sitemap vs mRSS Aspect Video Sitemap mRSS Feed Format XML sitemap with video namespace RSS 2.0 with media RSS extension Usage Google’s recommended way to provide video metadata Supported alternative, especially if syndicating multimedia feeds Detail Level Focused on video metadata for indexing More detailed multimedia syndication including pricing, player, etc. Deprecated Tags Some video sitemap tags removed Uses different tags, e.g., <media:price> is supported How to Combine Sitemap Extensions 1. Declare Multiple Namespaces in <urlset> Each sitemap extension you want to use needs its namespace declared in the root <urlset> tag using the xmlns attribute. For example, if you want to combine news, video, image, and hreflang (xhtml) extensions, your <urlset> looks like this: <?xml version=”1.0″ encoding=”UTF-8″?> <urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″ xmlns:news=”http://www.google.com/schemas/sitemap-news/0.9″ xmlns:video=”http://www.google.com/schemas/sitemap-video/1.1″ xmlns:image=”http://www.google.com/schemas/sitemap-image/1.1″ xmlns:xhtml=”http://www.w3.org/1999/xhtml”> 2. Add Extension Tags Inside Each <url> Tag Within each <url> entry, you can include any combination of tags from the declared extensions that apply to that URL. Example structure inside a <url>: <url> <loc>https://www.example.com/article1.html</loc> <!– News extension –> <news:news> <news:publication> <news:name>Example News</news:name> <news:language>en</news:language> </news:publication> <news:publication_date>2025-07-28</news:publication_date> <news:title>Breaking News Headline</news:title> </news:news> <!– Video extension –> <video:video> <video:thumbnail_loc>https://www.example.com/thumb.jpg</video:thumbnail_loc> <video:title>How to Combine Sitemaps</video:title> <video:description>Quick tutorial on sitemap extensions</video:description> <video:content_loc>https://cdn.example.com/videos/tutorial.mp4</video:content_loc> </video:video> <!– Image extension –> <image:image> <image:loc>https://www.example.com/images/image1.jpg</image:loc> </image:image> <!– Hreflang extension –> <xhtml:link rel=”alternate” hreflang=”fr” href=”https://www.example.com/fr/article1.html”/> <xhtml:link rel=”alternate” hreflang=”es” href=”https://www.example.com/es/article1.html”/> </url> 3. Follow Individual Extension Rules 4. Final Notes

Learn About Sitemaps

Last Updated: July 30, 2025

What is a Sitemap? A sitemap is a file where you list important pages, videos, images, and other files on your site, along with their relationships. Search engines like Google use this file to crawl your site more efficiently and understand which content you consider important. What Information Can a Sitemap Include? Do You Need a Sitemap? Google usually finds most pages through internal linking, but a sitemap is helpful in these cases: When You Might Not Need a Sitemap: Note: Using popular CMS platforms like WordPress, Wix, or Blogger, your sitemap might already be automatically generated and submitted to search engines. Build and Submit a Sitemap What is a Sitemap? A sitemap is a file that lists all the important URLs on your site and provides extra info about them to help Google crawl your site better. How to Build a Sitemap for Google Google supports several sitemap formats, each with pros and cons. Choose the one that fits your website and technical setup best — Google does not prefer one over the other. Sitemap Format Description & Benefits Pros Cons XML Sitemap Most versatile format. Supports detailed info about URLs including images, videos, news, and localized pages. – Extensible and rich with info- CMS plugins often available for auto-generation – Can be complex to maintain for large or frequently changing sites RSS, mRSS, Atom 1.0 Similar to XML but primarily used for feeds. Often auto-generated by CMS platforms. – Automatically created by many CMS- Can provide info about videos – Limited to videos and feeds- Cannot include images or news data Text Sitemap The simplest format, listing only URLs for HTML or indexable pages. – Very easy to create and maintain- Good for very large sites – Can only list URLs, no extra info like videos or images Submitting Your Sitemap Tips: Sitemap Best Practices 1. Sitemap Size Limits 2. Sitemap File Encoding & Location 3. URLs in Sitemap 4. XML Sitemap Specifics <?xml version=”1.0″ encoding=”UTF-8″?> <urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″> <url> <loc>https://www.example.com/foo.html</loc> <lastmod>2022-06-04</lastmod> </url> </urlset> 5. RSS, mRSS, and Atom 1.0 Sitemaps 6. Text Sitemaps How to Create a Sitemap Submitting Your Sitemap to Google Add sitemap location in your robots.txt file like:Sitemap: https://example.com/sitemap.xml Cross-Submitting Sitemaps for Multiple Sites If you manage multiple websites: To inform Google: Example for robots.txt: Sitemap: https://sitemaps.example.com/sitemap-example-com.xml Managing Sitemaps with a Sitemap Index File Why use a Sitemap Index File? Sitemap Index Best Practices Example Sitemap Index XML <?xml version=”1.0″ encoding=”UTF-8″?> <sitemapindex xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″> <sitemap> <loc>https://www.example.com/sitemap1.xml.gz</loc> <lastmod>2024-08-15</lastmod> </sitemap> <sitemap> <loc>https://www.example.com/sitemap2.xml.gz</loc> <lastmod>2022-06-05</lastmod> </sitemap> </sitemapindex>

Link Best Practices for Google

Last Updated: July 30, 2025

Aspect Guidelines Examples Make links crawlable – Use <a> tags with href attributes.– Avoid links without href or those relying on JavaScript events only.– Dynamically inserted links using proper <a href> markup are crawlable.– URLs in links should be valid, resolvable web addresses (URI format). Good: <a href=”https://example.com/products”>Products</a>Bad: <a onclick=”goTo(‘https://example.com’)”>Link</a> Anchor text placement – Place meaningful text between <a> tags.- Avoid empty anchor text; if empty, use title attribute.- For image links, use descriptive alt attributes.– If using JavaScript to insert anchor text, verify with URL Inspection Tool. Good: <a href=”/ghost-peppers”>ghost peppers</a>Bad: <a href=”/ghost-peppers”></a>Image link good alt: <img alt=”add enchiladas to your cart”/> Write good anchor text – Be descriptive, concise, relevant.– Avoid vague texts like “click here” or “read more.”- Don’t stuff keywords unnaturally.– Provide context by ensuring the surrounding sentence makes sense.– Avoid long, rambling anchor text.– Avoid chaining too many links together without context. Bad: <a href=”https://example.com”>Click here</a>Better: <a href=”https://example.com/cheese-list”>list of cheese types</a> Internal linking – Link between your own pages to help users and Google discover related content.– Every important page should be linked from at least one other page.- Don’t overload pages with too many links. Link to related guides, FAQs, or category pages to enhance site structure and user experience. External linking – Link to trustworthy external sources to add credibility.– Use nofollow if you don’t trust the site or for sponsored links.– Use ugc for user-generated content links.– Add context so users know what to expect from the link destination.– Don’t overuse nofollow unnecessarily. Good example: Citing research study with link.<a href=”https://research.example.com”>Study on cheese flavor</a>

URL Structure Best Practices for Google Search

Last Updated: July 30, 2025

1️⃣ Technical Requirements 2️⃣ Structure & Readability 3️⃣ Optimization & Consistency Requirement Best Practice (Recommended) Bad Practice (Not Recommended) Follow IETF STD 66 Use percent encoding for reserved characters ✅ Example: /my%20page for space ❌ Using unencoded reserved characters may cause crawling issues Avoid URL Fragments (#) Use History API for dynamic content changes ✅ https://example.com/potatoes ❌https://example.com/#/potatoes (Google ignores fragments for crawling) Parameter Encoding Use = to separate key-value pairs and & for additional parameters ✅ https://example.com/category?category=dresses&sort=low-to-high&sid=789 ❌ Using : or [] for parameters ❌https://example.com/category?[category:dresses][sort:price-low-to-high] Multiple Values for Same Key Use commas within a parameter value ✅ https://example.com/category?category=dresses&color=purple,pink,salmon&sort=low-to-high&sid=789 ❌ Using commas and double commas for separating parameters ❌https://example.com/category?category,dresses,,sort,lowtohigh,,sid,789 Make it easy to understand your URL structure To help Google Search (and your users) better understand your site, we recommend creating a simple URL structure, applying the following best practices when possible. Best Practice Recommended Not Recommended Use descriptive URLs https://example.com/wiki/Aviation https://example.com/index.php?topic=42&area=3a5ebc944f41daa6f849f730f1 Use audience’s language https://example.com/lebensmittel/pfefferminz (German) https://example.com/ペパーミント (Japanese) Using unrelated or non-localized terms in URLs Use UTF-8 encoding for non-ASCII characters https://example.com/%D9%86%D8%B9%D9%86%D8%A7%D8%B9/%D8%A8%D9%82%D8%A7%D9%84%D8%A9 https://example.com/%E6%9D%82%E8%B4%A7/%E8%96%84%E8%8D%B7 https://example.com/gem%C3%BCse https://example.com/نعناع https://example.com/杂货/薄荷 https://example.com/gemüse Use hyphens to separate words https://example.com/summer-clothing/filter?color-profile=dark-grey https://example.com/summer_clothing/filter?color_profile=dark_grey https://example.com/greendress Minimize unnecessary parameters Keep URLs clean, only required parameters URLs with excessive or irrelevant parameters Consistent case usage Convert all URLs to lowercase /apple consistently /APPLE vs /apple treated as different URLs Multi-regional targeting https://example.de (Country-specific domain) https://example.com/de/ (Subdirectory for locale) Mixing locales without clear URL structure Avoid Common Issues Related to URLs Complex URLs with multiple parameters can confuse search engines and waste crawl budget. When too many URL variations point to the same or similar content, Googlebot may spend excessive bandwidth crawling duplicates instead of discovering fresh pages. This can lead to slower indexing and incomplete coverage of your site in Google Search. 👉 Keep URLs clean, consistent, and minimal in parameters to ensure efficient crawling and full indexation of important pages. Common Issue Description Example URLs Recommended Fix / Note Additive Filtering Combining multiple filters creates many URL variations showing similar content, causing URL explosion and redundant crawling. Google only needs a few representative pages to find the individual items. – https://example.com/hotel-search-results.jsp?Ne=292&N=461- https://example.com/hotel-search-results.jsp?Ne=292&N=461+4294967240- https://example.com/hotel-search-results.jsp?Ne=292&N=461+4294967240+4294967270 Limit crawlable filtered URLs; use canonical tags or block crawling of parameter combinations. Irrelevant Parameters URLs with unnecessary parameters like referral IDs, sorting, or session IDs create many duplicate URLs that do not change main content. – https://example.com/search/noheaders?click=6EE2BF1AF6A3D705D5561B7C3564D9C2&clickPage=OPD+Product+Page&cat=79- https://example.com/discuss/showthread.php?referrerid=249406&threadid=535913- https://example.com/results?search_sort=relevance- https://example.com/search/noheaders?sessionid=6EE2BF1AF6A3D705D5561B7C3564D9C2 Avoid session IDs in URLs (use cookies). Block or disallow crawling via robots.txt of such URLs. Calendar Issues Dynamically generated calendar pages create infinite URLs for past/future dates, wasting crawl budget and causing duplicate content. – https://example.com/calendar.php?d=13&m=8&y=2011 Use nofollow on links to dynamic future dates or restrict calendar crawling. Broken Relative Links Incorrect use of parent-relative links on wrong pages can create infinite or broken URLs if server doesn’t properly respond with 404. – Link: <a href=”../../category/stuff”> on https://example.com/category/community/070413/html/FAQ.htm- Leads to bogus URLs like https://example.com/category/community/category/stuff Use root-relative URLs instead of parent-relative URLs to avoid incorrect paths. Fixing crawling-related URL structure problems Issue Recommended Fix Details Problematic dynamic URLs Use robots.txt to block Googlebot access Block URLs that generate search results or have dynamic parameters that cause excessive crawling Infinite URL spaces (e.g., calendars) Use robots.txt or nofollow attributes on problematic links Prevent crawling of infinite date ranges or dynamically created pages Faceted navigation URLs Manage crawling carefully Implement best practices for faceted navigation to avoid crawling duplicate or nearly identical filtered URLs

File types indexable by Google

Last Updated: July 30, 2025

Category File Types / Extensions Document Formats PDF (.pdf), PostScript (.ps), CSV (.csv), EPUB (.epub), Hancom Hanword (.hwp) Google Earth Formats KML (.kml), KMZ (.kmz) GPS Formats GPX (.gpx) HTML Formats HTML (.htm, .html, other variations) Microsoft Office Word (.doc, .docx), Excel (.xls, .xlsx), PowerPoint (.ppt, .pptx) OpenOffice Formats Writer (.odt), Calc (.ods), Impress (.odp) Text Formats RTF (.rtf), TXT (.txt, .text), TeX/LaTeX (.tex) Programming / Source Code BASIC (.bas), C/C++ (.c, .cc, .cpp, .cxx, .h, .hpp), C# (.cs), Java (.java), Perl (.pl), Python (.py) Wireless / Markup Languages WML (.wml, .wap), XML (.xml) Image Formats BMP, GIF, JPEG, PNG, WebP, SVG, AVIF Video Formats 3GP, 3G2, ASF, AVI, DivX, M2V, M3U, M3U8, M4V, MKV, MOV, MP4, MPEG, OGV, QVT, RAM, RM, VOB, WebM, WMV, XAP 💡 Pro Tip for FSIDM Students: filetype:pdf digital marketing strategy (This shows only PDF results related to “digital marketing strategy”.)