Not all pages on your site should be indexed. You might want to hide certain content because:
- 🔒 Confidential or restricted data (e.g., pricing sheets for partners, private reports)
- 🗑 Low-value or spammy content (like thin user-generated posts that may hurt ranking)
- 🎯 Focus on important pages (block unimportant or duplicate pages to save crawl budget)
Main ways to block content from Google #
1️⃣ Remove the content from your site #
- Best method: If it doesn’t exist online, it can’t appear in Search.
- Use when: The content is not needed at all.
- Example: Old landing pages from expired campaigns.
2️⃣ Password-protect your files #
- Works for all content types (HTML, PDFs, images, videos).
- Prevents Googlebot from accessing these files.
- Example:
- Internal training material (locked with login).
- Client-specific reports.
- Internal training material (locked with login).
- Effect: Google will eventually remove them from search results.
3️⃣ Use the noindex rule #
- Method: Add in <head> or via HTTP header.
<meta name=”robots” content=”noindex”>
- Effect: Google can crawl the page, but won’t show it in results.
- Example: Thank-you pages, duplicate category pages.
4️⃣ Block crawling via robots.txt #
- Works for images and videos (or entire folders).
- Stops Googlebot from crawling specific paths.
- Example in robots.txt:
User-agent: Googlebot-Image
Disallow: /private-images/
⚠️ Note: Blocking crawling doesn’t remove content already indexed.
5️⃣ Opt out of specific Google properties #
- Tell Google to skip your site from specific services like:
- Google Shopping
- Google Hotels
- Vacation Rentals
- Google Shopping
- Example: A business not ready for Shopping Ads can opt out.
6️⃣ Remove existing content from Google #
- If already indexed, use Search Console → Removals Tool
- Or update with noindex and request recrawl.
💡 Pro tip for big sites:
If you have duplicate or low-priority pages, block them with noindex or robots.txt so Google spends crawl budget on your money pages (home, product, service).