robots.txt is like a “Do’s & Don’ts” sign for search engine crawlers. It tells them which parts of your site they can or cannot access.
⚠️ Important: robots.txt controls crawling, not indexing.
- If you want to stop a page from appearing in Google Search, use noindex or password protection.
- robots.txt just stops bots from visiting a page — it doesn’t guarantee it won’t appear in search results.
📌 What is robots.txt Used For? #
A robots.txt file mainly helps you:
- Manage crawler traffic so search bots don’t overload your server.
- Control crawling of unimportant, duplicate, or private sections of your site.
🔍 Effect on Different File Types #
| File Type | robots.txt Impact |
| Web Pages (HTML, PDF) | Stops crawling but URL may still appear in search if other sites link to it. |
| Media Files (Images, Videos, Audio) | Can block them from appearing in Google Search results, but doesn’t stop direct linking. |
| Resource Files (CSS, JS, Images) | You can block unimportant ones to save bandwidth. But don’t block essential resources needed to render or understand your page. |
⚠️ Limitations of robots.txt #
- Not a security tool: Malicious bots can ignore it. If something is truly private, use a password or block access via server settings.
- Different crawlers read it differently: Googlebot follows rules, but not every bot does.
- Blocked pages can still be indexed: If other sites link to them, Google might still show the URL without a description.
📝 Example robots.txt #
User-agent: *
Disallow: /private/
Allow: /public/
- User-agent: * → applies to all crawlers
- Disallow: /private/ → blocks everything in the /private/ folder
- Allow: /public/ → lets crawlers access /public/ folder
✅ Best Practices #
- Use robots.txt to reduce unnecessary crawling, not to hide sensitive data.
- Combine robots.txt with noindex or password protection for stronger control.
- Test your robots.txt with Google’s robots.txt Tester.
💡 FSIDM Tip: Think of robots.txt as a “polite request” to crawlers, not a locked door. If you want something truly hidden from search, lock it properly (password protection or noindex).