A robots.txt file tells search engine crawlers which parts of your site they can or cannot access.
Think of it like a traffic signal for Googlebot — not a security lock.
📍 Where to Place the robots.txt File #
- Must be placed in the root directory of your site.
Example: https://www.example.com/robots.txt - Each domain or subdomain needs its own robots.txt file.
- Only one robots.txt file per host is allowed.
🛠 Basic Structure of a robots.txt File #
A robots.txt file is just plain text.
Here’s a simple example:
User-agent: Googlebot
Disallow: /nogooglebot/
User-agent: *
Allow: /
Sitemap: https://www.example.com/sitemap.xml
🔍 What this means: #
- Googlebot can’t crawl URLs starting with /nogooglebot/
- All other bots can crawl everything (Allow: /)
- Sitemap location is specified for better crawling
📝 Step-by-Step: Create a robots.txt File #
1️⃣ Create the File #
- Use a text editor (Notepad, TextEdit, VS Code).
- Save as robots.txt in UTF-8 encoding (plain text only).
2️⃣ Write Rules #
Each “rule set” includes:
- User-agent → which crawler it applies to (e.g., Googlebot, * for all bots).
- Disallow → which paths crawlers can’t visit.
- Allow → which paths crawlers can visit.
- Sitemap (optional but recommended) → link to your sitemap.
3️⃣ Upload the File #
- Place robots.txt in your site’s root (e.g., /public_html/ in cPanel).
- Example: https://www.example.com/robots.txt
4️⃣ Test the File #
- Open https://yourdomain.com/robots.txt in a browser — it should load.
- Use Google’s robots.txt Tester in Search Console.
📌 Common robots.txt Rules #
Block Entire Site
User-agent: *
Disallow: /
Allow Only Public Folder
User-agent: *
Disallow: /
Allow: /public/
Block Specific File
User-agent: *
Disallow: /private-file.html
Block All Images in Google Images
User-agent: Googlebot-Image
Disallow: /
Block Specific File Type (e.g., .xls)
User-agent: Googlebot
Disallow: /*.xls$
⚠️ Important Notes #
- Don’t block essential CSS/JS — it can hurt SEO.
- Don’t use robots.txt for private data — use noindex or password protection instead.
- Updates take effect as soon as Google crawls your robots.txt again.
💡 FSIDM Tip: A well-optimized robots.txt protects server resources and guides Google to focus on valuable pages. Pair it with a sitemap for best crawling efficiency.