When maintaining multiple Sitecore environments (like test.yourdomain.com, staging.yourdomain.com, and www.yourdomain.com), it’s important to control how search engines interact with each. The robots.txt file defines which parts of your site can be crawled, and you usually don’t want test or staging sites to appear in search results.
This guide explains how robots.txt works in Sitecore, how to update it correctly, and whether those changes automatically apply to subdomain instances.
Understanding How Robots.txt Works in Sitecore
In most Sitecore environments, the robots.txt file is served as a static file from the website root in the Website or WebRoot folder on the content delivery server. This means:
- The
robots.txtfile is environment-specific. - Updating it in one instance (for example, production) will not automatically change it in other subdomains such as
testorstaging. - Each Sitecore environment (CD instance) has its own web root and, therefore, its own copy of
robots.txt.
In cloud-hosted setups (like Azure App Service or Sitecore Managed Cloud), robots.txt may be managed as part of the deployment package or DevOps pipeline, not through the Sitecore Content Editor.
Step 1: Locate the Robots.txt File
You can usually find it here:
/Website/robots.txt
or in modern setups:
/site/wwwroot/robots.txt
If you can’t find it, check your deployment repository or configuration scripts — some DevOps teams dynamically generate it during deployment.
Step 2: Update the Robots.txt Content
Edit the file directly using your preferred method:
For production, you’ll typically want:
User-agent: * Disallow: Sitemap: https://www.yourdomain.com/sitemap.xml
For test or staging, you should block crawlers completely:
User-agent: * Disallow: /
This ensures that search engines don’t index your test environments.
Step 3: Verify the Environment Deployment
If you update robots.txt in one Sitecore environment (for example, production), those changes will not automatically apply to test or subdomain instances.
Each environment must be updated separately because:
- The file lives in the web root, which is specific to that environment.
- Deployments (CI/CD pipelines) usually push separate packages or configurations per environment.
- Sitecore does not replicate static files between environments automatically.
In most setups, you’ll handle this in your deployment pipeline (for example, Azure DevOps or Octopus Deploy) by including the environment’s robots.txt in the build artifacts.
Example CI/CD Configuration Logic
In your deployment process, you can include logic like:
if environment == "production": copy robots-prod.txt to robots.txt else if environment == "staging": copy robots-staging.txt to robots.txt else if environment == "test": copy robots-test.txt to robots.txt
This ensures each environment serves its correct version automatically.
Step 4: Test Your Changes
After deployment, verify your robots.txt file:
- Open your browser and navigate to
https://test.yourdomain.com/robots.txt - Confirm the correct content is served for that environment.
- Use Google Search Console’s robots.txt Tester or an online checker to validate syntax.
Step 5: (Optional) Manage Robots.txt Through Sitecore Content
Some Sitecore setups store robots.txt content in an item within the Content Tree, often under /sitecore/content/Settings/robots.txt.
If your project uses a custom route handler to serve robots.txt dynamically (for example, /robots.txt is handled by MVC rather than a physical file), you can easily adjust it per environment by reading from a Sitecore field or setting.
Example controller logic (C#):
public ActionResult RobotsTxt()
{
var robotsContent = Sitecore.Context.Database.GetItem("/sitecore/content/Settings/robotsTxt")["Content"];
return Content(robotsContent, "text/plain");
}
Then you can manage different robots.txt content for each site or environment directly in the CMS without touching the file system.
Common Questions
Will updating robots.txt in Sitecore affect all subdomains?
No. Each subdomain has its own root path and configuration. You must deploy or configure it separately.
Can I use one shared robots.txt for all environments?
Not recommended. Search engines treat each subdomain as a separate site. Always serve a restrictive file for non-production environments.
Does Sitecore XM Cloud or JSS handle robots.txt differently?
Yes. In headless or XM Cloud setups, you can host robots.txt at the root of your rendering host (for example, Vercel or Azure Static Web App). The behavior depends on your deployment target, not Sitecore itself.
Best Practices
- Always block crawlers in staging, QA, or test subdomains using
Disallow: /. - Use CI/CD pipelines to manage environment-specific versions.
- Include robots.txt validation in your QA checklist before go-live.
- Maintain version control of each environment’s robots.txt in your Git repository.
- Use canonical tags in Sitecore to ensure search engines favor production pages.
Conclusion
Updating robots.txt in Sitecore is simple, but it’s crucial to remember that each environment and subdomain serves its own version. To ensure accurate and safe indexing behavior:
- Manage environment-specific robots.txt files
- Automate deployment using your CI/CD pipeline
- Confirm test environments are fully blocked from search indexing
This approach keeps your staging and QA environments invisible to crawlers while maintaining full SEO visibility for your live site.