in

How to Manage and Update Robots.txt in Sitecore for Test and Subdomain Environments

When maintaining multiple Sitecore environments (like test.yourdomain.com, staging.yourdomain.com, and www.yourdomain.com), it’s important to control how search engines interact with each. The robots.txt file defines which parts of your site can be crawled, and you usually don’t want test or staging sites to appear in search results.

This guide explains how robots.txt works in Sitecore, how to update it correctly, and whether those changes automatically apply to subdomain instances.

Understanding How Robots.txt Works in Sitecore

In most Sitecore environments, the robots.txt file is served as a static file from the website root in the Website or WebRoot folder on the content delivery server. This means:

  • The robots.txt file is environment-specific.
  • Updating it in one instance (for example, production) will not automatically change it in other subdomains such as test or staging.
  • Each Sitecore environment (CD instance) has its own web root and, therefore, its own copy of robots.txt.

In cloud-hosted setups (like Azure App Service or Sitecore Managed Cloud), robots.txt may be managed as part of the deployment package or DevOps pipeline, not through the Sitecore Content Editor.

Step 1: Locate the Robots.txt File

You can usually find it here:

/Website/robots.txt

or in modern setups:

/site/wwwroot/robots.txt

If you can’t find it, check your deployment repository or configuration scripts — some DevOps teams dynamically generate it during deployment.

Step 2: Update the Robots.txt Content

Edit the file directly using your preferred method:

For production, you’ll typically want:

User-agent: *
Disallow:
Sitemap: https://www.yourdomain.com/sitemap.xml

For test or staging, you should block crawlers completely:

User-agent: *
Disallow: /

This ensures that search engines don’t index your test environments.

Step 3: Verify the Environment Deployment

If you update robots.txt in one Sitecore environment (for example, production), those changes will not automatically apply to test or subdomain instances.

Each environment must be updated separately because:

  • The file lives in the web root, which is specific to that environment.
  • Deployments (CI/CD pipelines) usually push separate packages or configurations per environment.
  • Sitecore does not replicate static files between environments automatically.

In most setups, you’ll handle this in your deployment pipeline (for example, Azure DevOps or Octopus Deploy) by including the environment’s robots.txt in the build artifacts.

Example CI/CD Configuration Logic

In your deployment process, you can include logic like:

if environment == "production":
  copy robots-prod.txt to robots.txt
else if environment == "staging":
  copy robots-staging.txt to robots.txt
else if environment == "test":
  copy robots-test.txt to robots.txt

This ensures each environment serves its correct version automatically.

Step 4: Test Your Changes

After deployment, verify your robots.txt file:

  1. Open your browser and navigate to https://test.yourdomain.com/robots.txt
  2. Confirm the correct content is served for that environment.
  3. Use Google Search Console’s robots.txt Tester or an online checker to validate syntax.

Step 5: (Optional) Manage Robots.txt Through Sitecore Content

Some Sitecore setups store robots.txt content in an item within the Content Tree, often under /sitecore/content/Settings/robots.txt.

If your project uses a custom route handler to serve robots.txt dynamically (for example, /robots.txt is handled by MVC rather than a physical file), you can easily adjust it per environment by reading from a Sitecore field or setting.

Example controller logic (C#):

public ActionResult RobotsTxt()
{
    var robotsContent = Sitecore.Context.Database.GetItem("/sitecore/content/Settings/robotsTxt")["Content"];
    return Content(robotsContent, "text/plain");
}

Then you can manage different robots.txt content for each site or environment directly in the CMS without touching the file system.

Common Questions

Will updating robots.txt in Sitecore affect all subdomains?
No. Each subdomain has its own root path and configuration. You must deploy or configure it separately.

Can I use one shared robots.txt for all environments?
Not recommended. Search engines treat each subdomain as a separate site. Always serve a restrictive file for non-production environments.

Does Sitecore XM Cloud or JSS handle robots.txt differently?
Yes. In headless or XM Cloud setups, you can host robots.txt at the root of your rendering host (for example, Vercel or Azure Static Web App). The behavior depends on your deployment target, not Sitecore itself.

Best Practices

  • Always block crawlers in staging, QA, or test subdomains using Disallow: /.
  • Use CI/CD pipelines to manage environment-specific versions.
  • Include robots.txt validation in your QA checklist before go-live.
  • Maintain version control of each environment’s robots.txt in your Git repository.
  • Use canonical tags in Sitecore to ensure search engines favor production pages.

Conclusion

Updating robots.txt in Sitecore is simple, but it’s crucial to remember that each environment and subdomain serves its own version. To ensure accurate and safe indexing behavior:

  • Manage environment-specific robots.txt files
  • Automate deployment using your CI/CD pipeline
  • Confirm test environments are fully blocked from search indexing

This approach keeps your staging and QA environments invisible to crawlers while maintaining full SEO visibility for your live site.

What do you think?

Leave a Reply

Your email address will not be published. Required fields are marked *

How to Search for Text in Sitecore Using PowerShell

How to Create a Last Modified Report per Page in Sitecore