Troubleshooting Sitecore in Docker / Blogs / Perficient


I’ve been recently asked to provide a Docker containers troubleshooting session with my colleagues, as containers technology is taking over Sitecore development and both Headless SDK scaffolding tool and XM Cloud use it for providing starterkits.

Once seeing an error message, sometimes not that much explanatory it is worth checking these basic things prior to getting deeper into troubleshooting:

  • Have you run Docker Desktop at all? A few times I had to restart the machine and that was the case of forgetting to run Docker Desktop app
  • Is CPU Virtualization enabled on your machine? If not, you might need access to BIOS/UEFI to change it
  • Are the all prerequisites installed? This happens sometimes when you cloned a source code and try running it straight away
  • Is Docker Desktop running in Windows mode? Sometimes the app defaults to Linux containers upon restart
  • Are the external ports occupied (443 and Solr port)?
  • What version of Docker Compose is your code running with? Ensure compliance or consider upgrading
  • Is running Powershell scripts restricted on this machine?
  • Is any corporate VPN messing up with container networking?
  • Are there enough resources available? I’d advise a machine with a minimum of 32 GB RAM and 1TB SSD

Assuming the above is correct, let’s go ahead.

One of the most often errors occurs after the “Waiting for CM to become available…” timeout, similar to the below:

Waiting for CM to become available

Sometimes the error message can prompt you on what’s going wrong, like in the above screenshot where one could guess that Sitecore CLI is missing and dotnet tool install Sitecore.CLI command fixes it for me. But in many other cases, you should do the following:

  • see the logs output for a faulty container, very likely there’ll be some traces of a failed
  • run the terminal in the context of that container, and troubleshoot from there, ie. curl localhost

If both of the above are not available to you, that might be because the container was created/started but is not yet running/healthy, as per its status. That means it depends on another container, as is configured in docker-compose file, for example, this container depends on CM to become “healthy”, which means responding to health probes:

depends_on:
  cm:
    condition: service_healthy

You should not comment out or remove the dependencies, however in comes cases temporary commenting may help you progress troubleshooting one step ahead. Use these references wisely to troubleshoot containers down the dependency chain and find the guilty one.

Traefik

Windows Container starter kits use Traefik as a reverse proxy for your containers and all the requests to your Sitecore cluster are passed through it and get distributed according to the rules. It also shadows the errors from any specific container from the caller, so if you need to reproduce an error, run PowerShell in its context:

docker exec -it <container-name> powershell.exe

And that will give you access to the filesystem, including logs as well as the ability to execute (and reproduce) any commands it should run. And as previously said, you may first use curl or Invoke-WebRequest to liveness and readiness endpoints to see what error it throws, and then act from that.

One of the errors you may see with Traefik is an incorrect/missing certificate:

Traefik certificate issue in docker containers

To identify the cause of it, click on the “Not secure” error label (in Chrome, other browsers may have it worded differently but the consequence still stays the same), then retrieve the certificate hostname to compare. There might be two potential reasons for it.

The first is pretty obvious and derives from a hostname requested did not match the certificate hostname. You may need to check Traefik certs and config folders to see which certificate is being used. For example, if a wildcard certificate is in place, it serves any hostname one level down, but not more than that. In my case, a wildcard cert for xmcloudcm.localhost will be valid for abc.xmcloudcm.localhost, xyz.xmcloudcm.localhost but not for abc.zyz.xmcloudcm.localhost and not for xmcloudcm.localhost hostname itself.

The second reason is when you see the certificate hostname as TRAEFIK DEFAULT CERT – that means Traefik failed to lick up a certificate file and is serving its default built-in certificate:

If that happens, you’ve likely misspelled the path or there are some issues with mapped volumes. Certificates are configured at certs_config.yaml file, and the paths are local to the Traefik container, not the host machine:

tls:
  certificates:
    - certFile: C:\etc\traefik\certs\xmcloudcm.localhost.pem
      keyFile: C:\etc\traefik\certs\xmcloudcm.localhost-key.pem

Once you fix the paths – it will work as expected.

Troubleshooting Walkthrough

I  have recorded a video where I walk and talk through the approaches to troubleshooting your local Docker containers setup

Hope you find this helpful!

 

 

 

 





Source link

Social media & sharing icons powered by UltimatelySocial
error

Enjoy Our Website? Please share :) Thank you!