I am hosting theobjectivedad.com from my homelab out of my residential Internet service. Although service disruptions of a self-hosted personal website won’t cost me my career I still want to make theobjectivedad.com as available as possible. This means adding an entire universe of scope that, for the most part, begins with implementing a health check and will likely end with multi-site hot-hot availability.
In this article, I will explore how I implemented health checks and monitoring on theobjectivedad.com. I cover the overall monitoring architecture, how I developed the readiness and liveliness endpoints, and how I configured internal and external monitoring.
Architecture
As shown in the diagram below, each of the 3 Nginx pod replicas hosts a copy of the website and provides two health-related endpoints. The first, https://www.theobjectivedad.com/sys/alive responds with an HTTP 200 when Nginx is running and can respond to requests. The second, https://www.theobjectivedad.com/sys/ready, responds with an HTTP 200 when the pod can serve the actual site.
flowchart LR subgraph Cloudflare direction TB TUNNEL(fab:fa-cloudflare Tunnel) end subgraph K8sLab direction TB subgraph Deployment direction LR NGINX1("<Pod>\nfas:fa-server Nginx") NGINX2("<Pod>\nfas:fa-server Nginx") NGINX3("<Pod>\nfas:fa-server Nginx") READY1("fas:fa-file /sys/ready") ALIVE1("fas:fa-file /sys/alive") READY2("fas:fa-file /sys/ready") ALIVE2("fas:fa-file /sys/alive") READY3("fas:fa-file /sys/ready") ALIVE3("fas:fa-file /sys/alive") NGINX1 -.- ALIVE1 NGINX1 -.- READY1 NGINX2 -.- ALIVE2 NGINX2 -.- READY2 NGINX3 -.- ALIVE3 NGINX3 -.- READY3 end CONNECTOR("<Deployment>\nCloudflare Tunnel") INGRESS("<Ingress>\nIngress Controller") SERVICE("<Service>\nService") TUNNEL -- "HTTP2" --> CONNECTOR CONNECTOR -- "HTTP2" --> INGRESS INGRESS -- "HTTP2" --> SERVICE SERVICE -- "HTTP" --> Deployment KUBLET("fas:fa-gear Kublet") KUBLET ---> NGINX1 KUBLET ---> NGINX2 KUBLET ---> NGINX3 end ROBOT("<SaaS>\nfas:fa-robot Uptime Robot") ROBOT -- "HTTPS" --> TUNNEL
Whenever a pod replica starts in the Nginx Kubernetes deployment, a sidecar process synchronizes the published content from a git repository to an emptyDir in the pod. It is important to differentiate between the pod being dead (needs a restart) vs not being ready to serve content (the sync process is running). If I only implemented a single health endpoint for both liveliness and readiness checks I would run the risk of infinite restarts if the synchronization process ran longer than the liveliness threshold.
Internally, the Kubernetes Kublet is configured to periodically check both /sys/alive
and /sys/ready
for each replica. After a given number of failures on /sys/alive
the Kublet will attempt to restart the pod.
Externally, I am using UptimeRobot, a freemium service that periodically hits my /sys/ready
endpoint to verify the site is still running.
Endpoint Setup
The entire health check project started by implementing the actual endpoints /sys/alive
and /sys/ready
endpoints.
The process that renders the actual site content uses Pelican which will build a static copy of the entire site based on my pre-configured theme and pre-written content. Pelican allows for the use of Jinja templates, which will come in handy when building out readiness endpoint.
Since I want the readiness check to fail if the git sync process fails, the readiness endpoint must be deployed during the git sync process. This has the side benefit of allowing me to include additional information in /sys/ready
to be rendered by Pelican:
{
"status": "OK",
"site": {
"url": "{{SITEURL}}",
"rendered": "{{last_updated}}",
"production": {%if PELICAN_PROD%}true{%else%}false{%endif +%}
}
}
After Pelican renders the page the final output will look something like this:
{
"status": "OK",
"site": {
"url": "https://www.theobjectivedad.com",
"rendered": "2023-03-01T13:11:44Z",
"production": true
}
}
To configure /sys/ready
to point to the rendered readiness page, I needed to make a few configuration changes to nginx.conf
:
location = /sys/ready {
# Serve rendered health information from file
try_files /ready.json =404;
add_header Content-Type "application/json";
# Advise clients to never cache
add_header Cache-Control "no-cache, no-store, must-revalidate";
add_header Pragma "no-cache";
expires 0;
# Disable logging
log_not_found off;
access_log off;
}
The previous configuration binds the /sys/ready
location to the ready.json
file rendered by Pelican and deployed by the synchronization process or returns an HTTP404 status if ready.json
is not found. Additionally, I am adding several response headers to signal to clients to not cache old copies of the response which would defeat the purpose of the health check endpoint. Finally, as a matter of personal preference, I disable logging to the health check endpoint.
Next, I needed to add another location to nginx.conf
to enable the /sys/alive
endpoint. While it is similar to the previous block the important difference is that it isn’t looking for an external page and instead uses the return
directive to respond with a simple JSON payload. This is sufficient to verify that Nginx has started and is capable of responding to requests.
location = /sys/alive {
# If the server is up, return a simple JSON response and HTTP 200
return 200 "{\"status\": \"OK\"}";
add_header Content-Type "application/json";
# Disable logging
log_not_found off;
access_log off;
# Advise clients to never cache
add_header Cache-Control "no-cache, no-store, must-revalidate";
add_header Pragma "no-cache";
expires 0;
}
The final step was to add a page rule to the Cloudflare configuration to ensure the health check page is never cached. This is important for both automated external monitoring and anyone else who wants to see a real (not cached) status page via the browser:

Internal Monitoring
Implementing Kubernetes health check probes could be an entire article in itself and requires some knowledge about how a pod behaves as well as how Kubernetes handles liveliness and readiness failures. To keep things simple let’s take a look at the important minimum / maximum times I’m anticipating for the pods hosting this site in the table below.
Condition | Probe | Formula |
---|---|---|
Pre startup, minimum time before pod is ready | startupProbe |
\(\text{initialDelaySeconds}\) |
Pre startup, time before pod is restarted due to not ready | startupProbe |
\(\text{initialDelaySeconds} + (\text{periodSeconds} \times \text{failureThreshold})\) |
Post startup, minimum time before the pod is restarted due to liveliness failure | livenessProbe |
\(\text{periodSeconds} \times \text{failureThreshold}\) |
Post startup, minimum time before traffic is redirected away from the pod | readinessProbe |
\(\text{periodSeconds} \times \text{failureThreshold}\) |
Again, there are more considerations here than I mentioned but this does represent the major points. To implement the internal probes I added the following configuration to the deployment manifest:
Startup probe:
startupProbe:
httpGet:
path: /sys/ready
port: 8080
periodSeconds: 5
failureThreshold: 100
successThreshold: 1
Liveliness probe:
livenessProbe:
httpGet:
path: /sys/alive
port: 8080
periodSeconds: 5
failureThreshold: 3
successThreshold: 1
Readiness probe:
readinessProbe:
httpGet:
path: /sys/ready
port: 8080
periodSeconds: 5
failureThreshold: 3
successThreshold: 1
External Monitoring
With the health check endpoint in place, I was ready to add external monitoring. This isn’t a space I have much expertise in however a quick Google search for “free website monitoring” led me to UptimeRobot which seemed good enough.
While the free tier has some restrictions such as a minimum check interval of 5 minutes is adequate for a self-hosted personal website. The setup was simple enough and even offers a publicly facing dashboard:

Although not everyone would care about it, having a decent mobile app was certainly nice. Best yet, it is available in the free tier:

Final Thoughts
This is a fairly good start and adds a useful capability to my toolbox. I can’t understate this enough because the next time I need to design or implement a health check endpoint or monitoring service I’ll have a working model to start with.
While going through this exercise I did identify a few additional enhancements that I would like to have but didn’t have time to implement including:
- Add
jq
to the Nginx container so the Kublet can run assertions on the JSON responses coming back vs looking at the status code. - Add a sidecar that will periodically update
ready.json
with additional information such as an instance identifier (which can be helpful for troubleshooting)
The next milestone in the monitoring and availability space for me will be to create a secondary “data center” that I can host this site from. Being a statically-generated site it shouldn’t be that difficult but will require some additional changes to my CI/CD process. I also don’t want to get hooked into any monthly costs which means I’ll need to set up a workflow start and stop infrastructure as well as explore routing options available in the Cloudflare free tier.