Custom Load Balancing Endpoints in an Azure Web/Worker/VM Role
27 Apr 2013Windows Azure Web, Worker and Virtual Machine roles provide an easy built-in way to customise health monitoring for a load balanced endpoint, allowing you to disable a single endpoint for a role without causing the entire role to recycle. This can be achieved through use of the LoadBalancerProbes schema element, which is available in Azure SDK 1.7+.
Background
The Windows Azure Load Balancer running on the Azure Fabric Service acts as the default controller for determining how to route incoming network traffic to endpoints on your role instances. A default load balancer probe is provided that covers all endpoints for each role instance - this probe is high level and simply returns HTTP 200 OK if the role is in the Ready state (not Busy, Recycling, Stopping etc). If the response is not 200 OK, the load balancer stops all traffic being routed to that instance.
Once the role instance starts returning HTTP 200 again, the load balancer resumes traffic flow. When running a standard web role, your code is usually contained in the w3wp.exe
process which isn't actually monitored by the load balancer (so failures like your web application returning Internal Server Error 500 won't stop the role becoming unavailable).
Overriding the default probe
If you override the default probe for an endpoint, you can provide more complex, lower level logic for each individual endpoint in your service. Your probe is checked regularly (every 15 seconds by default) - if your probe responds with a HTTP 200 or TCP ACK within the timeout period (31 seconds by default) then the associated endpoint will have traffic routed to it as normal. If it starts returning any other HTTP codes or TCP messages, it will be removed from load balancing.
Usages
You can use this in multiple ways, for example:
- Ensuring only one instance of your role provides a selected endpoint at a time.
- Disabling an instance if one of your websites starts returning an unusually large number of HTTP errors for a specified URI.
- Removing a single endpoint from load balancer rotation if it becomes overloaded - for example, temporarily disabling new requests to port 80 on a web role if that instance becomes overloaded by a small number of unusually heavy requests (this would normally cause problems given the default load balancing is round robin).
- Disabling an endpoint when a custom service becomes unavailable, for example stopping requests to a virtual machine role database if the database is encountering issues (while still allowing requests to all other services).
Gotchas
- Overriding the built-in load balancing probe can mean that your replacement probe still returns 200 OK after a role has it's OnStop method() called. You should ensure your probe does the same as the built-in probe and begins returning a non-200 HTTP status code as soon as OnStop() is called.
Example .csdef schema
<ServiceDefinition>
<LoadBalancerProbes>
<LoadBalancerProbe name="TestProbe" protocol="{http|tcp}" path="{uri-for-checking-health-status-of-vm}" port="{port-number}" intervalInSeconds="{interval-in-seconds}" timeoutInSeconds="{timeout-in-seconds}" />
</LoadBalancerProbes>
<WorkerRole>
...
<Endpoints>
<InputEndpoint name="HttpIn" protocol="http" port="80" localPort="80" loadBalancerProbe="TestProbe" />
</Endpoints>
...
</WorkerRole>
</ServiceDefinition>
For a real world example of when a LoadBalancerProbe might be useful, see this post.
LoadBalancerProbe element attributes
name
- A unique identifier for this probe. Can be referenced by multiple endpoints.protocol
- HTTP or TCP. A 200 OK for HTTP or a TCP ACK for TCP means the endpoint should be kept available. All other responses indicate to the Fabric Controller that it should take this endpoint out of load balancing rotation.path
- Required for HTTP protocols to specify the URI used for health checking.port
- The port number to be used for checking availability. Defaults to the same port number as the endpoint.intervalInSeconds
- How frequently (in seconds) to make availability checking requests.timeoutInSeconds
- A number of seconds after which, if no success response is received by the availability checks, the endpoint will be removed from the load balancing rotation. A good recommended value is twice that ofintervalInSeconds
, allowing two full failed requests before disabling traffic to the associated endpoint.