gRPC & ASP.NET Core 3.1: Resiliency with Polly
What is Polly ?
Polly is a .NET resilience and transient-fault-handling library that allows developers to express policies such as Retry, Circuit Breaker, Timeout, Bulkhead Isolation, and Fallback in a fluent and thread-safe manner. From version 6.0.1, Polly targets .NET Standard 1.1 and 2+.
What’s a Retry Policy ?
Many faults are transient and may self-correct after a short delay. Then we would want to try again. A Policy Retry allows configuring automatic retries.
What’s a Circuit Breaker ?
When a system is seriously struggling, failing fast is better than making users/callers wait. Protecting a faulting system from overload can help it recover. A Circuit Breaker breaks the circuit (blocks executions) for a period, when faults exceed some pre-configured threshold.
Polly and gRPC
Firstly I won’t detail here how Polly works, for more information about Polly there is complete documentation that I recommend read prior the rest of this tutorial: https://github.com/App-vNext/Polly
Secondly this tutorial applies only on unary services
gRPC clients in .NET Core are definitely compatible compatible with Polly.
Using the ServiceCollection ‘s extension method services.AddGrpcClient<TClient>(() => {}) provides access to .AddPolicyHandler() (IHttpClientBuilder) extension method that allows the setup of an IAsyncPolicy exposed by Polly.
Here is how looks like a gRPC client in .NET Core client in an console with a Retry Policy. I won’t provide a sample of a Circuit breaker here, I have already described a sample for a Web API here: https://anthonygiretti.com/2019/03/26/best-practices-with-httpclient-and-retry-policies-with-polly-in-net-core-2-part-2/
Some explanations are required:
There are two types of statuses to check, why? because we have to handle HTTP status codes, as you might remember, gRPC always return HTTP OK, but only when gRPC handles the request, sometimes server errors occur independently of ASP.NET Core and we need to handle them. Then if gRPC treats the request, it will return a grpc-status thats needs to be handled as well, an error may occur in the gRPC pipeline. Here are statuses I chose to handle in my retry policy:
// Http errors var serverErrors = new HttpStatusCode[] { HttpStatusCode.BadGateway, HttpStatusCode.GatewayTimeout, HttpStatusCode.ServiceUnavailable, HttpStatusCode.InternalServerError, HttpStatusCode.TooManyRequests, HttpStatusCode.RequestTimeout };
// gRPC status var gRpcErrors = new StatusCode[] { StatusCode.DeadlineExceeded, StatusCode.Internal, StatusCode.NotFound, StatusCode.ResourceExhausted, StatusCode.Unavailable, StatusCode.Unknown };
Now let’s talk a look at the method named StatusManager.GetStatusCode();
This method allows to extract the grpc-status when it’s required:
- When gRPC handles the request and everything works fine, the grpc-status is written in the body, not in headers (and it has always the value OK in that case)
- When gRPC handles the request and an error occurs, the grpc-status is written in the headers and it has the value you gave it if you handled it, else it’s Cancelled by default
- When gRPC doesn’t handle the request there is no grpc-status anywhere and you have to rely on Http status codes.
The policy is now ready to work (don’t forget to apply it on the client).
Here is a demo when an internal error occurs server side: