Service Governance

Overview

Service governance addresses the challenges of managing distributed services: who provides a service, who consumes it, where to register, how to handle failures, how to ensure quality, and how to degrade gracefully under pressure.

Key areas of microservice governance:

Service Registration & Discovery: After decomposing a monolith into microservices, callers need to locate target service addresses dynamically.
Observability: Call topology, metrics, logging, and distributed tracing.
Traffic Management: Canary releases, blue-green deployments, A/B testing.
Fault Tolerance: Circuit breaking, isolation, rate limiting, fallback, and timeout mechanisms.
Security: Authentication and authorization between services.
Control: Real-time distribution of governance policies.
Node Health: Detect and remove unhealthy instances from the cluster.

Service Registration & Discovery

Silky supports automatic service registration and discovery with Zookeeper, Nacos, and Consul as the registry center. Instance online/offline status is detected in real time.

When a service instance starts, it registers or updates its service metadata (and endpoint address) in the registry.
With Zookeeper or Nacos, Silky uses a pub-sub model to receive the latest metadata and endpoint changes immediately.
With Consul, Silky polls the registry on a heartbeat interval.
When IO or communication errors occur during RPC, the failing instance is marked unhealthy and removed after Governance:UnHealthAddressTimesAllowedBeforeRemoving failures (0 = immediate removal).
Long connections support heartbeat (Rpc:EnableHeartbeat = true). Heartbeat failures trigger the same unhealthy removal logic.

Load Balancing

Silky supports four load balancing strategies:

Strategy	Description
`Polling` (default)	Round-robin across all healthy instances
`Random`	Randomly select a healthy instance
`HashAlgorithm`	Consistent hash — same parameter always routes to the same instance
`Appoint`	Direct routing to a specific address (framework internal use only)

Global Configuration

governance:
  shuntStrategy: Random

Per-Method Override

[HttpGet("{name}")]
[Governance(ShuntStrategy = ShuntStrategy.HashAlgorithm)]
Task<TestOut> Get([HashKey] string name);

When using HashAlgorithm, mark the parameter used for hashing with [HashKey]:

Task<OrderOutput> GetOrderAsync([HashKey] long orderId);

Direct Invocation via IAppointAddressInvoker

public class MyService : IMyService, IScopedDependency
{
    private readonly IAppointAddressInvoker _appointAddressInvoker;

    public MyService(IAppointAddressInvoker appointAddressInvoker)
    {
        _appointAddressInvoker = appointAddressInvoker;
    }

    public async Task<MyOutput> CallSpecificInstance(MyInput input)
    {
        return await _appointAddressInvoker.Invoke<MyOutput>(
            "192.168.1.100:2200",
            "Your.Service.Entry.Id_Get",
            new object[] { input });
    }
}

Timeout Control

Configure RPC timeout globally or per method:

governance:
  timeoutMillSeconds: 5000   # 5 seconds (0 = unlimited)

Per-method override:

[Governance(TimeoutMillSeconds = 3000)]
Task<OrderOutput> CreateAsync(CreateOrderInput input);

Failover (Retry)

On IO or communication errors, Silky automatically selects another healthy instance and retries:

governance:
  retryTimes: 3              # number of retries
  retryIntervalMillSeconds: 50  # delay between retries (ms)

Warning

Retries only trigger on infrastructure exceptions (IO errors, connection failures). Business logic exceptions (UserFriendlyException, validation errors) do NOT trigger a retry.

Circuit Breaking

Silky uses Polly to implement circuit breaking:

governance:
  enableCircuitBreaker: true
  exceptionsAllowedBeforeBreaking: 3   # consecutive exceptions to open the circuit
  breakerSeconds: 60                   # seconds the circuit stays open

When the circuit is open, all calls to that service entry immediately fail fast without attempting a network request.

RPC Concurrent Rate Limiting

Limit the number of concurrent RPC requests handled by a single instance:

governance:
  maxConcurrentHandlingCount: 50  # excess requests are routed to other instances

HTTP Rate Limiting

Integrate AspNetCoreRateLimit for external HTTP rate limiting at the gateway:

ipRateLimiting:
  enableEndpointRateLimiting: true
  generalRules:
    - endpoint: "*"
      period: 1s
      limit: 100

Service Fallback

Define a fallback class to handle RPC failures gracefully:

[ServiceRoute]
[Fallback(typeof(InventoryAppServiceFallback))]
public interface IInventoryAppService
{
    Task<int> GetStockAsync(long productId);
}

public class InventoryAppServiceFallback : IInventoryAppService
{
    public Task<int> GetStockAsync(long productId)
    {
        // Return a safe default when the remote call fails
        return Task.FromResult(-1);
    }
}

Global Governance Configuration Reference

governance:
  shuntStrategy: Polling           # load balancing strategy
  timeoutMillSeconds: 5000         # RPC timeout (ms); 0 = unlimited
  enableCachingInterceptor: true   # enable cache interception
  retryTimes: 3                    # failover retry count
  retryIntervalMillSeconds: 50     # delay between retries (ms)
  enableCircuitBreaker: true       # enable circuit breaking
  exceptionsAllowedBeforeBreaking: 3
  breakerSeconds: 60               # circuit open duration (s)
  addressFuseSleepDurationSeconds: 60
  unHealthAddressTimesAllowedBeforeRemoving: 3
  maxConcurrentHandlingCount: 50   # per-instance max concurrent requests