Best practices for horizontal scaling of OpenRemote Manager and Keycloak on Kubernetes?

Hi OpenRemote community,

I’m building a smart agriculture IoT platform using OpenRemote deployed on a K3s Kubernetes cluster. My business model is hardware + SaaS, serving multiple tenants (individual farmers, cooperatives, and agricultural enterprises).

I’m trying to design the right architecture from the beginning to avoid painful migrations later, and I have some questions about horizontal scaling.

Current stack:

  • K3s Kubernetes cluster
  • OpenRemote Manager (single pod currently)
  • Keycloak (bundled)
  • PostgreSQL HA (CloudNativePG)
  • Redis HA
  • Longhorn distributed storage
  • MinIO

Questions:

  1. Multiple Manager pods — Is it possible to run multiple Manager pods serving the same tenant simultaneously? I understand Manager is stateful and keeps asset states in RAM. Is there any clustering or shared-state mechanism (e.g., via Redis or database) that would allow this?

  2. Sticky sessions — If true horizontal scaling is not supported, would sticky sessions via Nginx Ingress be a recommended workaround? What are the limitations?

  3. Multiple Manager instances (multi-tenant) — Is the recommended production approach to run one Manager instance per tenant? If so, are there official Helm charts or automation tools to provision new tenant instances automatically?

  4. Keycloak scaling — What is the recommended way to run multiple Keycloak pods in cluster mode with OpenRemote? Does OpenRemote have any specific configuration requirements for Keycloak clustering?

  5. Roadmap — Is horizontal scaling of a single Manager instance on the roadmap for future releases?

Any guidance, real-world experience, or architectural recommendations would be greatly appreciated. Thank you!

Hello,

The short answer for now is that no, OR does not scale well horizontally and there is no real provision at this stage to handle that.

(1) and (2) So I never tried it but I’m pretty sure running multiple manager instances, even with sticky sessions, would not work.
In fact in all the tests I’ve performed, I’ve had the best results with a k8s cluster with a single node on which all OR required pods are deployed and scaling that node vertically.

(3) You can use a single instance with multiple realms or multiple instances to support multi-tenant, it all depends on requirements in terms of sizing and isolation.
We have some helm charts and CloudFormation code available in the main repo and are also looking into other tools such as Terraform or Pulumi.

(4) I believe we haven’t tried that either. I don’t think we’ve ever seen Keycloak being a bottleneck in terms of performance.

(5) Yes, horizontal scaling (not only the manager but the whole stack) is on the roadmap. No promises or timelines but this is definitely something we’re looking into. I’m spending part of my time on doing load testing, understanding the system bottlenecks, see the optimisations we can bring and the more profound architectural changes we could make to scale.

Ask we’re always keen to understand the real needs of actual users, may I ask more information about your goals. Do you have ideas in terms of volumes ? Number of tenants, connected devices, events/s, data retention volumes…

Thanks,
Eric

How did you make the postgresql HA?

we have had some issues getting it to run high performance and high availability. As it seems like there is some code in the manager that make some. Changed or something?

Whats your setup?