Site Reliability Engineer - Cloud Data Services H/F
OVHcloud offers a wide range of Cloud services to companies, organization or individuals.
Whether you're looking for Private Cloud, Public Cloud or even Hybrid Cloud solutions, our services are always being improved with the very latest technologies and innovations.
Our organization brings together people with a wide variety of backgrounds, experiences and perspectives.
We encourage them to collaborate, think big and take risks in a blame-free environment.
We promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentoring needed to learn and grow.
Site Reliability Engineer (SRE) in IO team.
By joining the IO team, you will work on scalable cloud data services to transport and manage data. To fulfill our mission and to be part of the team, you will work on load balancing, streaming and storage systems.
As Site Reliability Engineer, you're in charge of the operational conditions of the platform which means designing and implementing the infrastructure in regard to the security, availability, performance, SLA and capacity expectations
On a daily basis this is what the job looks like:
- Be part of a product team, that carefully build it with passion
- Ensure the observability of the platform and build actionable monitoring
- Troubleshoot complex issues and coordinate cross-team efforts to mitigate them
- Suggest and help implement best practices
- Ensure the continuity of the service with on-call responsibilities
- Work with the teams to continuously improve performance and quality
- Follow-up issues or incidents to prevent futures ones
- Be curious, benevolent and share with the team
Your Skills ?
What you absolutely need:
- Team player
- Desire to dive in and understand/fix complex problems with large environments
- Open minded
- Customer centric
- Being a quick learner
- Not afraid of changes
Some technologies we are familiar with:
- Unix internals and related
- CI/CD/CA tools, platforms and associated processes (we use CDS)
- Data pipelines/Messaging/Pub-Sub System (Kafka, Pulsar)
- Monitoring tools, platforms, and associated processes
- Observability stacks: Warp10, Prometheus
Your background ?
This job is for you if you already played with:
- Network skills
- At least one of Go, Rust or Java
- Appreciated scripting skills in Python or Perl
- Notions in Distributed systems operations is a plus
- HAProxy experience is a plus for the Load Balancing stack
- Appreciated knowledge in major distributed systems
- Experience with at least one distributed system like HDFS, HBase, FoundationDB, TiKV, or similar
- Moreover, if you have already been involved in a SRE team or worked as a DevOps, we need you !
Even if you don't, your motivation and aptitudes are your best arguments.
Notre équipe Public Cloud est experte des questions d'infrastructure et de scalabilité. Elle travaille sur un produit jeune et innovant,… En savoir +