Opensource SRE and automation H/F
Imagine a platform hosting 1.2 millions of databases (and growing).
You have no clue of what's going on inside them, but you're responsible of keeping them up, 7 days a week, 24 hours a day. You have to share resources to get very low prices, but you have to guaranty someone's behaviour won't impact their neighbour. You have to be sure customers orders will be fulfill in seconds, whatever their quantity. What about upgrading all the underlying software of such a huge play area?
Due to the law of large numbers, you'll find there the most improbable things you'll ever see.Our team
The team consists of five people... for now. We're located in 3 locations: Roubaix, Lyon and Montréal. We're constantly using visioconf, so we're as close as if we were sat next to each others.
We believe that the service availability, the operational management, the performance, the features, the time-to-market, the innovation... all of those varied and diverse things have to be handled by the same persons. This is why we merged all of those subjects in one team (#SRE), who make the conception of the products and the platforms, and maintain them in operational conditions.
The platform is growing, and we're looking for Site Reliability Engineers to support this growth.
Your mission, if you accept it, is to:
- Automate and industrialize the management of the databases platforms. Keep them optimal, consistent and sustainable. Be proactive on the errors. Innovate and propose evolutions on the infrastructure to always match the needs.
- Set alerts on the platforms health, and handle them reactively and smartly, during the day on during your on-call period. Identify and fix the root causes so that an error won't happen again.
- Get the feedback of the customers, and communicate with them, using the forum, the mailing lists or via the support. If you're comfortable with that or if you want to give a try, you can also participate to meetups or write articles on the OVH blog.
- Autonomous and working hand-in-hand with the team, and the other ones
- Focus on complex technical tasks and handle how plentiful context switch and interruptions from the others
- Humble and communicating on the team's work
- Work using short iterations and have a long term vision
- The perfect platform is a 100% available one, with a very few human interventions, that always matches the changing needs. We need someone that know how to aim for that, using their advanced production experience and industrialization skills.
- Knowledge and experience in Linux, automation, containerization...
- We use Python, Perl... Some parts of the source code are recent, some parts are... legacy. it's the same for some parts of the other team's services we're using. We're looking for someone who is not afraid about this, who is adaptable, who will make a compromise between using legacy unaltered, optimize it, or rebuild it from scratch, depending on the benefit / costs balance.
Sounds good? Contact us right now, the opportunity is available right now.
Tech - R&D
Notre département Tech-R&D conçoit et développe les services qui façonneront l’avenir d’OVH. Il fournit également les systèmes IT… En savoir +