Scaleway - Site Reliability Engineer 🛠️
About Scaleway
Scaleway, a subsidiary of Iliad, is a leading European cloud provider. Founded in 1999, our mission is to foster a more responsible digital industry by helping developers and businesses create, deploy, and adapt applications to any infrastructure. 🌍
With offices in Paris and Lille, we continuously refine our cloud ecosystem, being the first users of our products. 🏢
Our 25,000+ customers choose us for our multi-AZ redundancy, seamless user experience, carbon-neutral datacenters, and native multi-cloud management tools. Our products include fully managed solutions for bare metal, containerization, and serverless architectures, offering a responsible choice in cloud computing. 🌐
Join our dynamic team of nearly 600 diverse professionals in a stimulating and international environment that combines technical excellence, creativity, and collaboration. 🚀
About the job
We're seeking a Site Reliability Engineer to join our team! 👨💻
Reporting to a Lead SRE, you'll ensure the reliable delivery of our products to users worldwide. We expect a strong background in development and system administration. As our systems constantly evolve, so must the tools we use to observe and maintain their resilience. 💪
Minimum qualifications
- Previous experience as a developer in Go, Python, or Rust
- Experience in system programming with common scripting languages (bash, Python)
- Demonstrated ability to troubleshoot production systems failures
- A great attitude and desire to work with a team
- Passion for incremental improvements on tooling, automation enthusiast
- Experience with Linux systems (Ubuntu/Debian)
- Experience with cloud environments architecture (baremetal, virtual machines, containers, orchestrators)
- Good understanding of computer networks: TCP/IP, DNS, load-balancing, IPv6, BGP, and network virtualization
- Understanding of written and spoken English, capable of writing technical documentation in English, ability to speak English if needed
Preferred qualifications
- Experience with infrastructure as code and continuous deployment
- Experience dealing with physical hardware automation
- Experience with monitoring & logging systems
- Experience administering relational databases
- Knowledge of one cloud platform and related use-cases
- Initiative to propose and defend new solutions
- Team player, willing to share knowledge, opinions, and participate in team rituals
- Good communication and coaching skills
Responsibilities
- Create or optimize tools & documentation to identify, diagnose, and remediate production incidents, automating as much as possible
- Troubleshoot high-impact issues in collaboration with multiple engineering teams
- Take on-call responsibilities, mitigate production issues, and provide real-time solutions to customers
- Ensure high-quality service for customers using observability and monitoring technologies
- Manage the lifecycle of products in production
- Implement best practices in stability, resiliency, scalability, security, and performance across our systems
Technical Stack
- Python, Go, Rust
- RabbitMQ
- PostgreSQL
- HA Proxy, Nginx, REST APIs / Flask
- S3 API
- Sentry, Prometheus, Grafana, ElasticSearch, Fluentd, Kibana
- Ansible, AWX, Foreman, Salt
- GitLab, Nexus
- Ubuntu, Debian, CentOS
- Jira, Confluence, Slack, GSuite
Location
This position is based in our offices in Paris or Lille, France. 🇫🇷
Don't meet all the requirements? Apply anyway!
We encourage you to apply even if you don't meet all the requirements. Don't limit yourself to a job description - you never know what might happen! 🌟
Scaleway | Scaleway Blog | Scaleway on Twitter