Scaleway - Site Reliability Engineer đ ïž
About Scaleway
Scaleway, a subsidiary of Iliad, is a leading European cloud provider. Founded in 1999, our mission is to foster a more responsible digital industry by helping developers and businesses create, deploy, and adapt applications to any infrastructure. đ
With offices in Paris and Lille, we continuously refine our cloud ecosystem, being the first users of our products. đą
Our 25,000+ customers choose us for our multi-AZ redundancy, seamless user experience, carbon-neutral datacenters, and native multi-cloud management tools. Our products include fully managed solutions for bare metal, containerization, and serverless architectures, offering a responsible choice in cloud computing. đ
Join our dynamic team of nearly 600 diverse professionals in a stimulating and international environment that combines technical excellence, creativity, and collaboration. đ
About the job
We're seeking a Site Reliability Engineer to join our team! đšâđ»
Reporting to a Lead SRE, you'll ensure the reliable delivery of our products to users worldwide. We expect a strong background in development and system administration. As our systems constantly evolve, so must the tools we use to observe and maintain their resilience. đȘ
Minimum qualifications
- Previous experience as a developer in Go, Python, or Rust
- Experience in system programming with common scripting languages (bash, Python)
- Demonstrated ability to troubleshoot production systems failures
- A great attitude and desire to work with a team
- Passion for incremental improvements on tooling, automation enthusiast
- Experience with Linux systems (Ubuntu/Debian)
- Experience with cloud environments architecture (baremetal, virtual machines, containers, orchestrators)
- Good understanding of computer networks: TCP/IP, DNS, load-balancing, IPv6, BGP, and network virtualization
- Understanding of written and spoken English, capable of writing technical documentation in English, ability to speak English if needed
Preferred qualifications
- Experience with infrastructure as code and continuous deployment
- Experience dealing with physical hardware automation
- Experience with monitoring & logging systems
- Experience administering relational databases
- Knowledge of one cloud platform and related use-cases
- Initiative to propose and defend new solutions
- Team player, willing to share knowledge, opinions, and participate in team rituals
- Good communication and coaching skills
Responsibilities
- Create or optimize tools & documentation to identify, diagnose, and remediate production incidents, automating as much as possible
- Troubleshoot high-impact issues in collaboration with multiple engineering teams
- Take on-call responsibilities, mitigate production issues, and provide real-time solutions to customers
- Ensure high-quality service for customers using observability and monitoring technologies
- Manage the lifecycle of products in production
- Implement best practices in stability, resiliency, scalability, security, and performance across our systems
Technical Stack
- Python, Go, Rust
- RabbitMQ
- PostgreSQL
- HA Proxy, Nginx, REST APIs / Flask
- S3 API
- Sentry, Prometheus, Grafana, ElasticSearch, Fluentd, Kibana
- Ansible, AWX, Foreman, Salt
- GitLab, Nexus
- Ubuntu, Debian, CentOS
- Jira, Confluence, Slack, GSuite
Location
This position is based in our offices in Paris or Lille, France. đ«đ·
Don't meet all the requirements? Apply anyway!
We encourage you to apply even if you don't meet all the requirements. Don't limit yourself to a job description - you never know what might happen! đ
Scaleway | Scaleway Blog | Scaleway on Twitter