ScalewayPublié il y a 21 jours
Logo Scaleway

Site Reliability Engineer - SRE

Scaleway - Site Reliability Engineer 🛠️


About Scaleway

Scaleway, a subsidiary of Iliad, is a leading European cloud provider. Founded in 1999, our mission is to foster a more responsible digital industry by helping developers and businesses create, deploy, and adapt applications to any infrastructure. 🌍

With offices in Paris and Lille, we continuously refine our cloud ecosystem, being the first users of our products. 🏢

Our 25,000+ customers choose us for our multi-AZ redundancy, seamless user experience, carbon-neutral datacenters, and native multi-cloud management tools. Our products include fully managed solutions for bare metal, containerization, and serverless architectures, offering a responsible choice in cloud computing. 🌐

Join our dynamic team of nearly 600 diverse professionals in a stimulating and international environment that combines technical excellence, creativity, and collaboration. 🚀


About the job

We're seeking a Site Reliability Engineer to join our team! 👨‍💻

Reporting to a Lead SRE, you'll ensure the reliable delivery of our products to users worldwide. We expect a strong background in development and system administration. As our systems constantly evolve, so must the tools we use to observe and maintain their resilience. 💪


Minimum qualifications

  • Previous experience as a developer in Go, Python, or Rust
  • Experience in system programming with common scripting languages (bash, Python)
  • Demonstrated ability to troubleshoot production systems failures
  • A great attitude and desire to work with a team
  • Passion for incremental improvements on tooling, automation enthusiast
  • Experience with Linux systems (Ubuntu/Debian)
  • Experience with cloud environments architecture (baremetal, virtual machines, containers, orchestrators)
  • Good understanding of computer networks: TCP/IP, DNS, load-balancing, IPv6, BGP, and network virtualization
  • Understanding of written and spoken English, capable of writing technical documentation in English, ability to speak English if needed

Preferred qualifications

  • Experience with infrastructure as code and continuous deployment
  • Experience dealing with physical hardware automation
  • Experience with monitoring & logging systems
  • Experience administering relational databases
  • Knowledge of one cloud platform and related use-cases
  • Initiative to propose and defend new solutions
  • Team player, willing to share knowledge, opinions, and participate in team rituals
  • Good communication and coaching skills

Responsibilities

  • Create or optimize tools & documentation to identify, diagnose, and remediate production incidents, automating as much as possible
  • Troubleshoot high-impact issues in collaboration with multiple engineering teams
  • Take on-call responsibilities, mitigate production issues, and provide real-time solutions to customers
  • Ensure high-quality service for customers using observability and monitoring technologies
  • Manage the lifecycle of products in production
  • Implement best practices in stability, resiliency, scalability, security, and performance across our systems

Technical Stack

  • Python, Go, Rust
  • RabbitMQ
  • PostgreSQL
  • HA Proxy, Nginx, REST APIs / Flask
  • S3 API
  • Sentry, Prometheus, Grafana, ElasticSearch, Fluentd, Kibana
  • Ansible, AWX, Foreman, Salt
  • GitLab, Nexus
  • Ubuntu, Debian, CentOS
  • Jira, Confluence, Slack, GSuite

Location

This position is based in our offices in Paris or Lille, France. 🇫🇷


Don't meet all the requirements? Apply anyway!

We encourage you to apply even if you don't meet all the requirements. Don't limit yourself to a job description - you never know what might happen! 🌟


Scaleway | Scaleway Blog | Scaleway on Twitter

Skills

Back-end
Python
Go
Rust
Flask
Tooling
Bash
Gitlab
RabbitMQ
Sentry
Gestion de projet
Confluence
Jira
Management
Slack
Cloud
Cloud Computing
Prometheus
Serverless
Data
Elasticsearch
Grafana
PostgreSQL
Ops
Ansible
Nginx