Scaleway 🌐
SRE - Virtualization 💻
About Scaleway
- Founded in 1999, Scaleway is the cloud subsidiary of Iliad Group, a leading European telecommunications company.
- Mission: Foster a more responsible digital industry by helping developers and businesses create, deploy, and adapt applications to any infrastructure.
- 25,000+ customers choose Scaleway for its multi-AZ redundancy, seamless user experience, carbon-neutral datacenters, and native multi-cloud management tools.
- Products include fully managed solutions for bare metal, containerization, and serverless architectures, offering a responsible choice in cloud computing.
- Join a dynamic team of nearly 600 diverse professionals in a stimulating international environment that combines technical excellence, creativity, and collaboration.
About the Job
- Ensure reliable delivery of virtual machines and bare metal servers to users worldwide.
- Collaborate with Engineering Manager Emerick Mounoury.
- Strong background in Python development and system administration, with DevOps experience and SRE practices.
- Systems and monitoring tools are constantly evolving, requiring adaptability.
Minimum Qualifications
- Experience in system programming using Python, Bash, Go, or similar languages.
- Demonstrated ability to troubleshoot production system failures.
- Positive mindset and desire to work collaboratively.
- Passion for automation and incremental tooling improvements.
- Experience with Linux systems (Ubuntu server) and virtualization (QEMU/KVM).
- Good understanding of computer networks (TCP/IP, DNS, load balancing, IPv6, firewall, BGP, network virtualization).
- Good command of English.
Preferred Qualifications
- Ability to meticulously identify and solve bugs in any codebase.
- Experience with infrastructure-as-code and continuous deployment.
- Experience with physical hardware automation.
- Experience with monitoring & logging systems.
- Experience managing relational databases.
- Knowledge of at least one cloud platform and related use cases.
- Experience as an OSS contributor and/or maintainer.
- Knowledge in HPC (High Performance Computing).
Responsibilities
- Create/optimize tools & documentation to identify, diagnose, and solve production incidents, automating as much as possible.
- Troubleshoot high-impact issues by collaborating with multiple Engineering teams (Storage, Network, Hardware).
- Take on-call responsibilities, mitigate production issues, and respond to customers in real time.
- Ensure high-quality service for customers using observability and monitoring technologies.
- Manage the life cycle of hypervisors in production and participate in the fleet-wide migration plan.
- Empower teammates to swiftly integrate and deploy software components across the virtualization system.
- Implement best stability, resiliency, scalability, security, and performance practices across the virtualization system.
Our Technical Stack
- Python/Bash
- RabbitMQ + Celery
- PostgreSQL + SQLAlchemy
- HA Proxy, Nginx, REST APIs / Flask
- S3 API
- Sentry, Prometheus, Grafana, ElasticSearch, Fluentd, Kibana
- Ansible, AWX, Foreman
- GitLab, Nexus
- Ubuntu, Debian, CentOS
- Jira, Confluence, Slack, GSuite
Location
Recruitment Process
- Screening call (30 mins) with the recruiter
- Manager Interview (45 mins)
- Technical Interviews (1h30mins)
- HR Interview (45 mins)
- Offer sent within 48 hours
Don't meet all the criteria? Apply anyway! We're open to exceptional candidates.
🌐 Scaleway | Scaleway Blog | Scaleway sur X