Senior Site Reliability Engineer

> 5 années d'expérience

CDI

Site Reliability Engineer (SRE)

Management

Ansible

ArgoCD

💼 We're making the world of digital assets accessible and secure for everyone. Join the mission!

Founded in 2014, Ledger is the global platform for digital assets and Web3. Over 20% of the world's crypto assets are secured through our Ledger Nanos. Headquartered in Paris and Vierzon, with offices in the UK, US, Switzerland, and Singapore, Ledger has a team of more than 900 professionals developing a variety of products and services to enable individuals and companies to securely buy, store, swap, grow, and manage crypto assets – including the Ledger hardware wallets line with more than 6 million units already sold in 200 countries.

At Ledger, we embody the values that make us unique: Pragmatism, Audacity, Commitment, Trust, and Transparency. Hear from our employees how they shape the work we do here.

Primary responsibilities:

As part of our SRE team, you'll drive technology's transformation by launching new platforms, building tools, automating complex issues, and integrating with the latest technology.
Site Reliability Engineers leverage their experience as software and systems engineers to ensure applications integrated by SRE are available, have full-stack observability, and have continuous improvement through code and automation.
We seek an experienced candidate in reliability engineering who thrives on and enjoys solving complex problems through innovation and impacting change at scale.
You'll bring a strong mix of software engineering, operations, and systems engineering experience to the role, with experience in integrating complex systems.

In this role, you will:

Participate in building a DevOps/SRE culture and enable the transition to modern infrastructure management and deployment practices.
Participate in building the SRE team roadmap (vision and delivery accountability). Anticipate stakeholder needs, game-changing technologies emergence, and challenge scope/deadlines.
Perform integration of platform software components.
Participate in designing and delivering solutions to improve the availability, scalability, latency, and efficiency of systems.
Influence and create standards & best practices in support of service-level objectives.
Automate key SRE metrics, including SLOs/SLAs and error budgets.
Provide expert support to our level-2/application support team to troubleshoot priority incidents and conduct post-mortems.
Apply analytics on past incidents and usage patterns to predict issues and take proactive actions.
Ensure control of technical debt and promote quality practices.
Follow SRE and chaos engineering approaches across all strategic systems to predict in coordination with Service Design and prevent outages and improve solution availability.
Drive adoption of self-healing and resiliency patterns such as circuit breakers, bulkheads, etc.
Design and conduct performance tests, identify bottlenecks, and opportunities for optimization.

What we're looking for:

8+ years in cloud engineering at scale, in organizations operating SaaS solutions.
Proficiency in working in Unix/Linux environments, Git, Python, Terraform, Kubernetes, AWS cloud solutions and architectures, CI/CD tools, Argocd, Ansible, configuration management, etc.
Strong knowledge of observability practices, with experience implementing and managing Logging, Monitoring, and Alerting frameworks with solutions such as Datadog or Prometheus/Grafana/Loki.
Experience in cross-functional work and the ability to demonstrate a collaborative approach regarding building key relationships across the organization and define projects scope, goals, plan, and deliverables.
"Customer focused" with the ability to identify and understand both internal and external customers' needs.
Creative problem-solving and analysis skills with an ability to identify, develop, and implement solutions to meet the needs of the business.
Excellent presentation and written communication.
Ability to deal with ambiguity, high levels of pressure, and rapidly changing environments.
Engineering degree.

What's in it for you?

Equity: Employees are the foundation of our success, and we award stock options so you can share in that success as we grow.
Flexibility: A hybrid work policy.
Social: Annual company outing for Ledgerdary Days, plus frequent social events, snacks, and drinks.
Medical: Comprehensive health insurance policy offering extensive medical, dental, and vision care coverage.
Well-being: Personal development, coaching & fitness with our dedicated partners.
Vacation: Five weeks of paid leave per year, in addition to national holidays and rest & relaxation (RTT) days.
High tech: Access to high-performance office equipment and gadgets, including Apple products.
Transport: Ledger reimburses part of your preferred means of transportation.
Discounts: Employee discount on all our products.

We are an equal opportunity employer for all without any distinction of gender, ethnicity, religion, sexual orientation, social status, disability, or age.

Senior Site Reliability Engineer

Skills

Jobs similaires