Lead Resilience Engineer at Enova
Are you fully engaged at work?
Think back to your last job. Did your work ever feel...less than challenging? Enova team members don’t have that problem. We’re regularly solving complex problems that directly impact the business and help grow our skills at the same time. When we talk to our team members, they tell us they’ve broken through personal and professional barriers thanks to the mission-driven work they’re tackling and the support from their manager and team. Where else can you level up your skills by working on fresh, interesting challenges every day? We want to celebrate your wins with you. This is the core of Enova.
About the role:
We have a lot of technology here at Enova and, as much as we strive for a 100% uptime rate, it's not reality. In this role, you will work to improve the resiliency of our technologies through tests, experiments, and post-incident analysis. You will work on optimizing how we deal with unexpected complex failures, including owning our incident response process, facilitating post-incident blameless retrospectives, and learning from failures to minimize their impact. By developing a strong understanding of our systems and applications and how they relate holistically, you'll be able to appropriately respond during outages and make recommendations alongside Subject Matter Experts. You will collect and analyze data around failures, identify risks and vulnerabilities, and drive their resolution. You will regularly collaborate with IT, Software Engineering, and product teams to drive a culture of quality where resilience is woven into our technology stack. You will show what different failure modes look like by running experiments (Chaos Engineering, Disaster Recovery) and share learnings across the organization.
What you'll be doing:
- Own Enova’s Production Incident Process end-to-end.
- Continually test and improve the resiliency of our services on an ongoing basis; drive Disaster Recovery and Chaos Engineering experiments, balancing tech and business needs.
- Manage initiatives that focus on process improvements, risk mitigation, and improving customer experience.
- Collect data, perform trend analysis, and identify patterns of risks and vulnerabilities.
- Work with partner teams to address vulnerabilities, including making architectural recommendations.
- Socialize lessons learned among technology and business teams.
- Be part of our MI PIC (Incident Commander) rotation following training, leading incidents to completion, and driving post-incident analysis (including interviews, contributing factor analysis, incident response analysis, and remediation plans).
We get excited about you if you have:
- 3+ years of professional work experience in a technology role; Software Engineering, Systems, Ops, SRE, Product Management or others.
- Experience with object-oriented programming languages (such as Ruby, Go, Perl, Python, Java, or PHP)
- Experience with infrastructure as code (Terraform, Chef, etc.)
- Experience with databases and big data stores, especially Postgres or Kafka
- Ability to handle, analyze, and present data.
- Superior analytical, problem solving, and critical thinking skills.
- Interest in complex distributed systems - how they work, how they can work better, how to know if they are working correctly.
- Comfortable with ambiguity; able to translate ambiguous problems into strong solutions.
- Demonstrates maturity, good judgment, negotiation, leadership and project management skills.
- Excellent written and verbal communication skills, including the ability to communicate to different levels of an organization (i.e. on a technical vs. non-technical level).
About Resilience Engineering:
The Resilience Engineering team strives to drive a culture of quality where resilience is built into all of our systems. We do this by focusing on our incident response process, incident analysis and learnings, and resilience practices. We work closely with other Tech, Operations, and Business teams to resolve complex failures and to continuously learn.
Enova is a FinTech company dedicated to using technology to help hard-working people get access to fast, trustworthy credit. To date, we’ve helped more than 5 million customers around the world. Born and raised in Chicago, our philosophy is simple, “Life’s short. Work someplace awesome.” Want to learn more? Just ask any of our almost 1,500 employees.