Discover. A brighter future.
With us, you’ll do meaningful work from Day 1. Our collaborative culture is built on three core behaviors: We Play to Win, We Get Better Every Day & We Succeed Together. And we mean it — we want you to grow and make a difference at one of the world's leading digital banking and payments companies. We value what makes you unique so that you have an opportunity to shine.
Come build your future, while being the reason millions of people find a brighter financial future with Discover.Job Description
Site Reliability Engineering (SRE) applies software engineering techniques and discipline to production operations to attack reliability and performance issues to fix them for good. SREs focus on availability, latency, performance, efficiency, change management, monitoring, emergency response and capacity planning of their services
As a Senior Manager you will lead a team of SREs who are responsible for some of Discover’s most critical applications: our Card and Bank websites and mobile application. Your SREs will work closely with application engineers to ensure sites/mobile app is designed with resiliency in mind. You’ll focus on improving the availability of our sites/app by focusing on SLOs, monitoring and alerting. Your team will spend time with application engineers analyzing performance and efficiency of services. The team will spend significant amount of their time automating operational functions.
Our SREs are responsible for emergency response, ensuring outages are quickly mitigated, post incident reviews are held, and action is taken to avoid future outages. The SRE practice is fairly new at Discover, and as a Senior Manager you will be able to help mature the SRE culture and practice.
- Lead the newly established SRE team supporting Discover.com and Discover’s mobile application. Discover.com gets 3.5million and Discover Mobile gets 2.5 million logins daily!
- Champion a culture of learning, continuous improvement, and blameless retrospection within your team.
- Mentor and grow your junior engineers, and empower and unblock your senior ones.
- Partner with our Talent Acquisition team as we recruit, interview and hire the best engineering talent to join Discover’s growing SRE practice.
- Partner with Product teams and Solution Architects to help design solutions that achieve the required reliability outcomes for their services.
- Be a leader in the SRE community of practice and evolve the SRE practice for the entire organization.
- Manage a technology team. Lead team and peers to ensure capacity and performance management is compliant.
- Oversee application release management.
- Manage change control for applications.
- Manage disaster recovery and compliance plans.
- Bachelor’s Degree in Information Technology or related
- 6 + years of experience in Payments and or equivalent technology Industry, or related experience
- In Lieu of Education, 8 + years of experience in Payments and or equivalent technology Industry, or related experience
- 5+ years of SRE experience in a highly customer-focused environment.
- 3+ years’ experience successfully managing a team of 5-8 engineers on large-scale projects that included technical deep-dives and production troubleshooting in the areas of: distributed systems, programming, configuration management, networking, storage, and operating systems
- Strong leadership skills and the ability to motivate teams.
- Ability to drive change, and motivate engineers to develop simple solutions for complex operational challenges.
- Experience collaborating and partnering effectively with several other teams.
- Experience leading discussions with senior leadership, and are able to tailor the level of technical detail to suit your audience.
- Proficiency in designing resilient app patterns
- Expertise in 24x7 site monitoring and ability to own uptime & performance SLA’s for large scale distributed systems
- Expertise and operational experience at operating highly available, scalable and fault-tolerant systems using container platforms
- Familiar with OS tuning, optimization and system requirements for vertical scaling
- Proficiency in one or more general purpose programming languages: Python, Go, shell scripting (Unix/Linux), Java
- Expertise in automation tools experience such as Chef, Puppet, Ansible.
- Experience in developing monitoring tools and log analysis tools to manage operations
#BI-Remote #Remote #LI-LJ1
What are you waiting for? Apply today!
The same way we treat our employees is how we treat all applicants – with respect. Discover Financial Services is an equal opportunity employer (EEO is the law). We thrive on diversity & inclusion. You will be treated fairly throughout our recruiting process and without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status in consideration for a career at Discover.