Discover
We power a network that helps people achieve a brighter financial future.
Riverwoods, IL
Remote
Hybrid

Lead Service Reliability Engineer

Sorry, this job was removed at 11:07 a.m. (CST) on Friday, July 9, 2021
Find out who's hiring in Chicago.
See all Developer + Engineer jobs in Chicago
Apply
By clicking Apply Now you agree to share your profile information with the hiring company.

Discover. A brighter future.

With us, you’ll do meaningful work from Day 1. Our collaborative culture is built on three core behaviors: We Play to Win, We Get Better Every Day & We Succeed Together. And we mean it — we want you to grow and make a difference at one of the world's leading digital banking and payments companies. We value what makes you unique so that you have an opportunity to shine.

Come build your future, while being the reason millions of people find a brighter financial future with Discover.

Job Description  

Service Reliability Engineers (SREs) are a hybrid of systems and software engineers who are responsible for scaling, automation, and production issue support for applications. SRE’s have an intense passion for finding and improving efficiencies with infrastructure, development and deployment automation. As a SRE, you` will lead the efforts of application deployment, reliability, scalability, availability and performance alongside the engineering and infrastructure teams. Site Reliability Engineers will work closely with our Software Development & Engineering teams to build mature, production-ready services and applications. As part of the SRE team, you will help define our standards for monitoring, alerting, scalability, and production-readiness. You will monitor and report on the uptime of our systems and services, the performance of our applications, and the capacity of our platform. 

The SRE is responsible for the provisioning, benchmarking, tuning, and improving the end to end customer experience for our Payment Services platforms. In our industry where millions of dollars move every day and milliseconds count in every transaction you are always looking for ways to ensure our customers get the best response time. You will also be deeply involved in system roadmap planning and release management activities as well. Overall, you will become a rock star subject matter expert on the operation of these world class core systems powering our great Fortune 300 Company (which really operates like a startup). You will promote a risk-aware culture, ensure efficient and effective risk and compliance management practices by adhering to required standards and processes.

How You’ll Do It:

Operational stability and performance 

  • Work with other members of their assigned value stream to ensure that in-scope applications/platforms are meeting performance and stability requirements. This includes managing major incidents to mitigation/resolution. 
  • Supports and maintains software installations and hardware systems - lifecycle management, change management, request management and incident management.
  • Designs and tests vended software and non-vended solutions.
  • Consults with customer base to gather requirements for solution set.
  • Control application code deployment servers and code deployment methods
  • Design and architect operational solutions for managing applications and infrastructure, with the specific goal of increasing the automation, repeatability, and consistency of operational tasks.
  • Self manages the effort split between operational work and engineering work 

Problem management:

  • Perform post-incident reviews of all major incidents and determine action items required to avoid similar issues/minimize downtime for future incidents.
  • Lead and participate in performance tests, identifies the bottlenecks, opportunities for optimization and capacity demands
  • Analyze and participate in periodic on-call duties to prevent, solve and automate the response to problems in mission critical services and automated deployments
  • Partnering with security engineers and developing plans and automation to aggressively and safely respond to new risks and vulnerabilities.
  • Works closely with plan, build, run and infrastructure teams to support key business applications.
  • Works closely with senior staff for process improvements and automation opportunities

Monitors and metrics: 

  • Work with Application Development to ensure that assigned applications/platforms have appropriate monitoring and metrics in place to appropriately measure performance and stability. 
  • Monitor and report on SLA/SLO for a given applications services. Work with business and product owners to establish key performance indicators.
  • Create and maintain monitoring technologies and processes that improve the visibility to our applications' performance and business metrics and keep operational workload reasonable.
  • Defines and drives adoption of a best in class monitoring frameworks to accomplish end to end application or service monitoring and noiseless alerting end application or service monitoring and noiseless alerting with proper telemetry

Identify functional and non-functional improvements:

  • Act as the Operations representative in value stream planning and prioritize sessions to ensure that operational needs of assigned applications/platforms are addressed as needed. Hold quarterly operational performance reviews with value stream management. 

Release planning and coordination: 

  • Work with other members of his/her assigned value stream to ensure that the production releases for their in scope applications/platforms are properly planned and coordinated. This includes Holds Change/Release implementation reviews to ensure thorough and appropriate implementation plans.
  • Work with Release Manager and development teams to deploy software releases

Review and sign-off/approval of change tickets for the assigned value stream: 

  • Represent the value stream at Change Advisory Board Meetings. 
  • Participate in Program Increment Planning Sessions as a liaison for Operations and Infrastructure support. 
  • Provide information regarding upcoming critical changes to the value stream. 
  • Control application log collection and analysis - Automate processes and systems configuration/deployment

Operational readiness:

  • Ensure that applications/platforms in the value stream are operationally ready for production. This includes Annual Review of all SOPs/knowledge articles.
  • Monitor review for any new feature launch or other significant change that may impact monitoring. 
  • Review SOP/knowledge article for any new feature launch or other significant change that may impact support documentation. Train Command Center and Application 1st level
  • Support on new SOPs, knowledge articles, and any other support-related needs. 
  • Perform monthly capacity analysis of applications/platforms within the value stream. Create and maintain operationally focused ELK dashboards for the value stream. 

Qualifications You’ll Need

The Basics

  • Bachelor's degree in business, computer information systems, computer science, MIS, engineering, science, or related field
  • 2+ years of experience in information technology, or related field
  • In lieu of a degree, 4+ years of experience in Information Technology, or related field

Bonus Points If You Have

  • At least 5 years of experience in software engineering
  • 2 years of coding experience using strongly typed language Java, Golang
  • 2 years of experience in SRE, DevOps, or similar role
  • 2 years of experience with scripting languages like Python / Bash
  • Experience in DevOps skills and methodologies - Create and manage a continuous build, integration, test, and deployment systems
  • Proficient in monitoring, alerting, analyzing and troubleshooting large scale distributed systems
  • Experience with clustering technologies - high availability, resiliency and horizontal scaling.
  • Good understanding of defining and executing High Availability, Disaster Recovery, Sustained Resiliency, Chaos Engineering tests
  • Familiar with design principles of monitoring and alerting systems
  • Familiar with OS tuning, optimization and system requirements for vertical scaling
  • Ability to enhance and maintain complex software components and distributed systems.
  • Understanding of networking concepts and experience with HTTP protocol
  • Deep knowledge of distributed pub-sub message systems
  • Proficiency in one or more general purpose programming languages: Python, Go, shell scripting (Unix/Linux), Java
  • Automation tools experience such as Chef, Puppet, Ansible. Developing monitoring tools and log analysis tools to manage operations
  • Continued curiosity regarding new technologies and evolving best practices 

#LI-MF1 #Remote #BI-Remote

What are you waiting for? Apply today!

The same way we treat our employees is how we treat all applicants – with respect. Discover Financial Services is an equal opportunity employer (EEO is the law). We thrive on diversity & inclusion. You will be treated fairly throughout our recruiting process and without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status in consideration for a career at Discover.

See More
Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.

What are Discover Perks + Benefits

Discover Benefits Overview

Start enjoying great benefits Day 1 — We support you with the same dedication we bring to all of our customers. Our comprehensive benefits package features first-class insurance, financial planning support and excellent perks designed to help you reach your goals and live a rich, healthy life.

Check out more of our amazing employee benefits at mydiscoverbenefits.com

Culture
Volunteer in local community
Discover’s business is built on helping people, and we invest in the community (Blessing Backpacks, Boys & Girls Clubs, Big Brothers/Sisters) to demonstrate our commitment to a brighter future.
Partners with nonprofits
Open door policy
OKR operational model
Team based strategic planning
Open office floor plan
Flexible work schedule
Remote work program
Diversity
Dedicated diversity and inclusion staff
Mandated unconscious bias training
Diversity manifesto
Diversity employee resource groups
Hiring practices that promote diversity
Health Insurance & Wellness Benefits
Flexible Spending Account (FSA)
You can open a separate Health Care FSA (HCFSA) and contribute up to $2,650 tax-free from your paycheck to reimburse yourself for eligible out-of-pocket expenses.
Disability insurance
Employees receive Short-Term Disability Insurance at no cost.
Dental insurance
Discover offers two dental plan options — Standard and Premier — both are administered by MetLife.
Vision insurance
Discover offers two vision plan options — Standard and Premier through VSP.
Health insurance
Discover offers a variety of medical plans for you and eligible family members, so that you can choose the benefit plan that suits your needs.
Life insurance
As a Discover employee, you receive Basic Life Insurance of one times your HWEE (up to $500,000) at no cost to you.
Pet insurance
Purchase medical coverage at a discounted rate for your beloved family pet. The more pets you insure, the greater the discount.
Wellness programs
Help balance your work and personal life with a wide variety of free and discounted resource and referral services including family and relationship counseling and financial guidance.
Mental health benefits
Financial & Retirement
401(K)
You may elect to contribute 1% to 30% of your eligible base salary, commissions and bonus on a pre-tax basis, up to IRS limits every year.
401(K) matching
Discover matches up to 6% of the pre-tax contributions you make to the 401(k) Plan.
Employee stock purchase plan
The ESPP provides eligible employees with an opportunity to purchase shares of Discover common stock through payroll deductions at a 5% discount.
Performance bonus
Charitable contribution matching
Child Care & Parental Leave Benefits
Childcare benefits
Generous parental leave
Family medical leave
Adoption Assistance
Discover helps eligible employees and their families with the costs of adoption by reimbursing certain expenses.
Company sponsored family events
Vacation & Time Off Benefits
Generous PTO
Discover has a Paid Time Off of 4 to 5 Weeks of paid time per year.
Paid volunteer time
Paid holidays
Discover provides 7 paid holidays.
Paid sick days
Office Perks
Commuter benefits
When you enroll in the Commuter Benefits Program at WageWorks, you’ll save on taxes on mass-transit passes, parking and other eligible expenses.
Company-sponsored outings
Onsite office parking
Recreational clubs
Relocation assistance
Fitness stipend
Onsite gym
Discover has fitness centers and Weight Watchers® programs at all five major locations.
Professional Development Benefits
Job training & conferences
Tuition reimbursement
Discover provides tuition reimbursement and a full-ride bachelor's degree program for select online degree programs.
Lunch and learns
Promote from within
Mentorship program
Continuing education stipend
Continuing education available during work hours
Online course subscriptions available
Customized development tracks

More Jobs at Discover

Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about DiscoverFind similar jobs like this