Cloud Site Reliability Engineer at Discover
Discover. A brighter future.
With Discover, you’ll have the chance to make a difference at one of the world’s leading digital banking and payments companies. From Day 1, you’ll do meaningful work you’re passionate about, with the support and resources you need for success. We value what makes each employee unique and provide a collaborative, team-based culture that gives everyone an opportunity to shine. Be the reason millions of people find a brighter financial future, while building the future you want, here at Discover.
At Discover, be part of a culture where diversity, teamwork and collaboration reign. Join a company that is just as employee-focused as it is on its customers and is consistently awarded for both. We’re all about people, and our employees are why Discover is a great place to work. Be the reason we help millions of consumers build a brighter financial future and achieve yours along the way with a rewarding career.
- Identify process and infrastructure gaps and implement process improvements to increase operational reliability.
- Increase operational efficiency and promote stability through automation.
- Develop Dashboards for alerting and monitoring to ensure application systems service reliability and availability.
- Write quality code using SOLID principles in Test Driven Development with languages such as Java, Ruby, Go, Python, and Bash.
- Operational Performance & Stability: Works with other members of their assigned Value Stream to ensure that the in-scope applications/platforms are meeting performance and stability requirements. This includes managing Major Incidents to Mitigation/Resolution.
- Problem Management: Performs Post-Incident Reviews of all Major Incidents and determining Action Items required to avoid similar issues/minimize downtime for future Incidents.
- Monitors and Metrics: Works with Application Development to ensure that assigned applications/platforms have the appropriate monitoring and metrics in place to appropriately measure performance and stability.
- Identify Functional and Non-Functional Improvements: Acts as the Operations representative in Value Stream planning and prioritizes sessions to ensure that Operational needs of assigned applications/platforms are addressed as needed. Holds quarterly Operational Performance Reviews with Value Stream management.
- Release Planning & Coordination: Works with other members of their assigned Value Stream to ensure that the Production releases for their in scope applications/platforms are properly planned and coordinated. This includes Holds Change/Release implementation reviews to ensure thorough and appropriate implementation plans.
- Provides review and sign-off/approval of change tickets for the assigned Value Stream. Represents the Value Stream in Change Advisory Board Meetings. Participates in Program Increment Planning Sessions as a liaison for Operations and Infrastructure support. Provides information regarding upcoming critical changes to the Value Stream.
- Operational Readiness:
- Ensures that applications/platforms in the Value Stream are operationally ready for Production. This includes Annual Review of all SOPs/Knowledge Articles. Monitors review for any new Feature launch or other significant change that may impact monitoring.
- Develop/ change SOP/Knowledge Article review for any new Feature launch or other significant change that may impact support documentation.
- Training of Command Center and Application 1st level Support on new SOPs, Knowledge Articles, and any other support-related needs.
- Performs Monthly Capacity Analysis of applications/platforms within the Value Stream. Creates and Maintains Operationally focused ELK Dashboards for the Value Stream
At a minimum, here’s what we need from you:
- Bachelor's Degree in Business, Computer Information Systems, Computer Science, MIS, Engineering, Science, or related field
- 2+ years of experience in Information Technology, or related field
- In lieu of a degree, 4+ years of experience in Information Technology, or related field
If we had our say, we’d also look for:
- 4+ years of experience in Technology, or related field
- Extensive background working in an Infrastructure as Code (IOC) environment
- Skill to utilize tools and services such as AWS, Openshift Container Platform, Kubernetes Framework, PCF, Kafka, Jenkins, Github, Maven, Chef, Ansible, IntelliJ, Eclipse, and CLS for logging, Grafana/AppDynamics for alerting/monitoring, troubleshooting
What are you waiting for? Apply today!
The same way we treat our employees is how we treat all applicants – with respect. Discover Financial Services is an equal opportunity employer (EEO is the law). We thrive on diversity & inclusion. You will be treated fairly throughout our recruiting process and without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status in consideration for a career at Discover.