Site Reliability Engineer

Paylocity

Sorry, this job was removed at 2:39 p.m. (CST) on Tuesday, November 19, 2019

View 933 Jobs

Find out who's hiring in Northwest Suburbs.

See all Developer + Engineer jobs in Northwest Suburbs

View 933 Jobs

Apply

By clicking Apply Now you agree to share your profile information with the hiring company.

Save job

The Site Reliability Engineer will be a core part of the Site Reliability & Operations team within our Technology organization. This role will be responsible for defining the future state of our monitoring environment and the key integrations between our monitoring tools, event management, and IT Service Management tool stack. This key position will act as a go-to expert for application performance management and infrastructure monitoring across the product teams in the organization.

This position requires exceptional communication skills, a commitment to exceptional results and a passion for

continuous improvement.

Are you the teammate we are looking for?

Who you are:

A senior technologist with a strong background in enterprise level monitoring solutions and their deployment in a large scale environment (~5000 hosts)
Passionate about continuous improvement in performance & availability in an environment through efficient alerting and routing
Proven experience in architecting integrations between different layers of a monitoring tool stack and defining the interaction of data between source monitoring tools, event management, and ITSM
Ability to translate business knowledge embedded in our product teams into transactions and key events that will be tracked as part of our reference architecture

How we work:

Small, self-sufficient product-oriented teams with an entrepreneurial spirit organized into categories
Dedicated Tech Delivery and Enablement experts committed to cutting-edge infrastructure and developer tools
Casual, collaborative, agile environment which embraces and operates under our shared principles
Complete transparency with open, honest discussions about our progress
Close working relationship between executive stakeholders, product teams, and operational focused teams

What we offer:

Lean enabling process that focuses on putting our product teams in the best position to succeed
A commitment to investing in our products, hiring the best talent, and giving them the chance to meaningfully contribute to a vast market opportunity
A subscription to an Online Training Forum for all technology colleagues

What you bring:

Minimum 5-7 years of experience in deploying or maintaining enterprise monitoring tools for Application Performance Management (e.g. AppDynamics, New Relic), Infrastructure Monitoring (e.g. Solarwinds, SCOM) and translating the resulting alerts into notifications and escalations in a mixed SaaS and OnPremise environment
Ability to effectively communicate details of complex issues to stakeholders, business and technical users
Analytical skills, with the ability to identify themes within data and make data driven decisions
Provide leadership around defining our key events / alerts in a reference architecture and deploying this architecture across an environment of diverse technology assets
Implement a standard way of rolling out monitoring agents to a diverse set of target end-point profiles using deployment automation tools (e.g. Octopus)
Maintain agent versioning to ensure stability of the monitoring environment and communicate out any potential gaps in coverage
Demonstrated high-level understanding of enterprise software and networking concepts including SaaS technologies, and SDLC
Create self-service models for consumption of monitoring toolsets where applicable
Knowledge and experience developing and using SOAP, RPC, and REST APIs
Working knowledge of scripting languages such as PowerShell for use in workflow orchestration
Perform high level assessment of business, functional, and technical requirements to address monitoring gaps and implementation of new monitoring products

During the last three months, you would have:

Lead broad programs in the deployment and configuration of monitoring agents to a variety of application and database hosts
Architected the integration of alerting out of the monitoring toolset into event management and ITIL
Defined and implemented escalation patterns for different severities of alerts into a paging tool for incident management
Identified and documented ongoing monitoring training requirements and created a communication plan
Created easily consumable data around application performance and availability to leadership and external stakeholders
Presented material around future state developments in proactive monitoring to internal stakeholders and leadership

Read Full Job Description

Site Reliability Engineer

Location

Similar Jobs