Cloud Logging and Monitoring Engineer
What We'll Bring
Our quest to modernizing the way we do technology is not slowing down anytime soon. We continue to make big strides in our agile atmosphere to bring the latest in products and solutions within the cloud infrastructure. Our cloud teams have the potential to shape the future by solving thought-provoking problems and using transformational technology to further enhance our capabilities in this data-driven world. Helping our clients build trust begins with a strong team of innovators ready to pave the way with strategy and optimization in mind. You’ll have the chance to thrive in a culture of ownership and delivery as these efforts continue to expand. As technology evolves, our advantage in having an ecosystem of innovation and modernization creates an unmatchable environment. These advantages can enable you to be at the center of groundbreaking discoveries.
What You'll Bring
4+ years of experience with one/more public/private cloud platforms (e.g. AWS, Azure, GCP, etc.)
4+ years of experience with Splunk, plus if certified
Experience with monitoring and logging for large volume sites with many users
Understanding of cloud providers, platforms and configurations and familiarity with workload usage/consumption patterns on cloud
Ability to support agile decision making across multiple cloud services, with short-cycle analytics and problem solving
Ability to liaise with multiple stakeholders (across users, IT & ops support groups) and proactively identify opportunities for service improvement
Solid knowledge of AWS logging mechanisms from CloudWatch to CloudTrail and serverless and EKS/ECS
Advanced Splunk admin skills, need to know how to configure AWS inputs in Splunk using the AWS TA
Solid understanding of AWS account management
We'd Love to See:
Bachelor’s degree in a computer-related field (such as Computer Science, Engineering, or MIS), and at least 7 years of IT experience.
Impact You'll Make
Engage in end-to-end lifecycle of solutions; from inception, design, through deployment, operation and refinement.
Support pre-go live services such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
Maintain solutions once they are live by measuring and monitoring availability and overall system health; incident triage and management.
React to production deficiencies by continuously implementing automation, self-healing and monitoring to production solutions.
Evolve solutions by pushing for changes that improve reliability stability and velocity.
Practice sustainable incident response and blameless postmortems.
Provide a systematic problem-solving approach and a sense of ownership and drive for the team.
Multi-tasker while ensuring priorities are met in a fast-paced environment
Translate application requirements and support data solutions in AWS
#DICE
#LI-EP1