Site Reliability Engineer

Uptake

Sorry, this job was removed at 6:05 a.m. (CST) on Wednesday, June 13, 2018

View 933 Jobs

Find out who's hiring in Chicago.

See all Developer + Engineer jobs in Chicago

View 933 Jobs

Apply

By clicking Apply Now you agree to share your profile information with the hiring company.

Save job

What We Do:

Uptake is a Chicago-based predictive analytics SaaS platform provider that empowers major industry leaders to optimize performance, reduce asset failures and enhance safety. At Uptake, we combine our strengths—machine learning, analytics, data visualization and software development—with the expertise of our industrial partners. The result is enormous savings in development time and resources for Uptake’s partners and a proven industrial grade software platform that delivers value to partners and their end customers.

What You'll Do:

As a Site Reliability Engineer, you’d proactively monitor and improve end-to-end system performance, identify deficiencies, and potential failures throughout our infrastructure. You will build deep, end-to-end knowledge of the complexity of our platform and continuously create improvements and automation to enhance durability, performance and maintainability of the platform. You are central to the automation of everything at Uptake.

Responsibilities:

Support and perform maintenance across product and data environments/systems
Proactively monitor events, investigate issues, analyze solutions, and drive problems through to resolution using a wide variety of Ops tools and monitoring platforms to gain knowledge, understanding, and enable persistent monitoring of system availability, performance, and capacity
Develop and maintain scalable alerting, ticketing, and logging tools for debugging and monitoring
Maintain our monitoring systems and develop new metrics/monitoring dashboards as additional coverage events become necessary
Be on call for potential downtime problem solving and root cause diagnosis
Provide support with network management and maintain a high availability environment
As a technology subject matter expert, you will mentor engineers to stretch their knowledge and perspective.

Qualifications:

Excellent understanding of Linux, Bash and shell scripting
Knowledge of and experience with network stack, protocols, network management and monitoring tools
Knowledge of AWS technologies - EC2, S3
Experience with group services, including configuration, synchronization, and naming protocols, preferably using Apache ZooKeeper.
Experience with large-scale data processing, preferably using Apache Spark
Experience with a distributed log tool such as Kafka
Experience with automation tools: Puppet, Chef, Docker, Jenkins and/or Ansible
Knowledge of Mesos/Marathon and Docker for container orchestration
Experience in Big Data (NoSQL) & standard enterprise databases - including data modeling, testing and deployment support. Proficiency in Cassandra, HBase, or PostgreSQL is strongly preferred.
Familiarity with JavaScript build tools: Maven, Gradle or Ivy (preferred)
Experience with JVM and Java stack: Tomcat, Jetty
Ability to work collaboratively in a fast-paced, entrepreneurial environment
Experience working with Agile methodologies
Excited by Big Data technologies and interested in integrating statistics and analytics to make our systems perform even better

Please provide:

Resume
Cover letter

Read Full Job Description

Site Reliability Engineer

Location

Similar Jobs