Principal Site Reliability Engineer (SRE)

Cribl

| Remote

Sorry, this job was removed at 1:20 p.m. (CST) on Tuesday, December 21, 2021

View 889 Jobs

Find out who’s hiring remotely

See all Remote jobs

View 889 Jobs

Apply

By clicking Apply Now you agree to share your profile information with the hiring company.

Save job

Join our team at Cribl Inc as a Principal Site Reliability Engineer and join our mission to unlock the value of all machine data. In this role you straddle the entire socio-technical landscape of Cribl Cloud. You will help lead and mature the SRE practice within Cribl Cloud. Planning and decision making around site reliability tooling and architectural roadmap will be made and heavily influenced by you. Mentorship combined with a proactive hands-on style of getting work done will be critical to the success of this role. You enjoy and are comfortable clearly communicating and collaborating with product engineering squads on topics related to production observability, system design, and incident handling.

Cribl provides users a new level of observability, intelligence and control over their real-time data. Reporting to the Director of Engineering, you will contribute in our efforts to envision, create and run Cribl Cloud offering.

We are a 100% remote first company. You will work on new development on a product we've built from scratch!

Responsibilities:

You will drive technology direction within the product domain and influence setting the bar for the engineering team(s) for technical excellence and delivery.
You will work closely with other technical leader and architect(s) to understand and capture product dependencies and align/prioritize project execution; able to challenge decisions when needed
You will provide thought leadership for the vision, strategy and architecture for the SRE and SaaS engineering teams.
You will work with engineers to ensure that the designed solution responds to non-functional requirements such as availability, performance, security, and maintainability.
Improve the reliability of our systems by working with engineers to ensure that the software delivery pipeline is as efficient as possible.
Mentor our engineers to achieve more than they thought possible. You enjoy making other teams successful and are fulfilled through the success of others.
You will write and update documentation, including runbooks/playbook
You will automate work including infrastructure needs, testing, failover solutions, failure mitigation, and much more
You will debug complex problems across an entire stack and creating solid solutions

Minimum Requirements:

7+ years experience with software engineering, software development, or system operations
Experience building, and operating large-scale production systems
Knowledge of Container technologies, javascript / typescript & source control
Experience working with container deployment and orchestration technologies with knowledge of fundamentals including service discovery, deployments, monitoring, scheduling, and load balancing.
Understanding of Systems programming (network stack, file system, OS services) and networking (L2 vs. L3, network architecture, VLANs)
Experience identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues.
You have skills to work across teams and functions to influence design, operations and deployment of available software

Nice to have:

Experience with development and deployment in a hosted cloud environment, AWS.
Experience with running containerized environments and understanding of multi-tenancy and security implications.
Experience with optimized and scalable software that operates on a large number of nodes.

What we offer:

Competitive Salary
Stock Options
Medical, dental, and vision insurance
Flexible spending account (FSA)
401(k) plan offered
Parental Leave
Professional Development and Career Growth
Generous Vacation and Holiday Policy, including 2 Floating Holidays to use for holidays you observe
Social Responsibility Employee Group that reflects our value-driven company culture

Diversity drives innovation, enables better decisions to support our customers, and inspires change for the better. We’re building a culture where differences are valued and welcomed. We work together to bring out the best in each other. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying.

Read Full Job Description

Principal Site Reliability Engineer (SRE)

Location

Similar Jobs