Lead Site Reliability Engineer (SRE) - Big Data Team - DevOps
GrubHub Holdings Inc. is the nation's leading online and mobile food-ordering company dedicated to connecting hungry diners with local takeout restaurants. The GrubHub Holdings Inc. portfolio of brands includes GrubHub, Seamless, MenuPages and Allmenus. The company's online and mobile ordering platforms allow diners to order directly from thousands of takeout restaurants across the country and London, and every order is supported by the company's 24/7 customer service. GrubHub Holdings Inc. has offices in Chicago, New York City and London.
About the job:
Grubhub site reliabiltiy engineers own and run their products and services from conception to continuous operation. SREs play a key role in our software engineering process, and they are embedded within teams to focus on the operational aspects of our services.
Some Challenges You’ll TackleResponsibilities:
- Create, maintain, own and operate your team’s services that supporting fundamental capabilities within Grubhub’s products.
- Tackle some of the most challenging problems you can face developing high availability services in a distributed cloud environment that needs to scale exponentially.
- Help evaluate and choose emerging technologies…new service protocols and architectures, self-healing capabilities, globally distributed caching, performance and code quality tooling, etc. Determine the right tool for the right task.
- Manage / Lead a team of 2 to 3 direct reports
Tools we work with:
- Java for micro services
- Cassandra
- Docker (in production!)
- Mesos and Marathon for job scheduling
- Combination of AWS and our own hardware
- Python and Fabric for automation and our CD pipeline
- Jenkins for builds and task execution
- Linux (CentOS and Ubuntu)
- DataDog for metrics and alerting
- Puppet
Requirements:
- Experience building complex distributed systems. In this role you are the one gravitating toward operational concerns of the team, focusing on reliability, performance, capacity planning and automation of everything.
- Proficient in high level script languages such as Python or Ruby (Python preferred)
- Experience developing solutions leveraging Docker
- Experience managing Linux (Centos, Ubuntu) systems
- Configuration management experience with Puppet, Chef, or Ansible
- Building/implementing monitoring for network, server and application status
- Experience with monitoring tools such as graphite, nagios, Datadog, Runscope
- Experience with log aggregation systems using splunk, logstash, loggly, elasticsearch
- Continuous integration, testing, and deployment using git, jenkins
- Experience with relational databases (MySQL)
- Experience with NoSQL databases (Cassandra, Couchbase, Mongo)
- Experience with Hadoop (Cloudera, DataStax), mahout and other big data platforms
- Exceptional communication and troubleshooting skills.
- Unlimited paid vacation days. Choose how your time is spent.
- Never go hungry! We provide weekly GrubHub/Seamless credit.
- Regular in-office social events, including happy hours, wine tastings, karaoke, bingo with prizes and more.
- Company-Wide Initiatives encouraging innovation, continuous learning and cross-department connections.
Grubhub is an equal opportunity employer. We evaluate qualified applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, and other legally protected characteristics. The EEO is the Law poster is available here: DOL Poster. Grubhub is committed to working with and providing reasonable accommodations to individuals with disabilities. If you need a reasonable accommodation because of a disability for any part of the employment process, please send an e-mail to [email protected] and let us know the nature of your request and your contact information.