Harrison Clarke are working with a company that helps serve schools & students globally in their search for a Site Reliability Engineer. The company works with millions of users daily in significantly the U.S. and beyond to help students learn and teachers monitor the progress of their pupils.
The Site Reliability Engineer will be responsible for engaging with service owners, helping to improve, drive, and maintain their services from inception to deployment. You will use your experience writing scripts in languages such as Python and passion for code-testing automation to help the company in their ambition to improve the quality & accessibility of education. You relish the opportunity to work at a company that places value on diversity and transparency and are comfortable with accountability and problem-sharing alike.
Measure and track availability & system health to help service owners maintain their services
Work with service owners to push for reliability/velocity-enhancing changes through automation & evolve systems, thereby sustainably scaling systems
Work with service owners to enhance the full-service lifecycle
Carry out activities including system design consulting, capacity planning, & launch reviews to ensure service owners push services through the full-service lifecycle
Carry out, encourage, and sustain incident response
Strong sense of ownership and drive as well as being a logical problem-solver and clear communicator
Skills and willingness to automate routine tasks and optimize code
Experience using Python or Go & Shell among others to write scripts
Experience involving software complexity, algorithm-based-thinking, and data-structures
Experience using containers to deploy automated code testing
Experience with on-call rotation to know its difficulties, how it can be developed, and how to promote its importance
Experience working to troubleshoot, analyse, and design distributed systems that serve production traffic