Site Reliability Specialist 1

September 26 2021
Industries Education, Training
Categories Information Technology, Internet, Web, E-Commerce, QA, Tester, Debug
Edmonton, AB

University of Alberta
IST Server & Application Host
Site Reliability Specialist 1

Competition No. - S100345986
Posting Date - Aug 27, 2021
Closing Date - Sep 17, 2021

Position Type - Full Time - Operating Funded
Salary range - $62,877.36 to $87,627.24 per year
Grade - 11
Hours - 35 per wk

This competition is open to all applicants, however internal candidates and applicants who were former employees of the University of Alberta in the past 18 months will be given priority consideration before external candidates. Please indicate your internal status using the "Advertisement" drop down menu when applying.

This posting may be used to fill up to 2 positions.

This position offers a comprehensive benefits package which can be viewed at: Faculty & Staff Benefits.

Reporting to the Manager, Cloud Services, the Site Reliability Specialist, Intermediate (SRS) works with other team members to design, improve and support complex cloud infrastructure environments such as the primary Learning Management Systems for the University of Alberta and other post-secondary institutions. This position requires the incumbent to ensure continuity of service for clients and their users. The SRS applies more extensive design and automation to the entire support lifecycle of server and application environments compared to traditional system administration practices. The successful candidate will be familiar with disaster recovery and business continuity planning.

The SRS will be accountable for their responsibilities on the team, and demonstrate the desire to learn new skills. Also, they will be proactive in setting goals for themselves and for the team, and take the initiative in providing solutions. The SRS should demonstrate strong DevOps and site reliability engineering principles such as comprehensive automation and testing to assist with the transition from traditional IT support models.

All services require off hours emergency support. The incumbent will participate in a 24-hour, 7-day on-call rotation as well as scheduled off-hour maintenance.

The successful candidate can look forward to an energetic, professional team environment where there is a commitment to personal and professional growth.

- Maintains critical environments by ensuring the application platforms are stable, patched and assessed for agility and performance on a regular basis, using automation techniques when possible.
- Analysis of service usage, user activity and performance metrics to forecast demand, capacity, and service delivery enhancement opportunities.
- Participates in regular planning sessions, contributes to the development and maintenance of road maps and schedules, and monitors vendor release information.
- Develops cloud-based services, code testing, and cloud performance checks.
- Implements configuration improvements to continuous service delivery pipelines.
- Researches new technologies that enhance the efficiency of operations and quality of service.
- Works with more experienced team members and stakeholders of departments or faculties to identify requirements and to design system architectures that meet their technological and business needs.
- Responds to incidents with in-depth post-mortem analysis of events, which includes setting up and facilitating meetings with stakeholders, within expected timeframes.
- Designs solutions to remediate issues for future occurrences.

- Bachelor Degree in Computing Science or a related field or equivalent Technical Diploma combined with relevant experience.
- Minimum 4-6 years of experience working in large and complex IT environments.
- Minimum 4-6 years of working experience with UNIX/Linux/BSD systems and/or Windows server systems as an application or database administrator.
- Minimum 2 years of software development experience.
- Proven experience administering large applications in the cloud (AWS/Azure/GCP).
- Experience with automating builds and releases (Jenkins, TravisCI).
- Experience with infrastructure as code (Terraform, AWS CloudFormation).
- Experience with configuration management (Chef, Ansible, Puppet).
- Programming and scripting experience with Ruby, Python, Bash, PowerShell, and Java.
- Effective problem solving and time management skills.
- Strong written and verbal communication skills.
- Ability to work efficiently for independent work and to be collaborative within a team environment.

Interested applicants may apply:

The University of Alberta is committed to an equitable, diverse, and inclusive workforce. We welcome applications from all qualified persons. We encourage women; First Nations, Métis and Inuit persons; members of visible minority groups; persons with disabilities; persons of any sexual orientation or gender identity and expression; and all those who may contribute to the further diversification of ideas and the University to apply.

Apply now! network