Manager, Cloud Platform

Telus about 20 hours ago

Calgary, Alberta, Canada

Senior Level

Full-Time

About the role

Description Our Team and What We’ll Accomplish Together We are Canada’s largest healthcare IT provider and we’re transforming healthcare. The TELUS Health Cloud Platform team is passionate about solving complex problems to make life simpler for patients, clinicians, and the teams that serve them. We’re building secure, cloud-native platforms at scale across GCP, AWS, and Azure — and we take pride in doing it right. We are building toward an agentic-first operating model. AI agents — not humans — handle the routine: provisioning infrastructure, responding to requests, enforcing guardrails, and guiding teams through self-service workflows. Security is built into everything we do, not bolted on at the end. Our engineers focus on building and improving those agents and systems, not executing manual tasks. We’re looking for a leader who gets this shift and knows how to drive it. As Manager, Cloud Platform, you will lead both our platform engineering and cloud operations functions. Platform self-service is our north star, and agentic workflows are how we get there. Your mandate is to build the systems — agents, golden paths, automation frameworks, and security guardrails — that allow product and engineering teams to interact with the cloud platform entirely through AI-driven interfaces, without ever needing to file a ticket or wait for a human handoff. Security is a first-class concern for this role. You will own the security posture of the platform layer — ensuring identity, access, and compliance controls are enforced automatically through code and agents, not manual review. This is a dual mandate: build the agentic platform that eliminates operational toil, while ensuring the platform remains secure, compliant, and trusted by the organization.What You’ll Do Build the Agentic-First Platform

Design and lead the build-out of an agentic platform operating model — where AI agents (Claude, GitHub Copilot, and custom agents) are the primary interface between product teams and cloud infrastructure

Replace manual ticketing workflows with agent-driven request handling: developers describe what they need in natural language or via CLI, and agents generate, validate, and apply the required Terraform or configuration changes

Build agent workflows that guide product teams through infrastructure onboarding, access requests, environment bootstrapping, and compliance checks — without requiring Cloud Platform team intervention

Establish GitHub as the operational backbone: issues, PRs, documentation, and agent interactions all flow through a GitHub-native model

Instrument agents with awareness of platform standards, security guardrails, and organizational context — so they enforce policy automatically rather than escalating to humans

Define and communicate the agentic roadmap to senior leadership, engineering teams, and product stakeholders

Own Platform Security & Compliance

Own the security posture of the cloud platform layer — ensuring identity, access, and network controls are implemented consistently and enforced through automation across GCP, AWS, and Azure

Implement and maintain security guardrails at the organization and pipeline levels, ensuring all infrastructure provisioned through the platform meets baseline security and compliance requirements

Lead IAM governance: role binding, access provisioning, key rotation, service account hygiene, and Workload Identity Federation — with a goal of automating these controls through agents and policy-as-code

Partner with the Security team to ensure platform capabilities align with organizational security standards and support audit requirements (SOC 2, PIPEDA, HIPAA-aligned practices)

Build security into the self-service golden paths — so that teams provisioning infrastructure through approved patterns inherit secure defaults automatically

Treat security findings as engineering problems: prioritize remediation through code, automation, and agent enforcement rather than manual review cycles

Own the Self-Service Platform & Golden Paths

Design opinionated “golden path” frameworks using Terraform, Terragrunt, and GitHub Actions that standardize and secure infrastructure patterns across GCP, AWS, and Azure

Build and maintain a centralized module marketplace and IaC library that teams and agents can consume confidently

Ensure all self-service capabilities are agent-accessible — designed for both human and programmatic consumption from day one

Establish clear support boundaries: teams using the golden path get full support; non-standard configurations are self-supported

Lead Cloud Operations

Ensure operational coverage across the multi-cloud estate: GCP, AWS, and Azure

Lead incident management with a focus on durable remediation — every significant incident produces agent runbooks, automation, or documentation that prevents recurrence

Drive down request volume through agentic self-service, not headcount scaling — treating high ticket volume as an engineering problem to be automated away

Coordinate with the SRE and observability teams to ensure platform services meet reliability expectations and incidents are routed and resolved efficiently

Drive Engineering Excellence

Build and maintain CI/CD pipelines and Infrastructure-as-Code to automate provisioning, configuration management, patching, and compliance enforcement

Contribute to the golden image factory initiative — ensuring CIS-hardened, patched base images are available on-demand across all cloud platforms

Champion a “security as code” mindset across the team — policy enforcement, compliance checks, and access controls are implemented in pipelines and agents, not spreadsheets

Lead, Coach & Develop Your Team

Manage a blended team of platform engineers and cloud operations engineers, with a deliberate focus on growing agent-building, automation, and security engineering skills

Hire for engineers who are energized by building AI-driven, security-first systems — not just operating existing ones

Foster a learning culture — create space for the team to grow in agentic development, cloud security, certifications, and IaC alongside day-to-day responsibilities

Help shape and evolve team ceremonies and ways of working and contributing to how the team structures its delivery cadence, retrospectives, and planning without being the sole driver of execution

Collaborate Across the Organization

Partner with Product, Engineering, Security, and Architecture teams to align platform and agentic capabilities with organizational priorities

Serve as the internal champion for agentic workflows — helping product and engineering teams understand how to interact with the platform through agents rather than manual processes

Report on platform adoption, agent utilization, security posture, and toil-reduction progress to senior leadership

Qualifications What You’ll Need Leadership & Mindset

5+ years of progressive experience in cloud platform engineering or cloud operations — with at least 2 years in a people management or technical leadership role

A genuine belief in agentic-first, security-first workflows and a track record of building automation that replaces manual processes — not just augments them

Experience leading teams through transformation: from reactive, ticket-driven operations toward proactive, agent-driven platform delivery

Strong communication skills — able to translate platform complexity into clear narratives for executive leadership and business stakeholders

Comfortable operating in ambiguity and driving change in an environment that is still evolving

Technical Depth

Hands-on experience across at least two of GCP, AWS, and Azure — with a solid grasp of identity, networking, compute, and security controls at scale

Deep expertise in Infrastructure-as-Code (Terraform, Terragrunt) and the ability to design secure, reusable, opinionated module libraries

Experience building or working with AI agents and agentic workflows — including prompt engineering, tool use, and integrating agents with CI/CD systems and infrastructure APIs

Strong understanding of cloud security fundamentals: IAM, RBAC, service accounts, Workload Identity Federation, network security, and secrets management

Experience implementing policy-as-code and automated compliance enforcement in multi-cloud environments

Proficiency in at least one scripting/programming language (Python, Go, Bash) — you write code, not just YAML

Experience building developer-facing self-service platforms, including CLI tools, GitHub Actions workflows, and chat-based interfaces

Operational Excellence

Proven track record of reducing operational toil through automation — with concrete examples of what you built and how it measurably reduced burden

Experience managing incident response at scale, including post-mortem facilitation and follow-through on action items

Familiarity with request and workflow management practices — and an instinct for treating high request volume as an engineering problem to be automated away

Understanding of security and compliance requirements in regulated healthcare environments (SOC 2, HIPAA-aligned practices, PIPEDA)

Education & Certifications

Bachelor’s degree in Computer Science, Engineering, or a related technical field — or equivalent practical experience

Cloud Certifications (Required — at least one): AWS Solutions Architect (Associate or Professional), GCP Professional Cloud DevOps Engineer, or Azure Administrator Associate

Cloud Certifications (Preferred — additional): GCP Professional Cloud Architect, AWS DevOps Engineer Professional, Azure DevOps Engineer Expert

DevOps / Platform: CKA (Certified Kubernetes Administrator) or equivalent practitioner-level credential is a strong asset

Nice-to-haves

Experience designing or operating agentic systems in a production engineering context — including LLM tool use, agent orchestration, or AI-driven workflow automation

Familiarity with GitHub Copilot, Claude, or similar AI coding/operations tools in an enterprise setting

Experience with cloud security posture management (CSPM) tooling and integrating security findings into automated remediation workflows

Experience supporting large-scale infrastructure modernization or cloud adoption programs

Experience with identity federation and SSO administration across multi-cloud environments

Background in regulated healthcare IT — understanding of patient-facing or clinical systems

Experience with FinOps principles and cloud cost attribution

Familiarity with enterprise collaboration and development tooling as both a user and administrator

Advanced knowledge of English is required because you will most of the time interact in English with internal parties (colleagues, internal partners, stakeholders, etc.); and work with IT tools whose interface is only accessible in English as part of this position's main responsibilities given its national scope. #LI-REMOTE

About Telus

Telecommunications

Website

Similar Jobs