itjobs.ca Logo
HCLTech logo

Cloud Consultant

HCLTechabout 20 hours ago
Calgary, Alberta, Canada
Senior Level
Full-Time

About the role

Job Title: Azure Cloud Site Reliability Engineer (SRE) Job Title: -Calgary ,AB (Onsite )

Must Have ; -Infrastructure deploy(VMs, AKS, networks )CI/CD pipelines setu pMonitoring, scaling, patchin gTroubleshooting production issue sAutomation : Terraform, Scripti g

Role Overview: The SRE will be responsible for the reliability, availability, and performance of Azure/AWS PaaS and IaaS workloads. They bridge the gap between development and operations, focusing on building automated systems that prevent failures, managing incident responses, and optimizing cloud costs . Key Responsibilitie s System Reliability & Monitoring: Design, implement, and maintain comprehensive monitoring and alerting systems such as Azure Monitor, AWS CloudWatch, Application Insights, and Log Analytic s.Automation & Toil Reduction: Automate repetitive manual operations (toil) such as environment provisioning, system patching, and scaling. Use IaC tools like Terraform and Ansible to manage infrastructur e.Incident Response & Management: Actively manage incident responses, root cause analysis (RCA), and post-mortem investigations to improve system reliability and minimize mean time to resolution (MTTR ).Cloud SRE Agent Integration: Deploy and configure Cloud SRE Agent to automate incident investigation, execute remediation steps (restart, scale, rollback), and manage routine task s.Capacity Planning & Scalability: Analyze usage patterns to optimize cloud resources, ensuring high availability and performance while managing costs via Azure Cost Managemen t.CI/CD & DevOps Collaboration: Integrate automation workflows into CI/CD pipelines (e.g., GitHub Actions or Azure Pipelines) to ensure reliable deployment

s. Required Skills & Qualificatio ns Cloud Platforms: Expert knowledge of Microsoft Azure infrastructure services (Compute, Storage, Networking, AK S).Scripting & Programming: Proficiency in Python, Bash, or PowerShell for building automation too ls.Infrastructure as Code (IaC): Extensive experience with Terraform and ARM templates/Bic ep.Observability Tools: Experience with Azure Monitor, Grafana, Prometheus, or Datad og.Containers & Orchestration: Solid understanding of Kubernetes/AKS (Azure Kubernetes Servic e).Operating Systems: Proficient in Windows/Linux environmen ts.Azure Certification is a +Exposure to multi Cloud environment is mu

st. Typical "Day in the Life" Activit ies Reviewing Service Level Objectives (SLOs) and error budg ets.Refining auto-scaling rules for Kubernetes clusters based on traffic tre nds.Working with developers to review service architecture and ensure fault tolera nce.Configuring AI-driven alert suppression to reduce alert fati gue.Creating Azure Dashboards to visualize key performance indicators (KP

Is).

About HCLTech

IT Services and IT Consulting