Lead Site Reliability Engineer

Scalable Systems 1 day ago

Hybrid

Senior Level

CONTRACTOR

About the role

Lead Site Reliability Engineer Location: Toronto Ontario- Hybrid: 2 days in office a week Long Time Contract

Deep application and system-level knowledge across complex end-to-end environments, including tightly integrated on prem and cloud native services, supporting large-scale, multitier transaction flows Prior hands-on experience with APM and observability platforms, including Dynatrace or comparable enterprise observability tools, with the ability to instrument, analyze, and troubleshoot complex distributed applications Proven deep troubleshooting experience resolving issues across multilayer, end to end (E2E) environments, spanning application, infrastructure, network, and platform layers across on prem and cloud services The person is to drive and execute the SREWCCS Roadmap for BMO Hand-on role from day 1 Observability experience expectations please see description for Observability SME below Deep knowledge and experience in implementing SRE practices and guiding complex SRE implementations across the industry Would provide Assessments of current capability help identify gaps and contribute to the SRE WCCS roadmap Able to navigate multi-team SRE IT Ops to drive results Creative workaround and solutions SRE Observability SME Hands-on role from day 1 Day 1 Dynatrace expertise i.e. DQL Gen3 dashboards Traces on Grail Active-Gate Plugins SRG Workflow development Biz Events Prior hands-on experience with APM and observability platforms, including Dynatrace or comparable enterprise observability tools, with the ability to instrument, analyze, and troubleshoot complex distributed applications Deep troubleshooting expertise leveraging observability signals (metrics, events, logs, and traces) to identify root causes and resolve failures across multilayer E2E environments Deep background on Observability fundamentals - MELT Expert level Dashboard (related UIUX design) Experienced in troubleshooting performance non-functional issues Familiar with SRE concepts as outlined in Google SRE book workbook etc. Expertise in AWS Observability, CW, Application Signals, Metrics, logs traces, Lambda, API-GW Able to come up with creative ways to monitor observe systems like IBM Data power where sufficient observability isnt present Development with Python, AWS Lambda, ECS, Azure Functions Understands fundamentals of how AI based systems built and monitored Background or knowledge of OTEL Experienced in Financial Services are or equivalent i.e. very complex end-to-end transaction e.g. 50 systems working together to fulfil one customer request Platform Engineering experience Shipping platform capabilities (e.g., self-service onboarding pipeline, policy-as-code, golden signals-as-code, standardized instrumentation libraries). Depth of knowledge for the role Programming depth requires strong programming in Python and Node.js and building backend integrations components. Looking for Practically observability experience with multi-system integration In-depth Observability

Thanks & Regards, Ranjeet Kumar |Talent Acquisition Specialist Tel: 437-292-5839 Email id – ranjeet.kumar@scalable-systems.com Linkedin ID- www.linkedin.com/in/ranjeet-k-saini-7a583485 Web – https://www.scalable-systems.com/

About Scalable Systems

IT Services and IT Consulting

Website