itjobs.ca Logo
IBM logo

Senior Software Engineer - Confluent (AI Tooling)

IBMabout 6 hours ago
Remote
Senior Level
Full-Time

About the role

Introduction

At IBM Software, we transform client challenges into solutions. Building the world’s leading AI-powered, cloud-native products that shape the future of business and society. Our legacy of innovation creates endless opportunities for IBMers to learn, grow, and make an impact on a global scale. Working in Software means joining a team fueled by curiosity and collaboration. You’ll work with diverse technologies, partners, and industries to design, develop, and deliver solutions that power digital transformation. With a culture that values innovation, growth, and continuous learning, IBM Software places you at the heart of IBM’s product and technology landscape. Here, you’ll have the tools and opportunities to advance your career while creating software that changes the world. With Confluent, data doesn’t sit still. We put information in motion, streaming in near real time so organizations can react faster, build smarter, and deliver experiences as dynamic as the world around them.

Your Role And Responsibilities

The Cloud Reliability team at Confluent builds the infrastructure and tooling that keeps Confluent Cloud reliable, secure, and operable at scale. This role is on the AI side of that mission: building an operations assistant that brings agentic workflows to incident investigation and troubleshooting. The system unifies information and actions from across Confluent's operational systems into an AI-enhanced experience that helps engineers find root causes faster.

We are looking for engineers with a passion for applying AI to large-scale operational problems. This role provides an opportunity to work across multiple domains, including agentic systems and tool use, large language model integration, evaluation and quality, and the distributed infrastructure that makes AI-driven operations reliable and secure.

What You Will Do

Design, build, and operate an AI-powered operations system that engineers across Confluent rely on for investigation and troubleshooting. Build the tools and integrations that let AI systems securely query operational data and act on Confluent's infrastructure. Develop agentic workflows and reasoning loops, and improve their accuracy, latency, and reliability over time. Build evaluation frameworks and feedback loops to measure and continuously improve the quality of AI-generated investigations.. Strengthen the security and access model around AI-driven operations, including least-privilege access and comprehensive auditing. Partner with cross-functional engineering teams to turn operational expertise into automated, AI-assisted workflows.

Preferred Education

Master's Degree

Required Technical And Professional Expertise

Strong software engineering fundamentals and experience building and operating production systems. Proficiency in Python Experience integrating large language models into real applications A practical, product-minded approach to ambiguous problems and the ability to iterate quickly. A self-starter with strong problem-solving skills and the ability to work in a fast-paced environment.

Preferred Technical And Professional Experience

Experience building agentic systems, LLM tool use, or retrieval-augmented applications. Familiarity with the Model Context Protocol (MCP) or similar tool-integration frameworks. Experience designing evaluation and quality measurement for AI systems. A background in distributed systems, cloud infrastructure (Kubernetes), or operational tooling. Experience with security and access control for systems that act on production infrastructure.

About IBM

IT Services and IT Consulting