itjobs.ca Logo
MongoDB logo

Staff Site Reliability Engineer (Fabric)

MongoDBabout 20 hours ago
Toronto
Staff
Full-Time

Top Benefits

Comprehensive health insurance
Flexible paid time off (PTO)
20 weeks paid gender-neutral parental leave

About the role

  • Platform Engineering is the department within SRE that is responsible for a range of critical infrastructure and operational functions that support the broader engineering organization
  • Among these are our multi-cloud-provider Kubernetes infrastructure, deployment machinery, and observability and alerting systems
  • The Fabric team manages the infrastructure that enables secure communication between systems and from the public internet
  • Their responsibilities encompass network architecture, service mesh, and edge load balancing, ensuring customer data remains safe in transit
  • The team plays a crucial role in developing and maintaining the reliable and globally connected multi-cloud network that supports MongoDB products
  • This role is pivotal in building and maintaining the robust infrastructure necessary for secure and efficient communication between our services
  • As an SRE on the Fabric team, you will leverage your expertise in networking, distributed systems, and automation to ensure our systems are resilient, scalable, and reliable
  • Participate in the development of a reliable and resilient multi-cloud globally-connected network that is crucial for MongoDB’s services
  • Collaborate with service-owning teams to provide internal support, addressing technical issues and offering guidance on best practices for service-to-service connectivity
  • Participate in a 24/7 on-call rotation to swiftly resolve issues related to network architecture and service-to-service connectivity, ensuring minimal disruption and high availability

Benefits

  • Rich health insurance coverage
  • Virtual & on-site fitness classes
  • Health screenings & telemedicine
  • Access to transgender-inclusive health insurance coverage
  • Global and internal mobility opportunities
  • Equity & Employee Stock Purchase Program
  • Pension & retirement programs
  • Income Protection
  • Flexible PTO is offered to every US employee & competitive time off policies for non-US employees
  • Employee Assistance Program
  • Mental health counseling
  • Free meditation app access
  • Fertility & adoption financial assistance
  • Parental counseling for new parents
  • 4 weeks of emergency care leave
  • 20 weeks of fully paid gender neutral parental leave & flexible work arrangements- We are seeking a talented Site Reliability Engineer (SRE) with a strong networking background to join the Fabric team
  • Have 10+ years of experience working on software and operating distributed systems, with deep expertise in networking fundamentals and a good understanding of how the internet works, e.g. TCP/IP (including IPv6), DNS, TLS/mTLS, BGP, tunnels, overlays, and SDN principles
  • Have a strong knowledge of service mesh and load-balancing concepts, and be eager to implement these in a multi-cloud environment
  • Be intimately familiar with modern cloud-based infrastructure and the network design primitives of at least one of AWS, Azure, or GCP, e.g. VPCs, subnetting, routing, VPNs, peering, private link / private service connect, and CDNs
  • Possess a customer-focused mindset, driving improvements that benefit end-users
  • Value efficiency in processes and operations, and display a strong preference for automation over manual processes (“allergic to ops work”)

About MongoDB

Software Development