This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior SRE DevOps Engineer in Canada.
This is a high-impact role at the intersection of software engineering and cloud operations, focused on building and maintaining resilient, large-scale infrastructure for real-time communication systems. You will design, automate, and optimize cloud-native environments that support mission-critical connectivity under strict latency and reliability constraints. The position combines hands-on coding with deep operational ownership, empowering you to shape infrastructure strategy while improving developer productivity. Working in a remote-first, highly technical environment, you’ll collaborate across engineering teams to ensure scalability, security, and performance. If you thrive on solving distributed systems challenges and building production-grade reliability tooling, this role offers both ownership and influence.
Accountabilities:
- Designing and implementing SLI/SLO frameworks with error budgets to guide reliability and performance decisions.
- Building and maintaining AWS-based production infrastructure using Infrastructure as Code (Terraform, CloudFormation), including ECS, EKS/Kubernetes, and microservices orchestration.
- Developing internal tools, automation frameworks, and reliability services in TypeScript, Python, or similar languages to enhance operational efficiency.
- Leading incident response processes, conducting root cause analyses, and creating automated runbooks to reduce MTTR.
- Architecting and maintaining CI/CD pipelines for backend services, mobile applications, and IoT firmware across cloud and on-prem environments.
- Implementing comprehensive observability using OpenTelemetry, distributed tracing, metrics exporters, and alerting systems.
- Managing data services such as PostgreSQL (RDS), Redis/ElastiCache, SQS, and networking components (ALB/NLB, VPC, IAM).
- Enforcing strong security standards, including IAM policies, encryption, secrets management, vulnerability management, and compliance auditing.
Requirements:
The ideal candidate is both a strong software engineer and an experienced platform reliability expert. Key qualifications include:
- 7+ years of experience in SRE, DevOps, or Platform Engineering roles with daily hands-on coding responsibilities.
- Proficiency in at least one backend language (TypeScript/Node.js, Python, or Go) for developing automation tools, internal services, and reliability frameworks.
- Deep expertise in AWS services (ECS, EKS, RDS, ElastiCache, SQS, VPC, IAM, CloudWatch).
- Strong experience with Infrastructure as Code tools (Terraform, CloudFormation, or Pulumi), including modular design and state management.
- Proven experience designing and maintaining CI/CD pipelines in both cloud and on-prem environments.
- Solid understanding of container orchestration (Docker, Kubernetes, Helm) and distributed systems patterns such as circuit breakers, retries, and graceful degradation.
- Experience operating production databases (PostgreSQL, Redis) and message queues.
- Strong security knowledge covering network segmentation, encryption, secrets management, and incident response.
- Preferred experience with real-time communication infrastructure (SIP, RTP, WebRTC), telecom systems, IoT pipelines, or satellite/low-bandwidth optimization environments.
Benefits:
- Competitive compensation package
- Flexible remote work environment with autonomy and ownership
- Opportunity to build and scale critical communication infrastructure
- Exposure to cutting-edge technologies across cloud, IoT, telecom, and distributed systems
- High-impact role with direct influence on reliability and platform architecture
- Collaborative, technically advanced engineering culture