
Team Lead - Site Reliability Engineer
- Brampton, ON
- Permanent
- Temps-plein
- Lead Reliability Engineering Initiatives – Oversee the health and performance of production systems, ensuring high availability, scalability, and fault tolerance across critical services.
- Manage & Grow the SRE Team – Mentor, support, and manage a team of SREs. Drive hiring, onboarding, and performance reviews while fostering a culture of learning and accountability.
- Drive Incident Response & Resolution – Lead incident management processes, guide root cause analysis, and implement long-term fixes to improve system resilience.
- Scale Automation & Tooling – Architect and promote automation strategies to reduce manual work and increase operational efficiency across environments.
- Guide Infrastructure as Code (IaC) Strategy – Ensure best practices in infrastructure provisioning and management using tools like Terraform, Ansible, and Kubernetes.
- Evolve CI/CD & Deployment Practices – Oversee and enhance build and release pipelines for seamless, secure, and reliable deployments.
- Cross-Team Collaboration – Work closely with engineering, product, security, and operations to align reliability goals with business priorities.
- Proven experience as a senior or lead SRE in complex, production-grade environments
- Strong leadership and people management skills, with experience leading technical teams
- Deep understanding of reliability engineering principles, SLAs/SLOs, and system design
- Expertise in scripting (Python, Bash, or Go) and automation practices
- Hands-on experience with cloud platforms (AWS, GCP, Azure) and IaC tools (Terraform, Ansible)
- Solid understanding of observability, monitoring, and alerting (Prometheus, Grafana, etc.)
- Ability to drive strategic decisions, manage priorities, and build cross-functional relationships
Candidates who are 18 years or older are required to complete a criminal background check. Details will be provided through the application process.#EN #SS #LTnA #ON