
Technical Lead, DevOps/SRE
- Toronto, ON
- Permanent
- Temps-plein
- Team Leadership & People Management: Leading, mentoring, and developing a high-performing team of DevOps and SRE engineers. Fostering a collaborative, results-oriented culture focused on problem resolution, not blame.
- Operational Excellence & Stability: Taking ownership of the health, performance, and stability of our production CI/CD pipelines and SaaS Operations for our Toronto-based systems.
- Application & Infrastructure Lifecycle Management: Planning, managing, and executing application and infrastructure upgrades to ensure systems remain current, secure, and supportable.
- Process Development & Documentation: Designing, implementing, and rigorously documenting Standard Operating Procedures (SOPs) for all key operational activities, including incident response, deployments, and infrastructure changes. Driving the team's transition from ad-hoc tasks to a procedure-driven workflow.
- Strategic Planning & Execution: Leading the planning, effort estimation, and prioritization of all team activities. Implementing disciplined tracking to distinguish between planned project work and unplanned operational incidents, ensuring clear visibility and resource management.
- Stakeholder Communication: Serving as the primary technical point of contact for DevOps and SRE. Communicating effectively with C-level stakeholders, translating complex technical concepts into clear business impact and running structured, objective-driven meetings.
- Infrastructure & Automation: Overseeing the automation of public cloud resources (GCP, Azure, vSphere) using Infrastructure as Code (IaC) best practices (Terraform, Packer, SaltStack).
- SRE & Monitoring: Driving the implementation of SRE best practices to meet and exceed SLAs/SLOs. Championing the use of application and infrastructure monitoring tools (e.g., New Relic, Elastic Stack) to ensure reliability and proactive issue detection.
- Security & Compliance: Upholding and enhancing security concepts within the DevOps lifecycle and supporting security operations across the organization.
- On-Call & Incident Management: Participating in and managing an on-call rotation to support critical systems and leading the incident management process to ensure swift resolution.
- 5+ years of progressive experience in DevOps, SRE, or Infrastructure roles, with at least 1-2 years in a leadership or senior capacity.
- University degree or College diploma in Computer Science, Engineering, or a related field (or equivalent work experience).
- Proven experience in people management, with a track record of building and leading successful technical teams.
- A self-driven and proactive mindset, with the ability to identify operational needs and architect solutions without direct oversight.
- Strong, disciplined background in project planning, effort estimation, prioritization, and tracking.
- Demonstrated experience developing and implementing SRE principles and robust Standard Operating Procedures (SOPs) in a production environment.
- Exceptional communication, presentation, and interpersonal skills, with a demonstrated ability to effectively communicate with senior leadership (C-level).
- Excellent technical writing and documentation skills, with experience creating clear and concise SOPs, architectural diagrams, and reports.
- Advanced knowledge of cloud and virtualization platforms (GCP, Azure, AWS, vSphere) and expertise with Infrastructure as Code tools (e.g., Terraform, SaltStack, Ansible).
- Deep understanding of CI/CD pipelines and associated tools (e.g., Bitbucket/Git, Bamboo, Nexus, SonarQube).
- Bilingual (English/Spanish) is a strong asset.
- Experience working in an agile environment
- Familiar with the Financial Industry, Wealth Management, Investments, FinTech or other financial related fields.
- Experience working for a Wealth Management, FinTech or Enterprise software vendor.