
Lead Site Reliability Administrator
- Waterloo, ON
- Permanent
- Temps-plein
OpenText is a global leader in information management, where innovation, creativity, and collaboration are the key components of our corporate culture. As a member of our team, you will have the opportunity to partner with the most highly regarded companies in the world, tackle complex issues, and contribute to projects that shape the future of digital transformation.OpenText™ Cybersecurity SMB & Consumer solutions help organizations protect their most valuable and sensitive information with products, services, and training. The OpenText Cybersecurity portfolio provides a broad portfolio of data security and data protection solutions that help companies prevent, detect, respond, and recover from the latest cybersecurity threats.As a global leader in secure information management, OpenText empowers businesses to stay ahead of the ever-evolving cyber threats. Our Cybersecurity Enterprise portfolio is formidable, offering innovative solutions that safeguard organizations from malicious attacks, data breaches, and cyber vulnerabilities. By joining our team, you'll be at the forefront of developing and implementing state-of-the-art security technologies, protecting critical assets and sensitive information for clients worldwide.Your ImpactYou will play a pivotal role in implementing, managing, and troubleshooting cloud infrastructure-based solutions for our organization. Your focus will be on enhancing system stability, scalability, and performance. Your expertise in cloud technologies and SRE principles will drive the success of our cloud initiatives and ensure our systems operate seamlessly.As a Lead Site Reliability Engineer (SRE) you will:
- Provide operations support for cybersecurity products and AWS infrastructure, including product configuration and functionality.
- Maintain uptime, apply patches, and enforce security across all cloud environments while ensuring compliance.
- Configure and manage cloud environments using IaC automation tools to provision, scale, and optimize resources.
- Implement SRE principles in incident response, post-incident analysis, monitoring, capacity planning, and resilience improvements.
- Develop and maintain cloud architecture documentation, technical specifications, and best practices for cross-team communication.
- Mentor junior team members while staying current with emerging cloud technologies, industry trends, and SRE methodologies.
- Bachelor’s degree in Computer Science, Information Technology, or cloud-focused certifications with 4+ years of industry experience.
- 1–5 years supporting Docker/Kubernetes, AWS (or similar cloud), and Linux/Windows system administration.
- Hands-on experience with DevOps tools (Jenkins, Git, Terraform), scripting (Python, PowerShell, shell), and databases (Aurora, Postgres).
- Strong knowledge of cybersecurity concepts and compliance frameworks (ISO 27001, GDPR, FedRAMP/FISMA, SOC 2, PCI DSS).
- Proficiency with monitoring tools (Zabbix, Grafana, CloudWatch) and familiarity with GitLab/Packer (plus).
- 5–10+ years in technical development, QE, or operations, with 5+ years building automated provisioning and deployment pipelines.
- Experience in CI/CD practices, iterative development, and managing microservices-based systems.
- Familiarity with monitoring/alerting tools (OpsGenie, VictorOps, PagerDuty) and external monitoring platforms (Pingdom, ThousandEyes).