Senior Site Reliability Engineer (SRE) - Application Support
Royal Bank of Canada Voir toutes les offres
- Toronto, ON
- Permanent
- Temps-plein
- Design, implement, and maintain scalable, highly available infrastructure for application support.
- Collaborate with development teams to ensure applications are designed for reliability, observability, and operational excellence.
- Develop and maintain automation tools and scripts to streamline deployment, monitoring, and incident response.
- Monitor system performance, troubleshoot issues, and optimize application workflows.
- Lead incident response and post-mortem analysis to prevent recurring issues.
- Ensure compliance with security and regulatory standards in all operational processes.
- Strong Technical Expertise – Proficiency in cloud platforms (Azure), containerization (Kubernetes), Apache Airflow, Microsoft Databricks, Dynatrace, Splunk to design and maintain scalable, reliable systems.
- Automation & Scripting Skills – Ability to develop and maintain automation tools and scripts (e.g., Python, Bash) to streamline deployments, monitoring, and incident response.
- Problem-Solving & Troubleshooting – Proven ability to diagnose and resolve complex system issues efficiently, especially under pressure, while leading incident response and post-mortem analyses.
- Collaboration & Communication – Strong teamwork skills to bridge gaps between development and operations teams, ensuring applications are built for reliability and observability.
- Knowledge and experience in cyber security and technology risk domains, including risk assessment and mitigation strategies.
- Strategic thinker with excellent interpersonal skills to work across technical functions and business units