
MLOps Engineer
- Ottawa, ON
- Permanent
- Temps-plein
- Develop infrastructure for distributed systems, such as Spark, Ray, and Kubernetes, using Infrastructure as Code (IaC) tools like Terraform.
- Deploy solutions through containerization with Docker, orchestration with Kubernetes or ECS, and relevant serving frameworks.
- Build and maintain robust data and machine learning pipelines utilizing Databricks workflows and Argo.
- Integrate machine learning workflows smoothly with CI/CD pipelines to streamline model building, testing, and deployment.
- Set up and manage monitoring and alerting systems to guarantee the health and performance of ML models and data pipelines.
- Document workflows, pipelines, and processes to enable team members to reuse and sustain them.
- A degree in Computer Science, Engineering, or a related field.
- Proficiency in Infrastructure as Code (IaC) tools such as Terraform and workflow orchestration systems like Argo and Databricks Workflow.
- Strong Python skills for scripting, automation, and interfacing with ML APIs and orchestration tools.
- Experience setting up CI/CD pipelines for ML projects using GitLab or similar tools.
- Experience with cloud ML platforms like Azure Databricks.
- Proficiency in distributed systems and big data technologies such as Apache Spark, Apache Ray
- Experience with containerization and orchestration platforms like Docker, Kubernetes, and serverless architecture.
- Familiarity with ML lifecycle management tools such as Kubeflow and MLflow for automating pipelines and tracking experiments.
- Good understanding of data security, model governance, and compliance within ML systems.
- Ability to troubleshoot complex issues involving infrastructure, models, and data flow.