Software Engineer, ML Inference
DarkVision Voir toutes les offres
- North Vancouver, BC
- 100.000-150.000 $ par an
- Permanent
- Temps-plein
- Inference Pipeline Engineering: Develop and maintain the software that moves data through preprocessing, model execution, and post-processing. Own throughput, latency, and resource efficiency (CPU/GPU utilization, I/O, batching, and parallelism).
- Performance Engineering: Profile inference and data paths; optimize memory use and bottlenecks; measure and improve end-to-end pipeline performance with clear metrics.
- Production Software Quality: Write clean, modular, reviewed, and tested code. Participate in design discussions and uphold engineering standards for maintainability and reliability.
- Training Infrastructure and Automatic Retraining: Design and implement automated pipelines for model training and retraining. You will build systems that allow for repeatable and scalable training loops.
- Lifecycle Management: Establish and maintain best practices for model and dataset versioning. You will implement the tooling that tracks model lineage, connecting specific model versions to the exact data and hyperparameters used to create them.
- Data Integration: Write the logic required to interface with internal data ingestion systems. You will handle the efficient loading, pre-processing, and movement of data to ensure pipelines are fed correctly.
- Bachelor's degree in Computer Science, Engineering, or a related field.
- 2+ years of professional software engineering experience, including shipping and operating ML inference or deployment-related systems.
- Strong proficiency in Python, with a focus on writing clean, modular, and tested code.
- Hands-on experience profiling and optimizing Python for performance in data- or compute-heavy workloads (e.g., profiling tools, memory/CPU hotspots, batching, parallelism).
- Deep practical experience with PyTorch for model execution and integration including areas such as data loading, device placement, and efficient inference-oriented usage of the framework.
- Demonstrated experience with model optimization and deployment for inference, such as ONNX export/conversion, TensorRT (or similar runtimes), mixed precision, and/or quantization.
- Understanding of high-performance computing concepts, parallel processing, and how inference workloads behave on GPUs and in distributed or multi-process settings.
- Strong communication skills to articulate engineering trade-offs and constraints to diverse technical teams.
- Comfortable reading and debugging C/C++ (e.g., native extensions, bindings, or performance-critical libraries).
- Familiarity with compiler- or graph-based optimization paths (e.g., TorchScript, torch.compile, Triton).
- CUDA kernel development, custom operators, or deep GPU-level optimization beyond framework defaults.
- Familiarity with distributed computing frameworks (e.g., Ray, Dask).
- Experience with MLOps tools for experiment tracking and artifact management (e.g., Weights & Biases, DVC, MLFlow).
- Experience with workflow orchestration tools (e.g., Prefect, Airflow, or Dagster).
- Working understanding of SQL and databases.
eQuest