About the Role
Senior AI Engineer — Real-Time Video Systems
Location: Uptown Charlotte, North Carolina
We are building a high-performance, GPU-accelerated video intelligence platform operating in latency-sensitive, production environments. This role is for a senior engineer who can independently architect, optimize, and scale real-time computer vision systems without supervision.
You will own inference performance, model optimization, and production reliability end-to-end. This is not a research-only role. It is not a “train a model and hand it off” role. You will be responsible for making models fast, stable, and production-ready in live environments.
If you thrive on squeezing maximum throughput from GPUs, designing resilient inference services, and making real systems perform under load, this will be a strong fit.
What You’ll Own
• Architect and optimize GPU-accelerated inference pipelines for high-volume video streams.
• Drive performance tuning initiatives: batching strategy, frame stride, memory allocation, quantization, and hardware-level optimization.
• Implement and refine object detection systems (YOLO-class architectures or equivalent) with temporal filtering and multi-frame logic.
• Reduce false positives through tracking, smoothing, and sequence-aware event logic.
• Own latency, throughput, and VRAM efficiency metrics — and improve them.
• Integrate inference outputs into distributed, event-driven systems and cloud storage layers.
• Design production observability: metrics, logging, alerting, and fault-tolerant execution paths.
• Collaborate on dataset refinement and model iteration while maintaining a production-first mindset.
• Contribute to containerized deployment and scalable runtime infrastructure.
What We’re Looking For
• 5+ years building and shipping production ML/computer vision systems.
• Demonstrated ownership of performance-critical GPU inference pipelines.
• Deep proficiency in Python, PyTorch, and OpenCV.
• Strong hands-on experience with:
• YOLO-class detection frameworks
• ONNX and TensorRT optimization
• CUDA-level performance tuning
• Model quantization and throughput optimization
• Solid understanding of video processing fundamentals:
• Frame sampling strategies
• Temporal filtering and tracking
• Confidence calibration
• Multi-stream aggregation
• Experience deploying containerized workloads (Docker) in production.
• Ability to independently diagnose bottlenecks and implement performance improvements without direction.
Ideal Profile
• You have shipped production systems that operate continuously under load.
• You are comfortable profiling GPU memory and compute usage.
• You understand the trade-offs between accuracy, latency, and cost.
• You prefer building resilient systems over writing academic experiments.
• You require minimal oversight and are comfortable defining technical direction within your domain.
Core Technology Environment
Python, PyTorch, OpenCV, YOLO-class models, ONNX, TensorRT, CUDA, async I/O frameworks, REST/gRPC APIs, event-driven systems, cloud storage/messaging platforms, Docker, production telemetry tools.