MOTS - Inference
- 300000
- Bay Area
- Permanent
What We're Building:
We're entering an exciting new phase focused on collaborating with commercial partners to adapt and fine-tune our cutting-edge AI models for specific business needs. Our achievements in developing, aligning, and deploying state-of-the-art models for a high-EQ consumer-facing chatbot have built a strong foundation for success. With robust infrastructure, efficient finetuning processes, and ample H100 resources, this is a unique opportunity to contribute to innovation in a collaborative environment.
About Us:
We are a small, interdisciplinary AI studio that has trained several state-of-the-art language models and developed a popular personal assistant chatbot. Our focus is now on fine-tuning and deploying models for enterprise-specific use cases in partnership with commercial clients. As a public benefit corporation, we prioritize the well-being of our partners, users, and the broader community.
About the Role:
Member of Technical Staff, Research Engineer (Inference)
This role is crucial to deploying high-performance models for enterprise applications. As part of the inference team, research engineers optimize model inference processes, reduce latency, and improve throughput while ensuring robust enterprise deployment.
Ideal Candidate Will Have:
- Experience deploying and optimizing LLMs for inference in both cloud and on-prem environments
- Proficiency with model optimization frameworks like ONNX, TensorRT, or TVM
- Strong problem-solving skills for complex model performance and scaling issues
- Deep understanding of model inference trade-offs, including hardware constraints and real-time processing requirements
- Proficiency with PyTorch and familiarity with infrastructure tools like Docker and Kubernetes for inference pipelines