MTS: Inference

4090995449
  • $180,000-$250,000
  • Palo Alto, CA
  • Permanent

Elevate AI Performance: Join Us as a Research Engineer in Model Inference!


What We're Building:

As we embark on a new phase of growth, our focus is on collaborating with commercial partners to adapt and fine-tune our state-of-the-art AI models for their unique business needs. With a strong track record in developing and deploying cutting-edge models in consumer-facing applications, we’re now channeling our expertise into optimizing these models for real-world performance. Join our team and be part of an innovative organization dedicated to pushing the boundaries of AI.


About the Role:

As a Member of Technical Staff, Research Engineer specializing in inference, you will be at the forefront of optimizing our advanced AI models for efficient and effective deployment in enterprise environments. Your work will involve fine-tuning the inference processes, reducing latency, and enhancing throughput, all while maintaining the highest standards of model performance. You'll play a critical role in ensuring that our AI solutions run smoothly and reliably in real-world scenarios.


This Role is Ideal for You If You:

  • Have hands-on experience deploying and optimizing large language models (LLMs) for inference in both cloud and on-premise environments.
  • Are proficient with model optimization tools and frameworks such as ONNX, TensorRT, or TVM.
  • Enjoy troubleshooting and resolving complex issues related to model performance and scalability.
  • Understand the trade-offs involved in model inference, including hardware constraints and real-time processing needs.
  • Are skilled in PyTorch and familiar with using Docker and Kubernetes to manage and deploy inference pipelines.

Why Join Us?

  • Autonomy & Impact: We believe in empowering individual contributors. Here, you’ll have the space and resources to lead projects, make impactful decisions, and see the direct results of your work.
  • Collaborative Culture: Our team thrives on teamwork, mutual respect, and a shared commitment to excellence. We encourage open dialogue, constructive challenges, and continuous learning.
  • Data-Driven Decisions: User feedback and performance metrics are at the core of our AI development process, guiding our priorities and ensuring we deliver the best solutions.
  • Work-Life Balance: We value your well-being and offer unlimited paid time off, flexible parental leave, and a supportive environment to help you recharge and maintain a healthy work-life balance.


Diversity & Inclusion:

We are dedicated to creating AI solutions that serve everyone. We welcome candidates from all backgrounds and are committed to building a diverse and inclusive team.

If you’re passionate about optimizing AI models for real-world applications and want to be part of a team that values innovation and collaboration, we’d love to hear from you! Apply today and showcase your best work, whether through open source contributions, personal projects, or a cover letter highlighting your proudest achievements.

Sarah Olivieri Researcher

Apply for this role