Research Engineer

GW03010525
  • $180,000-$250,000
  • Palo Alto
  • Permanent

Shape the Future of Conversational AI

About Us

We are a public benefit corporation dedicated to harnessing advanced large language models to create an AI platform tailored for enterprise needs, with a particular focus on conversational AI. Our team is composed of friendly, innovative, and collaborative individuals committed to developing impactful AI solutions.

 About the Role: Research Engineer (Inference)

As a key player in our commitment to deploying high-performance models for enterprise applications, you will be part of our inference team, which focuses on ensuring that these models operate efficiently and effectively in real-world scenarios. Research engineers will optimize model inference processes, minimize latency, and enhance throughput while maintaining model performance, all to ensure robust deployment in enterprise settings.

 Key Responsibilities

  • Deploy and optimize large language models (LLMs) for inference in both cloud and on-premises environments.
  • Utilize model optimization and acceleration tools and frameworks, such as ONNX, TensorRT, or TVM.
  • Tackle complex challenges related to model performance and scalability.
  • Understand the trade-offs involved in model inference, including hardware limitations and real-time processing needs.
  • Demonstrate proficiency in PyTorch and be familiar with infrastructure management tools like Docker and Kubernetes for deploying inference pipelines.

What We Are Looking For

If you have a strong background in deploying and optimizing LLMs, enjoy solving intricate problems, and have a deep understanding of model inference challenges, we would love to hear from you! Join us in building impactful enterprise AI solutions that will shape the future.

Victor Pascoe ML Research & Engineering Recruiter

Apply for this role