Member of Technical Staff
- 250000
- Palo Alto, CA
- Permanent
Shape the Future of Conversational AI
About Us
We are a public benefit corporation dedicated to harnessing advanced large language models to create an AI platform tailored for enterprise needs, with a particular focus on conversational AI. Our team is composed of friendly, innovative, and collaborative individuals committed to developing impactful AI solutions.
About the Role: Research Engineer (Inference)
As a key player in our commitment to deploying high-performance models for enterprise applications, you will be part of our inference team, which focuses on ensuring that these models operate efficiently and effectively in real-world scenarios. Research engineers will optimize model inference processes, minimize latency, and enhance throughput while maintaining model performance, all to ensure robust deployment in enterprise settings.
Key Responsibilities
- Deploy and optimize large language models (LLMs) for inference in both cloud and on-premises environments.
- Utilize model optimization and acceleration tools and frameworks, such as ONNX, TensorRT, or TVM.
- Tackle complex challenges related to model performance and scalability.
- Understand the trade-offs involved in model inference, including hardware limitations and real-time processing needs.
- Demonstrate proficiency in PyTorch and be familiar with infrastructure management tools like Docker and Kubernetes for deploying inference pipelines.
What We Are Looking For
If you have a strong background in deploying and optimizing LLMs, enjoy solving intricate problems, and have a deep understanding of model inference challenges, we would love to hear from you! Join us in building impactful enterprise AI solutions that will shape the future.