What We're Looking For
Are you a Machine Learning Engineer focused on building systems that power avatars and agents? ALETHIA AI is looking for a Generative AI / LLM Engineer to train, tune, and deploy models across vision, audio, and LLMs, integrating them directly into products.
As our Generative AI / LLM Engineer, you will ship pipelines, optimize latency, and support prompt tuning for LLM behaviors. We need a builder who writes Python, understands tradeoffs, and delivers reliable models that improve experience, engagement, and growth.
Key Responsibilities
- Develop, fine-tune, and adapt LLMs or VLMs for conversational, multi-turn, and real-time avatar interactions with strong grasp of how LLMs learn and behave, including in-context learning, prompt sensitivity, function calling, fine-tuning, and multi-turn reasoning patterns.
- Read research papers on LLM or VLM techniques such as personas, multi-turn reasoning, contextual memory, RL agent fine-tuning, and perception models, then implement prototypes that translate to production systems.
- Build context engineering systems including short and long-term conversational memory, RAG pipelines with vector stores, and real-time data integration for grounded multi-turn conversations.
- Develop methods for verbal and non-verbal communication in avatars, including persona consistency, facial expressions, speech patterns, and real-time behavioral adaptation.
- Build perception pipelines integrating vision, audio, and language modalities for real-time avatar systems while coordinating with Voice and CV Engineers on multimodal interaction flows.
- Deploy LLMs and multimodal systems at scale by building APIs, inference endpoints, and serving infrastructure optimized for latency, throughput, GPU and CPU costs, and reliability for real-time avatar applications.
- Build and maintain pipelines for privacy-preserved insights systems across structured and unstructured datasets, including conversational corpora, avatar data, audio, and multimodal datasets.
- Build evaluation frameworks and monitoring systems to track reasoning quality, consistency, hallucination rates, persona alignment, memory fidelity, and drift while troubleshooting inference issues and iterating rapidly.
- Collaborate with product, design, and creative teams to translate conversational and avatar requirements into prompt pipelines, memory systems, and behavior controls while providing technical guidance on feasibility, capabilities, and tradeoffs.
- Apply prompting techniques for voice synthesis and image-to-video models to achieve natural prosody, pronunciation accuracy, and avatar generation.
Requirements
Technical Skills
- Bachelor's or Master's in CS, ML, AI, or related field, or equivalent hands-on experience.
- Backend Python skills (FastAPI, Flask, Django) for writing clean, scalable APIs and microservices that orchestrate LLM workflows.
- Hands-on with LLMs or VLMs and understanding of LLM agents, including multi-turn reasoning, function calling, tool use, conversational memory, and state management.
- Strong grasp of context engineering, including short and long-term memory management, RAG pipelines, information retrieval, and vector stores.
- Experience building chat and dialogue agents, including conversation management, contextual memory, and multi-agent coordination.
- Ability to architect end-to-end data and prompt engineering pipelines that ensure rich user experience aligned with product requirements.
- Experience with LLM serving frameworks such as vLLM or TensorRT-LLM and deployment on serverless GPU platforms such as Modal or RunPod.
- Strong debugging mindset for profiling inference bottlenecks, optimizing latency, and troubleshooting agent behavior.
Soft Skills
- Strong problem-solving with attention to detail.
- Clear communication to cross-functional teams.
- Collaborative mindset in fast-paced environments.
Nice to Have
- Experience with multimodal systems that combine text with vision, audio, or video in production settings.
- Familiarity with model evaluation harnesses, guardrails, or red-teaming workflows for LLM applications.
- Hands-on exposure to observability stacks for AI services, including tracing, latency monitoring, and cost analysis.
- Background in personalization, agent orchestration, or real-time interactive AI systems.
Remuneration
Competitive, with role-aligned incentives and growth opportunities.
Apply Now