Bachelor’s/Master’s/PhD in Computer Science, AI, or related field
Experience:
3+ years in ML/AI roles, with 2+ years focused on RAG systems.
Proven experience deploying LLMs in on-prem or hybrid environments.
Proficiency with vLLM, LiteLLM, and open-source LLMs (e.g., LLAMA, Deepseek, Mistral).
Experience in introducing AI Agents/Assistants
Technical Skills:
Strong Python expertise with frameworks like PyTorch, Hugging Face Transformers, and LangChain.
Experience with vector/graph databases (e.g. Neo4j).
Familiarity with Linux-based systems and RedHat OpenShift
Soft Skills:
Ability to communicate complex AI concepts to non-technical stakeholders.
Strong problem-solving skills and adaptability in fast-paced environments.
Your responsibilities
RAG System Development:
Architect and deploy end-to-end RAG pipelines, combining retrieval mechanisms (e.g., vector databases like Neo4j) with generative models (e.g., LLAMA) for enterprise use cases.
Fine-tune and optimize retrieval models to ensure high accuracy and low latency in on-prem environments.
Model Integration & Deployment:
Implement and customize inference servers using vLLM for efficient LLM serving and LiteLLM for lightweight model orchestration.
Integrate open-source LLMs (e.g., LLAMA, Mistral) with proprietary data sources and APIs.
On-Prem Infrastructure Management:
Design GPU-optimized, scalable infrastructure for LLM training and inference, ensuring compliance with security and data governance policies.
Collaborate with DevOps teams to containerize workflows using Docker/Kubernetes and automate MLOps pipelines.
Performance Optimization:
Apply techniques like quantization, pruning, and dynamic batching to maximize resource efficiency in resource-constrained on-prem setups.
Monitor system performance, troubleshoot bottlenecks, and ensure high availability.
Cross-Functional Collaboration:
Partner with data engineers to curate and preprocess domain-specific datasets for retrieval and generation tasks.
Translate business requirements into technical solutions for stakeholders in telco environments.