New

High-performance AI inference solutions Engineer

Advanced Micro Devices, Inc.
$163,360.00/Yr.-$245,040.00/Yr.
United States, California, San Jose
2100 Logic Drive (Show on map)
Feb 21, 2026
WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences-from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges-striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career. THE ROLE: The AMD AI Group (AIG) is seeking an experienced MTS/Senior Software Development Engineer to drive high-performance AI inference solutions on AMD Instinct GPUs. This role combines deep expertise in compiler technology, GPU kernel optimization, and modern deep learning frameworks to deliver production-grade inference performance across AMD's current and next-generation accelerator lineup - from MI300X and MI350/MI355X shipping today, to future GPU product lineups. You will work at the intersection of model optimization, kernel development, and serving infrastructure to ensure AMD GPUs deliver world-class inference throughput and latency. AMD is looking for a specialized software engineer who is passionate about improving the performance of key applications and benchmarks. You will be a member of a core team of incredibly talented industry specialists and will work with the very latest hardware and software technology. THE PERSON: The ideal candidate should be passionate about software engineering and possess leadership skills to drive sophisticated issues to resolution. Able to communicate effectively and work optimally with different teams across AMD. KEY RESPONSIBILITIES: Design,optimize, and benchmark AI inference pipelines for large language models (LLMs), vision-language models (VLMs), and transformer architectures on AMD Instinct GPUs usingROCm, HIP, and MLIR. Lead compiler-level optimizations for inference workloads, including LLVM instruction scheduling, MLIR dialect development, and performance-critical optimization passes for AMD GPU targets. Develop andoptimizehigh-performance GPU kernels (GEMM, attention mechanisms, custom operators) with deep attention to memory hierarchy, VGPRutilization, compute-communication overlap, and warp-level scheduling. Drive integration and performance optimization of AMD Instinct GPUs within inference serving frameworks such asvLLM,SGLang, andTorchServe- ensuring day-zero readiness for new GPU launches. Build forward-looking inference software for next-generation hardware:optimize forHBM4 memory hierarchies, new FP4/FP6 data types, and scale-up interconnects on MI450 and MI500 series GPUs. Architect graph neural network (GNN) basedQoRestimation models for compiler design space exploration and automated performance budgeting across GPU generations. Collaborate with silicon architecture teams to provide software-informed feedback on next-generation Instinct GPU designs, ensuring inference workload characteristics are reflected in hardware decisions. Develop quantization-aware training and post-training quantization pipelines to maximize model performance on AMD's evolving data type support (FP8, FP6, FP4). Contribute to AMD's corecomputelibraries (BLAS, HPC, Graph) with a focus on inference-critical primitives and cross-generational performance portability. Benchmark, profile, and resolve performance bottlenecks in distributed inference systems, including tensor parallelism scaling, RCCL communication patterns, and multi-GPU serving on Helios rack-scale infrastructure. Required Experience: Experience in high-performance computing, AI inference, GPU kernel development, orhardware-softwareco-design. Strongproficiencyin C++, Python, and CUDA/HIP with hands-on experience writing andoptimizingGPU kernels for AI workloads. Deep understanding of compiler infrastructure (LLVM, MLIR) and experience with compiler optimization passes targeting GPU architectures. Experience with deep learning frameworks (PyTorch, TensorFlow) and inference serving systems (vLLM,SGLang,TorchServe, or equivalent). Demonstrated ability to analyze andoptimizeGPU kernel performance: memory coalescing, occupancy tuning, register pressure management, and instruction-level optimization. Strong mathematical foundations in numerical computing, linear algebra, and optimization algorithms Preferred Experience: Experience with AMDROCmecosystem, RCCL, and Instinct MI-series GPU architectures (MI300X, MI350X, MI355X). Track recordof publications in top-tier venues (FPGA, ICCAD, DAC,NeurIPS, ICML, or equivalent). Experience building GNN-based models for EDA or compiler optimization problems. Familiarity with quantization techniques (weight-activation quantization, mixed-precision inference, FP4/FP6/FP8) for production deployment. Experience with high-performancesystolicarray and attention kernel design for GPU accelerators. Contributions to open-source HPC or AI libraries (BLAS, graph analytics, sparse solvers). Experience with distributed inference systems, multi-GPU serving at scale, and rack-level AI infrastructure. Background in algorithm-hardware co-design and performance modeling across multiple GPU generations. Why AMD AI Group: Shape the inference stack for AMD's most ambitious GPU lineup ever - MI350 shipping now & Future NPI HW Work across the full stack from silicon feedback to serving frameworks, with direct impact on products deployed by the world's leading AI companies. Be part of AMD's annual GPU launch cadence - every year bringsa newarchitecture, new memory technology, and new performance frontiers to unlock. Collaborate with world-class hardware and software teams building the openROCmecosystem to challenge the AI accelerator status quo. Competitive compensation, comprehensive benefits, and a culture that values deep technical contribution and engineering excellence. PREFERRED ACADEMIC CREDENTIALS: Ph.D. or M.S. in Electrical Engineering, Computer Engineering, Computer Science, ora related field. This role is not eligible for visa sponsorship. Benefits offered are described: AMD benefits at a glance. AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process. AMD may use Artificial Intelligence to help screen, assess or select applicants for this position. AMD's "Responsible AI Policy" is available here. This posting is for an existing vacancy.