Connecting World's top Talents with Premier Jobs and Networking.
Register
Connecting World's top Talents with Premier Jobs and Networking.

智能体开发平台-训练加速工程师-北京

Apply instagram Share link

Job Source

腾讯集团

Location

China, Shanghai

Salary

Negotiable

Job Type

Full Time

Language

Job Posted Date

20-06-2025

Job Description

1.负责大规模语言模型(LLM)分布式训练系统的性能优化,包括数据并行、模型并行、流水线并行(Pipeline Parallelism)等策略的工程实现与效率提升;
2.基于NVIDIA/AMD GPU硬件特性(如NVLink、InfiniBand互联、显存带宽优化),设计并优化分布式训练框架(如Megatron-LM、ColossalAI)的核心模块;
3.解决大模型训练中的显存瓶颈、通信延迟、计算负载不均衡等问题,开发高效显存管理、梯度压缩、混合精度训练等技术;
4.针对特定场景(如DeepSeek系列模型),优化DualPipe等定制化训练流水线,实现端到端训练吞吐量提升;
5.跟踪LLM训练技术前沿(如3D并行、ZeRO优化、动态计算调度),推动训练框架的迭代与创新。

Job Requirements

1.熟悉NVIDIA CUDA/AMD ROCm编程,具备GPU内核优化经验(如PTX指令调优、显存带宽优化);
2.精通Megatron-LM、DeepSpeed或Colossal-AI等分布式训练框架,有千亿参数模型并行训练实战经验;
3.熟悉大模型训练全流程优化(数据加载、梯度累积、通信压缩等),能通过Profiling工具定位性能瓶颈;
4.优先条件,熟悉异步强化学习训练框架(如VeRL、AReaL)、参与过Agentic RL训练优化,或者有DeepSeek系列模型优化经验(如DualPipe调度、MLA注意力优化)、参与过相关开源项目贡献等经验者 优先。。加分项:



腾讯集团




Just one more quick step more to complete your application!

 

Welcome to Linkedtour! Please complete your profile first and then enjoy your trip in Linkedtour!

 

Just one more quick step more to complete your application!

 

Please complete now your information at our partner site and click to apply. Good luck !