欢迎访问 计算机与信息技术学院(大数据学院)
当前位置: 首页 » 师资队伍 » 讲师 » 马亿
  • 马亿

    最终学位:博士

    导师类型:

  • 电子邮箱:

    联系电话:0351-7010566

  • 研究方向:强化学习、具身智能

  • 个人简介
  • 学术论文
  • 科研项目

马亿,博士,讲师,师从郝建业教授,2024年博士毕业于天津大学智能与计算学部强化学习实验室,研究方向为强化学习、具身智能及强化学习的应用。近年来,在NeurIPS、ICML、ICLR、AAAI、IJCAI、KDD、CIKM等人工智能和数据挖掘国际顶级会议上发表论文20余篇,并担任各类国际会议和期刊审稿人。曾获华为2012实验室创新先锋二等奖、NeurIPS 2022自动驾驶比赛双赛道冠军、军科委全国兵棋推演大赛一等奖、天津大学优秀博士论文、NeurIPS Top Reviewer等荣誉,研究成果在阿里妈妈广告竞价、华为物流运输、问界自动驾驶、支付宝协商推荐场景等多个场景进行试点和应用。

更多信息详见:https://mayi1996.top/

[1] Ma Y, Hao J, Liang H, Xiao C. Rethinking Decision Transformer via Hierarchical Reinforcement Learning. ICML. 2024. (CCF A类会议)

[2] Ma Y, Tang H, Li D, Meng Z. Reining Generalization in Offline Reinforcement Learning via Representation Distinction. NeurIPS. 2023. (CCF A类会议)

[3] Ma Y, Wang C, Chen C, Liu J, Meng Z, Zheng Y, Hao J. OSCAR: OOD State-Conservative Offline Reinforcement Learning for Sequential Decision Making. CAAI Artificial Intelligence Research. 2023

[4] Ma Y, Hao X, Hao J, Lu J, Liu X, Tong X, Yuan M, Li Z, Tang J, Meng Z. A Hierarchical Reinforcement Learning Based Optimization Framework for Large-scale Dynamic Pickup and Delivery Problems. NeurIPS. 2021. (CCF A类会议)

[5] Liang H (共同一作), Ma Y (共同一作), Cao Z, Liu T, Ni F, Li Z, Hao J. SplitNet: A Reinforcement Learning Based Sequence Splitting Method for the MinMax Multiple Travelling Salesman Problem. AAAI. 2023. (CCF A类会议)

[6] Hao X (共同一作), Peng Z (共同一作), Ma Y (共同一作), Wang G, Jin J, Hao J, Chen S, Bai R, Xie M, Xu M, Zheng Z. Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising. ICML. 2020. (CCF A类会议)

[7] Liu L (共同一作), Ma Y (共同一作), Zhu X, Yang Y, Hao X, Wang L, Peng J. Integrating Sequence and Network Information to Enhance Protein-Protein Interaction Prediction Using Graph Convolutional Networks. BIBM. 2019. (CCF B类会议)

[8] Liu J, Hao J, Ma Y, Xia S. Imagine Big from Small: Unlock the Cognitive Generalization of Deep Reinforcement Learning from Simple Scenarios. ICML. 2024. (CCF A类会议)

[9] Zhao K, Hao J, Ma Y, Liu J, Zheng Y, Meng Z. ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles. IJCAI. 2024. (CCF A类会议)

[10] Yuan Y, Hao J, Ma Y, Dong Z, Liang H, Liu J, Feng Z, Zhao K, Zheng Y. Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback. ICLR. 2024. (CAAI & 清华A类会议)

[11] Liu J, Ma Y, Hao J, Hu Y, Zheng Y, Lv T, Fan C. A Trajectory Perspective on the Role of Data Sampling Techniques in Offline Reinforcement Learning. AAMAS. 2024. (CCF B类会议)

[12] Liang H, Dong Z, Ma Y, Hao X, Zheng Y, Hao J. A Hierarchical Imitation Learning-based Decision Framework for Autonomous Driving. CIKM. 2023. (CCF B类会议)

[13] Sang T, Tang H, Ma Y, Hao J, Zheng Y, Meng Z, Li B, Wang Z. PAnDR: Fast Adaptation to New Environments from Offine Experiences via Decoupling Policy and Environment Representations. IJCAI. 2022. (CCF A类会议)

[14] Ni F, Hao J, Lu J, Tong X, Yuan M, Duan J, Ma Y, He K. A Multi-Graph Attributed Reinforcement Learning based Optimization Algorithm for Large-scale Hybrid Flow Shop Scheduling Problem. KDD. 2021. (CCF A类会议)

[15] Wang H, Tang H, Hao J, Hao X, Fu Y, Ma Y. Large Scale Deep Reinforcement Learning in War-games. BIBM. 2020. (CCF B类会议)

[16] Zhang P, Hao J, Wang W, Tang H, Ma Y, Duan Y, Zheng Y. KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human Suboptimal Knowledge. IJCAI. 2020. (CCF A类会议)

[1] 通用博弈智能决策关键技术, 科技创新2030“新一代人工智能”重大项目课题, 参加

[2] 数据与知识双驱动的智能决策理论与方法研究, 国家自然科学重大研究计划培育项目, 参加

[3] 基于多智能体强化学习的多XXX协同策略研究, 国防科技创新特区项目课题, 参与

[4] 临机场景下XXXXX技术, 航天某院项目, 学生负责人

[5] 面向XXXXX的智能在线策略调度技术, 航天某院项目, 学生负责人