[置顶] 自我博弈偏好优化(Self-Play Preference Optimization,SPO)能否奖励模型?
posted @ 2025-08-22 11:07 limingqi 阅读(176) 评论(0) 推荐(0)
posted @ 2025-08-22 11:07 limingqi 阅读(176) 评论(0) 推荐(0)
posted @ 2025-07-26 12:48 limingqi 阅读(67) 评论(0) 推荐(0)
posted @ 2025-07-26 12:47 limingqi 阅读(130) 评论(0) 推荐(0)
posted @ 2026-03-27 14:06 limingqi 阅读(18) 评论(0) 推荐(0)
posted @ 2026-03-24 10:14 limingqi 阅读(26) 评论(0) 推荐(0)
posted @ 2026-03-20 18:11 limingqi 阅读(10) 评论(0) 推荐(0)
posted @ 2026-03-18 16:11 limingqi 阅读(7) 评论(0) 推荐(0)
posted @ 2026-03-18 15:40 limingqi 阅读(7) 评论(0) 推荐(0)
posted @ 2026-03-18 15:25 limingqi 阅读(7) 评论(0) 推荐(0)
posted @ 2026-03-18 13:54 limingqi 阅读(4) 评论(0) 推荐(0)
posted @ 2026-03-18 13:49 limingqi 阅读(3) 评论(0) 推荐(0)
posted @ 2026-03-18 10:57 limingqi 阅读(3) 评论(0) 推荐(0)
posted @ 2026-03-09 11:36 limingqi 阅读(16) 评论(0) 推荐(0)