【Parrallel-R1代码实现】sft-v1

数据生成模版修改为

Solve the following problem step by step.
During the reasoning process, whenever you encounter a step that may benefit from multiple perspectives or independent reasoning, insert a parallel block at that point.

follow this format:
1. Start with <Parallel> to begin parallel reasoning
2. Generate multiple <Path> blocks, each with a different approach
3. Within each <Path>, you can generate Python code
   •  Python code package by: ```python  <your code> ```
   •  Code output  package by: ```output   <python output>```

4. After all paths, generate <Summary> to combine insights
5. Provide your final answer with the format: Final Answer: <answer>

Example:
<Parallel>
<Path>Approach 1: use Python```python  123+456 ``` ```output 579 ```
So the result is  579</Path>
<Path>Approach 2: direct calculation: 123+456=579</Path>
</Parallel>
<Summary>Both methods confirm the answer</Summary>
Final Answer: 579

Remind:
1. You can only use tools once in each path.
2. You can only use one time of parallel reasoning.
3. You can only use one time of tool call in each path.
4. You are encouraged to use tools in each path to solve the problem.
5. You can mock the Python tools response


Problem:{problem}

sft模版修改为(和数据生成模版的区别在于,少了一行You can mock the Python tools response):

Solve the following problem step by step.

During the reasoning process, whenever you encounter a step that may benefit from multiple perspectives or independent reasoning, insert a parallel block at that point.\n\nfollow this format:
1. Start with <Parallel> to begin parallel reasoning
2. Generate multiple <Path> blocks, each with a different approach
3. Within each <Path>, please integrate natural language reasoning with programs to solve the problem.
   •  Python code package by: ```python\n  <your code> \n```
   •  Code output  package by: ```output\n   <python output> \n```
4. After all paths, generate <Summary> to combine insights
5. Provide your final answer with the format: Final Answer: <answer>

Example:
<Parallel>
  <Path>Approach 1: use Python```python\n  123+456\n``` ```output\n 579\n```\nSo the result is  579</Path>
  <Path>Approach 2: direct calculation: 123+456=579</Path>
</Parallel>
<Summary>Both methods confirm the answer</Summary>
Final Answer: 579

Remind:
1. Do Not write code outside of the <Path> block.

Problem:

同样的,使用gpt-4.1生成gsm8k数据集的回答,作为sft数据

SFTDataset代码修改:

对于代码的输出内容(包裹在\(\text{```output}\) 标签中),需要计算\(\text{```output}\) 标签的损失(因为\(\text{```output}\)是模型自己生成的), 不计算\(\text{```output}\) \(\text{```}\)中间的工具返回结果的损失,不计算工具调用末尾标签\(\text{```}\)的损失(因为末尾标签是外部插入的)

问题

不同path之间的注意力mask掉,所以在实际生成时,模型实际上不知道哪个path有效,哪个path无效

posted @ 2025-12-25 09:59  Brain404  阅读(10)  评论(0)    收藏  举报