<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Cosine Decay on Answer</title>
    <link>https://answer.freetools.me/tags/cosine-decay/</link>
    <description>Recent content in Cosine Decay on Answer</description>
    <generator>Hugo -- 0.152.2</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 11 Mar 2026 15:59:24 +0800</lastBuildDate>
    <atom:link href="https://answer.freetools.me/tags/cosine-decay/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>大模型训练中的学习率调度：从线性预热到WSD策略的技术演进</title>
      <link>https://answer.freetools.me/%E5%A4%A7%E6%A8%A1%E5%9E%8B%E8%AE%AD%E7%BB%83%E4%B8%AD%E7%9A%84%E5%AD%A6%E4%B9%A0%E7%8E%87%E8%B0%83%E5%BA%A6%E4%BB%8E%E7%BA%BF%E6%80%A7%E9%A2%84%E7%83%AD%E5%88%B0wsd%E7%AD%96%E7%95%A5%E7%9A%84%E6%8A%80%E6%9C%AF%E6%BC%94%E8%BF%9B/</link>
      <pubDate>Wed, 11 Mar 2026 15:59:24 +0800</pubDate>
      <guid>https://answer.freetools.me/%E5%A4%A7%E6%A8%A1%E5%9E%8B%E8%AE%AD%E7%BB%83%E4%B8%AD%E7%9A%84%E5%AD%A6%E4%B9%A0%E7%8E%87%E8%B0%83%E5%BA%A6%E4%BB%8E%E7%BA%BF%E6%80%A7%E9%A2%84%E7%83%AD%E5%88%B0wsd%E7%AD%96%E7%95%A5%E7%9A%84%E6%8A%80%E6%9C%AF%E6%BC%94%E8%BF%9B/</guid>
      <description>深入解析大语言模型训练中学习率调度策略的技术原理与演进历程，从Warmup机制到Cosine Decay再到WSD策略，揭示为什么GPT、LLaMA等主流模型都选择了特定的学习率配置，以及不同策略在训练动态、损失景观和收敛性上的深层差异。</description>
    </item>
  </channel>
</rss>
