<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>长度外推 on Answer</title>
    <link>https://answer.freetools.me/tags/%E9%95%BF%E5%BA%A6%E5%A4%96%E6%8E%A8/</link>
    <description>Recent content in 长度外推 on Answer</description>
    <generator>Hugo -- 0.152.2</generator>
    <language>zh-cn</language>
    <lastBuildDate>Thu, 12 Mar 2026 17:23:09 +0800</lastBuildDate>
    <atom:link href="https://answer.freetools.me/tags/%E9%95%BF%E5%BA%A6%E5%A4%96%E6%8E%A8/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>位置编码外推性：为什么Transformer无法处理比训练时更长的序列</title>
      <link>https://answer.freetools.me/%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81%E5%A4%96%E6%8E%A8%E6%80%A7%E4%B8%BA%E4%BB%80%E4%B9%88transformer%E6%97%A0%E6%B3%95%E5%A4%84%E7%90%86%E6%AF%94%E8%AE%AD%E7%BB%83%E6%97%B6%E6%9B%B4%E9%95%BF%E7%9A%84%E5%BA%8F%E5%88%97/</link>
      <pubDate>Thu, 12 Mar 2026 17:23:09 +0800</pubDate>
      <guid>https://answer.freetools.me/%E4%BD%8D%E7%BD%AE%E7%BC%96%E7%A0%81%E5%A4%96%E6%8E%A8%E6%80%A7%E4%B8%BA%E4%BB%80%E4%B9%88transformer%E6%97%A0%E6%B3%95%E5%A4%84%E7%90%86%E6%AF%94%E8%AE%AD%E7%BB%83%E6%97%B6%E6%9B%B4%E9%95%BF%E7%9A%84%E5%BA%8F%E5%88%97/</guid>
      <description>位置编码外推性：为什么Transformer无法处理比训练时更长的序列</description>
    </item>
    <item>
      <title>相对位置偏置如何改变Transformer的序列理解能力：从Shaw到ALiBi的七年技术演进</title>
      <link>https://answer.freetools.me/%E7%9B%B8%E5%AF%B9%E4%BD%8D%E7%BD%AE%E5%81%8F%E7%BD%AE%E5%A6%82%E4%BD%95%E6%94%B9%E5%8F%98transformer%E7%9A%84%E5%BA%8F%E5%88%97%E7%90%86%E8%A7%A3%E8%83%BD%E5%8A%9B%E4%BB%8Eshaw%E5%88%B0alibi%E7%9A%84%E4%B8%83%E5%B9%B4%E6%8A%80%E6%9C%AF%E6%BC%94%E8%BF%9B/</link>
      <pubDate>Thu, 12 Mar 2026 05:34:41 +0800</pubDate>
      <guid>https://answer.freetools.me/%E7%9B%B8%E5%AF%B9%E4%BD%8D%E7%BD%AE%E5%81%8F%E7%BD%AE%E5%A6%82%E4%BD%95%E6%94%B9%E5%8F%98transformer%E7%9A%84%E5%BA%8F%E5%88%97%E7%90%86%E8%A7%A3%E8%83%BD%E5%8A%9B%E4%BB%8Eshaw%E5%88%B0alibi%E7%9A%84%E4%B8%83%E5%B9%B4%E6%8A%80%E6%9C%AF%E6%BC%94%E8%BF%9B/</guid>
      <description>深入解析Transformer相对位置编码的技术原理与演进历程。从2018年Shaw的开创性论文到T5的分桶策略、ALiBi的线性偏置、Swin的2D相对位置编码，系统阐述为什么&amp;#34;距离比坐标更重要&amp;#34;，以及相对位置信息如何在注意力计算中发挥作用。涵盖数学公式、实现细节、性能对比与工程权衡。</description>
    </item>
    <item>
      <title>EOS Token：为什么这个特殊标记决定了大模型的说话边界</title>
      <link>https://answer.freetools.me/eos-token%E4%B8%BA%E4%BB%80%E4%B9%88%E8%BF%99%E4%B8%AA%E7%89%B9%E6%AE%8A%E6%A0%87%E8%AE%B0%E5%86%B3%E5%AE%9A%E4%BA%86%E5%A4%A7%E6%A8%A1%E5%9E%8B%E7%9A%84%E8%AF%B4%E8%AF%9D%E8%BE%B9%E7%95%8C/</link>
      <pubDate>Thu, 12 Mar 2026 04:29:51 +0800</pubDate>
      <guid>https://answer.freetools.me/eos-token%E4%B8%BA%E4%BB%80%E4%B9%88%E8%BF%99%E4%B8%AA%E7%89%B9%E6%AE%8A%E6%A0%87%E8%AE%B0%E5%86%B3%E5%AE%9A%E4%BA%86%E5%A4%A7%E6%A8%A1%E5%9E%8B%E7%9A%84%E8%AF%B4%E8%AF%9D%E8%BE%B9%E7%95%8C/</guid>
      <description>深入解析大语言模型中 EOS (End of Sequence) Token 的工作原理、训练机制、跨模型实现差异，以及斯坦福大学关于 EOS 决策与长度外推的前沿研究发现。</description>
    </item>
  </channel>
</rss>
