<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>数据质量 on Answer</title>
    <link>https://answer.freetools.me/tags/%E6%95%B0%E6%8D%AE%E8%B4%A8%E9%87%8F/</link>
    <description>Recent content in 数据质量 on Answer</description>
    <generator>Hugo -- 0.152.2</generator>
    <language>zh-cn</language>
    <lastBuildDate>Thu, 12 Mar 2026 02:47:11 +0800</lastBuildDate>
    <atom:link href="https://answer.freetools.me/tags/%E6%95%B0%E6%8D%AE%E8%B4%A8%E9%87%8F/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>预训练数据如何决定大模型的上限：从数据质量到清洗流程的完整解析</title>
      <link>https://answer.freetools.me/%E9%A2%84%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE%E5%A6%82%E4%BD%95%E5%86%B3%E5%AE%9A%E5%A4%A7%E6%A8%A1%E5%9E%8B%E7%9A%84%E4%B8%8A%E9%99%90%E4%BB%8E%E6%95%B0%E6%8D%AE%E8%B4%A8%E9%87%8F%E5%88%B0%E6%B8%85%E6%B4%97%E6%B5%81%E7%A8%8B%E7%9A%84%E5%AE%8C%E6%95%B4%E8%A7%A3%E6%9E%90/</link>
      <pubDate>Thu, 12 Mar 2026 02:47:11 +0800</pubDate>
      <guid>https://answer.freetools.me/%E9%A2%84%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE%E5%A6%82%E4%BD%95%E5%86%B3%E5%AE%9A%E5%A4%A7%E6%A8%A1%E5%9E%8B%E7%9A%84%E4%B8%8A%E9%99%90%E4%BB%8E%E6%95%B0%E6%8D%AE%E8%B4%A8%E9%87%8F%E5%88%B0%E6%B8%85%E6%B4%97%E6%B5%81%E7%A8%8B%E7%9A%84%E5%AE%8C%E6%95%B4%E8%A7%A3%E6%9E%90/</guid>
      <description>深入解析大模型预训练数据处理的完整流程：从Common Crawl原始网页到高质量训练语料的转变过程。涵盖URL过滤、精确/近似/语义去重、启发式与模型驱动质量过滤、PII移除、数据混合策略等核心技术，以及数据质量对模型性能的量化影响。</description>
    </item>
    <item>
      <title>一个被对数表泄露的秘密：为什么数字1总是赢家</title>
      <link>https://answer.freetools.me/%E4%B8%80%E4%B8%AA%E8%A2%AB%E5%AF%B9%E6%95%B0%E8%A1%A8%E6%B3%84%E9%9C%B2%E7%9A%84%E7%A7%98%E5%AF%86%E4%B8%BA%E4%BB%80%E4%B9%88%E6%95%B0%E5%AD%971%E6%80%BB%E6%98%AF%E8%B5%A2%E5%AE%B6/</link>
      <pubDate>Sat, 07 Mar 2026 22:25:51 +0800</pubDate>
      <guid>https://answer.freetools.me/%E4%B8%80%E4%B8%AA%E8%A2%AB%E5%AF%B9%E6%95%B0%E8%A1%A8%E6%B3%84%E9%9C%B2%E7%9A%84%E7%A7%98%E5%AF%86%E4%B8%BA%E4%BB%80%E4%B9%88%E6%95%B0%E5%AD%971%E6%80%BB%E6%98%AF%E8%B5%A2%E5%AE%B6/</guid>
      <description>深入解析Benford定律的数学原理与广泛应用。从Simon Newcomb发现对数表磨损规律，到Frank Benford验证20,229个数据点，再到Mark Nigrini将其应用于欺诈检测。系统阐述尺度不变性证明、对数均匀分布原理，以及该定律在会计欺诈检测、选举数据分析、宏观经济数据核查、COVID-19数据质量评估、AI生成图像识别等领域的实际应用与局限性。</description>
    </item>
  </channel>
</rss>
