<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="rss.xsl"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>createcentury Blog</title>
        <link>https://createcentury.github.io/blog</link>
        <description>createcentury Blog</description>
        <lastBuildDate>Sat, 16 May 2026 00:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <item>
            <title><![CDATA[#4 mamba-metal: Apple Silicon で Mamba を動かす]]></title>
            <link>https://createcentury.github.io/blog/4</link>
            <guid>https://createcentury.github.io/blog/4</guid>
            <pubDate>Sat, 16 May 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Mamba (state-spaces/mamba) の selective scan は CUDA カーネル前提で書かれており、Apple Silicon ではそのまま走らない。Metal Shading Language (MSL) で書き直し、HuggingFace の重みを直接ロードして推論まで通すプロジェクト mamba-metal を作った。本記事はその設計と検証結果の備忘録。]]></description>
            <content:encoded><![CDATA[<p>Mamba (<a href="https://github.com/state-spaces/mamba" target="_blank" rel="noopener noreferrer" class="">state-spaces/mamba</a>) の selective scan は CUDA カーネル前提で書かれており、Apple Silicon ではそのまま走らない。Metal Shading Language (MSL) で書き直し、HuggingFace の重みを直接ロードして推論まで通すプロジェクト <a href="https://github.com/createcentury/mamba-metal" target="_blank" rel="noopener noreferrer" class="">mamba-metal</a> を作った。本記事はその設計と検証結果の備忘録。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="動機">動機<a href="https://createcentury.github.io/blog/4#%E5%8B%95%E6%A9%9F" class="hash-link" aria-label="Direct link to 動機" title="Direct link to 動機" translate="no">​</a></h2>
<p>Mamba 公式実装の本体は <code>csrc/selective_scan/selective_scan_fwd_kernel.cuh</code> にある CUDA カーネル。これが速度の核であり、Mamba を「論文上の理論」から「実機で動くアーキテクチャ」へ変えている部分。NVIDIA GPU 専用なので Apple Silicon では本来動かない。</p>
<p>参照実装の純 PyTorch 版（<code>selective_scan_ref</code>）も存在するが、for ループの素朴な漸化式評価で、長系列では非実用。Mamba の本質的な並列化（プレフィックススキャン）が抜けている。</p>
<p>そこで MSL で同等のカーネルを書くことにした。MLX の <code>mx.fast.metal_kernel</code> を介して JIT コンパイル・ディスパッチさせ、<code>.metal</code> ファイルを第一級資産として残す。</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="selective-scan-の本質">Selective scan の本質<a href="https://createcentury.github.io/blog/4#selective-scan-%E3%81%AE%E6%9C%AC%E8%B3%AA" class="hash-link" aria-label="Direct link to Selective scan の本質" title="Direct link to Selective scan の本質" translate="no">​</a></h2>
<p>Mamba の隠れ状態は</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi>h</mi><mi>t</mi></msub><mo>=</mo><msub><mover accent="true"><mi>A</mi><mo>ˉ</mo></mover><mi>t</mi></msub><mtext> </mtext><msub><mi>h</mi><mrow><mi>t</mi><mo>−</mo><mn>1</mn></mrow></msub><mo>+</mo><msub><mover accent="true"><mi>B</mi><mo>ˉ</mo></mover><mi>t</mi></msub><mtext> </mtext><msub><mi>u</mi><mi>t</mi></msub><mo separator="true">,</mo><mspace width="2em"></mspace><msub><mi>y</mi><mi>t</mi></msub><mo>=</mo><msubsup><mi>C</mi><mi>t</mi><mi mathvariant="normal">⊤</mi></msubsup><msub><mi>h</mi><mi>t</mi></msub></mrow><annotation encoding="application/x-tex">h_t = \bar{A}_t\, h_{t-1} + \bar{B}_t\, u_t,\qquad y_t = C_t^\top h_t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.0284em;vertical-align:-0.2083em"></span><span class="mord"><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8201em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord mathnormal">A</span></span><span style="top:-3.2523em"><span class="pstrut" style="height:3em"></span><span class="accent-body" style="left:-0.1111em"><span class="mord">ˉ</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">t</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1.0145em;vertical-align:-0.1944em"></span><span class="mord"><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8201em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord mathnormal" style="margin-right:0.0502em">B</span></span><span style="top:-3.2523em"><span class="pstrut" style="height:3em"></span><span class="accent-body" style="left:-0.1667em"><span class="mord">ˉ</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:-0.0502em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">u</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:2em"></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0359em">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:-0.0359em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.1461em;vertical-align:-0.247em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0715em">C</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8991em"><span style="top:-2.453em;margin-left:-0.0715em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">⊤</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.247em"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span></span>
<p>という入力依存の係数を持つ漸化式。<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mover accent="true"><mi>A</mi><mo>ˉ</mo></mover><mi>t</mi></msub><mo>=</mo><mi>exp</mi><mo>⁡</mo><mo stretchy="false">(</mo><msub><mi mathvariant="normal">Δ</mi><mi>t</mi></msub><mi>A</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\bar{A}_t = \exp(\Delta_t A)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9701em;vertical-align:-0.15em"></span><span class="mord"><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8201em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord mathnormal">A</span></span><span style="top:-3.2523em"><span class="pstrut" style="height:3em"></span><span class="accent-body" style="left:-0.1111em"><span class="mord">ˉ</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mop">exp</span><span class="mopen">(</span><span class="mord"><span class="mord">Δ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord mathnormal">A</span><span class="mclose">)</span></span></span></span> で、<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi mathvariant="normal">Δ</mi><mi>t</mi></msub><mo separator="true">,</mo><msub><mi>B</mi><mi>t</mi></msub><mo separator="true">,</mo><msub><mi>C</mi><mi>t</mi></msub></mrow><annotation encoding="application/x-tex">\Delta_t, B_t, C_t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord"><span class="mord">Δ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0502em">B</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:-0.0502em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0715em">C</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:-0.0715em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> は入力 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>x</mi><mi>t</mi></msub></mrow><annotation encoding="application/x-tex">x_t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> から計算される（selective: 入力に応じてゲートが開閉する）。</p>
<p>漸化式 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>h</mi><mi>t</mi></msub><mo>=</mo><msub><mi>a</mi><mi>t</mi></msub><msub><mi>h</mi><mrow><mi>t</mi><mo>−</mo><mn>1</mn></mrow></msub><mo>+</mo><msub><mi>b</mi><mi>t</mi></msub></mrow><annotation encoding="application/x-tex">h_t = a_t h_{t-1} + b_t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.9028em;vertical-align:-0.2083em"></span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">t</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> は素直には逐次にしか解けないが、ペア <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false">(</mo><mi>a</mi><mo separator="true">,</mo><mi>b</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(a, b)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord mathnormal">a</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal">b</span><span class="mclose">)</span></span></span></span> に対する次の演算</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mo stretchy="false">(</mo><msub><mi>a</mi><mn>2</mn></msub><mo separator="true">,</mo><msub><mi>b</mi><mn>2</mn></msub><mo stretchy="false">)</mo><mo>∘</mo><mo stretchy="false">(</mo><msub><mi>a</mi><mn>1</mn></msub><mo separator="true">,</mo><msub><mi>b</mi><mn>1</mn></msub><mo stretchy="false">)</mo><mo>=</mo><mo stretchy="false">(</mo><msub><mi>a</mi><mn>2</mn></msub><msub><mi>a</mi><mn>1</mn></msub><mo separator="true">,</mo><mtext>&nbsp;</mtext><msub><mi>a</mi><mn>2</mn></msub><msub><mi>b</mi><mn>1</mn></msub><mo>+</mo><msub><mi>b</mi><mn>2</mn></msub><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(a_2, b_2) \circ (a_1, b_1) = (a_2 a_1,\ a_2 b_1 + b_2)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">∘</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace">&nbsp;</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span>
<p>は<strong>結合的</strong> (associative)。よって prefix scan で <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>O</mi><mo stretchy="false">(</mo><mi>log</mi><mo>⁡</mo><mi>T</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">O(\log T)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.0278em">O</span><span class="mopen">(</span><span class="mop">lo<span style="margin-right:0.0139em">g</span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.1389em">T</span><span class="mclose">)</span></span></span></span> 段の並列ステップに落とせる（Blelloch 1990 / Martin &amp; Cundy 2017）。Mamba カーネルがやっているのも本質的にこれ。</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="msl-での核心">MSL での核心<a href="https://createcentury.github.io/blog/4#msl-%E3%81%A7%E3%81%AE%E6%A0%B8%E5%BF%83" class="hash-link" aria-label="Direct link to MSL での核心" title="Direct link to MSL での核心" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="simd-group-プリミティブ">SIMD-group プリミティブ<a href="https://createcentury.github.io/blog/4#simd-group-%E3%83%97%E3%83%AA%E3%83%9F%E3%83%86%E3%82%A3%E3%83%96" class="hash-link" aria-label="Direct link to SIMD-group プリミティブ" title="Direct link to SIMD-group プリミティブ" translate="no">​</a></h3>
<p>Metal は CUDA の warp に相当する <strong>SIMD-group</strong>（32 スレッド）を持ち、<code>simd_prefix_inclusive_sum</code>、<code>simd_shuffle_up</code> などの組み込み関数がある。ただしこれらは float スカラー専用。<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false">(</mo><mi>a</mi><mo separator="true">,</mo><mi>b</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(a, b)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord mathnormal">a</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal">b</span><span class="mclose">)</span></span></span></span> ペアの結合演算は自分で書く必要がある：</p>
<div class="language-metal codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-metal codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">for (uint d = 1u; d &lt; 32u; d &lt;&lt;= 1) {</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    float a_prev = simd_shuffle_up(a, d);</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    float b_prev = simd_shuffle_up(b, d);</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    if (lane &gt;= d) {</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        b = a * b_prev + b;   // 順序重要: 先に b を更新（古い a を使う）</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        a = a * a_prev;</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    }</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">}</span><br></div></code></pre></div></div>
<p>これで 32 レーンの SIMD-group 内で inclusive scan が完了。</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="block-level-scantwo-tier">Block-level scan（two-tier）<a href="https://createcentury.github.io/blog/4#block-level-scantwo-tier" class="hash-link" aria-label="Direct link to Block-level scan（two-tier）" title="Direct link to Block-level scan（two-tier）" translate="no">​</a></h3>
<p>1024 スレッド（= 32 SIMD-group × 32 lane）の threadgroup 全体で scan するために、SIMD-group の合計値を threadgroup memory に書き出し、1 つ目の SIMD-group がそれをさらに scan し、各スレッドが carry を加える、という二段構成にする。これは CUB の <code>BlockScan + WARP_SCANS</code> 戦略の MSL 版。</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="チャンク間-running-prefix">チャンク間 running prefix<a href="https://createcentury.github.io/blog/4#%E3%83%81%E3%83%A3%E3%83%B3%E3%82%AF%E9%96%93-running-prefix" class="hash-link" aria-label="Direct link to チャンク間 running prefix" title="Direct link to チャンク間 running prefix" translate="no">​</a></h3>
<p><code>seqlen &gt; 1024</code> の場合、<code>smem_running_prefix</code> 方式：各 SSM 状態 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>s</mi></mrow><annotation encoding="application/x-tex">s</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">s</span></span></span></span> ごとに <code>(carry_a[s], carry_b[s])</code> をチャンク間で持ち越す。新しいチャンクの先頭で前チャンクの累積を「左から」結合してから scan を実行：</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mo stretchy="false">(</mo><mi>a</mi><mo separator="true">,</mo><mi>b</mi><msub><mo stretchy="false">)</mo><mtext>new</mtext></msub><mo>=</mo><mo stretchy="false">(</mo><msub><mi>a</mi><mtext>local</mtext></msub><mo separator="true">,</mo><msub><mi>b</mi><mtext>local</mtext></msub><mo stretchy="false">)</mo><mo>∘</mo><mo stretchy="false">(</mo><msub><mtext>carry</mtext><mi>a</mi></msub><mo separator="true">,</mo><msub><mtext>carry</mtext><mi>b</mi></msub><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(a, b)_\text{new} = (a_\text{local}, b_\text{local}) \circ (\text{carry}_a, \text{carry}_b)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord mathnormal">a</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal">b</span><span class="mclose"><span class="mclose">)</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord text mtight"><span class="mord mtight">new</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord text mtight"><span class="mord mtight">local</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3361em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord text mtight"><span class="mord mtight">local</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">∘</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord"><span class="mord text"><span class="mord">carry</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.0573em"><span style="top:-2.4559em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">a</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2441em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord text"><span class="mord">carry</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.242em"><span style="top:-2.4559em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">b</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2441em"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span>
<p>これにより任意長の系列が単一カーネル呼び出しで処理できる。</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="観察tg-memory-は単純な再利用には効かない">観察：tg memory は単純な再利用には効かない<a href="https://createcentury.github.io/blog/4#%E8%A6%B3%E5%AF%9Ftg-memory-%E3%81%AF%E5%8D%98%E7%B4%94%E3%81%AA%E5%86%8D%E5%88%A9%E7%94%A8%E3%81%AB%E3%81%AF%E5%8A%B9%E3%81%8B%E3%81%AA%E3%81%84" class="hash-link" aria-label="Direct link to 観察：tg memory は単純な再利用には効かない" title="Direct link to 観察：tg memory は単純な再利用には効かない" translate="no">​</a></h3>
<p>愚直な「データを threadgroup memory に置いて K 回読み返す」パターンは、Apple Silicon の System Level Cache（CPU/GPU 共有）が黙って吸収してしまうため、global memory 直読みと差がほぼ出なかった。tg memory が真に必要なのは<strong>スレッド間通信</strong>（scan の中間値交換、running prefix の保管）であって、データキャッシュ代用ではない、というのが実測の結論。</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="カーネルから推論まで">カーネルから推論まで<a href="https://createcentury.github.io/blog/4#%E3%82%AB%E3%83%BC%E3%83%8D%E3%83%AB%E3%81%8B%E3%82%89%E6%8E%A8%E8%AB%96%E3%81%BE%E3%81%A7" class="hash-link" aria-label="Direct link to カーネルから推論まで" title="Direct link to カーネルから推論まで" translate="no">​</a></h2>
<p>カーネルが組めたら、上に Python のモデル層を積む。</p>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">selective_scan (Metal kernel)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  ↓</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">MambaBlock           = in_proj → conv1d → SiLU → x_proj/dt_proj → SSM → out_proj</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  ↓</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">MambaResidualBlock   = pre-norm RMSNorm + MambaBlock + residual</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  ↓</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">MambaModel           = embeddings → N × ResidualBlock → norm_f → tied LM head</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  ↓</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">generate / generate_fast</span><br></div></code></pre></div></div>
<p>HF の <code>state-spaces/mamba-*-hf</code> 重みは：</p>
<ol>
<li class=""><code>backbone.</code> 接頭辞を剥がす</li>
<li class=""><code>conv1d.weight</code> だけ PyTorch (out, in/g, k) → MLX (out, k, in/g) で transpose</li>
<li class="">それ以外（Linear, embeddings, A_log, D, norm）はそのまま <code>mx.array</code> に変換</li>
</ol>
<p>の 2 ステップだけで MLX 側にロードできる。<code>hidden_size</code> / <code>intermediate_size</code> / <code>num_hidden_layers</code> という HF transformers 標準フィールドを優先するのがコツ（790m などで legacy <code>d_model</code> フィールドが壊れているため）。</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="ol-インクリメンタルデコード">O(L) インクリメンタルデコード<a href="https://createcentury.github.io/blog/4#ol-%E3%82%A4%E3%83%B3%E3%82%AF%E3%83%AA%E3%83%A1%E3%83%B3%E3%82%BF%E3%83%AB%E3%83%87%E3%82%B3%E3%83%BC%E3%83%89" class="hash-link" aria-label="Direct link to O(L) インクリメンタルデコード" title="Direct link to O(L) インクリメンタルデコード" translate="no">​</a></h2>
<p>Mamba の論文上の最大の魅力は「長文脈で一定速度」。これを実機で具現化するには、推論時に SSM の隠れ状態と conv1d の sliding window を呼び出し間で持ち越す必要がある。</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">conv_states</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> ssm_states </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> model</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">init_state</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">batch_size</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">1</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> token </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> prompt</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    logits</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> conv_states</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> ssm_states </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> model</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">step</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">token</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> conv_states</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> ssm_states</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token comment" style="color:#999988;font-style:italic"># 以降は 1 トークンあたり O(1)</span><br></div></code></pre></div></div>
<p>毎ステップは elementwise 演算のみ（SSM scan は不要、なぜなら状態を既に持っているから）：</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msubsup><mi>h</mi><mtext>new</mtext><mrow><mo stretchy="false">(</mo><mi>s</mi><mo stretchy="false">)</mo></mrow></msubsup><mo>=</mo><mi>exp</mi><mo>⁡</mo><mo stretchy="false">(</mo><msub><mi mathvariant="normal">Δ</mi><mi>t</mi></msub><msub><mi>A</mi><mi>s</mi></msub><mo stretchy="false">)</mo><mo>⋅</mo><msup><mi>h</mi><mrow><mo stretchy="false">(</mo><mi>s</mi><mo stretchy="false">)</mo></mrow></msup><mo>+</mo><msub><mi mathvariant="normal">Δ</mi><mi>t</mi></msub><mo>⋅</mo><msub><mi>x</mi><mi>t</mi></msub><mo>⋅</mo><msub><mi>B</mi><mrow><mi>s</mi><mo separator="true">,</mo><mi>t</mi></mrow></msub><mo separator="true">,</mo><mspace width="2em"></mspace><msub><mi>y</mi><mi>t</mi></msub><mo>=</mo><munder><mo>∑</mo><mi>s</mi></munder><msubsup><mi>h</mi><mtext>new</mtext><mrow><mo stretchy="false">(</mo><mi>s</mi><mo stretchy="false">)</mo></mrow></msubsup><mo>⋅</mo><msub><mi>C</mi><mrow><mi>s</mi><mo separator="true">,</mo><mi>t</mi></mrow></msub><mo>+</mo><mi>D</mi><mo>⋅</mo><msub><mi>x</mi><mi>t</mi></msub></mrow><annotation encoding="application/x-tex">h_\text{new}^{(s)} = \exp(\Delta_t A_s) \cdot h^{(s)} + \Delta_t \cdot x_t \cdot B_{s,t},\qquad
y_t = \sum_s h_\text{new}^{(s)} \cdot C_{s,t} + D \cdot x_t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.185em;vertical-align:-0.247em"></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.938em"><span style="top:-2.453em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord text mtight"><span class="mord mtight">new</span></span></span></span><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathnormal mtight">s</span><span class="mclose mtight">)</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.247em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mop">exp</span><span class="mopen">(</span><span class="mord"><span class="mord">Δ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal">A</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">s</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1.0213em;vertical-align:-0.0833em"></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.938em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathnormal mtight">s</span><span class="mclose mtight">)</span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord">Δ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.5945em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0502em">B</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:-0.0502em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">s</span><span class="mpunct mtight">,</span><span class="mord mathnormal mtight">t</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:2em"></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0359em">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:-0.0359em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:2.3em;vertical-align:-1.25em"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.05em"><span style="top:-1.9em;margin-left:0em"><span class="pstrut" style="height:3.05em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">s</span></span></span><span style="top:-3.05em"><span class="pstrut" style="height:3.05em"></span><span><span class="mop op-symbol large-op">∑</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.25em"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.938em"><span style="top:-2.453em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord text mtight"><span class="mord mtight">new</span></span></span></span><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mopen mtight">(</span><span class="mord mathnormal mtight">s</span><span class="mclose mtight">)</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.247em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.9694em;vertical-align:-0.2861em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0715em">C</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:-0.0715em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">s</span><span class="mpunct mtight">,</span><span class="mord mathnormal mtight">t</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2861em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0278em">D</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span></span>
<p>これを z ゲートと out_proj で締める。</p>
<p>実測（M4 Max, mamba-130m, greedy decode）：</p>
<table><thead><tr><th style="text-align:right">生成トークン数</th><th style="text-align:right">O(L²) 再 forward</th><th style="text-align:right"><strong>O(L) <code>generate_fast</code></strong></th><th style="text-align:right">speedup</th></tr></thead><tbody><tr><td style="text-align:right">10</td><td style="text-align:right">0.24 s</td><td style="text-align:right"><strong>0.06 s</strong></td><td style="text-align:right">4.3×</td></tr><tr><td style="text-align:right">100</td><td style="text-align:right">3.24 s</td><td style="text-align:right"><strong>0.51 s</strong></td><td style="text-align:right">6.3×</td></tr><tr><td style="text-align:right">1000</td><td style="text-align:right">約 32 s（外挿）</td><td style="text-align:right"><strong>6.84 s</strong></td><td style="text-align:right">~5×</td></tr><tr><td style="text-align:right">2000</td><td style="text-align:right">約 80 s（外挿）</td><td style="text-align:right"><strong>14.08 s</strong></td><td style="text-align:right">~6×</td></tr></tbody></table>
<p><code>generate_fast</code> は <strong>n=50 から n=2000 まで一貫して ~7 ms/token</strong>。これが Mamba の "linear-time decode" の正体。</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="モデルサイズ別の結果">モデルサイズ別の結果<a href="https://createcentury.github.io/blog/4#%E3%83%A2%E3%83%87%E3%83%AB%E3%82%B5%E3%82%A4%E3%82%BA%E5%88%A5%E3%81%AE%E7%B5%90%E6%9E%9C" class="hash-link" aria-label="Direct link to モデルサイズ別の結果" title="Direct link to モデルサイズ別の結果" translate="no">​</a></h2>
<p><code>state-spaces/mamba-*-hf</code> の全 5 サイズが load &amp; generate 可能：</p>
<table><thead><tr><th>model</th><th style="text-align:right">params</th><th style="text-align:right">load (s)</th><th style="text-align:right">tok/s</th><th style="text-align:right">ms/tok</th><th>出力例（"The capital of Japan is" の続き）</th></tr></thead><tbody><tr><td>130m</td><td style="text-align:right">129 M</td><td style="text-align:right">1.3</td><td style="text-align:right">175</td><td style="text-align:right">5.7</td><td>Tokyo, Japan. The city is located in the northern part of the country...</td></tr><tr><td>370m</td><td style="text-align:right">372 M</td><td style="text-align:right">3.4</td><td style="text-align:right">82</td><td style="text-align:right">12.2</td><td>Tokyo.（繰り返し）</td></tr><tr><td>790m</td><td style="text-align:right">702 M</td><td style="text-align:right">4.8</td><td style="text-align:right">42</td><td style="text-align:right">23.7</td><td>Tokyo, and the capital of the country is Osaka.（誤り混在）</td></tr><tr><td>1.4b</td><td style="text-align:right">1372 M</td><td style="text-align:right">11.6</td><td style="text-align:right">30</td><td style="text-align:right">33.2</td><td>Tokyo. ... Washington, D.C. ... London.</td></tr><tr><td><strong>2.8b</strong></td><td style="text-align:right"><strong>2.7 B</strong></td><td style="text-align:right"><strong>19.6</strong></td><td style="text-align:right"><strong>12</strong></td><td style="text-align:right"><strong>80.6</strong></td><td><strong>"Tokyo, which is also the largest city in the country"</strong>（正確かつ自然）</td></tr></tbody></table>
<p>130m はサイズの限界で繰り返しに陥りやすいが、2.8b では「東京は最大の都市でもある」と付加的な事実までまとめて出してくる。greedy だけでこの差。</p>
<p>selective scan カーネル単体のピーク性能は <code>seqlen=32k</code> で <strong>~187 GFLOPS</strong>、Unified Memory の実効帯域は vec4 ロードで <strong>~290 GB/s</strong>（M4 Max の理論ピーク 410 GB/s の約 70%）。</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="残り課題">残り課題<a href="https://createcentury.github.io/blog/4#%E6%AE%8B%E3%82%8A%E8%AA%B2%E9%A1%8C" class="hash-link" aria-label="Direct link to 残り課題" title="Direct link to 残り課題" translate="no">​</a></h2>
<ul>
<li class=""><strong>Prefill の高速化</strong>: 現状プロンプトを 1 トークンずつ step で流すため、長文脈プロンプトでは秒オーダー。selective_scan カーネルから最終 SSM 状態を抽出できれば、parallel scan で prefill して decode に O(1)/token で接続できる</li>
<li class=""><strong>iPhone 上での Transformer vs Mamba ベンチマーク</strong>: 同じ規模で速度・精度を比較し、長文脈での Mamba 優位を可視化する</li>
<li class=""><strong>後方カーネル</strong>: 学習用の backward pass はまだ未実装</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="振り返り">振り返り<a href="https://createcentury.github.io/blog/4#%E6%8C%AF%E3%82%8A%E8%BF%94%E3%82%8A" class="hash-link" aria-label="Direct link to 振り返り" title="Direct link to 振り返り" translate="no">​</a></h2>
<p>Mamba は論文の数式自体は短いが、「実機で線形時間」を実現する部分はカーネルにある。それを別ハードウェア向けに書き直してみると、初めて論文の主張の細部が手触りとして理解できる：</p>
<ul>
<li class="">何を SRAM に閉じるべきで何を HBM に出すべきか（Apple Silicon ではキャッシュが吸収するので少し違う）</li>
<li class="">なぜ A は対角でなければならないか（per-state の独立性で外側ループに置けるから）</li>
<li class="">なぜ exp(ΔA) を <code>exp2f + LOG2E</code> で書くか（少しでも速い）</li>
<li class="">状態キャッシュがあれば本当に O(L) になるという主張の確認</li>
</ul>
<p>論文を読むだけでは抜けていた解像度が、書いてみると一気に上がる。</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="参考文献">参考文献<a href="https://createcentury.github.io/blog/4#%E5%8F%82%E8%80%83%E6%96%87%E7%8C%AE" class="hash-link" aria-label="Direct link to 参考文献" title="Direct link to 参考文献" translate="no">​</a></h2>
<ul>
<li class="">Albert Gu, Tri Dao. "<a href="https://arxiv.org/abs/2312.00752" target="_blank" rel="noopener noreferrer" class="">Mamba: Linear-Time Sequence Modeling with Selective State Spaces</a>" arXiv:2312.00752, 2023.</li>
<li class="">Guy E. Blelloch. "<a href="https://www.cs.cmu.edu/~guyb/papers/Ble93.pdf" target="_blank" rel="noopener noreferrer" class="">Prefix Sums and Their Applications</a>" CMU-CS-90-190, 1993.</li>
<li class="">Eric Martin, Chris Cundy. "<a href="https://arxiv.org/abs/1709.04057" target="_blank" rel="noopener noreferrer" class="">Parallelizing Linear Recurrent Neural Nets Over Sequence Length</a>" arXiv:1709.04057, 2017.</li>
<li class=""><a href="https://github.com/state-spaces/mamba" target="_blank" rel="noopener noreferrer" class="">state-spaces/mamba</a> — 公式実装</li>
<li class=""><a href="https://github.com/createcentury/mamba-metal" target="_blank" rel="noopener noreferrer" class="">createcentury/mamba-metal</a> — 本記事のプロジェクト</li>
</ul>
<hr>
<p><em>作成日: 2026-05-16 / 最終更新日: 2026-05-16</em></p>]]></content:encoded>
            <category>Machine Learning</category>
            <category>SSM</category>
            <category>CUDA</category>
            <category>Metal</category>
        </item>
        <item>
            <title><![CDATA[#3 nonlinear dynamic inversion]]></title>
            <link>https://createcentury.github.io/blog/3</link>
            <guid>https://createcentury.github.io/blog/3</guid>
            <pubDate>Thu, 14 May 2026 02:00:00 GMT</pubDate>
            <description><![CDATA[Nonlinear Dynamic Inversion (NDI) について。]]></description>
            <content:encoded><![CDATA[<p>Nonlinear Dynamic Inversion (NDI) について。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="参考文献">参考文献<a href="https://createcentury.github.io/blog/3#%E5%8F%82%E8%80%83%E6%96%87%E7%8C%AE" class="hash-link" aria-label="Direct link to 参考文献" title="Direct link to 参考文献" translate="no">​</a></h2>
<ul>
<li class="">"<a href="https://www.aerostudents.com/courses/advanced-flight-control/nonlinearDynamicInversion.pdf" target="_blank" rel="noopener noreferrer" class="">Nonlinear Dynamic Inversion</a>" Advanced Flight Control course notes, Aerostudents.</li>
</ul>
<hr>
<p><em>作成日: 2026-05-14 / 最終更新日: 2026-05-14</em></p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[#2 成人男性が1日に必要な栄養素]]></title>
            <link>https://createcentury.github.io/blog/2</link>
            <guid>https://createcentury.github.io/blog/2</guid>
            <pubDate>Thu, 14 May 2026 01:00:00 GMT</pubDate>
            <description><![CDATA[身体への投資としての栄養素設計。27歳・81kg・176.5cm の成人男性をモデルケースに、1日あたりに必要な栄養素を整理する。]]></description>
            <content:encoded><![CDATA[<p>身体への投資としての栄養素設計。27歳・81kg・176.5cm の成人男性をモデルケースに、1日あたりに必要な栄養素を整理する。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="基礎代謝">基礎代謝<a href="https://createcentury.github.io/blog/2#%E5%9F%BA%E7%A4%8E%E4%BB%A3%E8%AC%9D" class="hash-link" aria-label="Direct link to 基礎代謝" title="Direct link to 基礎代謝" translate="no">​</a></h2>
<p>国立健康・栄養研究所の式（国立栄研式 / Ganpule の式）に当てはめる：</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mo stretchy="false">(</mo><mn>0.0481</mn><mo>×</mo><mi>W</mi><mo>+</mo><mn>0.0234</mn><mo>×</mo><mi>H</mi><mo>−</mo><mn>0.0138</mn><mo>×</mo><mi>A</mi><mo>−</mo><mn>0.4235</mn><mo stretchy="false">)</mo><mo>×</mo><mfrac><mn>1000</mn><mn>4.186</mn></mfrac></mrow><annotation encoding="application/x-tex">(0.0481 \times W + 0.0234 \times H - 0.0138 \times A - 0.4235) \times \frac{1000}{4.186}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord">0.0481</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.7667em;vertical-align:-0.0833em"></span><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.7278em;vertical-align:-0.0833em"></span><span class="mord">0.0234</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.7667em;vertical-align:-0.0833em"></span><span class="mord mathnormal" style="margin-right:0.0813em">H</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.7278em;vertical-align:-0.0833em"></span><span class="mord">0.0138</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.7667em;vertical-align:-0.0833em"></span><span class="mord mathnormal">A</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord">0.4235</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:2.0074em;vertical-align:-0.686em"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.3214em"><span style="top:-2.314em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord">4.186</span></span></span><span style="top:-3.23em"><span class="pstrut" style="height:3em"></span><span class="frac-line" style="border-bottom-width:0.04em"></span></span><span style="top:-3.677em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord">1000</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.686em"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span>
<p>パラメータ <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>W</mi><mo>=</mo><mn>81.0</mn><mtext>&nbsp;kg</mtext></mrow><annotation encoding="application/x-tex">W = 81.0\text{ kg}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.1389em">W</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em"></span><span class="mord">81.0</span><span class="mord text"><span class="mord">&nbsp;kg</span></span></span></span></span>, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>H</mi><mo>=</mo><mn>176.5</mn><mtext>&nbsp;cm</mtext></mrow><annotation encoding="application/x-tex">H = 176.5\text{ cm}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.0813em">H</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">176.5</span><span class="mord text"><span class="mord">&nbsp;cm</span></span></span></span></span>, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>A</mi><mo>=</mo><mn>27</mn><mtext>&nbsp;歳</mtext></mrow><annotation encoding="application/x-tex">A = 27\text{ 歳}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal">A</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord">27</span><span class="mord text"><span class="mord">&nbsp;</span><span class="mord cjk_fallback">歳</span></span></span></span></span> を代入：</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mo stretchy="false">(</mo><mn>0.0481</mn><mo>×</mo><mn>81.0</mn><mo>+</mo><mn>0.0234</mn><mo>×</mo><mn>176.5</mn><mo>−</mo><mn>0.0138</mn><mo>×</mo><mn>27</mn><mo>−</mo><mn>0.4235</mn><mo stretchy="false">)</mo><mo>×</mo><mfrac><mn>1000</mn><mn>4.186</mn></mfrac><mo>≈</mo><mn>1,727.2</mn><mtext>&nbsp;kcal</mtext></mrow><annotation encoding="application/x-tex">(0.0481 \times 81.0 + 0.0234 \times 176.5 - 0.0138 \times 27 - 0.4235) \times \frac{1000}{4.186} \approx 1{,}727.2\text{ kcal}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord">0.0481</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.7278em;vertical-align:-0.0833em"></span><span class="mord">81.0</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.7278em;vertical-align:-0.0833em"></span><span class="mord">0.0234</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.7278em;vertical-align:-0.0833em"></span><span class="mord">176.5</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.7278em;vertical-align:-0.0833em"></span><span class="mord">0.0138</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.7278em;vertical-align:-0.0833em"></span><span class="mord">27</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">−</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord">0.4235</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:2.0074em;vertical-align:-0.686em"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.3214em"><span style="top:-2.314em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord">4.186</span></span></span><span style="top:-3.23em"><span class="pstrut" style="height:3em"></span><span class="frac-line" style="border-bottom-width:0.04em"></span></span><span style="top:-3.677em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord">1000</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.686em"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">≈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em"></span><span class="mord">1</span><span class="mord"><span class="mpunct">,</span></span><span class="mord">727.2</span><span class="mord text"><span class="mord">&nbsp;kcal</span></span></span></span></span></span>
<p><strong>1日あたりの基礎代謝量 ≈ 1,727 kcal</strong></p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="タンパク質">タンパク質<a href="https://createcentury.github.io/blog/2#%E3%82%BF%E3%83%B3%E3%83%91%E3%82%AF%E8%B3%AA" class="hash-link" aria-label="Direct link to タンパク質" title="Direct link to タンパク質" translate="no">​</a></h2>
<p>知的・身体的パフォーマンス向上を狙うレンジは <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>1.2</mn><mtext>&nbsp;g</mtext><mo>∼</mo><mn>1.6</mn><mtext>&nbsp;g/kg</mtext></mrow><annotation encoding="application/x-tex">1.2\text{ g} \sim 1.6\text{ g/kg}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8389em;vertical-align:-0.1944em"></span><span class="mord">1.2</span><span class="mord text"><span class="mord">&nbsp;g</span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∼</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord">1.6</span><span class="mord text"><span class="mord">&nbsp;g/kg</span></span></span></span></span>。</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mn>81.0</mn><mtext>&nbsp;kg</mtext><mo>×</mo><mn>1.5</mn><mo>≈</mo><mrow><mn mathvariant="bold">120</mn><mtext>&nbsp;g/日</mtext></mrow></mrow><annotation encoding="application/x-tex">81.0\text{ kg} \times 1.5 \approx \mathbf{120\text{ g/日}}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em"></span><span class="mord">81.0</span><span class="mord text"><span class="mord">&nbsp;kg</span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.6444em"></span><span class="mord">1.5</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">≈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord"><span class="mord mathbf">120</span><span class="mord text"><span class="mord">&nbsp;g/</span><span class="mord cjk_fallback">日</span></span></span></span></span></span></span>
<p>脳の神経伝達物質や免疫細胞もタンパク質から作られるため、高負荷な生活を送るならこのあたりが「投資効率の良いゾーン」になる。</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="戦略">戦略<a href="https://createcentury.github.io/blog/2#%E6%88%A6%E7%95%A5" class="hash-link" aria-label="Direct link to 戦略" title="Direct link to 戦略" translate="no">​</a></h3>
<ul>
<li class=""><strong>吸収効率の最適化</strong>: 一度に吸収できる量には限界があるため、1日3〜4回に分散させる</li>
<li class=""><strong>「高密度」な選択</strong>: 脂質の少ない赤身肉、卵、必要に応じてプロテインを活用し、胃の容量をタンパク質に優先的に割り振る</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="具体的な食材">具体的な食材<a href="https://createcentury.github.io/blog/2#%E5%85%B7%E4%BD%93%E7%9A%84%E3%81%AA%E9%A3%9F%E6%9D%90" class="hash-link" aria-label="Direct link to 具体的な食材" title="Direct link to 具体的な食材" translate="no">​</a></h3>
<table><thead><tr><th>食材</th><th>量</th><th>タンパク質</th><th>備考</th></tr></thead><tbody><tr><td>ステーキ（赤身）</td><td>300 g</td><td>約 60〜75 g</td><td>100gあたり約 20〜25 g</td></tr><tr><td>卵（Mサイズ）</td><td>1個</td><td>約 6.2 g</td><td>アミノ酸スコア100</td></tr><tr><td>サバの味噌煮缶</td><td>1缶（約190 g）</td><td>約 25〜30 g</td><td>汁まで含めて栄養価高</td></tr><tr><td>ヨーグルト</td><td>1個（約100 g）</td><td>約 4〜10 g</td><td>ギリシャヨーグルトなら 10 g 確保</td></tr></tbody></table>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="食物繊維">食物繊維<a href="https://createcentury.github.io/blog/2#%E9%A3%9F%E7%89%A9%E7%B9%8A%E7%B6%AD" class="hash-link" aria-label="Direct link to 食物繊維" title="Direct link to 食物繊維" translate="no">​</a></h2>
<p>推奨量: <strong>1日 21 g 以上</strong></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="具体的な食材-1">具体的な食材<a href="https://createcentury.github.io/blog/2#%E5%85%B7%E4%BD%93%E7%9A%84%E3%81%AA%E9%A3%9F%E6%9D%90-1" class="hash-link" aria-label="Direct link to 具体的な食材" title="Direct link to 具体的な食材" translate="no">​</a></h3>
<ul>
<li class=""><strong>ブロッコリー</strong>（約100g / 1/2房）: 食物繊維 約 4.4 g。タンパク質も含まれる、利回りの良い食材</li>
<li class=""><strong>ニンジン</strong>（約100g / 1本弱）: 食物繊維 約 2.8 g。彩りと栄養バランスを整えるのに最適</li>
<li class=""><strong>もち麦・玄米</strong>（1膳）: 食物繊維 約 2.0〜4.0 g。主食を白米から置き換えるだけでベースライン底上げ</li>
<li class=""><strong>納豆</strong>（1パック）: 食物繊維 約 3.3 g。タンパク質も同時確保</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1日のモデル">1日のモデル<a href="https://createcentury.github.io/blog/2#1%E6%97%A5%E3%81%AE%E3%83%A2%E3%83%87%E3%83%AB" class="hash-link" aria-label="Direct link to 1日のモデル" title="Direct link to 1日のモデル" translate="no">​</a></h3>
<table><thead><tr><th>タイミング</th><th>内容</th><th>食物繊維</th></tr></thead><tbody><tr><td>朝</td><td>オートミール or もち麦ごはん（3.0 g）＋ 納豆（3.3 g）</td><td>6.3 g</td></tr><tr><td>昼</td><td>コンビニの「根菜サラダ」や「ひじき煮」</td><td>3.0 g</td></tr><tr><td>夜</td><td>ステーキの付け合わせ（ブロッコリー等）</td><td>4.4 g</td></tr><tr><td>ドリンク</td><td>コーヒー or プロテインに粉末繊維追加（5.0 g × 1〜2回）</td><td>5.0〜10.0 g</td></tr><tr><td><strong>合計</strong></td><td></td><td><strong>約 18.7〜23.7 g</strong></td></tr></tbody></table>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="ビタミン">ビタミン<a href="https://createcentury.github.io/blog/2#%E3%83%93%E3%82%BF%E3%83%9F%E3%83%B3" class="hash-link" aria-label="Direct link to ビタミン" title="Direct link to ビタミン" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="ビタミンb群b1-b2-b6-b12-ナイアシン等">ビタミンB群（B1, B2, B6, B12, ナイアシン等）<a href="https://createcentury.github.io/blog/2#%E3%83%93%E3%82%BF%E3%83%9F%E3%83%B3b%E7%BE%A4b1-b2-b6-b12-%E3%83%8A%E3%82%A4%E3%82%A2%E3%82%B7%E3%83%B3%E7%AD%89" class="hash-link" aria-label="Direct link to ビタミンB群（B1, B2, B6, B12, ナイアシン等）" title="Direct link to ビタミンB群（B1, B2, B6, B12, ナイアシン等）" translate="no">​</a></h3>
<ul>
<li class=""><strong>役割</strong>: 脳や筋肉がエネルギーを産生する際の必須コンポーネント。不足すると「燃料はあるのにパワーが出ない（倦怠感）」状態に</li>
<li class=""><strong>推奨量（B1）</strong>: 1.4 mg/日</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="ビタミンd">ビタミンD<a href="https://createcentury.github.io/blog/2#%E3%83%93%E3%82%BF%E3%83%9F%E3%83%B3d" class="hash-link" aria-label="Direct link to ビタミンD" title="Direct link to ビタミンD" translate="no">​</a></h3>
<ul>
<li class=""><strong>役割</strong>: 骨の健康だけでなく、免疫機能やメンタルの安定にも関与。室内での仕事や研究が多い場合は不足しがち</li>
<li class=""><strong>目安量</strong>: 8.5 µg/日</li>
<li class="">近年の研究で、ビタミンD濃度が高い人ほどテストステロン値が高い傾向。日光浴が不足しがちな現代人には必須の「ブーストパッチ」</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="ビタミンc">ビタミンC<a href="https://createcentury.github.io/blog/2#%E3%83%93%E3%82%BF%E3%83%9F%E3%83%B3c" class="hash-link" aria-label="Direct link to ビタミンC" title="Direct link to ビタミンC" translate="no">​</a></h3>
<ul>
<li class=""><strong>役割</strong>: 抗酸化作用（身体の錆び取り）とコラーゲン合成。ハードな活動による酸化ストレスから細胞を保護</li>
<li class=""><strong>推奨量</strong>: 100 mg/日</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="その他のビタミン">その他のビタミン<a href="https://createcentury.github.io/blog/2#%E3%81%9D%E3%81%AE%E4%BB%96%E3%81%AE%E3%83%93%E3%82%BF%E3%83%9F%E3%83%B3" class="hash-link" aria-label="Direct link to その他のビタミン" title="Direct link to その他のビタミン" translate="no">​</a></h3>
<table><thead><tr><th>栄養素</th><th>推奨量</th><th>役割</th></tr></thead><tbody><tr><td>ビタミンA</td><td>900 µgRAE/日</td><td>視覚と粘膜の維持。長時間の画面注視や乾燥から目を守る（ニンジンに豊富）</td></tr><tr><td>ビタミンE</td><td>7.0 mg/日</td><td>強力な抗酸化作用。細胞の酸化（老化）を防ぎ血流を改善</td></tr><tr><td>葉酸</td><td>240 µg/日</td><td>細胞分裂と赤血球の形成（ブロッコリーに豊富）</td></tr><tr><td>パントテン酸</td><td>5 mg/日</td><td>エネルギー代謝の潤滑油。ストレス耐性ホルモンの合成に関与</td></tr></tbody></table>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="ミネラル">ミネラル<a href="https://createcentury.github.io/blog/2#%E3%83%9F%E3%83%8D%E3%83%A9%E3%83%AB" class="hash-link" aria-label="Direct link to ミネラル" title="Direct link to ミネラル" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="鉄iron">鉄（Iron）<a href="https://createcentury.github.io/blog/2#%E9%89%84iron" class="hash-link" aria-label="Direct link to 鉄（Iron）" title="Direct link to 鉄（Iron）" translate="no">​</a></h3>
<ul>
<li class=""><strong>役割</strong>: 全身に酸素を運ぶヘモグロビンの材料。不足すると脳への酸素供給が滞り集中力低下</li>
<li class=""><strong>推奨量</strong>: 7.5 mg/日</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="亜鉛zinc">亜鉛（Zinc）<a href="https://createcentury.github.io/blog/2#%E4%BA%9C%E9%89%9Bzinc" class="hash-link" aria-label="Direct link to 亜鉛（Zinc）" title="Direct link to 亜鉛（Zinc）" translate="no">​</a></h3>
<ul>
<li class=""><strong>役割</strong>: 数百種類の酵素の活性化に関わり、タンパク質合成や免疫応答、味覚の維持を支える</li>
<li class=""><strong>推奨量</strong>: 11 mg/日</li>
<li class="">「セックスミネラル」とも呼ばれる最重要項目。精子の生成やテストステロンの代謝に直接関わる</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="マグネシウム">マグネシウム<a href="https://createcentury.github.io/blog/2#%E3%83%9E%E3%82%B0%E3%83%8D%E3%82%B7%E3%82%A6%E3%83%A0" class="hash-link" aria-label="Direct link to マグネシウム" title="Direct link to マグネシウム" translate="no">​</a></h3>
<ul>
<li class=""><strong>役割</strong>: エネルギー代謝、神経伝達、筋肉の弛緩。不足すると足がつりやすく、疲れが抜けにくくなる</li>
<li class=""><strong>推奨量</strong>: 340 mg/日</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="カルシウム">カルシウム<a href="https://createcentury.github.io/blog/2#%E3%82%AB%E3%83%AB%E3%82%B7%E3%82%A6%E3%83%A0" class="hash-link" aria-label="Direct link to カルシウム" title="Direct link to カルシウム" translate="no">​</a></h3>
<ul>
<li class=""><strong>役割</strong>: 骨の維持と神経の安定</li>
<li class=""><strong>推奨量</strong>: 750 mg/日〜（耐容上限量 2,500 mg）</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="微量ミネラル">微量ミネラル<a href="https://createcentury.github.io/blog/2#%E5%BE%AE%E9%87%8F%E3%83%9F%E3%83%8D%E3%83%A9%E3%83%AB" class="hash-link" aria-label="Direct link to 微量ミネラル" title="Direct link to 微量ミネラル" translate="no">​</a></h3>
<table><thead><tr><th>栄養素</th><th>推奨量</th><th>役割</th></tr></thead><tbody><tr><td>セレン</td><td>30 µg/日</td><td>強力な抗酸化作用、細胞の老化防止</td></tr><tr><td>クロム</td><td>10 µg/日</td><td>糖代謝（インスリンの働き）を助ける</td></tr><tr><td>モリブデン</td><td>30 µg/日</td><td>尿酸の代謝、鉄の利用をサポート</td></tr><tr><td>マンガン</td><td>4.0 mg/日</td><td>骨の形成、多くの酵素の活性化</td></tr></tbody></table>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="脂質">脂質<a href="https://createcentury.github.io/blog/2#%E8%84%82%E8%B3%AA" class="hash-link" aria-label="Direct link to 脂質" title="Direct link to 脂質" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="n-3系脂肪酸dhaepaα-リノレン酸">n-3系脂肪酸（DHA、EPA、α-リノレン酸）<a href="https://createcentury.github.io/blog/2#n-3%E7%B3%BB%E8%84%82%E8%82%AA%E9%85%B8dhaepa%CE%B1-%E3%83%AA%E3%83%8E%E3%83%AC%E3%83%B3%E9%85%B8" class="hash-link" aria-label="Direct link to n-3系脂肪酸（DHA、EPA、α-リノレン酸）" title="Direct link to n-3系脂肪酸（DHA、EPA、α-リノレン酸）" translate="no">​</a></h3>
<ul>
<li class=""><strong>役割</strong>: 脳の神経細胞の膜を柔軟に保ち、認知機能や炎症抑制に寄与。サバ缶や青魚に豊富</li>
<li class=""><strong>目安量</strong>: 2.0 g/日 以上</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="脂質の質についての注意">脂質の質についての注意<a href="https://createcentury.github.io/blog/2#%E8%84%82%E8%B3%AA%E3%81%AE%E8%B3%AA%E3%81%AB%E3%81%A4%E3%81%84%E3%81%A6%E3%81%AE%E6%B3%A8%E6%84%8F" class="hash-link" aria-label="Direct link to 脂質の質についての注意" title="Direct link to 脂質の質についての注意" translate="no">​</a></h3>
<ul>
<li class=""><strong>飽和脂肪酸</strong>: 総エネルギーの <strong>10% 相当以下</strong> が目標。ステーキ（動物性脂質）に偏りすぎるとここが超過するため、魚や植物性オイルへ分散投資が推奨</li>
<li class=""><strong>n-6系脂肪酸</strong>（リノール酸など）: 27歳男性の目安量は <strong>11 g/日</strong>。主に植物油に含まれる</li>
<li class=""><strong>コレステロール</strong>: かつては摂取制限があったが、現在は「脂質異常症の重症化予防」目的で、可能な限り控える、あるいは脂質の質（飽和脂肪酸）に気をつけるという方針</li>
</ul>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="1日の食事プラン">1日の食事プラン<a href="https://createcentury.github.io/blog/2#1%E6%97%A5%E3%81%AE%E9%A3%9F%E4%BA%8B%E3%83%97%E3%83%A9%E3%83%B3" class="hash-link" aria-label="Direct link to 1日の食事プラン" title="Direct link to 1日の食事プラン" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="朝または昼">朝（または昼）<a href="https://createcentury.github.io/blog/2#%E6%9C%9D%E3%81%BE%E3%81%9F%E3%81%AF%E6%98%BC" class="hash-link" aria-label="Direct link to 朝（または昼）" title="Direct link to 朝（または昼）" translate="no">​</a></h3>
<p>オートミール (30 g) + 納豆 (1パック) + プロテイン + 粉末食物繊維 (5 g) + サバの味噌煮缶 (1缶) + もち麦ごはん (1膳)</p>
<p>→ 食物繊維 11 g とタンパク質 35 g を即時デプロイ。n-3系脂肪酸 (DHA/EPA) で脳を最適化。亜鉛・鉄も補給。炭水化物をインフラ化。</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="間食">間食<a href="https://createcentury.github.io/blog/2#%E9%96%93%E9%A3%9F" class="hash-link" aria-label="Direct link to 間食" title="Direct link to 間食" translate="no">​</a></h3>
<p>ギリシャヨーグルト + ゆで卵 (1個)</p>
<p>→ アミノ酸スコア100で、血中のアミノ酸濃度を一定に維持。</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="夕食">夕食<a href="https://createcentury.github.io/blog/2#%E5%A4%95%E9%A3%9F" class="hash-link" aria-label="Direct link to 夕食" title="Direct link to 夕食" translate="no">​</a></h3>
<p>赤身肉ステーキ (200〜300 g) + ブロッコリー &amp; ニンジン (各100 g)</p>
<p>→ 亜鉛・鉄・ビタミンA・B群を大量投入。睡眠中のリカバリー準備。</p>
<hr>
<p><strong>コスト目安</strong>: 1日あたり <strong>約 1,800 円 〜 2,300 円</strong></p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="参考文献">参考文献<a href="https://createcentury.github.io/blog/2#%E5%8F%82%E8%80%83%E6%96%87%E7%8C%AE" class="hash-link" aria-label="Direct link to 参考文献" title="Direct link to 参考文献" translate="no">​</a></h2>
<ul>
<li class="">厚生労働省「<a href="https://www.mhlw.go.jp/stf/newpage_44138.html" target="_blank" rel="noopener noreferrer" class="">「日本人の食事摂取基準（2025年版）」策定検討会報告書</a>」</li>
</ul>
<hr>
<p><em>作成日: 2026-05-14 / 最終更新日: 2026-05-14</em></p>]]></content:encoded>
            <category>Health</category>
            <category>Nutrition</category>
        </item>
        <item>
            <title><![CDATA[#1 mamba]]></title>
            <link>https://createcentury.github.io/blog/1</link>
            <guid>https://createcentury.github.io/blog/1</guid>
            <pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[Mamba の中核は selective scan という線形時間の系列演算。論文だけ読むと「ハードウェア意識した実装」で済まされてしまう部分を、公式実装の forward カーネル csrc/selectivescan/selectivescanfwdkernel.cuh を読んで分解する。]]></description>
            <content:encoded><![CDATA[<p>Mamba の中核は <strong>selective scan</strong> という線形時間の系列演算。論文だけ読むと「ハードウェア意識した実装」で済まされてしまう部分を、公式実装の forward カーネル <a href="https://github.com/state-spaces/mamba/blob/main/csrc/selective_scan/selective_scan_fwd_kernel.cuh" target="_blank" rel="noopener noreferrer" class=""><code>csrc/selective_scan/selective_scan_fwd_kernel.cuh</code></a> を読んで分解する。</p>
<!-- -->
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="動かす漸化式">動かす漸化式<a href="https://createcentury.github.io/blog/1#%E5%8B%95%E3%81%8B%E3%81%99%E6%BC%B8%E5%8C%96%E5%BC%8F" class="hash-link" aria-label="Direct link to 動かす漸化式" title="Direct link to 動かす漸化式" translate="no">​</a></h2>
<p>Mamba の隠れ状態 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>h</mi><mi>t</mi></msub><mo>∈</mo><msup><mi mathvariant="double-struck">R</mi><mi>N</mi></msup></mrow><annotation encoding="application/x-tex">h_t \in \mathbb{R}^{N}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8413em"></span><span class="mord"><span class="mord mathbb">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.109em">N</span></span></span></span></span></span></span></span></span></span></span></span> は、入力依存の係数で動く線形時不変ではない SSM：</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mi>h</mi><mi>t</mi></msub><mo>=</mo><msub><mover accent="true"><mi>A</mi><mo>ˉ</mo></mover><mi>t</mi></msub><mtext> </mtext><msub><mi>h</mi><mrow><mi>t</mi><mo>−</mo><mn>1</mn></mrow></msub><mo>+</mo><msub><mover accent="true"><mi>B</mi><mo>ˉ</mo></mover><mi>t</mi></msub><mtext> </mtext><msub><mi>x</mi><mi>t</mi></msub><mo separator="true">,</mo><mspace width="2em"></mspace><msub><mi>y</mi><mi>t</mi></msub><mo>=</mo><msubsup><mi>C</mi><mi>t</mi><mi mathvariant="normal">⊤</mi></msubsup><msub><mi>h</mi><mi>t</mi></msub><mo>+</mo><mi>D</mi><mtext> </mtext><msub><mi>x</mi><mi>t</mi></msub></mrow><annotation encoding="application/x-tex">h_t = \bar{A}_t\, h_{t-1} + \bar{B}_t\, x_t,\qquad y_t = C_t^\top h_t + D\, x_t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.0284em;vertical-align:-0.2083em"></span><span class="mord"><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8201em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord mathnormal">A</span></span><span style="top:-3.2523em"><span class="pstrut" style="height:3em"></span><span class="accent-body" style="left:-0.1111em"><span class="mord">ˉ</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">t</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1.0145em;vertical-align:-0.1944em"></span><span class="mord"><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8201em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord mathnormal" style="margin-right:0.0502em">B</span></span><span style="top:-3.2523em"><span class="pstrut" style="height:3em"></span><span class="accent-body" style="left:-0.1667em"><span class="mord">ˉ</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:-0.0502em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:2em"></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0359em">y</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:-0.0359em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.1461em;vertical-align:-0.247em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.0715em">C</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8991em"><span style="top:-2.453em;margin-left:-0.0715em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">⊤</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.247em"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord mathnormal" style="margin-right:0.0278em">D</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span></span>
<p><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi mathvariant="normal">Δ</mi><mi>t</mi></msub></mrow><annotation encoding="application/x-tex">\Delta_t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord">Δ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> をステップサイズとして、連続系から離散化する典型は ZOH だが、Mamba 公式実装は近似的に</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><msub><mover accent="true"><mi>A</mi><mo>ˉ</mo></mover><mi>t</mi></msub><mo>=</mo><mi>exp</mi><mo>⁡</mo><mo stretchy="false">(</mo><msub><mi mathvariant="normal">Δ</mi><mi>t</mi></msub><mi>A</mi><mo stretchy="false">)</mo><mo separator="true">,</mo><mspace width="2em"></mspace><msub><mover accent="true"><mi>B</mi><mo>ˉ</mo></mover><mi>t</mi></msub><mo>≈</mo><msub><mi mathvariant="normal">Δ</mi><mi>t</mi></msub><mi>B</mi></mrow><annotation encoding="application/x-tex">\bar{A}_t = \exp(\Delta_t A),\qquad \bar{B}_t \approx \Delta_t B</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.9701em;vertical-align:-0.15em"></span><span class="mord"><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8201em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord mathnormal">A</span></span><span style="top:-3.2523em"><span class="pstrut" style="height:3em"></span><span class="accent-body" style="left:-0.1111em"><span class="mord">ˉ</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.0701em;vertical-align:-0.25em"></span><span class="mop">exp</span><span class="mopen">(</span><span class="mord"><span class="mord">Δ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord mathnormal">A</span><span class="mclose">)</span><span class="mpunct">,</span><span class="mspace" style="margin-right:2em"></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8201em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord mathnormal" style="margin-right:0.0502em">B</span></span><span style="top:-3.2523em"><span class="pstrut" style="height:3em"></span><span class="accent-body" style="left:-0.1667em"><span class="mord">ˉ</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:-0.0502em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">≈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord">Δ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord mathnormal" style="margin-right:0.0502em">B</span></span></span></span></span>
<p>を使う（diagonal <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal">A</span></span></span></span> の各成分について scalar exp）。<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi mathvariant="normal">Δ</mi><mi>t</mi></msub><mo separator="true">,</mo><mi>B</mi><mo separator="true">,</mo><mi>C</mi></mrow><annotation encoding="application/x-tex">\Delta_t, B, C</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord"><span class="mord">Δ</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0502em">B</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0715em">C</span></span></span></span> は入力 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>x</mi><mi>t</mi></msub></mrow><annotation encoding="application/x-tex">x_t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.5806em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> から線形写像 + softplus などで都度作る — これが「<strong>selective</strong>」 (入力に応じてゲートが開閉する)。</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="なぜscanか">なぜ「scan」か<a href="https://createcentury.github.io/blog/1#%E3%81%AA%E3%81%9Cscan%E3%81%8B" class="hash-link" aria-label="Direct link to なぜ「scan」か" title="Direct link to なぜ「scan」か" translate="no">​</a></h2>
<p>漸化式 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>h</mi><mi>t</mi></msub><mo>=</mo><msub><mi>a</mi><mi>t</mi></msub><msub><mi>h</mi><mrow><mi>t</mi><mo>−</mo><mn>1</mn></mrow></msub><mo>+</mo><msub><mi>b</mi><mi>t</mi></msub></mrow><annotation encoding="application/x-tex">h_t = a_t h_{t-1} + b_t</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.9028em;vertical-align:-0.2083em"></span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal">h</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">t</span><span class="mbin mtight">−</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.8444em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2806em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> は <strong>左畳み込み</strong>だが、ペア <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo stretchy="false">(</mo><mi>a</mi><mo separator="true">,</mo><mi>b</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(a, b)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord mathnormal">a</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal">b</span><span class="mclose">)</span></span></span></span> に対する次の演算</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mo stretchy="false">(</mo><msub><mi>a</mi><mn>2</mn></msub><mo separator="true">,</mo><msub><mi>b</mi><mn>2</mn></msub><mo stretchy="false">)</mo><mo>∘</mo><mo stretchy="false">(</mo><msub><mi>a</mi><mn>1</mn></msub><mo separator="true">,</mo><msub><mi>b</mi><mn>1</mn></msub><mo stretchy="false">)</mo><mo>=</mo><mo stretchy="false">(</mo><msub><mi>a</mi><mn>2</mn></msub><msub><mi>a</mi><mn>1</mn></msub><mo separator="true">,</mo><mtext>&nbsp;</mtext><msub><mi>a</mi><mn>2</mn></msub><msub><mi>b</mi><mn>1</mn></msub><mo>+</mo><msub><mi>b</mi><mn>2</mn></msub><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">(a_2, b_2) \circ (a_1, b_1) = (a_2 a_1,\ a_2 b_1 + b_2)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">∘</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace">&nbsp;</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">a</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord"><span class="mord mathnormal">b</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span>
<p>は<strong>結合的</strong> (associative)。よって prefix scan (Blelloch 1990) で <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>O</mi><mo stretchy="false">(</mo><mi>log</mi><mo>⁡</mo><mi>T</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">O(\log T)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.0278em">O</span><span class="mopen">(</span><span class="mop">lo<span style="margin-right:0.0139em">g</span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.1389em">T</span><span class="mclose">)</span></span></span></span> 段の並列ステップに落とせる。Martin &amp; Cundy (2017) と S5 (Smith et al. 2022) はこの観察を線形 RNN・SSM に持ち込んだ。Mamba のカーネルもこれを GPU の <code>cub::BlockScan</code> で具体化している。</p>
<p>実装上の <code>thread_data[i]</code> の中身がまさにこのペア：</p>
<div class="language-cpp codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-cpp codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">// L221-222</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">thread_data</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">i</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">make_float2</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token function" style="color:#d73a49">exp2f</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">delta_vals</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">r</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">i</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> A_val</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">r</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain">                 </span><span class="token comment" style="color:#999988;font-style:italic">// a_i = exp(Δ A)</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token operator" style="color:#393A34">!</span><span class="token plain">kIsVariableB </span><span class="token operator" style="color:#393A34">?</span><span class="token plain"> delta_u_vals</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">r</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">i</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> B_vals</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">i</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> delta_u_vals</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">r</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">i</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic">// b_i = ΔB · u</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><br></div></code></pre></div></div>
<p><code>exp2f</code> が使われているのは、<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal">A</span></span></span></span> を読み込む際に <code>LOG2E</code> を一度かけておく前処理 (L174-179) があるため。<code>expf</code> より高速。</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="カーネルの全体構造">カーネルの全体構造<a href="https://createcentury.github.io/blog/1#%E3%82%AB%E3%83%BC%E3%83%8D%E3%83%AB%E3%81%AE%E5%85%A8%E4%BD%93%E6%A7%8B%E9%80%A0" class="hash-link" aria-label="Direct link to カーネルの全体構造" title="Direct link to カーネルの全体構造" translate="no">​</a></h2>
<p>ファイルは大きく3レイヤ：</p>
<table><thead><tr><th>役割</th><th>シンボル</th><th>行</th></tr></thead><tbody><tr><td>型・テンプレ定数</td><td><code>Selective_Scan_fwd_kernel_traits</code></td><td>L24-70</td></tr><tr><td>GPU カーネル本体</td><td><code>selective_scan_fwd_kernel</code></td><td>L72-308</td></tr><tr><td>Host ローンチ</td><td><code>selective_scan_fwd_launch</code> / <code>..._cuda</code></td><td>L310-376</td></tr></tbody></table>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="スレッドブロックの並び">スレッド/ブロックの並び<a href="https://createcentury.github.io/blog/1#%E3%82%B9%E3%83%AC%E3%83%83%E3%83%89%E3%83%96%E3%83%AD%E3%83%83%E3%82%AF%E3%81%AE%E4%B8%A6%E3%81%B3" class="hash-link" aria-label="Direct link to スレッド/ブロックの並び" title="Direct link to スレッド/ブロックの並び" translate="no">​</a></h3>
<div class="language-cpp codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-cpp codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">// L322</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">dim3 </span><span class="token function" style="color:#d73a49">grid</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">params</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">batch</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> params</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">dim </span><span class="token operator" style="color:#393A34">/</span><span class="token plain"> kNRows</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><br></div></code></pre></div></div>
<p>1つの CUDA ブロック = <code>(batch_id, dim_id)</code>。各ブロックは</p>
<ul>
<li class="">入力 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>u</mi><mo separator="true">,</mo><mi mathvariant="normal">Δ</mi><mo>∈</mo><msup><mi mathvariant="double-struck">R</mi><mi>T</mi></msup></mrow><annotation encoding="application/x-tex">u, \Delta \in \mathbb{R}^{T}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal">u</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord">Δ</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8413em"></span><span class="mord"><span class="mord mathbb">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.1389em">T</span></span></span></span></span></span></span></span></span></span></span></span> (1チャネル分)</li>
<li class="">重み <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>A</mi><mo>∈</mo><msup><mi mathvariant="double-struck">R</mi><mi>N</mi></msup></mrow><annotation encoding="application/x-tex">A \in \mathbb{R}^{N}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7224em;vertical-align:-0.0391em"></span><span class="mord mathnormal">A</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8413em"></span><span class="mord"><span class="mord mathbb">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.109em">N</span></span></span></span></span></span></span></span></span></span></span></span>、入力依存 <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>B</mi><mo separator="true">,</mo><mi>C</mi><mo>∈</mo><msup><mi mathvariant="double-struck">R</mi><mrow><mi>N</mi><mo>×</mo><mi>T</mi></mrow></msup></mrow><annotation encoding="application/x-tex">B, C \in \mathbb{R}^{N \times T}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8778em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0502em">B</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.0715em">C</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8413em"></span><span class="mord"><span class="mord mathbb">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.109em">N</span><span class="mbin mtight">×</span><span class="mord mathnormal mtight" style="margin-right:0.1389em">T</span></span></span></span></span></span></span></span></span></span></span></span></li>
</ul>
<p>を読んで、<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>y</mi><mo>∈</mo><msup><mi mathvariant="double-struck">R</mi><mi>T</mi></msup></mrow><annotation encoding="application/x-tex">y \in \mathbb{R}^{T}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7335em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.0359em">y</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8413em"></span><span class="mord"><span class="mord mathbb">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8413em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.1389em">T</span></span></span></span></span></span></span></span></span></span></span></span> を返す。<code>kNThreads</code> が <code>seqlen</code> に応じて 32〜128 で切り替わる：</p>
<div class="language-cpp codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-cpp codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">// L353-364</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">params</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">seqlen </span><span class="token operator" style="color:#393A34">&lt;=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">128</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">  </span><span class="token generic-function function" style="color:#d73a49">launch</span><span class="token generic-function generic class-name operator" style="color:#393A34">&lt;</span><span class="token generic-function generic class-name number" style="color:#36acaa">32</span><span class="token generic-function generic class-name punctuation" style="color:#393A34">,</span><span class="token generic-function generic class-name">  </span><span class="token generic-function generic class-name number" style="color:#36acaa">4</span><span class="token generic-function generic class-name operator" style="color:#393A34">&gt;</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">else</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">seqlen </span><span class="token operator" style="color:#393A34">&lt;=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">256</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">    </span><span class="token generic-function function" style="color:#d73a49">launch</span><span class="token generic-function generic class-name operator" style="color:#393A34">&lt;</span><span class="token generic-function generic class-name number" style="color:#36acaa">32</span><span class="token generic-function generic class-name punctuation" style="color:#393A34">,</span><span class="token generic-function generic class-name">  </span><span class="token generic-function generic class-name number" style="color:#36acaa">8</span><span class="token generic-function generic class-name operator" style="color:#393A34">&gt;</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">else</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">seqlen </span><span class="token operator" style="color:#393A34">&lt;=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">512</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">    </span><span class="token generic-function function" style="color:#d73a49">launch</span><span class="token generic-function generic class-name operator" style="color:#393A34">&lt;</span><span class="token generic-function generic class-name number" style="color:#36acaa">32</span><span class="token generic-function generic class-name punctuation" style="color:#393A34">,</span><span class="token generic-function generic class-name"> </span><span class="token generic-function generic class-name number" style="color:#36acaa">16</span><span class="token generic-function generic class-name operator" style="color:#393A34">&gt;</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">else</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">if</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">seqlen </span><span class="token operator" style="color:#393A34">&lt;=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1024</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain">   </span><span class="token generic-function function" style="color:#d73a49">launch</span><span class="token generic-function generic class-name operator" style="color:#393A34">&lt;</span><span class="token generic-function generic class-name number" style="color:#36acaa">64</span><span class="token generic-function generic class-name punctuation" style="color:#393A34">,</span><span class="token generic-function generic class-name"> </span><span class="token generic-function generic class-name number" style="color:#36acaa">16</span><span class="token generic-function generic class-name operator" style="color:#393A34">&gt;</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">else</span><span class="token plain">                       </span><span class="token generic-function function" style="color:#d73a49">launch</span><span class="token generic-function generic class-name operator" style="color:#393A34">&lt;</span><span class="token generic-function generic class-name number" style="color:#36acaa">128</span><span class="token generic-function generic class-name punctuation" style="color:#393A34">,</span><span class="token generic-function generic class-name"> </span><span class="token generic-function generic class-name number" style="color:#36acaa">16</span><span class="token generic-function generic class-name operator" style="color:#393A34">&gt;</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><br></div></code></pre></div></div>
<p>短い系列で多くのスレッドを使うとオーバーヘッドが勝つのでチューニングされている。</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="チャンク化">チャンク化<a href="https://createcentury.github.io/blog/1#%E3%83%81%E3%83%A3%E3%83%B3%E3%82%AF%E5%8C%96" class="hash-link" aria-label="Direct link to チャンク化" title="Direct link to チャンク化" translate="no">​</a></h3>
<p>ブロック内 1 イテレーションで処理する系列長は</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mtext>kChunkSize</mtext><mo>=</mo><mtext>kNThreads</mtext><mo>×</mo><mtext>kNItems</mtext></mrow><annotation encoding="application/x-tex">\text{kChunkSize} = \text{kNThreads} \times \text{kNItems}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em"></span><span class="mord text"><span class="mord">kChunkSize</span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.7778em;vertical-align:-0.0833em"></span><span class="mord text"><span class="mord">kNThreads</span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.6944em"></span><span class="mord text"><span class="mord">kNItems</span></span></span></span></span></span>
<p>つまり最大でも 2048 トークン (128×16)。<code>seqlen</code> がこれを超える場合は</p>
<div class="language-cpp codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-cpp codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">// L137</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token keyword" style="color:#00009f">int</span><span class="token plain"> chunk </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"> chunk </span><span class="token operator" style="color:#393A34">&lt;</span><span class="token plain"> params</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">n_chunks</span><span class="token punctuation" style="color:#393A34">;</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">++</span><span class="token plain">chunk</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">}</span><br></div></code></pre></div></div>
<p>でチャンクをループする。チャンク境界で状態を引き継ぐのが <code>smem_running_prefix</code> (L100, L244-247, L257-258)：scan の最後の prefix を共有メモリに保存し、次チャンクの初期 prefix として読む。これにより HBM への状態 readback を避ける（<strong>ハードウェア意識</strong> の本体）。</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="1チャンク内の処理フロー">1チャンク内の処理フロー<a href="https://createcentury.github.io/blog/1#1%E3%83%81%E3%83%A3%E3%83%B3%E3%82%AF%E5%86%85%E3%81%AE%E5%87%A6%E7%90%86%E3%83%95%E3%83%AD%E3%83%BC" class="hash-link" aria-label="Direct link to 1チャンク内の処理フロー" title="Direct link to 1チャンク内の処理フロー" translate="no">​</a></h2>
<div class="language-text codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-text codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token plain">1. load_input で u, delta を coalesced 読み込み</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">2. delta_softplus 適用 → delta_vals</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">3. delta_u_vals = delta * u, out_vals = D * u (skip connection)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">4. for state_idx in [0, dstate):</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">     a. A_val を読み (LOG2E 倍済み)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">     b. B_val, C_val を読み (selective なら BlockLoad、定数なら直接)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">     c. thread_data = (exp2f(Δ A), ΔB u)  ← scan の入力タプル</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">     d. cub::BlockScan で InclusiveScan(SSMScanOp)</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        → running_prefix を carry</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">     e. out_vals += scan_output.y * C</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">5. store_output で y を書き出し</span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">6. (オプション) kHasZ: out *= z * sigmoid(z)  ← SwiGLU 風ゲート</span><br></div></code></pre></div></div>
<p>state 次元 <code>dstate</code> (N) は外側の <code>for</code> ループになっていることに注意。並列 scan は<strong>時間方向</strong>で取り、状態次元は逐次。これは <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>A</mi></mrow><annotation encoding="application/x-tex">A</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal">A</span></span></span></span> が対角行列だから各 state 成分が独立しているのを利用している（diagonal SSM の旨味）。</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="共有メモリ設計">共有メモリ設計<a href="https://createcentury.github.io/blog/1#%E5%85%B1%E6%9C%89%E3%83%A1%E3%83%A2%E3%83%AA%E8%A8%AD%E8%A8%88" class="hash-link" aria-label="Direct link to 共有メモリ設計" title="Direct link to 共有メモリ設計" translate="no">​</a></h2>
<p><code>Selective_Scan_fwd_kernel_traits::kSmemSize</code> (L63-69) は</p>
<ul>
<li class="">BlockLoad/Store の TempStorage (union 的に再利用)</li>
<li class="">BlockScan の TempStorage</li>
</ul>
<p>を合算したサイズ。さらにカーネル本体で <code>kSmemSize</code> の後ろに <code>MAX_DSTATE * sizeof(scan_t) * kNRows</code> を継ぎ足し、running prefix 用領域を確保する：</p>
<div class="language-cpp codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-cpp codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token comment" style="color:#999988;font-style:italic">// L321</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">kSmemSize </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Ktraits</span><span class="token double-colon punctuation" style="color:#393A34">::</span><span class="token plain">kSmemSize </span><span class="token operator" style="color:#393A34">+</span><span class="token plain"> kNRows </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> MAX_DSTATE </span><span class="token operator" style="color:#393A34">*</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">sizeof</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">scan_t</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">;</span><br></div></code></pre></div></div>
<p>48KB を超える場合は <code>cudaFuncSetAttribute</code> でダイナミック共有メモリの上限を引き上げる (L331-340)。</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="主要な最適化テクニック">主要な最適化テクニック<a href="https://createcentury.github.io/blog/1#%E4%B8%BB%E8%A6%81%E3%81%AA%E6%9C%80%E9%81%A9%E5%8C%96%E3%83%86%E3%82%AF%E3%83%8B%E3%83%83%E3%82%AF" class="hash-link" aria-label="Direct link to 主要な最適化テクニック" title="Direct link to 主要な最適化テクニック" translate="no">​</a></h2>
<ul>
<li class=""><strong><code>exp2f</code> + LOG2E 前処理</strong>: 浮動小数指数を <code>expf</code> でなく <code>exp2f</code> で。<code>A</code> 側に LOG2E を 1 回かけるだけで全 step に効く</li>
<li class=""><strong>WARP_TRANSPOSE BlockLoad</strong>: ストライドアクセスをワープ単位で転置して coalesce</li>
<li class=""><strong>WARP_SCANS BlockScan</strong>: warp-level 並列スキャンを採用 (RAKING より高速、コメント L60-61 に他の選択肢が残されている)</li>
<li class=""><strong>kIsEvenLen 分岐</strong>: 系列長がチャンクで割り切れる場合は <code>BLOCK_LOAD_DIRECT</code> に切替 (L47-59)</li>
<li class=""><strong>complex 数の自前 <code>cexp2f</code></strong>: PyTorch の <code>thrust::complex_exp</code> が遅いので独自実装 (L229)</li>
<li class=""><strong><code>kIsVariableB/C</code> の compile-time 分岐</strong>: selective 性が無いケース (LTI) の不要な BlockLoad を消去 (L186-212)</li>
<li class=""><strong><code>__launch_bounds__</code></strong>: <code>kMinBlocks=3 or 5</code> で occupancy を明示 (L33, L73)</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="観察と疑問">観察と疑問<a href="https://createcentury.github.io/blog/1#%E8%A6%B3%E5%AF%9F%E3%81%A8%E7%96%91%E5%95%8F" class="hash-link" aria-label="Direct link to 観察と疑問" title="Direct link to 観察と疑問" translate="no">​</a></h2>
<ul>
<li class=""><code>kNRows == 1</code> しか実機で検証されていない (L312-314)。複数の dim を 1 ブロックで処理して reuse する余地が残っているが現状未開拓</li>
<li class=""><code>delta_softplus</code> の境界 <code>&lt;= 20.f</code> (L160): 浮動オーバーフロー対策のショートカット</li>
<li class=""><code>MAX_DSTATE</code> の値は <code>selective_scan.h</code> 側にあるはず（読まないと不明）— state 次元の上限を決めている</li>
</ul>
<p>Mamba のアーキテクチャ自体は SSM + selectivity のみだが、論文の主張する「線形時間で実用速度」は、このカーネルの <strong>チャンク化 × 結合的 scan × 状態は SRAM</strong> の 3 点で初めて成立している。</p>
<hr>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="参考文献">参考文献<a href="https://createcentury.github.io/blog/1#%E5%8F%82%E8%80%83%E6%96%87%E7%8C%AE" class="hash-link" aria-label="Direct link to 参考文献" title="Direct link to 参考文献" translate="no">​</a></h2>
<ul>
<li class="">Albert Gu, Tri Dao. "<a href="https://arxiv.org/abs/2312.00752" target="_blank" rel="noopener noreferrer" class="">Mamba: Linear-Time Sequence Modeling with Selective State Spaces</a>" arXiv:2312.00752, 2023.</li>
<li class="">Guy E. Blelloch. "<a href="https://www.cs.cmu.edu/~guyb/papers/Ble93.pdf" target="_blank" rel="noopener noreferrer" class="">Prefix Sums and Their Applications</a>" Technical Report CMU-CS-90-190, 1993.</li>
<li class="">Eric Martin, Chris Cundy. "<a href="https://arxiv.org/abs/1709.04057" target="_blank" rel="noopener noreferrer" class="">Parallelizing Linear Recurrent Neural Nets Over Sequence Length</a>" arXiv:1709.04057, 2017.</li>
<li class="">Jimmy T.H. Smith, Andrew Warrington, Scott W. Linderman. "<a href="https://arxiv.org/abs/2208.04933" target="_blank" rel="noopener noreferrer" class="">Simplified State Space Layers for Sequence Modeling</a>" arXiv:2208.04933, 2022.</li>
<li class="">Wikipedia. "<a href="https://en.wikipedia.org/wiki/Leaky_integrator" target="_blank" rel="noopener noreferrer" class="">Leaky integrator</a>"</li>
<li class="">Wikipedia. "<a href="https://en.wikipedia.org/wiki/Zero-order_hold" target="_blank" rel="noopener noreferrer" class="">Zero-order hold</a>"</li>
</ul>
<hr>
<p><em>作成日: 2026-05-11 / 最終更新日: 2026-05-14</em></p>]]></content:encoded>
            <category>Machine Learning</category>
            <category>SSM</category>
            <category>CUDA</category>
        </item>
    </channel>
</rss>