Field notes on model architecture, inference systems, and agent execution.
A running collection of bilingual writing on transformer principles, GPU inference structure, runtime harnesses, and other implementation-critical topics.
Principles of the Transformer Architecture
A practical reading of self-attention, token mixing, and the structural reasons Transformer became the base of modern language models.
Deeper Research on Transformer Architecture
Look past the headline design and examine scaling, positional schemes, efficiency variants, and the research directions that made Transformer more useful in practice.
NVIDIA H100 Architecture and Why Inference Needs LPX
A system-level look at H100: compute units, memory hierarchy, and why inference workloads depend on a carefully engineered low-precision execution path.
Research on Harness for Agent Systems
Why agent systems need a harness layer: execution control, tool isolation, observability, retries, and the boundary between model reasoning and production software.