Yuning AI
HomeProductTechnical BlogAbout Us
ENZH
Technical Blog

Field notes on model architecture, inference systems, and agent execution.

A running collection of bilingual writing on transformer principles, GPU inference structure, runtime harnesses, and other implementation-critical topics.

Model Principles

Principles of the Transformer Architecture

A practical reading of self-attention, token mixing, and the structural reasons Transformer became the base of modern language models.

April 22, 20267 min read
TransformerAttentionLLM Foundations
Architecture Research

Deeper Research on Transformer Architecture

Look past the headline design and examine scaling, positional schemes, efficiency variants, and the research directions that made Transformer more useful in practice.

April 21, 20268 min read
ScalingPositional EncodingArchitecture Research
Inference Infrastructure

NVIDIA H100 Architecture and Why Inference Needs LPX

A system-level look at H100: compute units, memory hierarchy, and why inference workloads depend on a carefully engineered low-precision execution path.

April 20, 20268 min read
NVIDIA H100InferenceLPX
Agent Engineering

Research on Harness for Agent Systems

Why agent systems need a harness layer: execution control, tool isolation, observability, retries, and the boundary between model reasoning and production software.

April 19, 20267 min read
HarnessAgentsRuntime
Yuning AI

A focused AI engineering company delivering model interfaces, agent systems, and industry-ready software.

HomeProductTechnical BlogAbout Us
京ICP备2026030115号-1京公网安备11010802048803号