perf(format): cache compact row layout per nested slot by stevenschlansker · Pull Request #3717 · apache/fory

stevenschlansker · 2026-05-30T05:34:45Z

Why?

Avoid redundant re-computation of compact codec offsets layout, saving cpu and memory allocations

and

fix(format): honor input offset when copying inline-struct bytes

What does this PR do?

perf(format): cache compact row layout per nested slot

Hoist offsets, widths, and nullability into CompactRowLayout, built
once per encoder. getStruct does no per-call lookup; inline-struct
reads go directly to the parent buffer; the writer reads schema-
derived state from the cache; compact rows skip the unused extData[]
allocation. For List<Struct> fields the parent layout also carries
the element layout so getArray(i) does not rebuild it.

Fix a latent inline-width bug in CompactBinaryRowWriter that was
masked by the prior slicing getStruct: writeUnaligned and
writeAlignedBytes dropped the input offset, so re-emitting a child
row via BinaryWriter.write(int, BinaryRow) copied the wrong bytes.

Related issues

AI Contribution Checklist

Substantial AI assistance was used in this PR: yes / no

AI Usage Disclosure

substantial_ai_assistance: yes
scope: all
affected_files_or_subsystems: Java row format
ai_review: ✔️
ai_review_artifacts:

One commit, perf(format): cache compact row layout per nested slot. Caches fixedOffsets, per-field fixedWidths, allFieldsNotNullable, and bitmapWidthInBytes in a new CompactRowLayout value, computed once per array (element type is fixed) or memoized per parent extData slot.
The getStruct hot path previously rebuilt fixedOffsets and re-walked allNotNullable on every call.
Recommendation: Approve. Numbers in the commit message match a real bottleneck (the int[] + allNotNullable walk per nested decode), the implementation preserves the existing wire behavior and ownership model, no public API regression, all format tests pass, and the new caching follows the existing extData-slot pattern rather than inventing a new mechanism.
— Reviewed by Claude (Opus 4.7), 2026-05-29

Skill used: I ran this through the fory-code-review skill, which delegated the diff inspection to a read-only review subagent loaded with the Fory review checklist, the lesson-derived red flags, and the Java runtime rules.
Verdict
No blockers. Recommend submission. Both commits are coherent and tested:

d5cbdc2 is a real, narrow bugfix: the three put/copyFrom(..., 0, numBytes) sites in CompactBinaryRowWriter were copying from the start of the source buffer instead of from baseOffset. After schema sorting, nested inline structs land at non-zero offsets, so this corrupted writes for any non-trivial schema. CompactCodecTest.testCopyInlineStructViaWriter exercises that exact failure mode. Wire format unchanged — only the source byte range was wrong.

2b1825d precomputes a CompactRowLayout once per writer instead of recomputing field offsets and nested layouts on every write. Cache is per-CompactBinaryRowWriter instance, no map, no growth, no thread-safety surface (writers are already not thread-safe). CompactRowLayoutTest pins identity via assertSame at multiple nesting levels and covers the allFieldsNotNullable bitmap-skip path. CompactBinaryRow.getStruct also drops a per-call buffer.slice() allocation by pointing to the parent buffer at the right offset instead.

human_verification: ✔️
performance_verification: ✔️
provenance_license_confirmation: ✔️

Does this PR introduce any user-facing change?

Compact codec user enjoys more CPU and memory on business logic without even realizing it.
No public API or wire changes.

Benchmark

    RowFormatAllocationProbe bytes/op vs apache/main, compact mode,
    median of five:
      root    100208 -> 74305 (-25.9%)
      array     9032 ->  6629 (-26.6%)
      matrix   72005 -> 52760 (-26.7%)
      map       7744 ->  6401 (-17.3%)
      encode   55423 -> 41762 (-24.6%)
    Standard mode unchanged within noise.

CompactBinaryRowWriter.writeUnaligned(byte[]) / writeUnaligned(MemoryBuffer) / writeAlignedBytes(MemoryBuffer) passed 0 as the source offset on the inline-fixed-width branch instead of the caller's offset/baseOffset. Any copy from a non-zero source offset (e.g. re-emitting a struct read from a sorted-schema row whose inline field sits past byte 0) wrote the wrong bytes. Fall back to the caller's offset in all three methods.

CompactRowLayout precomputes per-schema fixed offsets, fixed widths, bitmap width, and child layouts, and is shared across all CompactBinaryRow instances built from the same schema. The writer holds the cached layout and exposes it via BaseBinaryRowWriter.newRow(), so each BinaryRowEncoder.decode call allocates a fresh BinaryRow over the already-built layout tree rather than rebuilding it. A fresh row per decode keeps lazy interface-backed decoders safe: those wrap and retain the source BinaryRow, so reusing one row across decode() calls would silently alias every previously returned proxy onto the next payload. The unused Encoding.newRow(Schema) entry point is removed along with the now-dead codecFactory field on BinaryRowEncoder. Also drops the per-inline-struct getBuffer().slice(...) allocation in CompactBinaryRow.getStruct by pointing the nested row directly at the parent buffer, since all nested reads already add baseOffset.

stevenschlansker requested a review from chaokunyang as a code owner May 30, 2026 05:34

stevenschlansker force-pushed the fory-perf-compact-row-layout branch from 4727dc7 to 7c138fe Compare May 30, 2026 05:39

stevenschlansker force-pushed the fory-perf-compact-row-layout branch from 7c138fe to 6292e96 Compare May 30, 2026 05:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(format): cache compact row layout per nested slot#3717

perf(format): cache compact row layout per nested slot#3717
stevenschlansker wants to merge 2 commits into
apache:mainfrom
stevenschlansker:fory-perf-compact-row-layout

stevenschlansker commented May 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stevenschlansker commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why?

What does this PR do?

Related issues

AI Contribution Checklist

Does this PR introduce any user-facing change?

Benchmark

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

stevenschlansker commented May 30, 2026 •

edited

Loading