Skip to content

perf(format): cache compact row layout per nested slot#3717

Open
stevenschlansker wants to merge 2 commits into
apache:mainfrom
stevenschlansker:fory-perf-compact-row-layout
Open

perf(format): cache compact row layout per nested slot#3717
stevenschlansker wants to merge 2 commits into
apache:mainfrom
stevenschlansker:fory-perf-compact-row-layout

Conversation

@stevenschlansker
Copy link
Copy Markdown
Contributor

@stevenschlansker stevenschlansker commented May 30, 2026

Why?

Avoid redundant re-computation of compact codec offsets layout, saving cpu and memory allocations

and

fix(format): honor input offset when copying inline-struct bytes

What does this PR do?

perf(format): cache compact row layout per nested slot

Hoist offsets, widths, and nullability into CompactRowLayout, built
once per encoder. getStruct does no per-call lookup; inline-struct
reads go directly to the parent buffer; the writer reads schema-
derived state from the cache; compact rows skip the unused extData[]
allocation. For List<Struct> fields the parent layout also carries
the element layout so getArray(i) does not rebuild it.

Fix a latent inline-width bug in CompactBinaryRowWriter that was
masked by the prior slicing getStruct: writeUnaligned and
writeAlignedBytes dropped the input offset, so re-emitting a child
row via BinaryWriter.write(int, BinaryRow) copied the wrong bytes.

Related issues

AI Contribution Checklist

  • Substantial AI assistance was used in this PR: yes / no

AI Usage Disclosure

  • substantial_ai_assistance: yes
  • scope: all
  • affected_files_or_subsystems: Java row format
  • ai_review: ✔️
  • ai_review_artifacts:

One commit, perf(format): cache compact row layout per nested slot. Caches fixedOffsets, per-field fixedWidths, allFieldsNotNullable, and bitmapWidthInBytes in a new CompactRowLayout value, computed once per array (element type is fixed) or memoized per parent extData slot.
The getStruct hot path previously rebuilt fixedOffsets and re-walked allNotNullable on every call.
Recommendation: Approve. Numbers in the commit message match a real bottleneck (the int[] + allNotNullable walk per nested decode), the implementation preserves the existing wire behavior and ownership model, no public API regression, all format tests pass, and the new caching follows the existing extData-slot pattern rather than inventing a new mechanism.
— Reviewed by Claude (Opus 4.7), 2026-05-29

Skill used: I ran this through the fory-code-review skill, which delegated the diff inspection to a read-only review subagent loaded with the Fory review checklist, the lesson-derived red flags, and the Java runtime rules.
Verdict
No blockers. Recommend submission. Both commits are coherent and tested:

  • d5cbdc2 is a real, narrow bugfix: the three put/copyFrom(..., 0, numBytes) sites in CompactBinaryRowWriter were copying from the start of the source buffer instead of from baseOffset. After schema sorting, nested inline structs land at non-zero offsets, so this corrupted writes for any non-trivial schema. CompactCodecTest.testCopyInlineStructViaWriter exercises that exact failure mode. Wire format unchanged — only the source byte range was wrong.
  • 2b1825d precomputes a CompactRowLayout once per writer instead of recomputing field offsets and nested layouts on every write. Cache is per-CompactBinaryRowWriter instance, no map, no growth, no thread-safety surface (writers are already not thread-safe). CompactRowLayoutTest pins identity via assertSame at multiple nesting levels and covers the allFieldsNotNullable bitmap-skip path. CompactBinaryRow.getStruct also drops a per-call buffer.slice() allocation by pointing to the parent buffer at the right offset instead.
  • human_verification: ✔️
  • performance_verification: ✔️
  • provenance_license_confirmation: ✔️

Does this PR introduce any user-facing change?

Compact codec user enjoys more CPU and memory on business logic without even realizing it.
No public API or wire changes.

Benchmark

    RowFormatAllocationProbe bytes/op vs apache/main, compact mode,
    median of five:
      root    100208 -> 74305 (-25.9%)
      array     9032 ->  6629 (-26.6%)
      matrix   72005 -> 52760 (-26.7%)
      map       7744 ->  6401 (-17.3%)
      encode   55423 -> 41762 (-24.6%)
    Standard mode unchanged within noise.

CompactBinaryRowWriter.writeUnaligned(byte[]) / writeUnaligned(MemoryBuffer)
/ writeAlignedBytes(MemoryBuffer) passed 0 as the source offset on the
inline-fixed-width branch instead of the caller's offset/baseOffset. Any
copy from a non-zero source offset (e.g. re-emitting a struct read from
a sorted-schema row whose inline field sits past byte 0) wrote the wrong
bytes. Fall back to the caller's offset in all three methods.
@stevenschlansker stevenschlansker force-pushed the fory-perf-compact-row-layout branch from 4727dc7 to 7c138fe Compare May 30, 2026 05:39
CompactRowLayout precomputes per-schema fixed offsets, fixed widths,
bitmap width, and child layouts, and is shared across all
CompactBinaryRow instances built from the same schema. The writer holds
the cached layout and exposes it via BaseBinaryRowWriter.newRow(), so
each BinaryRowEncoder.decode call allocates a fresh BinaryRow over the
already-built layout tree rather than rebuilding it. A fresh row per
decode keeps lazy interface-backed decoders safe: those wrap and retain
the source BinaryRow, so reusing one row across decode() calls would
silently alias every previously returned proxy onto the next payload.
The unused Encoding.newRow(Schema) entry point is removed along with
the now-dead codecFactory field on BinaryRowEncoder.

Also drops the per-inline-struct getBuffer().slice(...) allocation in
CompactBinaryRow.getStruct by pointing the nested row directly at the
parent buffer, since all nested reads already add baseOffset.
@stevenschlansker stevenschlansker force-pushed the fory-perf-compact-row-layout branch from 7c138fe to 6292e96 Compare May 30, 2026 05:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant