Skip to content

Add automatic latency sensitivity inference#1618

Merged
rapids-bot[bot] merged 16 commits intoNVIDIA:developfrom
dnandakumar-nv:distributed-obs
Feb 20, 2026
Merged

Add automatic latency sensitivity inference#1618
rapids-bot[bot] merged 16 commits intoNVIDIA:developfrom
dnandakumar-nv:distributed-obs

Conversation

@dnandakumar-nv
Copy link
Copy Markdown
Contributor

@dnandakumar-nv dnandakumar-nv commented Feb 20, 2026

Summary

Extends the prediction trie to automatically compute and assign latency sensitivity scores to each LLM call position in an agentic workflow. During profiling, the system analyzes execution traces to determine which calls are most latency-critical and stores the computed sensitivity alongside existing predictions. At runtime, _DynamoTransport injects the auto-computed sensitivity for calls that lack a manual @latency_sensitive decorator — zero developer annotation required.

Motivation

Today, latency sensitivity is manually assigned via @latency_sensitive(n) decorators. This requires developers to reason about which parts of their workflow matter most for latency — something the profiler already has the data to determine automatically. The prediction trie already stores per-call-position metrics (remaining_calls, interarrival_ms, output_tokens) and feeds them into Dynamo routing hints. Adding latency sensitivity to this pipeline closes the loop: profile once, get intelligent routing automatically.

Approach

Three signals are computed per call position across all profiled traces, normalized to [0, 1], and combined with configurable weights:

  • Critical path weight (default w_critical=0.5): call_duration / workflow_duration — calls that consume a larger fraction of total wall-clock time score higher.
  • Downstream fan-out (default w_fanout=0.3): Remaining LLM calls after this one, normalized by max. A planning call that gates many downstream calls scores higher.
  • User-facing position (default w_position=0.2): U-shaped weighting that boosts the first call (time-to-first-activity) and last call (time-to-final-answer).

The composite score is mapped to an integer in [1, sensitivity_scale] and stored in LLMCallPrediction.latency_sensitivity. At runtime, the auto value is used only when no manual @latency_sensitive decorator is active — manual always wins.

Changes

Data model (nvidia_nat_core)

  • Added latency_sensitivity: int | None field to LLMCallPrediction
  • Backward-compatible serialization/deserialization (old trie JSON files load with None default)

Profiler config (nvidia_nat_core)

  • Added auto_sensitivity, sensitivity_scale, w_critical, w_fanout, w_position fields to PredictionTrieConfig

Trie builder (nvidia_nat_core)

  • Added SensitivityConfig dataclass and _compute_sensitivity_scores() method
  • Extended LLMCallContext with call_duration_s, workflow_duration_s, sensitivity_score
  • Added sensitivity accumulators to _NodeAccumulators for per-index and aggregated tracking
  • Added _score_to_sensitivity() to convert accumulated scores to clamped integers

Runtime injection (nvidia_nat_core)

  • Added has_manual_latency_sensitivity property to Context (checks if decorator stack has been pushed)
  • _DynamoTransport.handle_async_request reads prediction.latency_sensitivity and applies it when no manual override is active

Bug fixes

  • Fixed nvidia_nat_langchain test collection failure: updated 6 files importing from deprecated nat.eval.evaluator.evaluator_model to use nat.data_models.evaluator
  • Fixed nvidia_nat_core test collection failure: renamed test_profiler.pytest_profiler_config.py to avoid basename collision with nvidia_nat_eval's test_profiler.py

Documentation

  • Added prediction trie section to docs/source/improve-workflows/profiler.md covering trie contents, auto sensitivity scoring, override behavior, configuration, and runtime usage

Override Behavior

Scenario Effective sensitivity
No decorator, no trie prediction Default (2)
No decorator, trie says 4 Auto (4)
@latency_sensitive(5), trie says 3 Manual (5)
@latency_sensitive(1), trie says 4 Manual (1)

Test Plan

  • LLMCallPrediction field tests (default None, serialization round-trip, integer storage)
  • PredictionTrieConfig field tests (defaults, custom weights, auto_sensitivity toggle)
  • Context.has_manual_latency_sensitivity tests (default False, True after push, False after pop, nested)
  • Trie builder duration extraction tests (call_duration_s from span timestamps, workflow_duration_s from trace span)
  • Composite sensitivity scoring tests (single call, multi-call with all three signals, disabled config)
  • Serialization backward compatibility (old JSON without latency_sensitivity loads as None)
  • DynamoTransport auto injection test (applies auto value, skips when manual override active, skips when None)

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features

    • Adds a prediction trie to the profiler with configurable auto-sensitivity scoring; emits prediction_trie.json when enabled. Auto-computed latency-sensitivity influences routing hints, request priority, and is propagated to runtime hints/headers; manual latency annotations override auto values.
  • Documentation

    • New comprehensive guide covering the prediction trie, signals, configuration, enabling, runtime usage, examples, and manual override behavior.
  • Tests

    • Expanded coverage for sensitivity scoring, serialization, trie building, transport behavior, and manual override handling.

dnandakumar-nv and others added 9 commits February 19, 2026 18:47
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add configuration fields (auto_sensitivity, sensitivity_scale, w_critical,
w_fanout, w_position) that control the sensitivity scoring algorithm.
These values will be passed to PredictionTrieBuilder in a later task.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows the runtime (_DynamoTransport) to distinguish between default
latency sensitivity (stack=[2]) and an explicitly set value via
@latency_sensitive decorator (stack length > 1).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ransport

When a prediction from the trie includes a latency_sensitivity value and no
manual @latency_sensitive decorator is active, the transport now uses the
auto-computed sensitivity instead of the context default.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reorganized lines to improve readability, making structure clearer and easier to maintain. Adjusted imports, argument splitting, and calculations for better alignment with style guidelines, without changing functionality.

Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>
Move evaluator-related models from `nat.eval.evaluator` to `nat.data_models` to improve module organization. Remove outdated `test_profiler.py` and update documentation to include details on the new `prediction_trie` functionality.

Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>
@dnandakumar-nv dnandakumar-nv requested a review from a team as a code owner February 20, 2026 00:18
@dnandakumar-nv dnandakumar-nv added feature request New feature or request non-breaking Non-breaking change labels Feb 20, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Feb 20, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 20, 2026

Walkthrough

Adds auto-computed per-call latency_sensitivity into the prediction_trie (configurable via profiler.yaml), persists it in serialized trie output, and at runtime Dynamo LLM applies the auto sensitivity unless a manual latency sensitivity override is active in Context.

Changes

Cohort / File(s) Summary
Documentation
docs/source/improve-workflows/profiler.md
New "Prediction Trie and Dynamo Routing Hints" section, examples, YAML keys, and note that prediction_trie.json is emitted when enabled.
Profiler config & data models
packages/nvidia_nat_core/src/nat/data_models/profiler.py, packages/nvidia_nat_core/src/nat/profiler/prediction_trie/data_models.py
Added PredictionTrieConfig fields: auto_sensitivity, sensitivity_scale, w_critical, w_fanout, w_position. Added optional latency_sensitivity to LLMCallPrediction.
Trie scoring engine
packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py
New SensitivityConfig; LLMCallContext extended with durations and sensitivity_score; accumulators track sensitivity; added _compute_sensitivity_scores and _score_to_sensitivity; builder populates latency_sensitivity when configured.
Serialization
packages/nvidia_nat_core/src/nat/profiler/prediction_trie/serialization.py
Deserialization updated to preserve and pass latency_sensitivity into LLMCallPrediction for per-index and any-index entries.
Context & runtime integration
packages/nvidia_nat_core/src/nat/builder/context.py, packages/nvidia_nat_core/src/nat/llm/dynamo_llm.py
Added Context.has_manual_latency_sensitivity property; dynamo_llm uses prediction.latency_sensitivity when present and no manual override, and propagates sensitivity into agent hints and latency-sensitivity HTTP header (guarded).
Profiler runner integration
packages/nvidia_nat_eval/src/nat/plugins/eval/profiler/profile_runner.py
PredictionTrieBuilder now optionally receives SensitivityConfig when trie_config.auto_sensitivity is enabled.
Tests
packages/nvidia_nat_core/tests/nat/builder/test_context.py, packages/nvidia_nat_core/tests/nat/llm/test_dynamo_llm.py, packages/nvidia_nat_core/tests/nat/profiler/prediction_trie/*
New and expanded tests for Context manual-sensitivity stack, dynamo_llm auto vs manual precedence and header/hint propagation, LLMCallPrediction latency_sensitivity defaults and serialization round-trips, and trie builder sensitivity computation/aggregation and any-index behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Tracer as Execution Tracer
    participant Builder as PredictionTrie Builder
    participant Trie as Prediction Trie (storage)
    participant LLM as Dynamo LLM
    participant Context as Execution Context
    participant Router as Routing Engine

    Tracer->>Builder: add_trace(spans, sensitivity_config?)
    Builder->>Builder: _extract_llm_contexts() -> compute call_duration_s, workflow_duration
    Builder->>Builder: _compute_sensitivity_scores() -> per-call sensitivity_score
    Builder->>Builder: _score_to_sensitivity() -> latency_sensitivity (int | None)
    Builder->>Trie: build() store predictions with latency_sensitivity
    Trie->>Trie: save_prediction_trie() (serialize latency_sensitivity)

    LLM->>Trie: load_prediction_trie()
    LLM->>LLM: retrieve prediction for call index
    LLM->>Context: check has_manual_latency_sensitivity
    alt Manual override active
        LLM->>Router: inject manual latency_sensitivity hint
    else No manual override
        LLM->>Router: inject prediction.latency_sensitivity hint (if present)
    end
    Router->>Router: compute routing priority from sensitivity
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.05% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add automatic latency sensitivity inference' is concise, descriptive, uses imperative mood, and is 43 characters—well within the 72-character limit. It clearly summarizes the main feature added (automatic latency sensitivity computation).
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Updated import paths for `LLMCallPrediction` and `PredictionMetrics` to reflect their new location under `nat.plugins.eval.profiler`. This ensures compatibility with the updated module structure.

Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>
# Conflicts:
#	packages/nvidia_nat_core/tests/nat/profiler/prediction_trie/test_trie_builder.py
@dnandakumar-nv
Copy link
Copy Markdown
Contributor Author

/ok to test 32773c4

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (6)
docs/source/improve-workflows/profiler.md (1)

74-74: Pre-existing: capitalize Toolkit in body text.

"the toolkit will automatically apply""the Toolkit will automatically apply". This is a pre-existing occurrence not introduced by this PR, but since the file is being modified it is worth cleaning up.

✏️ Proposed fix
-Simply specify which NeMo Agent Toolkit supported frameworks you will be using anywhere in your workflow (including tools) upon registration and the toolkit will automatically apply the appropriate profiling decorators at build time.
+Simply specify which NeMo Agent Toolkit supported frameworks you will be using anywhere in your workflow (including tools) upon registration and the Toolkit will automatically apply the appropriate profiling decorators at build time.

Based on learnings: "Starting from commit 37d9f33 (Jan 29, 2026), the lowercase 'toolkit' should no longer be used in body text, titles, or headings."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/improve-workflows/profiler.md` at line 74, The sentence
containing "the toolkit will automatically apply" should use the capitalized
form "Toolkit" per the new style rule; update that exact phrase to "the Toolkit
will automatically apply" (and search the surrounding body text in this document
for other lowercase "toolkit" occurrences and capitalize them) so all body text,
titles, and headings follow the new convention introduced in commit 37d9f337.
packages/nvidia_nat_core/src/nat/builder/context.py (1)

392-399: Add a Returns: section to the docstring to comply with Google-style guidelines.

The sibling latency_sensitivity property (lines 379–390) includes a Returns: block; this new property should match for consistency.

📝 Proposed docstring fix
     `@property`
     def has_manual_latency_sensitivity(self) -> bool:
-        """True if any `@latency_sensitive` decorator is active in the current scope.
-
-        The default stack is [2] (length 1). Any push_latency_sensitivity call
-        adds to the stack, making length > 1.
-        """
+        """Returns `True` if any `@latency_sensitive` decorator is active in the current scope.
+
+        The default stack is `[2]` (length 1). Any `push_latency_sensitivity` call
+        adds to the stack, making length > 1.
+
+        Returns:
+            bool: `True` when at least one manual latency sensitivity push is active,
+                `False` otherwise.
+        """
         return len(self._context_state.latency_sensitivity_stack.get()) > 1

As per coding guidelines: "Provide Google-style docstrings for every public module, class, function and CLI command."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_core/src/nat/builder/context.py` around lines 392 - 399,
The docstring for the property has_manual_latency_sensitivity should include a
Google-style "Returns:" section like its sibling latency_sensitivity; update the
docstring in the has_manual_latency_sensitivity property to add a "Returns:"
block that states it returns a bool indicating whether any `@latency_sensitive`
decorator is active in the current scope, referencing the property name
has_manual_latency_sensitivity and keeping the existing brief description
intact.
packages/nvidia_nat_core/tests/nat/builder/test_context.py (1)

20-62: Extract the repeated ContextState reset into a pytest fixture.

The 3-line setup block (state = ContextState.get(); state._latency_sensitivity_stack.set(None); ctx = Context.get()) is copied into all four tests. Per coding guidelines this should live in a fixture, which also provides proper teardown isolation.

♻️ Proposed fixture refactor
+import pytest
+
 from nat.builder.context import Context
 from nat.builder.context import ContextState
 
 
+@pytest.fixture(name="fresh_ctx")
+def fixture_fresh_ctx():
+    """Yield a Context with a clean latency sensitivity stack."""
+    state = ContextState.get()
+    token = state._latency_sensitivity_stack.set(None)
+    try:
+        yield Context.get()
+    finally:
+        state._latency_sensitivity_stack.reset(token)
+
+
-def test_has_manual_latency_sensitivity_false_by_default():
+def test_has_manual_latency_sensitivity_false_by_default(fresh_ctx):
     """Default stack [2] means no manual decorator is active."""
-    state = ContextState.get()
-    # Reset to ensure fresh state
-    state._latency_sensitivity_stack.set(None)
-
-    ctx = Context.get()
-    assert ctx.has_manual_latency_sensitivity is False
+    assert fresh_ctx.has_manual_latency_sensitivity is False
 
 
-def test_has_manual_latency_sensitivity_true_when_pushed():
+def test_has_manual_latency_sensitivity_true_when_pushed(fresh_ctx):
     """After push_latency_sensitivity, a manual decorator is active."""
-    state = ContextState.get()
-    state._latency_sensitivity_stack.set(None)
-
-    ctx = Context.get()
-    with ctx.push_latency_sensitivity(5):
-        assert ctx.has_manual_latency_sensitivity is True
+    with fresh_ctx.push_latency_sensitivity(5):
+        assert fresh_ctx.has_manual_latency_sensitivity is True

Apply the same pattern to the remaining two tests.

As per coding guidelines: "Any frequently repeated code should be extracted into pytest fixtures" and "Pytest fixtures should define the name argument … fixture_ prefix."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_core/tests/nat/builder/test_context.py` around lines 20 -
62, Extract the repeated setup into a pytest fixture named fixture_reset_context
that gets the ContextState singleton, resets state._latency_sensitivity_stack to
None, yields a fresh Context via Context.get(), and then resets
state._latency_sensitivity_stack to None again for teardown; update each test
(test_has_manual_latency_sensitivity_false_by_default,
test_has_manual_latency_sensitivity_true_when_pushed,
test_has_manual_latency_sensitivity_false_after_pop,
test_has_manual_latency_sensitivity_nested) to accept the fixture_reset_context
argument and use the yielded Context instead of creating ContextState/Context
inline.
packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py (1)

29-36: SensitivityConfig docstring is missing a Google-style Attributes: section.

The class docstring is present but the three fields (scale, w_critical, w_fanout, w_position) have no documentation, violating the requirement for Google-style docstrings on every public API.

♻️ Suggested docstring update
 `@dataclass`
 class SensitivityConfig:
-    """Configuration for auto-sensitivity scoring."""
+    """Configuration for auto-sensitivity scoring.
+
+    Attributes:
+        scale: Integer upper bound for sensitivity scores; must be >= 1.
+        w_critical: Weight applied to the critical-path signal (call_duration / workflow_duration).
+        w_fanout: Weight applied to the downstream fan-out signal (remaining_calls / max_remaining).
+        w_position: Weight applied to the U-shaped call-position signal.
+    """

As per coding guidelines, "Provide Google-style docstrings for every public module, class, function and CLI command."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py`
around lines 29 - 36, The SensitivityConfig dataclass docstring lacks a
Google-style Attributes: section documenting its public fields; update the
docstring for SensitivityConfig to include an "Attributes:" block that briefly
describes each field (scale, w_critical, w_fanout, w_position), their types and
their default/meaning so it conforms to the project Google-style docstring
requirement and documents the public API.
packages/nvidia_nat_core/src/nat/data_models/profiler.py (1)

43-50: New PredictionTrieConfig fields lack docstring Attributes: section.

None of the five new public fields (auto_sensitivity, sensitivity_scale, w_critical, w_fanout, w_position) are documented in the class docstring or via Field(description=…). The same weight defaults are also silently duplicated from SensitivityConfig, creating a DRY risk if they diverge.

♻️ Suggested improvement
 class PredictionTrieConfig(BaseModel):
+    """Configuration for the prediction trie profiler component.
+
+    Attributes:
+        enable: Whether to enable prediction trie building.
+        output_filename: Filename for the serialized trie JSON.
+        auto_sensitivity: Automatically compute latency sensitivity scores during profiling.
+        sensitivity_scale: Integer scale for sensitivity scores; must be >= 1.
+        w_critical: Weight for the critical-path signal in [0, 1].
+        w_fanout: Weight for the downstream fan-out signal in [0, 1].
+        w_position: Weight for the U-shaped call-position signal in [0, 1].
+    """
     enable: bool = False
     output_filename: str = "prediction_trie.json"
-    auto_sensitivity: bool = True
-    sensitivity_scale: int = 5
-    w_critical: float = 0.5
-    w_fanout: float = 0.3
-    w_position: float = 0.2
+    auto_sensitivity: bool = True
+    sensitivity_scale: int = Field(default=5, ge=1, description="Integer scale for sensitivity scores.")
+    w_critical: float = Field(default=0.5, ge=0.0, description="Weight for critical-path signal.")
+    w_fanout: float = Field(default=0.3, ge=0.0, description="Weight for fan-out signal.")
+    w_position: float = Field(default=0.2, ge=0.0, description="Weight for call-position signal.")

As per coding guidelines, "Provide Google-style docstrings for every public module, class, function and CLI command" and "All public APIs require Python 3.11+ type hints on parameters and return values."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_core/src/nat/data_models/profiler.py` around lines 43 -
50, The new PredictionTrieConfig class has five undocumented public fields
(auto_sensitivity, sensitivity_scale, w_critical, w_fanout, w_position) and
duplicates default weight values from SensitivityConfig; update the
PredictionTrieConfig docstring to include a Google-style "Attributes:" section
describing each field (purpose, types, and default), and/or add
Field(description=...) to each Pydantic field; to avoid DRY, reference or import
the canonical defaults from SensitivityConfig (or a shared constant) instead of
hardcoding the same numeric defaults in PredictionTrieConfig so the defaults
remain single-source-of-truth.
packages/nvidia_nat_core/tests/nat/profiler/prediction_trie/test_trie_builder.py (1)

144-162: Tests exercise private method _extract_llm_contexts directly.

Both test_extract_contexts_include_call_duration and test_extract_contexts_include_workflow_duration call builder._extract_llm_contexts(simple_trace). Testing private methods couples the tests to internal implementation details; any rename or signature change breaks the tests without breaking public behavior.

The same assertions can be reached through the public add_tracebuild() path by checking the computed call_duration_s effect on latency_sensitivity values, or by exposing the needed invariants via a thin public helper if they're genuinely important to contract-test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/nvidia_nat_core/tests/nat/profiler/prediction_trie/test_trie_builder.py`
around lines 144 - 162, Tests currently call the private method
_extract_llm_contexts on PredictionTrieBuilder; replace this with the public API
by feeding the trace through add_trace(simple_trace) and then calling build()
(or otherwise using the builder's public methods) to obtain the resulting
contexts/results and assert on call_duration_s/workflow_duration_s effects
(e.g., via the produced LLMCallContext-like objects or latency_sensitivity
values); alternatively, if the duration values must be directly asserted, add a
small public helper method (e.g., extract_llm_contexts or get_llm_contexts) on
PredictionTrieBuilder that wraps the private implementation, then update tests
to call that public method instead of _extract_llm_contexts to avoid coupling to
the private implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/improve-workflows/profiler.md`:
- Around line 242-247: The table contains an unexplained magic number "Default
(2)"; update the surrounding prose in profiler.md to explicitly state that when
no decorator and no `trie` prediction are present the runtime falls back to a
built-in default sensitivity of 2, and optionally reference the relevant
configuration field (e.g., `sensitivity_scale` or the config entry that defines
defaults) so readers know the source of the value; search for the "Default (2)"
string and edit the paragraph(s) before or after the table to add one clear
sentence like: "When no decorator is active and no trie prediction is available,
the runtime falls back to a built-in default sensitivity of 2 (see
`sensitivity_scale`)."

In `@packages/nvidia_nat_core/src/nat/data_models/profiler.py`:
- Around line 46-50: PredictionTrieConfig currently defines unused sensitivity
fields (auto_sensitivity, sensitivity_scale, w_critical, w_fanout, w_position)
that are never passed into the runtime builder; update ProfilerRunner to either
remove these fields or fully wire them into PredictionTrieBuilder: if removing,
delete the five fields and any references; if integrating, convert
PredictionTrieConfig -> SensitivityConfig (ensure sensitivity_scale maps to
SensitivityConfig.scale) and pass that SensitivityConfig instance into
PredictionTrieBuilder() where ProfilerRunner currently constructs it, add
Pydantic validators on sensitivity_scale (>=1) and non-negative weights, and add
Field(description=...) for each sensitivity field to match docs; also fix the
field-name mismatch between sensitivity_scale and scale.

In `@packages/nvidia_nat_core/src/nat/llm/dynamo_llm.py`:
- Around line 567-575: The code silently swallows exceptions when checking
prediction.latency_sensitivity and redundantly calls Context.get() again; update
the block to reuse the already-defined ctx (from earlier in the scope), avoid a
blind except: pass, and catch only expected errors (e.g., AttributeError or
RuntimeError) while logging the failure with full traceback using the module
logger (logger.exception or similar) instead of passing; ensure the logic still
only assigns latency_sensitivity = prediction.latency_sensitivity when
prediction.latency_sensitivity is not None and not
ctx.has_manual_latency_sensitivity.

In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py`:
- Around line 147-177: The method _compute_sensitivity_scores assigns cfg =
self._sensitivity_config which is typed Optional and causes pyright
reportOptionalMemberAccess; fix by narrowing or guarding before use: at the
start of _compute_sensitivity_scores check if self._sensitivity_config is None
and return early (or assert/raise) so cfg is non-None for the body, then use
cfg.w_critical / cfg.w_fanout / cfg.w_position safely; reference
self._sensitivity_config and cfg in the change so callers like add_trace remain
consistent.
- Around line 269-275: The _score_to_sensitivity method currently uses Python's
built-in round(), which implements banker's rounding and can map .5 midpoints to
unexpected integers; change the rounding to round-half-up semantics by replacing
round(mean_score * scale) with an explicit half-up computation (e.g., compute
mean_score * scale, add 0.5 and take floor) so that midpoint values always round
up; update the return expression in _score_to_sensitivity (referencing acc,
acc.compute_metrics().mean, and self._sensitivity_config.scale) to use this new
half-up logic and keep the clamping with max(1, min(scale, ...)).

---

Nitpick comments:
In `@docs/source/improve-workflows/profiler.md`:
- Line 74: The sentence containing "the toolkit will automatically apply" should
use the capitalized form "Toolkit" per the new style rule; update that exact
phrase to "the Toolkit will automatically apply" (and search the surrounding
body text in this document for other lowercase "toolkit" occurrences and
capitalize them) so all body text, titles, and headings follow the new
convention introduced in commit 37d9f337.

In `@packages/nvidia_nat_core/src/nat/builder/context.py`:
- Around line 392-399: The docstring for the property
has_manual_latency_sensitivity should include a Google-style "Returns:" section
like its sibling latency_sensitivity; update the docstring in the
has_manual_latency_sensitivity property to add a "Returns:" block that states it
returns a bool indicating whether any `@latency_sensitive` decorator is active in
the current scope, referencing the property name has_manual_latency_sensitivity
and keeping the existing brief description intact.

In `@packages/nvidia_nat_core/src/nat/data_models/profiler.py`:
- Around line 43-50: The new PredictionTrieConfig class has five undocumented
public fields (auto_sensitivity, sensitivity_scale, w_critical, w_fanout,
w_position) and duplicates default weight values from SensitivityConfig; update
the PredictionTrieConfig docstring to include a Google-style "Attributes:"
section describing each field (purpose, types, and default), and/or add
Field(description=...) to each Pydantic field; to avoid DRY, reference or import
the canonical defaults from SensitivityConfig (or a shared constant) instead of
hardcoding the same numeric defaults in PredictionTrieConfig so the defaults
remain single-source-of-truth.

In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py`:
- Around line 29-36: The SensitivityConfig dataclass docstring lacks a
Google-style Attributes: section documenting its public fields; update the
docstring for SensitivityConfig to include an "Attributes:" block that briefly
describes each field (scale, w_critical, w_fanout, w_position), their types and
their default/meaning so it conforms to the project Google-style docstring
requirement and documents the public API.

In `@packages/nvidia_nat_core/tests/nat/builder/test_context.py`:
- Around line 20-62: Extract the repeated setup into a pytest fixture named
fixture_reset_context that gets the ContextState singleton, resets
state._latency_sensitivity_stack to None, yields a fresh Context via
Context.get(), and then resets state._latency_sensitivity_stack to None again
for teardown; update each test
(test_has_manual_latency_sensitivity_false_by_default,
test_has_manual_latency_sensitivity_true_when_pushed,
test_has_manual_latency_sensitivity_false_after_pop,
test_has_manual_latency_sensitivity_nested) to accept the fixture_reset_context
argument and use the yielded Context instead of creating ContextState/Context
inline.

In
`@packages/nvidia_nat_core/tests/nat/profiler/prediction_trie/test_trie_builder.py`:
- Around line 144-162: Tests currently call the private method
_extract_llm_contexts on PredictionTrieBuilder; replace this with the public API
by feeding the trace through add_trace(simple_trace) and then calling build()
(or otherwise using the builder's public methods) to obtain the resulting
contexts/results and assert on call_duration_s/workflow_duration_s effects
(e.g., via the produced LLMCallContext-like objects or latency_sensitivity
values); alternatively, if the duration values must be directly asserted, add a
small public helper method (e.g., extract_llm_contexts or get_llm_contexts) on
PredictionTrieBuilder that wraps the private implementation, then update tests
to call that public method instead of _extract_llm_contexts to avoid coupling to
the private implementation.

Comment thread docs/source/improve-workflows/profiler.md
Comment thread packages/nvidia_nat_core/src/nat/data_models/profiler.py
Comment thread packages/nvidia_nat_core/src/nat/llm/dynamo_llm.py
Comment thread packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py Outdated
Corrected the formatting of "Interarrival" in the profiler documentation to include backticks for consistency. This ensures uniform styling across the document.

Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
docs/source/improve-workflows/profiler.md (1)

219-222: Inconsistent formatting: `Interarrival` uses backticks while sibling labels do not.

Interarrival is a plain English term in this context, not a code identifier. Wrapping it in backticks is inconsistent with the surrounding items.

✏️ Proposed fix
-- **`Interarrival` time**: Expected time in milliseconds until the next LLM call.
+- **Interarrival time**: Expected time in milliseconds until the next LLM call.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/improve-workflows/profiler.md` around lines 219 - 222, The list
item uses backticks around `Interarrival` which is inconsistent with the other
plain English labels; update the bullet label from "`Interarrival` time" to
"Interarrival time" (leave the surrounding items "Remaining calls", "Output
tokens", and "Latency sensitivity" unchanged) so all sibling labels share the
same plain-text formatting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/improve-workflows/profiler.md`:
- Line 217: The phrase uses an inanimate possessive ("workflow's call graph");
update both occurrences (the line referencing "workflow's call graph" and the
similar occurrence around line 230) to avoid "'s" with an inanimate noun by
rephrasing to e.g. "the call graph of the workflow" or "the workflow call
graph", and ensure the adjacent mentions of 'function path' and 'call index'
remain unchanged.
- Around line 280-284: Reword list item 3 to match the verb-first pattern of
items 1, 2, and 4: start with an imperative verb (e.g., "Use" or "Apply") and
then state the condition, referencing the same symbols as the other items; for
example, "Use the auto-computed latency_sensitivity from profiler predictions
for priority computation when no manual `@latency_sensitive` decorator is active."
Ensure the text mentions `latency_sensitivity` and `@latency_sensitive` and
keeps the concise style consistent with the Dynamo transport and
nvext.agent_hints context.

---

Duplicate comments:
In `@docs/source/improve-workflows/profiler.md`:
- Around line 242-247: The table's "Default (2)" magic number needs an explicit
explanation: state that the default effective sensitivity is 2 and that this
value comes from the system default (e.g., the DEFAULT_SENSITIVITY constant or
the default of the sensitivity_scale parameter), and update the surrounding
prose to mention where that default is defined/controlled (reference
sensitivity_scale and DEFAULT_SENSITIVITY) and the precedence rules with
`@latency_sensitive` and trie predictions so readers can see why rows like
`@latency_sensitive(...)`, `trie`, and `sensitivity_scale` interact as shown.

---

Nitpick comments:
In `@docs/source/improve-workflows/profiler.md`:
- Around line 219-222: The list item uses backticks around `Interarrival` which
is inconsistent with the other plain English labels; update the bullet label
from "`Interarrival` time" to "Interarrival time" (leave the surrounding items
"Remaining calls", "Output tokens", and "Latency sensitivity" unchanged) so all
sibling labels share the same plain-text formatting.

Comment thread docs/source/improve-workflows/profiler.md
Comment thread docs/source/improve-workflows/profiler.md
dnandakumar-nv and others added 2 commits February 19, 2026 19:45
This update clarifies the purpose of the `scale` parameter, making its role in sensitivity scoring more explicit. Adjustments were made across relevant modules, including configuration, trie building, and tests, to ensure consistency.

Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>
@dnandakumar-nv
Copy link
Copy Markdown
Contributor Author

/ok to test 20b7c3b

Copy link
Copy Markdown
Member

@willkill07 willkill07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with one comment related to formula.

Comment thread packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py Outdated
    Modified the scaling formula to ensure proper rounding behavior and prevent potential rounding errors. This change provides a more accurate mapping of mean scores to sensitivity scales.

Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py (1)

29-36: Consider adding __post_init__ validation for SensitivityConfig.

There are no guards against invalid inputs (e.g., sensitivity_scale <= 0 or negative weights). A sensitivity_scale of 0 would make _score_to_sensitivity always return 1, and negative weights would invert signal contributions. A lightweight __post_init__ would catch misconfigurations early.

Proposed validation
 `@dataclass`
 class SensitivityConfig:
     """Configuration for auto-sensitivity scoring."""
 
     sensitivity_scale: int = 5
     w_critical: float = 0.5
     w_fanout: float = 0.3
     w_position: float = 0.2
+
+    def __post_init__(self) -> None:
+        if self.sensitivity_scale < 1:
+            raise ValueError(f"sensitivity_scale must be >= 1, got {self.sensitivity_scale}")
+        if any(w < 0 for w in (self.w_critical, self.w_fanout, self.w_position)):
+            raise ValueError("Sensitivity weights must be non-negative")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py`
around lines 29 - 36, Add a __post_init__ to SensitivityConfig that validates
inputs: ensure sensitivity_scale is an int > 0, each weight (w_critical,
w_fanout, w_position) is >= 0, and that the sum of the weights is > 0 (or
normalize the weights to sum to 1 if you prefer consistent scoring). If any
check fails, raise a ValueError with a clear message. Implement this in the
SensitivityConfig dataclass so _score_to_sensitivity and other consumers get
guaranteed valid parameters.
packages/nvidia_nat_eval/src/nat/plugins/eval/profiler/profile_runner.py (1)

282-294: Inconsistent import path for SensitivityConfig.

PredictionTrieBuilder and save_prediction_trie are imported from the public package (nat.profiler.prediction_trie), but SensitivityConfig is imported directly from the internal module (nat.profiler.prediction_trie.trie_builder). For consistency and to reduce coupling to internal module structure, consider re-exporting SensitivityConfig from packages/nvidia_nat_core/src/nat/profiler/prediction_trie/__init__.py and importing it from the public package instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_eval/src/nat/plugins/eval/profiler/profile_runner.py`
around lines 282 - 294, The imports are inconsistent: PredictionTrieBuilder and
save_prediction_trie come from nat.profiler.prediction_trie but
SensitivityConfig is pulled from the internal trie_builder module; re-export
SensitivityConfig from the public package (add SensitivityConfig to
nat.profiler.prediction_trie.__init__.py exports) and then update the import in
profile_runner.py to import SensitivityConfig from nat.profiler.prediction_trie
alongside PredictionTrieBuilder and save_prediction_trie so all three symbols
come from the public API.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py`:
- Around line 130-132: The calculation of call_duration_s can become negative
when end_step.span_event_timestamp > end_step.event_timestamp; clamp the
duration to zero by replacing the raw subtraction with a non-negative value
(e.g., call_duration_s = max(0.0, end_step.event_timestamp -
end_step.span_event_timestamp if span exists else 0.0)) and ensure subsequent
uses (like computing critical_path_weight and the sensitivity score) use this
clamped call_duration_s so negative weights are never propagated.

---

Duplicate comments:
In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py`:
- Around line 147-177: The variable cfg is assigned from
self._sensitivity_config but still typed as SensitivityConfig | None, so narrow
it immediately in _compute_sensitivity_scores: after assigning cfg =
self._sensitivity_config, add a guard that returns (or raises) if cfg is None so
subsequent accesses like cfg.w_critical, cfg.w_fanout, and cfg.w_position are
safe; this removes the optional-member access warning and preserves existing
behavior of sensitivity_score assignment on ctx.sensitivity_score.
- Around line 269-275: The _score_to_sensitivity function still uses Python's
round() which implements banker's rounding; replace round(mean_score * scale)
with an explicit half-up rounding implementation (e.g., compute scaled =
mean_score * scale and use math.floor(scaled + 0.5) or equivalent) so .5 always
rounds up, then clamp with max(1, min(scale, ...)); update references in
_score_to_sensitivity (and any helper used) and import math if needed.

---

Nitpick comments:
In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py`:
- Around line 29-36: Add a __post_init__ to SensitivityConfig that validates
inputs: ensure sensitivity_scale is an int > 0, each weight (w_critical,
w_fanout, w_position) is >= 0, and that the sum of the weights is > 0 (or
normalize the weights to sum to 1 if you prefer consistent scoring). If any
check fails, raise a ValueError with a clear message. Implement this in the
SensitivityConfig dataclass so _score_to_sensitivity and other consumers get
guaranteed valid parameters.

In `@packages/nvidia_nat_eval/src/nat/plugins/eval/profiler/profile_runner.py`:
- Around line 282-294: The imports are inconsistent: PredictionTrieBuilder and
save_prediction_trie come from nat.profiler.prediction_trie but
SensitivityConfig is pulled from the internal trie_builder module; re-export
SensitivityConfig from the public package (add SensitivityConfig to
nat.profiler.prediction_trie.__init__.py exports) and then update the import in
profile_runner.py to import SensitivityConfig from nat.profiler.prediction_trie
alongside PredictionTrieBuilder and save_prediction_trie so all three symbols
come from the public API.

@dnandakumar-nv
Copy link
Copy Markdown
Contributor Author

/ok to test e946159

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py`:
- Around line 29-36: Add a Google-style docstring to the SensitivityConfig
dataclass that documents the Attributes: sensitivity_scale, w_critical,
w_fanout, and w_position; then implement a __post_init__ method on
SensitivityConfig that (1) imports math (alongside existing imports) and
validates sensitivity_scale is an int >= 1, and (2) validates the three weight
fields form a valid probability distribution (e.g., sum to 1.0 within a small
epsilon like math.isclose) and raise a ValueError with a clear message if checks
fail so callers are alerted rather than silently producing skewed sensitivity
buckets.

---

Duplicate comments:
In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py`:
- Around line 147-154: The method _compute_sensitivity_scores currently reads
self._sensitivity_config into cfg without narrowing, causing optional-member
access when later referencing cfg.w_critical, cfg.w_fanout, and cfg.w_position;
fix by adding an early guard at the top of _compute_sensitivity_scores that
returns immediately if self._sensitivity_config is None (or assign cfg =
self._sensitivity_config after that guard), ensuring subsequent uses of cfg are
on a non-None SensitivityConfig instance.
- Around line 130-132: The subtraction that computes call_duration_s in
trie_builder.py can produce negative values due to clock skew
(span_event_timestamp vs end_step.event_timestamp); update the calculation in
the TrieBuilder code to clamp negative durations to zero (e.g., replace direct
subtraction with a max(0.0, end_step.event_timestamp - span_start) pattern) so
call_duration_s cannot be negative and thus cannot propagate negative
critical_path_weight into the composite sensitivity score; adjust any downstream
assumptions in functions that reference call_duration_s or critical_path_weight
accordingly.
- Around line 269-275: The _score_to_sensitivity function uses Python's round()
which implements bankers rounding, causing midpoint mean_score values to map to
the wrong sensitivity bucket; change the rounding to explicit round-half-up
semantics when transforming mean_score * (scale - 1) (the variable computed from
sensitivity_scale and mean_score) — e.g., compute the product into a float
(e.g., prod = mean_score * (scale - 1)) and replace round(prod) with a
deterministic half-up operation (such as math.floor(prod + 0.5) or using Decimal
with ROUND_HALF_UP) before adding 1 and clamping between 1 and scale, keeping
the rest of the logic in _score_to_sensitivity unchanged.

@dnandakumar-nv
Copy link
Copy Markdown
Contributor Author

/merge

@rapids-bot rapids-bot Bot merged commit 9828975 into NVIDIA:develop Feb 20, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants