Add automatic latency sensitivity inference#1618
Add automatic latency sensitivity inference#1618rapids-bot[bot] merged 16 commits intoNVIDIA:developfrom
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add configuration fields (auto_sensitivity, sensitivity_scale, w_critical, w_fanout, w_position) that control the sensitivity scoring algorithm. These values will be passed to PredictionTrieBuilder in a later task. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows the runtime (_DynamoTransport) to distinguish between default latency sensitivity (stack=[2]) and an explicitly set value via @latency_sensitive decorator (stack length > 1). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ransport When a prediction from the trie includes a latency_sensitivity value and no manual @latency_sensitive decorator is active, the transport now uses the auto-computed sensitivity instead of the context default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reorganized lines to improve readability, making structure clearer and easier to maintain. Adjusted imports, argument splitting, and calculations for better alignment with style guidelines, without changing functionality. Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>
Move evaluator-related models from `nat.eval.evaluator` to `nat.data_models` to improve module organization. Remove outdated `test_profiler.py` and update documentation to include details on the new `prediction_trie` functionality. Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>
WalkthroughAdds auto-computed per-call latency_sensitivity into the prediction_trie (configurable via profiler.yaml), persists it in serialized trie output, and at runtime Dynamo LLM applies the auto sensitivity unless a manual latency sensitivity override is active in Context. Changes
Sequence Diagram(s)sequenceDiagram
participant Tracer as Execution Tracer
participant Builder as PredictionTrie Builder
participant Trie as Prediction Trie (storage)
participant LLM as Dynamo LLM
participant Context as Execution Context
participant Router as Routing Engine
Tracer->>Builder: add_trace(spans, sensitivity_config?)
Builder->>Builder: _extract_llm_contexts() -> compute call_duration_s, workflow_duration
Builder->>Builder: _compute_sensitivity_scores() -> per-call sensitivity_score
Builder->>Builder: _score_to_sensitivity() -> latency_sensitivity (int | None)
Builder->>Trie: build() store predictions with latency_sensitivity
Trie->>Trie: save_prediction_trie() (serialize latency_sensitivity)
LLM->>Trie: load_prediction_trie()
LLM->>LLM: retrieve prediction for call index
LLM->>Context: check has_manual_latency_sensitivity
alt Manual override active
LLM->>Router: inject manual latency_sensitivity hint
else No manual override
LLM->>Router: inject prediction.latency_sensitivity hint (if present)
end
Router->>Router: compute routing priority from sensitivity
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Updated import paths for `LLMCallPrediction` and `PredictionMetrics` to reflect their new location under `nat.plugins.eval.profiler`. This ensures compatibility with the updated module structure. Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>
# Conflicts: # packages/nvidia_nat_core/tests/nat/profiler/prediction_trie/test_trie_builder.py
|
/ok to test 32773c4 |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (6)
docs/source/improve-workflows/profiler.md (1)
74-74: Pre-existing: capitalizeToolkitin body text.
"the toolkit will automatically apply"→"the Toolkit will automatically apply". This is a pre-existing occurrence not introduced by this PR, but since the file is being modified it is worth cleaning up.✏️ Proposed fix
-Simply specify which NeMo Agent Toolkit supported frameworks you will be using anywhere in your workflow (including tools) upon registration and the toolkit will automatically apply the appropriate profiling decorators at build time. +Simply specify which NeMo Agent Toolkit supported frameworks you will be using anywhere in your workflow (including tools) upon registration and the Toolkit will automatically apply the appropriate profiling decorators at build time.Based on learnings: "Starting from commit 37d9f33 (Jan 29, 2026), the lowercase 'toolkit' should no longer be used in body text, titles, or headings."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/source/improve-workflows/profiler.md` at line 74, The sentence containing "the toolkit will automatically apply" should use the capitalized form "Toolkit" per the new style rule; update that exact phrase to "the Toolkit will automatically apply" (and search the surrounding body text in this document for other lowercase "toolkit" occurrences and capitalize them) so all body text, titles, and headings follow the new convention introduced in commit 37d9f337.packages/nvidia_nat_core/src/nat/builder/context.py (1)
392-399: Add aReturns:section to the docstring to comply with Google-style guidelines.The sibling
latency_sensitivityproperty (lines 379–390) includes aReturns:block; this new property should match for consistency.📝 Proposed docstring fix
`@property` def has_manual_latency_sensitivity(self) -> bool: - """True if any `@latency_sensitive` decorator is active in the current scope. - - The default stack is [2] (length 1). Any push_latency_sensitivity call - adds to the stack, making length > 1. - """ + """Returns `True` if any `@latency_sensitive` decorator is active in the current scope. + + The default stack is `[2]` (length 1). Any `push_latency_sensitivity` call + adds to the stack, making length > 1. + + Returns: + bool: `True` when at least one manual latency sensitivity push is active, + `False` otherwise. + """ return len(self._context_state.latency_sensitivity_stack.get()) > 1As per coding guidelines: "Provide Google-style docstrings for every public module, class, function and CLI command."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/nvidia_nat_core/src/nat/builder/context.py` around lines 392 - 399, The docstring for the property has_manual_latency_sensitivity should include a Google-style "Returns:" section like its sibling latency_sensitivity; update the docstring in the has_manual_latency_sensitivity property to add a "Returns:" block that states it returns a bool indicating whether any `@latency_sensitive` decorator is active in the current scope, referencing the property name has_manual_latency_sensitivity and keeping the existing brief description intact.packages/nvidia_nat_core/tests/nat/builder/test_context.py (1)
20-62: Extract the repeated ContextState reset into a pytest fixture.The 3-line setup block (
state = ContextState.get(); state._latency_sensitivity_stack.set(None); ctx = Context.get()) is copied into all four tests. Per coding guidelines this should live in a fixture, which also provides proper teardown isolation.♻️ Proposed fixture refactor
+import pytest + from nat.builder.context import Context from nat.builder.context import ContextState +@pytest.fixture(name="fresh_ctx") +def fixture_fresh_ctx(): + """Yield a Context with a clean latency sensitivity stack.""" + state = ContextState.get() + token = state._latency_sensitivity_stack.set(None) + try: + yield Context.get() + finally: + state._latency_sensitivity_stack.reset(token) + + -def test_has_manual_latency_sensitivity_false_by_default(): +def test_has_manual_latency_sensitivity_false_by_default(fresh_ctx): """Default stack [2] means no manual decorator is active.""" - state = ContextState.get() - # Reset to ensure fresh state - state._latency_sensitivity_stack.set(None) - - ctx = Context.get() - assert ctx.has_manual_latency_sensitivity is False + assert fresh_ctx.has_manual_latency_sensitivity is False -def test_has_manual_latency_sensitivity_true_when_pushed(): +def test_has_manual_latency_sensitivity_true_when_pushed(fresh_ctx): """After push_latency_sensitivity, a manual decorator is active.""" - state = ContextState.get() - state._latency_sensitivity_stack.set(None) - - ctx = Context.get() - with ctx.push_latency_sensitivity(5): - assert ctx.has_manual_latency_sensitivity is True + with fresh_ctx.push_latency_sensitivity(5): + assert fresh_ctx.has_manual_latency_sensitivity is TrueApply the same pattern to the remaining two tests.
As per coding guidelines: "Any frequently repeated code should be extracted into pytest fixtures" and "Pytest fixtures should define the
nameargument …fixture_prefix."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/nvidia_nat_core/tests/nat/builder/test_context.py` around lines 20 - 62, Extract the repeated setup into a pytest fixture named fixture_reset_context that gets the ContextState singleton, resets state._latency_sensitivity_stack to None, yields a fresh Context via Context.get(), and then resets state._latency_sensitivity_stack to None again for teardown; update each test (test_has_manual_latency_sensitivity_false_by_default, test_has_manual_latency_sensitivity_true_when_pushed, test_has_manual_latency_sensitivity_false_after_pop, test_has_manual_latency_sensitivity_nested) to accept the fixture_reset_context argument and use the yielded Context instead of creating ContextState/Context inline.packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py (1)
29-36:SensitivityConfigdocstring is missing a Google-styleAttributes:section.The class docstring is present but the three fields (
scale,w_critical,w_fanout,w_position) have no documentation, violating the requirement for Google-style docstrings on every public API.♻️ Suggested docstring update
`@dataclass` class SensitivityConfig: - """Configuration for auto-sensitivity scoring.""" + """Configuration for auto-sensitivity scoring. + + Attributes: + scale: Integer upper bound for sensitivity scores; must be >= 1. + w_critical: Weight applied to the critical-path signal (call_duration / workflow_duration). + w_fanout: Weight applied to the downstream fan-out signal (remaining_calls / max_remaining). + w_position: Weight applied to the U-shaped call-position signal. + """As per coding guidelines, "Provide Google-style docstrings for every public module, class, function and CLI command."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py` around lines 29 - 36, The SensitivityConfig dataclass docstring lacks a Google-style Attributes: section documenting its public fields; update the docstring for SensitivityConfig to include an "Attributes:" block that briefly describes each field (scale, w_critical, w_fanout, w_position), their types and their default/meaning so it conforms to the project Google-style docstring requirement and documents the public API.packages/nvidia_nat_core/src/nat/data_models/profiler.py (1)
43-50: NewPredictionTrieConfigfields lack docstringAttributes:section.None of the five new public fields (
auto_sensitivity,sensitivity_scale,w_critical,w_fanout,w_position) are documented in the class docstring or viaField(description=…). The same weight defaults are also silently duplicated fromSensitivityConfig, creating a DRY risk if they diverge.♻️ Suggested improvement
class PredictionTrieConfig(BaseModel): + """Configuration for the prediction trie profiler component. + + Attributes: + enable: Whether to enable prediction trie building. + output_filename: Filename for the serialized trie JSON. + auto_sensitivity: Automatically compute latency sensitivity scores during profiling. + sensitivity_scale: Integer scale for sensitivity scores; must be >= 1. + w_critical: Weight for the critical-path signal in [0, 1]. + w_fanout: Weight for the downstream fan-out signal in [0, 1]. + w_position: Weight for the U-shaped call-position signal in [0, 1]. + """ enable: bool = False output_filename: str = "prediction_trie.json" - auto_sensitivity: bool = True - sensitivity_scale: int = 5 - w_critical: float = 0.5 - w_fanout: float = 0.3 - w_position: float = 0.2 + auto_sensitivity: bool = True + sensitivity_scale: int = Field(default=5, ge=1, description="Integer scale for sensitivity scores.") + w_critical: float = Field(default=0.5, ge=0.0, description="Weight for critical-path signal.") + w_fanout: float = Field(default=0.3, ge=0.0, description="Weight for fan-out signal.") + w_position: float = Field(default=0.2, ge=0.0, description="Weight for call-position signal.")As per coding guidelines, "Provide Google-style docstrings for every public module, class, function and CLI command" and "All public APIs require Python 3.11+ type hints on parameters and return values."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/nvidia_nat_core/src/nat/data_models/profiler.py` around lines 43 - 50, The new PredictionTrieConfig class has five undocumented public fields (auto_sensitivity, sensitivity_scale, w_critical, w_fanout, w_position) and duplicates default weight values from SensitivityConfig; update the PredictionTrieConfig docstring to include a Google-style "Attributes:" section describing each field (purpose, types, and default), and/or add Field(description=...) to each Pydantic field; to avoid DRY, reference or import the canonical defaults from SensitivityConfig (or a shared constant) instead of hardcoding the same numeric defaults in PredictionTrieConfig so the defaults remain single-source-of-truth.packages/nvidia_nat_core/tests/nat/profiler/prediction_trie/test_trie_builder.py (1)
144-162: Tests exercise private method_extract_llm_contextsdirectly.Both
test_extract_contexts_include_call_durationandtest_extract_contexts_include_workflow_durationcallbuilder._extract_llm_contexts(simple_trace). Testing private methods couples the tests to internal implementation details; any rename or signature change breaks the tests without breaking public behavior.The same assertions can be reached through the public
add_trace→build()path by checking the computedcall_duration_seffect onlatency_sensitivityvalues, or by exposing the needed invariants via a thin public helper if they're genuinely important to contract-test.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/nvidia_nat_core/tests/nat/profiler/prediction_trie/test_trie_builder.py` around lines 144 - 162, Tests currently call the private method _extract_llm_contexts on PredictionTrieBuilder; replace this with the public API by feeding the trace through add_trace(simple_trace) and then calling build() (or otherwise using the builder's public methods) to obtain the resulting contexts/results and assert on call_duration_s/workflow_duration_s effects (e.g., via the produced LLMCallContext-like objects or latency_sensitivity values); alternatively, if the duration values must be directly asserted, add a small public helper method (e.g., extract_llm_contexts or get_llm_contexts) on PredictionTrieBuilder that wraps the private implementation, then update tests to call that public method instead of _extract_llm_contexts to avoid coupling to the private implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/source/improve-workflows/profiler.md`:
- Around line 242-247: The table contains an unexplained magic number "Default
(2)"; update the surrounding prose in profiler.md to explicitly state that when
no decorator and no `trie` prediction are present the runtime falls back to a
built-in default sensitivity of 2, and optionally reference the relevant
configuration field (e.g., `sensitivity_scale` or the config entry that defines
defaults) so readers know the source of the value; search for the "Default (2)"
string and edit the paragraph(s) before or after the table to add one clear
sentence like: "When no decorator is active and no trie prediction is available,
the runtime falls back to a built-in default sensitivity of 2 (see
`sensitivity_scale`)."
In `@packages/nvidia_nat_core/src/nat/data_models/profiler.py`:
- Around line 46-50: PredictionTrieConfig currently defines unused sensitivity
fields (auto_sensitivity, sensitivity_scale, w_critical, w_fanout, w_position)
that are never passed into the runtime builder; update ProfilerRunner to either
remove these fields or fully wire them into PredictionTrieBuilder: if removing,
delete the five fields and any references; if integrating, convert
PredictionTrieConfig -> SensitivityConfig (ensure sensitivity_scale maps to
SensitivityConfig.scale) and pass that SensitivityConfig instance into
PredictionTrieBuilder() where ProfilerRunner currently constructs it, add
Pydantic validators on sensitivity_scale (>=1) and non-negative weights, and add
Field(description=...) for each sensitivity field to match docs; also fix the
field-name mismatch between sensitivity_scale and scale.
In `@packages/nvidia_nat_core/src/nat/llm/dynamo_llm.py`:
- Around line 567-575: The code silently swallows exceptions when checking
prediction.latency_sensitivity and redundantly calls Context.get() again; update
the block to reuse the already-defined ctx (from earlier in the scope), avoid a
blind except: pass, and catch only expected errors (e.g., AttributeError or
RuntimeError) while logging the failure with full traceback using the module
logger (logger.exception or similar) instead of passing; ensure the logic still
only assigns latency_sensitivity = prediction.latency_sensitivity when
prediction.latency_sensitivity is not None and not
ctx.has_manual_latency_sensitivity.
In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py`:
- Around line 147-177: The method _compute_sensitivity_scores assigns cfg =
self._sensitivity_config which is typed Optional and causes pyright
reportOptionalMemberAccess; fix by narrowing or guarding before use: at the
start of _compute_sensitivity_scores check if self._sensitivity_config is None
and return early (or assert/raise) so cfg is non-None for the body, then use
cfg.w_critical / cfg.w_fanout / cfg.w_position safely; reference
self._sensitivity_config and cfg in the change so callers like add_trace remain
consistent.
- Around line 269-275: The _score_to_sensitivity method currently uses Python's
built-in round(), which implements banker's rounding and can map .5 midpoints to
unexpected integers; change the rounding to round-half-up semantics by replacing
round(mean_score * scale) with an explicit half-up computation (e.g., compute
mean_score * scale, add 0.5 and take floor) so that midpoint values always round
up; update the return expression in _score_to_sensitivity (referencing acc,
acc.compute_metrics().mean, and self._sensitivity_config.scale) to use this new
half-up logic and keep the clamping with max(1, min(scale, ...)).
---
Nitpick comments:
In `@docs/source/improve-workflows/profiler.md`:
- Line 74: The sentence containing "the toolkit will automatically apply" should
use the capitalized form "Toolkit" per the new style rule; update that exact
phrase to "the Toolkit will automatically apply" (and search the surrounding
body text in this document for other lowercase "toolkit" occurrences and
capitalize them) so all body text, titles, and headings follow the new
convention introduced in commit 37d9f337.
In `@packages/nvidia_nat_core/src/nat/builder/context.py`:
- Around line 392-399: The docstring for the property
has_manual_latency_sensitivity should include a Google-style "Returns:" section
like its sibling latency_sensitivity; update the docstring in the
has_manual_latency_sensitivity property to add a "Returns:" block that states it
returns a bool indicating whether any `@latency_sensitive` decorator is active in
the current scope, referencing the property name has_manual_latency_sensitivity
and keeping the existing brief description intact.
In `@packages/nvidia_nat_core/src/nat/data_models/profiler.py`:
- Around line 43-50: The new PredictionTrieConfig class has five undocumented
public fields (auto_sensitivity, sensitivity_scale, w_critical, w_fanout,
w_position) and duplicates default weight values from SensitivityConfig; update
the PredictionTrieConfig docstring to include a Google-style "Attributes:"
section describing each field (purpose, types, and default), and/or add
Field(description=...) to each Pydantic field; to avoid DRY, reference or import
the canonical defaults from SensitivityConfig (or a shared constant) instead of
hardcoding the same numeric defaults in PredictionTrieConfig so the defaults
remain single-source-of-truth.
In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py`:
- Around line 29-36: The SensitivityConfig dataclass docstring lacks a
Google-style Attributes: section documenting its public fields; update the
docstring for SensitivityConfig to include an "Attributes:" block that briefly
describes each field (scale, w_critical, w_fanout, w_position), their types and
their default/meaning so it conforms to the project Google-style docstring
requirement and documents the public API.
In `@packages/nvidia_nat_core/tests/nat/builder/test_context.py`:
- Around line 20-62: Extract the repeated setup into a pytest fixture named
fixture_reset_context that gets the ContextState singleton, resets
state._latency_sensitivity_stack to None, yields a fresh Context via
Context.get(), and then resets state._latency_sensitivity_stack to None again
for teardown; update each test
(test_has_manual_latency_sensitivity_false_by_default,
test_has_manual_latency_sensitivity_true_when_pushed,
test_has_manual_latency_sensitivity_false_after_pop,
test_has_manual_latency_sensitivity_nested) to accept the fixture_reset_context
argument and use the yielded Context instead of creating ContextState/Context
inline.
In
`@packages/nvidia_nat_core/tests/nat/profiler/prediction_trie/test_trie_builder.py`:
- Around line 144-162: Tests currently call the private method
_extract_llm_contexts on PredictionTrieBuilder; replace this with the public API
by feeding the trace through add_trace(simple_trace) and then calling build()
(or otherwise using the builder's public methods) to obtain the resulting
contexts/results and assert on call_duration_s/workflow_duration_s effects
(e.g., via the produced LLMCallContext-like objects or latency_sensitivity
values); alternatively, if the duration values must be directly asserted, add a
small public helper method (e.g., extract_llm_contexts or get_llm_contexts) on
PredictionTrieBuilder that wraps the private implementation, then update tests
to call that public method instead of _extract_llm_contexts to avoid coupling to
the private implementation.
Corrected the formatting of "Interarrival" in the profiler documentation to include backticks for consistency. This ensures uniform styling across the document. Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
docs/source/improve-workflows/profiler.md (1)
219-222: Inconsistent formatting:`Interarrival`uses backticks while sibling labels do not.
Interarrivalis a plain English term in this context, not a code identifier. Wrapping it in backticks is inconsistent with the surrounding items.✏️ Proposed fix
-- **`Interarrival` time**: Expected time in milliseconds until the next LLM call. +- **Interarrival time**: Expected time in milliseconds until the next LLM call.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/source/improve-workflows/profiler.md` around lines 219 - 222, The list item uses backticks around `Interarrival` which is inconsistent with the other plain English labels; update the bullet label from "`Interarrival` time" to "Interarrival time" (leave the surrounding items "Remaining calls", "Output tokens", and "Latency sensitivity" unchanged) so all sibling labels share the same plain-text formatting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/source/improve-workflows/profiler.md`:
- Line 217: The phrase uses an inanimate possessive ("workflow's call graph");
update both occurrences (the line referencing "workflow's call graph" and the
similar occurrence around line 230) to avoid "'s" with an inanimate noun by
rephrasing to e.g. "the call graph of the workflow" or "the workflow call
graph", and ensure the adjacent mentions of 'function path' and 'call index'
remain unchanged.
- Around line 280-284: Reword list item 3 to match the verb-first pattern of
items 1, 2, and 4: start with an imperative verb (e.g., "Use" or "Apply") and
then state the condition, referencing the same symbols as the other items; for
example, "Use the auto-computed latency_sensitivity from profiler predictions
for priority computation when no manual `@latency_sensitive` decorator is active."
Ensure the text mentions `latency_sensitivity` and `@latency_sensitive` and
keeps the concise style consistent with the Dynamo transport and
nvext.agent_hints context.
---
Duplicate comments:
In `@docs/source/improve-workflows/profiler.md`:
- Around line 242-247: The table's "Default (2)" magic number needs an explicit
explanation: state that the default effective sensitivity is 2 and that this
value comes from the system default (e.g., the DEFAULT_SENSITIVITY constant or
the default of the sensitivity_scale parameter), and update the surrounding
prose to mention where that default is defined/controlled (reference
sensitivity_scale and DEFAULT_SENSITIVITY) and the precedence rules with
`@latency_sensitive` and trie predictions so readers can see why rows like
`@latency_sensitive(...)`, `trie`, and `sensitivity_scale` interact as shown.
---
Nitpick comments:
In `@docs/source/improve-workflows/profiler.md`:
- Around line 219-222: The list item uses backticks around `Interarrival` which
is inconsistent with the other plain English labels; update the bullet label
from "`Interarrival` time" to "Interarrival time" (leave the surrounding items
"Remaining calls", "Output tokens", and "Latency sensitivity" unchanged) so all
sibling labels share the same plain-text formatting.
This update clarifies the purpose of the `scale` parameter, making its role in sensitivity scoring more explicit. Adjustments were made across relevant modules, including configuration, trie building, and tests, to ensure consistency. Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>
|
/ok to test 20b7c3b |
willkill07
left a comment
There was a problem hiding this comment.
Approving with one comment related to formula.
Modified the scaling formula to ensure proper rounding behavior and prevent potential rounding errors. This change provides a more accurate mapping of mean scores to sensitivity scales. Signed-off-by: dnandakumar-nv <dnandakumar@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py (1)
29-36: Consider adding__post_init__validation forSensitivityConfig.There are no guards against invalid inputs (e.g.,
sensitivity_scale <= 0or negative weights). Asensitivity_scaleof 0 would make_score_to_sensitivityalways return 1, and negative weights would invert signal contributions. A lightweight__post_init__would catch misconfigurations early.Proposed validation
`@dataclass` class SensitivityConfig: """Configuration for auto-sensitivity scoring.""" sensitivity_scale: int = 5 w_critical: float = 0.5 w_fanout: float = 0.3 w_position: float = 0.2 + + def __post_init__(self) -> None: + if self.sensitivity_scale < 1: + raise ValueError(f"sensitivity_scale must be >= 1, got {self.sensitivity_scale}") + if any(w < 0 for w in (self.w_critical, self.w_fanout, self.w_position)): + raise ValueError("Sensitivity weights must be non-negative")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py` around lines 29 - 36, Add a __post_init__ to SensitivityConfig that validates inputs: ensure sensitivity_scale is an int > 0, each weight (w_critical, w_fanout, w_position) is >= 0, and that the sum of the weights is > 0 (or normalize the weights to sum to 1 if you prefer consistent scoring). If any check fails, raise a ValueError with a clear message. Implement this in the SensitivityConfig dataclass so _score_to_sensitivity and other consumers get guaranteed valid parameters.packages/nvidia_nat_eval/src/nat/plugins/eval/profiler/profile_runner.py (1)
282-294: Inconsistent import path forSensitivityConfig.
PredictionTrieBuilderandsave_prediction_trieare imported from the public package (nat.profiler.prediction_trie), butSensitivityConfigis imported directly from the internal module (nat.profiler.prediction_trie.trie_builder). For consistency and to reduce coupling to internal module structure, consider re-exportingSensitivityConfigfrompackages/nvidia_nat_core/src/nat/profiler/prediction_trie/__init__.pyand importing it from the public package instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/nvidia_nat_eval/src/nat/plugins/eval/profiler/profile_runner.py` around lines 282 - 294, The imports are inconsistent: PredictionTrieBuilder and save_prediction_trie come from nat.profiler.prediction_trie but SensitivityConfig is pulled from the internal trie_builder module; re-export SensitivityConfig from the public package (add SensitivityConfig to nat.profiler.prediction_trie.__init__.py exports) and then update the import in profile_runner.py to import SensitivityConfig from nat.profiler.prediction_trie alongside PredictionTrieBuilder and save_prediction_trie so all three symbols come from the public API.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py`:
- Around line 130-132: The calculation of call_duration_s can become negative
when end_step.span_event_timestamp > end_step.event_timestamp; clamp the
duration to zero by replacing the raw subtraction with a non-negative value
(e.g., call_duration_s = max(0.0, end_step.event_timestamp -
end_step.span_event_timestamp if span exists else 0.0)) and ensure subsequent
uses (like computing critical_path_weight and the sensitivity score) use this
clamped call_duration_s so negative weights are never propagated.
---
Duplicate comments:
In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py`:
- Around line 147-177: The variable cfg is assigned from
self._sensitivity_config but still typed as SensitivityConfig | None, so narrow
it immediately in _compute_sensitivity_scores: after assigning cfg =
self._sensitivity_config, add a guard that returns (or raises) if cfg is None so
subsequent accesses like cfg.w_critical, cfg.w_fanout, and cfg.w_position are
safe; this removes the optional-member access warning and preserves existing
behavior of sensitivity_score assignment on ctx.sensitivity_score.
- Around line 269-275: The _score_to_sensitivity function still uses Python's
round() which implements banker's rounding; replace round(mean_score * scale)
with an explicit half-up rounding implementation (e.g., compute scaled =
mean_score * scale and use math.floor(scaled + 0.5) or equivalent) so .5 always
rounds up, then clamp with max(1, min(scale, ...)); update references in
_score_to_sensitivity (and any helper used) and import math if needed.
---
Nitpick comments:
In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py`:
- Around line 29-36: Add a __post_init__ to SensitivityConfig that validates
inputs: ensure sensitivity_scale is an int > 0, each weight (w_critical,
w_fanout, w_position) is >= 0, and that the sum of the weights is > 0 (or
normalize the weights to sum to 1 if you prefer consistent scoring). If any
check fails, raise a ValueError with a clear message. Implement this in the
SensitivityConfig dataclass so _score_to_sensitivity and other consumers get
guaranteed valid parameters.
In `@packages/nvidia_nat_eval/src/nat/plugins/eval/profiler/profile_runner.py`:
- Around line 282-294: The imports are inconsistent: PredictionTrieBuilder and
save_prediction_trie come from nat.profiler.prediction_trie but
SensitivityConfig is pulled from the internal trie_builder module; re-export
SensitivityConfig from the public package (add SensitivityConfig to
nat.profiler.prediction_trie.__init__.py exports) and then update the import in
profile_runner.py to import SensitivityConfig from nat.profiler.prediction_trie
alongside PredictionTrieBuilder and save_prediction_trie so all three symbols
come from the public API.
|
/ok to test e946159 |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py`:
- Around line 29-36: Add a Google-style docstring to the SensitivityConfig
dataclass that documents the Attributes: sensitivity_scale, w_critical,
w_fanout, and w_position; then implement a __post_init__ method on
SensitivityConfig that (1) imports math (alongside existing imports) and
validates sensitivity_scale is an int >= 1, and (2) validates the three weight
fields form a valid probability distribution (e.g., sum to 1.0 within a small
epsilon like math.isclose) and raise a ValueError with a clear message if checks
fail so callers are alerted rather than silently producing skewed sensitivity
buckets.
---
Duplicate comments:
In `@packages/nvidia_nat_core/src/nat/profiler/prediction_trie/trie_builder.py`:
- Around line 147-154: The method _compute_sensitivity_scores currently reads
self._sensitivity_config into cfg without narrowing, causing optional-member
access when later referencing cfg.w_critical, cfg.w_fanout, and cfg.w_position;
fix by adding an early guard at the top of _compute_sensitivity_scores that
returns immediately if self._sensitivity_config is None (or assign cfg =
self._sensitivity_config after that guard), ensuring subsequent uses of cfg are
on a non-None SensitivityConfig instance.
- Around line 130-132: The subtraction that computes call_duration_s in
trie_builder.py can produce negative values due to clock skew
(span_event_timestamp vs end_step.event_timestamp); update the calculation in
the TrieBuilder code to clamp negative durations to zero (e.g., replace direct
subtraction with a max(0.0, end_step.event_timestamp - span_start) pattern) so
call_duration_s cannot be negative and thus cannot propagate negative
critical_path_weight into the composite sensitivity score; adjust any downstream
assumptions in functions that reference call_duration_s or critical_path_weight
accordingly.
- Around line 269-275: The _score_to_sensitivity function uses Python's round()
which implements bankers rounding, causing midpoint mean_score values to map to
the wrong sensitivity bucket; change the rounding to explicit round-half-up
semantics when transforming mean_score * (scale - 1) (the variable computed from
sensitivity_scale and mean_score) — e.g., compute the product into a float
(e.g., prod = mean_score * (scale - 1)) and replace round(prod) with a
deterministic half-up operation (such as math.floor(prod + 0.5) or using Decimal
with ROUND_HALF_UP) before adding 1 and clamping between 1 and scale, keeping
the rest of the logic in _score_to_sensitivity unchanged.
|
/merge |
Summary
Extends the prediction trie to automatically compute and assign latency sensitivity scores to each LLM call position in an agentic workflow. During profiling, the system analyzes execution traces to determine which calls are most latency-critical and stores the computed sensitivity alongside existing predictions. At runtime,
_DynamoTransportinjects the auto-computed sensitivity for calls that lack a manual@latency_sensitivedecorator — zero developer annotation required.Motivation
Today, latency sensitivity is manually assigned via
@latency_sensitive(n)decorators. This requires developers to reason about which parts of their workflow matter most for latency — something the profiler already has the data to determine automatically. The prediction trie already stores per-call-position metrics (remaining_calls,interarrival_ms,output_tokens) and feeds them into Dynamo routing hints. Adding latency sensitivity to this pipeline closes the loop: profile once, get intelligent routing automatically.Approach
Three signals are computed per call position across all profiled traces, normalized to
[0, 1], and combined with configurable weights:w_critical=0.5):call_duration / workflow_duration— calls that consume a larger fraction of total wall-clock time score higher.w_fanout=0.3): Remaining LLM calls after this one, normalized by max. A planning call that gates many downstream calls scores higher.w_position=0.2): U-shaped weighting that boosts the first call (time-to-first-activity) and last call (time-to-final-answer).The composite score is mapped to an integer in
[1, sensitivity_scale]and stored inLLMCallPrediction.latency_sensitivity. At runtime, the auto value is used only when no manual@latency_sensitivedecorator is active — manual always wins.Changes
Data model (
nvidia_nat_core)latency_sensitivity: int | Nonefield toLLMCallPredictionNonedefault)Profiler config (
nvidia_nat_core)auto_sensitivity,sensitivity_scale,w_critical,w_fanout,w_positionfields toPredictionTrieConfigTrie builder (
nvidia_nat_core)SensitivityConfigdataclass and_compute_sensitivity_scores()methodLLMCallContextwithcall_duration_s,workflow_duration_s,sensitivity_score_NodeAccumulatorsfor per-index and aggregated tracking_score_to_sensitivity()to convert accumulated scores to clamped integersRuntime injection (
nvidia_nat_core)has_manual_latency_sensitivityproperty toContext(checks if decorator stack has been pushed)_DynamoTransport.handle_async_requestreadsprediction.latency_sensitivityand applies it when no manual override is activeBug fixes
nvidia_nat_langchaintest collection failure: updated 6 files importing from deprecatednat.eval.evaluator.evaluator_modelto usenat.data_models.evaluatornvidia_nat_coretest collection failure: renamedtest_profiler.py→test_profiler_config.pyto avoid basename collision withnvidia_nat_eval'stest_profiler.pyDocumentation
docs/source/improve-workflows/profiler.mdcovering trie contents, auto sensitivity scoring, override behavior, configuration, and runtime usageOverride Behavior
triepredictiontriesays 4@latency_sensitive(5),triesays 3@latency_sensitive(1),triesays 4Test Plan
LLMCallPredictionfield tests (defaultNone, serialization round-trip, integer storage)PredictionTrieConfigfield tests (defaults, custom weights,auto_sensitivitytoggle)Context.has_manual_latency_sensitivitytests (defaultFalse,Trueafter push,Falseafter pop, nested)call_duration_sfrom span timestamps,workflow_duration_sfrom trace span)latency_sensitivityloads asNone)DynamoTransportauto injection test (applies auto value, skips when manual override active, skips whenNone)By Submitting this PR I confirm:
Summary by CodeRabbit
New Features
Documentation
Tests