[Feat] Implement per-user LLM rate limiting and update documentation by Jayanaka-98 · Pull Request #5604 · jaseci-labs/jaseci

Jayanaka-98 · 2026-04-17T17:00:21Z

Per-User LLM Rate Limiting and Budget Enforcement

Adds configurable per-user rate limits and spend caps on all by llm() calls in jac-scale. When a user exceeds any configured limit, the call is blocked before it reaches the LLM provider and an exception is raised.

Motivation

Without per-user limits, a single user can exhaust the entire LLM budget of a deployment, either accidentally (runaway loops) or deliberately. This PR makes it possible to set hard guardrails per user at the platform level, without requiring any changes to application code.

What changed

context.jac: Added username field to JScaleExecutionContext. This is deliberately scoped to jac-scale (not the jaclang base ExecutionContext) since username is a deployment concern, not a language runtime concern.

jfast_api.impl.jac: In request_context_middleware, the authenticated username from request.state is written into the execution context after JWT validation. Uses hasattr duck-typing to avoid a cross-module import.

llm_telemetry.jac / llm_telemetry.impl.jac: New JacUserRateLimiter class (pure logic, no litellm dependency) with:

check_pre_call(username): checks all configured limits, raises on violation
record_success(username, tokens, cost_usd, model): increments Redis counters and writes a MongoDB usage record

A thin _LiteLLMRateLimitAdapter(CustomLogger) wires the limiter into LiteLLM's native callback system (log_pre_api_call for blocking, log_success_event for tracking). The adapter reads the username from the Jac execution context at call time.

Budget persistence: RPM/RPD/TPM/TPD use Redis counters with short TTLs (lost on restart is fine). Daily and monthly budgets use MongoDB as the source of truth with a 60-second Redis cache. On cache miss (e.g. after a Redis restart), the total is rebuilt from a MongoDB aggregation, so budget limits survive restarts.

config_loader.jac / config_loader.impl.jac: New [plugins.scale.llm_limits] section with enabled, rpm, rpd, tpm, tpd, daily_budget_usd, monthly_budget_usd. All fields default to null (disabled).

test_llm_rate_limiting.jac: 14 tests across three tiers:

Unit tests (mock Redis/MongoDB): each limit type blocks correctly, no-Redis is a no-op, budget cache hit/miss behavior, MongoDB write shape
Integration tests (real Redis via testcontainers): RPM enforced + per-user isolation, TPM accumulation, RPD day-rollover reset
Persistence tests (real Redis + MongoDB): daily budget rebuilds from MongoDB after Redis flush, monthly budget spans multiple days correctly

docs/reference/plugins/jac-scale.md: New "Per-User LLM Rate Limiting" section covering configuration, how it works, unauthenticated request behavior, MongoDB usage record schema, and a minimal daily-budget-only example.

Configuration

[plugins.scale.llm_limits]
enabled = true
rpm  = 60
rpd  = 1000
tpm  = 100000
tpd  = 500000
daily_budget_usd   = 5.00
monthly_budget_usd = 50.00

All fields are optional. Omit any to leave that dimension unlimited. If enabled = false or the section is absent, no limiting occurs at all.

Design notes

No application code changes required. Limits are enforced at the infrastructure layer via LiteLLM's CustomLogger callback, the same mechanism used by the existing JacLLMLogger telemetry.
Unauthenticated requests are not blocked. If no username is present in the execution context, the limiter is skipped. Gate with :priv walkers if you need enforcement on all traffic.
Token limits are best-effort pre-call. log_pre_api_call fires before the LLM call; token counts come from the post-call event. The check reads the accumulated counter and blocks if already at or above the limit. The first call that crosses the threshold goes through.
Streaming coverage. log_pre_api_call fires for both streaming and non-streaming calls routed through litellm.completion. Calls made via the OpenAI SDK directly (bypassing litellm) are not intercepted.

Jayanaka-98 and others added 10 commits April 17, 2026 11:50

rate limiting implementation

13606fc

testing

5e4f047

documentation

6c2a7cd

release notes

ef4563e

Merge branch 'main' into rate_limmiting

36ebf66

rewrite

1e1012f

Merge branch 'main' into rate_limmiting

3b900ba

lintfix

26940d7

Merge branch 'main' into rate_limmiting

0d8028e

format

6f925d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] Implement per-user LLM rate limiting and update documentation#5604

[Feat] Implement per-user LLM rate limiting and update documentation#5604
Jayanaka-98 wants to merge 10 commits intojaseci-labs:mainfrom
Jayanaka-98:rate_limmiting

Jayanaka-98 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Jayanaka-98 commented Apr 17, 2026

Per-User LLM Rate Limiting and Budget Enforcement

Motivation

What changed

Configuration

Design notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant