Nautilus FFI Memory Contract
Low-medium relevance to Cortana. The FFI page documents the C-ABI contract between Rust core and the Cython/PyO3 layer that wraps it - rules for
CVec,PyCapsule,Box-backed*_APIwrappers, and theabort_on_panicenvelope around everyextern "C"symbol. End-user Cortana strategy code never crosses this boundary directly: everyStrategy.on_quote_tickcallback, everycache.position(...)lookup, everysubmit_order(...)call is a Python method that the framework has already wrapped in PyO3 plumbing. The FFI contract matters to us only as cost-of-call awareness - each Python ↔ Rust hop is a real transition, types likePrice/Quantity/Moneyare precision-aware Rust objects with PyO3-bound accessors, and naive code (e.g., calling.as_double()on the samePrice78 times inside a tight scoring loop) burns measurable cycles per crossing. The page is also where the framework’s “panics never unwind acrossextern "C"” rule is codified, which explains why the v1 release wheels runpanic = abortand why a Rust-side invariant violation kills the process rather than raising a Python exception. Read this page if Cortana ever authors a Rust crate (UW adapter, custom indicator). Otherwise the framework hides the FFI; the relevant takeaway is “minimize hot- loop crossings, cache.as_double()results once.”
Why this page exists
nautilus-rust.md answers “do we have to write Rust?” (no).
nautilus-architecture.md answers “how is Python wired to Rust?” (PyO3
- static link via Cython). This page covers what the official
FFI Memory Contractdoc actually says - the rules a contributor must follow when adding FFI symbols and, more importantly for Cortana, the cost model implied by those rules so we can reason about hot-path performance without authoring Rust ourselves.
The five FFI rules (verbatim mechanics)
The official page documents exactly five mechanics. Each one is a hard rule for any contributor; the cost implications fall out of them.
1. Fail-fast panics at the FFI boundary
“Rust panics must never unwind across
extern "C"functions. Unwinding into C or Python is undefined behaviour and can corrupt the foreign stack or leave partially-dropped resources behind. To enforce the fail-fast architecture we wrap every exported symbol incrate::ffi::abort_on_panic, which executes the body and callsprocess::abort()if a panic occurs.”
Mechanics: every exported symbol looks like
#[unsafe(no_mangle)]
pub extern "C" fn some_ffi_fn(args) -> ReturnType {
abort_on_panic(|| {
// body
})
}The panic message is logged before the abort, so debug output is
preserved. This is the framework-level reason panic = abort is the
recommended release config: a panic anywhere in the Rust core takes
the process down cleanly so the orchestrator (launchd / systemd /
docker) can restart from the last persisted state.
2. CVec lifecycle (the Vec<T> exchange protocol)
CVec is the canonical container for variable-length data crossing the
boundary - a repr(C) thin wrapper around Vec<T> passed by value.
| Step | Owner | Action |
|---|---|---|
| 1 | Rust | Build Vec<T>, convert with .into() - leaks the vec, transfers raw allocation to foreign code. |
| 2 | Foreign (Cython / PyO3 / C) | Read while the CVec is in scope. Do not modify ptr, len, cap. |
| 3 | Foreign | Exactly once, call the type-specific drop helper (e.g. vec_drop_book_levels, vec_drop_book_orders, vec_time_event_handlers_drop). |
The drop helper reconstructs the original Vec<T> with
Vec::from_raw_parts and lets it drop normally.
“If step 3 is forgotten the allocation is leaked for the remainder of the process; if it is performed twice the program will double-free and likely crash.”
The framework explicitly removed the old generic cvec_drop because
it always treated the buffer as Vec<u8> - calling it on any other
element type produces a size-mismatch and corrupts the allocator’s
bookkeeping. Use the type-specific helper. If none exists, add
one in crates/core/src/ffi/cvec.rs.
3. Capsules created on the Python side
Some Cython helpers allocate buffers with PyMem_Malloc, wrap them
into a CVec, and return the address inside a PyCapsule. Every such
capsule is created with a destructor (capsule_destructor or
capsule_destructor_deltas) that frees both the buffer and the
CVec. The Python caller therefore must NOT free the memory manually
- the destructor handles it on collection. Manual free → double-free crash.
4. Capsules created on the Rust side (PyO3 bindings)
When Rust pushes a heap-allocated value into Python, it MUST use
PyCapsule::new_with_destructor:
use pyo3::types::PyCapsule;
Python::attach(|py| {
let my_data = Box::new(MyStruct::new());
let ptr = Box::into_raw(my_data);
let capsule = PyCapsule::new_with_destructor(
py,
ptr,
None,
|ptr, _| {
// Reconstruct the Box and let it drop, freeing the alloc
let _ = unsafe { Box::from_raw(ptr) };
},
).expect("capsule creation failed");
// ... pass `capsule` back to Python ...
});“Do not use
PyCapsule::new(…, None); that variant registers no destructor and will leak memory unless the recipient manually extracts and frees the pointer (something we never rely on).”
The codebase has been audited so every Rust→Python capsule has a destructor. New FFI modules must follow the same pattern.
5. Box-backed *_API wrappers (owned Rust objects)
For complex objects (OrderBook, SyntheticInstrument,
TimeEventAccumulator), Rust allocates the value with Box::new and
returns a small repr(C) wrapper whose only field is the Box:
#[repr(C)]
pub struct OrderBook_API(Box<OrderBook>);
#[unsafe(no_mangle)]
pub extern "C" fn orderbook_new(id: InstrumentId, book_type: BookType)
-> OrderBook_API
{
OrderBook_API(Box::new(OrderBook::new(id, book_type)))
}
#[unsafe(no_mangle)]
pub extern "C" fn orderbook_drop(book: OrderBook_API) {
drop(book); // frees the heap allocation
}Mandatory rules:
- Every
*_newconstructor must have a matching*_drop. - Validate parameters before heap allocation (fail fast on bad input).
- The Python/Cython binding must guarantee
*_dropruns exactly once.
Two acceptable patterns for guaranteeing single-drop:
- Preferred (new code): wrap the pointer in a
PyCapsulewith destructor (rule 4). - Legacy (v1 Cython only): call the helper explicitly in
__del__/__dealloc__:
cdef class OrderBook:
cdef OrderBook_API _mem
def __cinit__(self, ...):
self._mem = orderbook_new(...)
def __del__(self):
if self._mem._0 != NULL:
orderbook_drop(self._mem)Forgetting drop → leak. Calling twice → crash.
What types cross the boundary cleanly (and why)
The page itself doesn’t enumerate this, but the conventions documented
across nautilus-value-types.md, nautilus-architecture.md, and the
FFI page imply a clear hierarchy of “cheap” vs “expensive” crossings:
Cheap (small, repr(C), value-typed):
- Primitives:
u64,i64,f64,bool- pass-by-value, no allocation. UnixNanos, fixed-precision integer scalars - same.repr(C)enums -OrderSide,OrderType,BookType, etc.- Identifier strings backed by interned IDs (
InstrumentId,StrategyId) - equality is a fixed-size hash compare.
Cheap-ish (Box-backed *_API wrappers, single allocation, single
crossing):
Price,Quantity,Money- fixed-precision scalars but with a precision tag;.as_double()is the conversion to plainf64.Order,Position,Account- when handed by reference.
Expensive (heap allocation per crossing, possibly per-call):
String↔ Pythonstr- UTF-8 validation + new alloc both directions.- Arbitrary Python
dict/list/tupleconstructed Rust-side - every element crosses the GIL-locked allocator. - Custom Python classes accessed from Rust by attribute lookup - each
getattris a full Python interpreter round-trip. CVecof complex elements - fine for one bulk hand-off, painful if done every tick.
Expensive and easy to do accidentally:
- Calling
.as_double()on the samePricerepeatedly inside a hot loop - every call is an FFI crossing. - Constructing a Python
dictorlistper tick to pass to a custom data type instead of using@customdataclass. - Reading attributes off Python user objects from Rust callbacks via
getattrinstead of through a typed PyO3 binding.
GIL implications
The PyO3 model: the GIL is held whenever Python code runs. The Nautilus pattern - single-threaded kernel, all callbacks dispatched on that thread - means strategy callbacks already run with the GIL held. The implications:
- Inside
on_quote_tick,on_bar,on_data, the strategy holds the GIL. Heavy NumPy / pandas / ML inference work blocks the kernel. If a callback takes 50 ms, that’s 50 ms of no other dispatch. - Background services (Tokio I/O, DataFusion queries) do NOT hold the GIL. They run on separate threads and re-enter the kernel via MPSC channels; the kernel callback then re-acquires the GIL when the Python handler runs.
Python::attach(|py| { ... })is the PyO3 incantation for any Rust code that wants to touch Python. It’s effectivelywith gil:from the Rust side. Any Cortana Rust extension would need this to publish events into the Python message bus.
The takeaway for Cortana: don’t put long synchronous compute inside a strategy callback. Either (a) keep the callback fast and decision- focused, or (b) push the heavy lift to a background actor that emits a result event the strategy subscribes to.
Async-callback-from-Rust patterns
The framework’s pattern, paraphrased from nautilus-architecture.md +
the FFI doc:
- Rust adapter receives WebSocket frame on a Tokio task (no GIL).
- Tokio task parses the frame and constructs a domain event
(
QuoteTick, customData). - Tokio task sends
DataEvent::Data(...)through an MPSC channel to the kernel thread. - Kernel thread receives event, writes to
Cache, publishes toMessageBustopic. - MessageBus dispatch invokes Python handler - at this point the
thread acquires the GIL via
Python::attach, calls the bound Python method, releases the GIL on return.
For Cortana, this means UW WebSocket events ingested by a Rust adapter
would arrive at on_data(UWFlowAlert) already parsed and validated -
no GIL contention with the WebSocket reader, no risk of the strategy
callback blocking the network read.
Cortana MK3 implications
The FFI cost model rarely bites Cortana directly because Cortana writes Python, the framework already pays the FFI cost on our behalf, and IBKR + UW + scoring run at human timescales (signal cadence: seconds, not nanoseconds).
That said, three concrete behaviors should be baked into the M1 strategy implementation to avoid accidentally pessimizing the hot path:
-
Cache
.as_double()results inside callbacks. A scoring feature that references mid-price 30 times shouldn’t callquote.bid_price.as_double()30 times; pull it once into a localf64at the top of the handler. Same forquote.ask_price, position sizes, premium values. (nautilus-value-types.mdmakes the lossy-but-cheap.as_double()API explicit; the FFI page explains why it’s not free - every call is an FFI crossing.) -
Use
@customdataclassforScoreUpdate/MetaProb/Regime, not raw Pythondicts. The@customdataclassdecorator (pernautilus-custom-data.md) generates the PyO3 boilerplate so the custom event passes through the message bus the same way nativeQuoteTickdoes - single allocation, typed fields, no per-key getattr costs. A naive “publish a dict” approach would cross the FFI boundary once per field per dispatch. -
Never iterate the entire 78-feature scoring vector inside a Rust callback. If we ever author a Rust scoring actor (post-spike, maybe), the right shape is “Rust receives raw quote, computes the 8 meta-selected features in Rust, publishes a typed
ScoreUpdatewith those 8 floats.” The Python strategy then readsScoreUpdatefields cheaply because the type isrepr(C)-friendly. The wrong shape is “Rust populates a Python dict with 78 entries every tick” - that’s 78 FFI crossings + 78 dict insertions per dispatch.
The single most expensive accidental pattern Cortana could hit:
calling .as_double() (or any other PyO3 accessor) repeatedly on the
same value object inside a tight loop or per-tick handler. Each call
crosses the FFI boundary. Mitigation: at the top of every strategy
handler, pull every value into a local float/int/bool once,
then operate on locals. This pattern is cheap to enforce in code
review and removes nearly all accidental FFI cost.
When (if ever) Cortana would author Rust to skip FFI cost:
- UW WebSocket parsing under flow bursts. If UW emits 1000+
events/sec during a real flow burst and Python parsing falls behind,
a Rust adapter under
crates/adapters/unusual_whales/is the documented pattern. The Rust side owns parsing, rate-limit, retry, and emits typedUWFlowAlertevents to the bus. Python strategy subscribes; FFI cost is one crossing per event, not one per JSON field. - A custom microstructure indicator that needs to inspect every
NBBO tick on a deep chain. If profiling shows the per-tick Python
callback eating real time, the indicator moves to Rust under
crates/indicators/. Defer until measured.
Neither case applies to the M1 scoring strategy. Stick to Python.
Spike DX premise verdict
nautilus-rust.md confirmed “Python-first; Rust core fades into the
background.” This page reinforces the same conclusion from the FFI
angle: the contract is documented, the framework already follows it
internally, and Cortana doesn’t have to read or implement any of it.
The discipline that does matter is the cheap one - don’t churn
typed-value accessors in tight loops, prefer @customdataclass over
ad-hoc dicts. Both are caught by code review, neither requires Rust
fluency.
See Also
- Nautilus Rust - when (if ever) we author Rust.
- Nautilus Architecture - Rust core + Python bindings via PyO3, single-threaded kernel topology.
- Nautilus Dev Rust - Rust contributor guide (parallel batch).
- Nautilus Dev Python - Python contributor guide (parallel batch).
- Nautilus Dev Coding Standards
- PyO3 stub annotations and 60/100 commit-format rules referenced here.
- Nautilus Value Types -
.as_double(),.as_decimal(), precision discipline; cheap-vs-expensive crossings. - Nautilus Custom Data -
@customdataclassforScoreUpdate/MetaProb/Regime. - 2026-05-09 Nautilus Spike Plan:
~/conductor/workspaces/cortanaroi-mk2/belo-horizonte/plans/2026-05-09-nautilus-spike.md
Timeline
- 2026-05-07 | Cody - Filed during pre-spike concept mastery sweep batch 7 (developer guide).