pgrx_pg_sys/submodules/
ffi.rs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
//LICENSE Portions Copyright 2019-2021 ZomboDB, LLC.
//LICENSE
//LICENSE Portions Copyright 2021-2023 Technology Concepts & Design, Inc.
//LICENSE
//LICENSE Portions Copyright 2023-2023 PgCentral Foundation, Inc. <contact@pgcentral.org>
//LICENSE
//LICENSE All rights reserved.
//LICENSE
//LICENSE Use of this source code is governed by the MIT license that can be found in the LICENSE file.
#![deny(unsafe_op_in_unsafe_fn)]

#[cfg(not(all(
    any(target_os = "linux", target_os = "macos"),
    any(target_arch = "x86_64", target_arch = "aarch64")
)))]
mod cee_scape {
    #[cfg(not(feature = "cshim"))]
    compile_error!("target platform cannot work without feature cshim");

    use libc::{c_int, c_void};
    use std::marker::PhantomData;

    #[repr(C)]
    pub struct SigJmpBufFields {
        _internal: [u8; 0],
        _neither_send_nor_sync: PhantomData<*const u8>,
    }

    pub fn call_with_sigsetjmp<F>(savemask: bool, mut callback: F) -> c_int
    where
        F: for<'a> FnOnce(&'a SigJmpBufFields) -> c_int,
    {
        extern "C" {
            fn call_closure_with_sigsetjmp(
                savemask: c_int,
                closure_env_ptr: *mut c_void,
                closure_code: extern "C" fn(
                    jbuf: *const SigJmpBufFields,
                    env_ptr: *mut c_void,
                ) -> c_int,
            ) -> c_int;
        }

        extern "C" fn call_from_c_to_rust<F>(
            jbuf: *const SigJmpBufFields,
            closure_env_ptr: *mut c_void,
        ) -> c_int
        where
            F: for<'a> FnOnce(&'a SigJmpBufFields) -> c_int,
        {
            let closure_env_ptr: *mut F = closure_env_ptr as *mut F;
            unsafe { (closure_env_ptr.read())(&*jbuf) }
        }

        let savemask: libc::c_int = if savemask { 1 } else { 0 };

        unsafe {
            let closure_env_ptr = core::ptr::addr_of_mut!(callback);
            core::mem::forget(callback);
            call_closure_with_sigsetjmp(
                savemask,
                closure_env_ptr as *mut libc::c_void,
                call_from_c_to_rust::<F>,
            )
        }
    }
}

use cee_scape::{call_with_sigsetjmp, SigJmpBufFields};

/**
Given a closure that is assumed to be a wrapped Postgres `extern "C"` function, [pg_guard_ffi_boundary]
works with the Postgres and C runtimes to create a "barrier" that allows Rust to catch Postgres errors
(`elog(ERROR)`) while running the supplied closure. This is done for the sake of allowing Rust to run
destructors before Postgres destroys the memory contexts that Rust-in-Postgres code may be enmeshed in.

Wrapping the FFI into Postgres enables
- memory safety
- improving error logging
- minimizing resource leaks

But only the first of these is considered paramount.

At all times PGRX reserves the right to choose an implementation that achieves memory safety.
Currently, this function is used to protect **every** bindgen-generated Postgres `extern "C"` function.

Generally, the only time *you'll* need to use this function is when calling a Postgres-provided
function pointer.

# Safety

It is undefined behavior if the function passed to `pg_guard_ffi_boundary` have objects with
destructors on the stack when postgres raises an `ERROR`. For example, the following is
both a resource leak, and undefined behavior (as it needs to be a [trivially-deallocated
stack frame]):

```rust,ignore
// This is UB!
pgrx::pg_sys::ffi::pg_guard_ffi_boundary(|| {
    let data = vec![1, 2, 3, 4, 5];
    // call FFI function that raises an ERROR
});
```
Instead, you should write it like
```rust,ignore
let data = vec![1, 2, 3, 4, 5];
pgrx::pg_sys::ffi::pg_guard_ffi_boundary(|| {
    // call FFI function that raises an ERROR
});
```

Further, it is undefined behavior if the function passed into `pg_guard_ffi_boundary` panics. It
is recommended that you keep the body of the `pg_guard_ffi_boundary` closure very small -- ideally
*only* containing a call to some C function, rather than containing any logic or variables of its
own.

Furthermore, Postgres is a single-threaded runtime.  As such, [`pg_guard_ffi_boundary`] should
**only** be called from the main thread.  In fact, [`pg_guard_ffi_boundary`] will detect this
and immediately panic.

More generally, Rust cannot guarantee destructors are always run, PGRX is written in Rust code, and
the implementation of `pg_guard_ffi_boundary` relies on help from Postgres, the OS, and the C runtime;
thus, relying on the FFI boundary catching an error and propagating it back into Rust to guarantee
Rust's language-level memory safety when calling Postgres is unsound (i.e. there are no promises).
Postgres can and does opt to erase exception and error context stacks in some situations.
The C runtime is beholden to the operating system, which may do as it likes with a thread.
PGRX has many magical powers, some of considerable size, but they are not infinite cosmic power.

Thus, if Postgres gives you a pointer into the database's memory, and you corrupt that memory
in a way technically permitted by Rust, intending to fix it before Postgres or Rust notices,
then you may not call Postgres and expect Postgres to not notice the code crimes in progress.
Postgres and Rust will see you. Whether they choose to ignore such misbehavior is up to them, not PGRX.
If you are manipulating transient "pure Rust" data, however, it is unlikely this is of consequence.

# Implementation Note

The main implementation uses `sigsetjmp`, [`pg_sys::error_context_stack`], and [`pg_sys::PG_exception_stack`].
which, when Postgres enters its exception handling in `elog.c`, will prompt a `siglongjmp` back to it.

This caught error is then converted into a Rust `panic!()` and propagated up the stack, ultimately
being converted into a transaction-aborting Postgres `ERROR` by PGRX.

[trivially-deallocated stack frame]: https://github.com/rust-lang/rfcs/blob/master/text/2945-c-unwind-abi.md#plain-old-frames
**/
use crate as pg_sys;
use crate::panic::{CaughtError, ErrorReport, ErrorReportLocation, ErrorReportWithLevel};
use core::ffi::CStr;
use std::mem::MaybeUninit;

#[inline(always)]
#[track_caller]
pub unsafe fn pg_guard_ffi_boundary<T, F: FnOnce() -> T>(f: F) -> T {
    // SAFETY: Caller promises not to call us from anything but the main thread.
    unsafe { pg_guard_ffi_boundary_impl(f) }
}

#[inline(always)]
#[track_caller]
unsafe fn pg_guard_ffi_boundary_impl<T, F: FnOnce() -> T>(f: F) -> T {
    //! This is the version that uses sigsetjmp and all that, for "normal" Rust/PGRX interfaces.

    // The next code is definitely thread-unsafe (it manipulates statics in an
    // unsynchronized manner), so we may as well check here.
    super::thread_check::check_active_thread();

    // SAFETY: This should really, really not be done in a multithreaded context as it
    // accesses multiple `static mut`. The ultimate caller asserts this is the main thread.
    unsafe {
        let caller_memxct = pg_sys::CurrentMemoryContext;
        let prev_exception_stack = pg_sys::PG_exception_stack;
        let prev_error_context_stack = pg_sys::error_context_stack;
        let mut result: std::mem::MaybeUninit<T> = MaybeUninit::uninit();
        let jump_value = call_with_sigsetjmp(false, |jump_buffer| {
            // Make Postgres' error-handling system aware of our new
            // setjmp/longjmp restore point.
            pg_sys::PG_exception_stack = std::mem::transmute(jump_buffer as *const SigJmpBufFields);

            // execute the closure, which will be a wrapped internal Postgres function
            result.write(f());
            0
        });

        if jump_value == 0 {
            // Flag is 0, so we've taken the successful return path. We're not
            // here as the result of a longjmp.
            // Restore Postgres' understanding of where its next longjmp should go
            pg_sys::PG_exception_stack = prev_exception_stack;
            pg_sys::error_context_stack = prev_error_context_stack;

            result.assume_init()
        } else {
            // We've landed here b/c of a longjmp originating in Postgres

            // the overhead to get the current [ErrorData] from Postgres and convert
            // it into our [ErrorReportWithLevel] seems worth the user benefit
            //
            // Note that this only happens in the case of us trapping an error

            // At this point, we're running within `pg_sys::ErrorContext`, but should be in the
            // memory context the caller was in before we call [CopyErrorData()] and start using it
            pg_sys::CurrentMemoryContext = caller_memxct;

            // SAFETY: `pg_sys::CopyErrorData()` will always give us a valid pointer, so just assume so
            let errdata_ptr = pg_sys::CopyErrorData();
            let errdata = errdata_ptr.as_ref().unwrap_unchecked();

            // copy out the fields we need to support pgrx' error handling
            let level = errdata.elevel.into();
            let sqlerrcode = errdata.sqlerrcode.into();
            let message = errdata
                .message
                .is_null()
                .then(|| String::from("<null error message>"))
                .unwrap_or_else(|| CStr::from_ptr(errdata.message).to_string_lossy().to_string());
            let detail = errdata.detail.is_null().then_some(None).unwrap_or_else(|| {
                Some(CStr::from_ptr(errdata.detail).to_string_lossy().to_string())
            });
            let hint = errdata.hint.is_null().then_some(None).unwrap_or_else(|| {
                Some(CStr::from_ptr(errdata.hint).to_string_lossy().to_string())
            });
            let funcname = errdata.funcname.is_null().then_some(None).unwrap_or_else(|| {
                Some(CStr::from_ptr(errdata.funcname).to_string_lossy().to_string())
            });
            let file =
                errdata.filename.is_null().then(|| String::from("<null filename>")).unwrap_or_else(
                    || CStr::from_ptr(errdata.filename).to_string_lossy().to_string(),
                );
            let line = errdata.lineno as _;

            // clean up after ourselves by freeing the result of [CopyErrorData] and restoring
            // Postgres' understanding of where its next longjmp should go
            pg_sys::FreeErrorData(errdata_ptr);
            pg_sys::PG_exception_stack = prev_exception_stack;
            pg_sys::error_context_stack = prev_error_context_stack;

            // finally, turn this Postgres error into a Rust panic so that we can ensure proper
            // Rust stack unwinding and also defer handling until later
            std::panic::panic_any(CaughtError::PostgresError(ErrorReportWithLevel {
                level,
                inner: ErrorReport {
                    sqlerrcode,
                    message,
                    detail,
                    hint,
                    location: ErrorReportLocation { file, funcname, line, col: 0, backtrace: None },
                },
            }))
        }
    }
}