1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478
//! Support for compiling with Cranelift. // # How does Wasmtime prevent stack overflow? // // A few locations throughout the codebase link to this file to explain // interrupts and stack overflow. To start off, let's take a look at stack // overflow. Wasm code is well-defined to have stack overflow being recoverable // and raising a trap, so we need to handle this somehow! There's also an added // constraint where as an embedder you frequently are running host-provided // code called from wasm. WebAssembly and native code currently share the same // call stack, so you want to make sure that your host-provided code will have // enough call-stack available to it. // // Given all that, the way that stack overflow is handled is by adding a // prologue check to all JIT functions for how much native stack is remaining. // The `VMContext` pointer is the first argument to all functions, and the first // field of this structure is `*const VMInterrupts` and the first field of that // is the stack limit. Note that the stack limit in this case means "if the // stack pointer goes below this, trap". Each JIT function which consumes stack // space or isn't a leaf function starts off by loading the stack limit, // checking it against the stack pointer, and optionally traps. // // This manual check allows the embedder (us) to give wasm a relatively precise // amount of stack allocation. Using this scheme we reserve a chunk of stack // for wasm code relative from where wasm code was called. This ensures that // native code called by wasm should have native stack space to run, and the // numbers of stack spaces here should all be configurable for various // embeddings. // // Note that we do not consider each thread's stack guard page here. It's // considered that if you hit that you still abort the whole program. This // shouldn't happen most of the time because wasm is always stack-bound and // it's up to the embedder to bound its own native stack. // // So all-in-all, that's how we implement stack checks. Note that stack checks // cannot be disabled because it's a feature of core wasm semantics. This means // that all functions almost always have a stack check prologue, and it's up to // us to optimize away that cost as much as we can. // // For more information about the tricky bits of managing the reserved stack // size of wasm, see the implementation in `traphandlers.rs` in the // `update_stack_limit` function. // // # How is Wasmtime interrupted? // // Ok so given all that background of stack checks, the next thing we want to // build on top of this is the ability to *interrupt* executing wasm code. This // is useful to ensure that wasm always executes within a particular time slice // or otherwise doesn't consume all CPU resources on a system. There are two // major ways that interrupts are required: // // * Loops - likely immediately apparent but it's easy to write an infinite // loop in wasm, so we need the ability to interrupt loops. // * Function entries - somewhat more subtle, but imagine a module where each // function calls the next function twice. This creates 2^n calls pretty // quickly, so a pretty small module can export a function with no loops // that takes an extremely long time to call. // // In many cases if an interrupt comes in you want to interrupt host code as // well, but we're explicitly not considering that here. We're hoping that // interrupting host code is largely left to the embedder (e.g. figuring out // how to interrupt blocking syscalls) and they can figure that out. The purpose // of this feature is to basically only give the ability to interrupt // currently-executing wasm code (or triggering an interrupt as soon as wasm // reenters itself). // // To implement interruption of loops we insert code at the head of all loops // which checks the stack limit counter. If the counter matches a magical // sentinel value that's impossible to be the real stack limit, then we // interrupt the loop and trap. To implement interrupts of functions, we // actually do the same thing where the magical sentinel value we use here is // automatically considered as considering all stack pointer values as "you ran // over your stack". This means that with a write of a magical value to one // location we can interrupt both loops and function bodies. // // The "magical value" here is `usize::max_value() - N`. We reserve // `usize::max_value()` for "the stack limit isn't set yet" and so -N is // then used for "you got interrupted". We do a bit of patching afterwards to // translate a stack overflow into an interrupt trap if we see that an // interrupt happened. Note that `N` here is a medium-size-ish nonzero value // chosen in coordination with the cranelift backend. Currently it's 32k. The // value of N is basically a threshold in the backend for "anything less than // this requires only one branch in the prologue, any stack size bigger requires // two branches". Naturally we want most functions to have one branch, but we // also need to actually catch stack overflow, so for now 32k is chosen and it's // assume no valid stack pointer will ever be `usize::max_value() - 32k`. use crate::address_map::{FunctionAddressMap, InstructionAddressMap}; use crate::cache::{ModuleCacheDataTupleType, ModuleCacheEntry}; use crate::compilation::{ Compilation, CompileError, CompiledFunction, Relocation, RelocationTarget, TrapInformation, }; use crate::func_environ::{get_func_name, FuncEnvironment}; use crate::{CacheConfig, FunctionBodyData, ModuleLocal, ModuleTranslation, Tunables}; use cranelift_codegen::ir::{self, ExternalName}; use cranelift_codegen::machinst::buffer::MachSrcLoc; use cranelift_codegen::print_errors::pretty_error; use cranelift_codegen::{binemit, isa, Context}; use cranelift_entity::PrimaryMap; use cranelift_wasm::{DefinedFuncIndex, FuncIndex, FuncTranslator, ModuleTranslationState}; use rayon::prelude::{IntoParallelRefIterator, ParallelIterator}; use std::convert::TryFrom; use std::hash::{Hash, Hasher}; /// Implementation of a relocation sink that just saves all the information for later pub struct RelocSink { /// Current function index. func_index: FuncIndex, /// Relocations recorded for the function. pub func_relocs: Vec<Relocation>, } impl binemit::RelocSink for RelocSink { fn reloc_block( &mut self, _offset: binemit::CodeOffset, _reloc: binemit::Reloc, _block_offset: binemit::CodeOffset, ) { // This should use the `offsets` field of `ir::Function`. panic!("block headers not yet implemented"); } fn reloc_external( &mut self, offset: binemit::CodeOffset, _srcloc: ir::SourceLoc, reloc: binemit::Reloc, name: &ExternalName, addend: binemit::Addend, ) { let reloc_target = if let ExternalName::User { namespace, index } = *name { debug_assert_eq!(namespace, 0); RelocationTarget::UserFunc(FuncIndex::from_u32(index)) } else if let ExternalName::LibCall(libcall) = *name { RelocationTarget::LibCall(libcall) } else { panic!("unrecognized external name") }; self.func_relocs.push(Relocation { reloc, reloc_target, offset, addend, }); } fn reloc_constant( &mut self, _code_offset: binemit::CodeOffset, _reloc: binemit::Reloc, _constant_offset: ir::ConstantOffset, ) { // Do nothing for now: cranelift emits constant data after the function code and also emits // function code with correct relative offsets to the constant data. } fn reloc_jt(&mut self, offset: binemit::CodeOffset, reloc: binemit::Reloc, jt: ir::JumpTable) { self.func_relocs.push(Relocation { reloc, reloc_target: RelocationTarget::JumpTable(self.func_index, jt), offset, addend: 0, }); } } impl RelocSink { /// Return a new `RelocSink` instance. pub fn new(func_index: FuncIndex) -> Self { Self { func_index, func_relocs: Vec::new(), } } } /// Implementation of a trap sink that simply stores all trap info in-memory #[derive(Default)] pub struct TrapSink { /// The in-memory vector of trap info pub traps: Vec<TrapInformation>, } impl TrapSink { /// Create a new `TrapSink` pub fn new() -> Self { Self::default() } } impl binemit::TrapSink for TrapSink { fn trap( &mut self, code_offset: binemit::CodeOffset, source_loc: ir::SourceLoc, trap_code: ir::TrapCode, ) { self.traps.push(TrapInformation { code_offset, source_loc, trap_code, }); } } fn get_function_address_map<'data>( context: &Context, data: &FunctionBodyData<'data>, body_len: usize, isa: &dyn isa::TargetIsa, ) -> FunctionAddressMap { let mut instructions = Vec::new(); if let Some(ref mcr) = &context.mach_compile_result { // New-style backend: we have a `MachCompileResult` that will give us `MachSrcLoc` mapping // tuples. for &MachSrcLoc { start, end, loc } in mcr.buffer.get_srclocs_sorted() { instructions.push(InstructionAddressMap { srcloc: loc, code_offset: start as usize, code_len: (end - start) as usize, }); } } else { // Old-style backend: we need to traverse the instruction/encoding info in the function. let func = &context.func; let mut blocks = func.layout.blocks().collect::<Vec<_>>(); blocks.sort_by_key(|block| func.offsets[*block]); // Ensure inst offsets always increase let encinfo = isa.encoding_info(); for block in blocks { for (offset, inst, size) in func.inst_offsets(block, &encinfo) { let srcloc = func.srclocs[inst]; instructions.push(InstructionAddressMap { srcloc, code_offset: offset as usize, code_len: size as usize, }); } } } // Generate artificial srcloc for function start/end to identify boundary // within module. Similar to FuncTranslator::cur_srcloc(): it will wrap around // if byte code is larger than 4 GB. let start_srcloc = ir::SourceLoc::new(data.module_offset as u32); let end_srcloc = ir::SourceLoc::new((data.module_offset + data.data.len()) as u32); FunctionAddressMap { instructions, start_srcloc, end_srcloc, body_offset: 0, body_len, } } /// A compiler that compiles a WebAssembly module with Cranelift, translating the Wasm to Cranelift IR, /// optimizing it and then translating to assembly. pub struct Cranelift; impl crate::compilation::Compiler for Cranelift { /// Compile the module using Cranelift, producing a compilation result with /// associated relocations. fn compile_module( translation: &ModuleTranslation, isa: &dyn isa::TargetIsa, cache_config: &CacheConfig, ) -> Result<ModuleCacheDataTupleType, CompileError> { let cache_entry = ModuleCacheEntry::new("cranelift", cache_config); let data = cache_entry.get_data( CompileEnv { local: &translation.module.local, module_translation: HashedModuleTranslationState( translation.module_translation.as_ref().unwrap(), ), function_body_inputs: &translation.function_body_inputs, isa: Isa(isa), tunables: &translation.tunables, }, compile, )?; Ok(data.into_tuple()) } } fn compile(env: CompileEnv<'_>) -> Result<ModuleCacheDataTupleType, CompileError> { let Isa(isa) = env.isa; let mut functions = PrimaryMap::with_capacity(env.function_body_inputs.len()); let mut relocations = PrimaryMap::with_capacity(env.function_body_inputs.len()); let mut address_transforms = PrimaryMap::with_capacity(env.function_body_inputs.len()); let mut value_ranges = PrimaryMap::with_capacity(env.function_body_inputs.len()); let mut stack_slots = PrimaryMap::with_capacity(env.function_body_inputs.len()); let mut traps = PrimaryMap::with_capacity(env.function_body_inputs.len()); env.function_body_inputs .into_iter() .collect::<Vec<(DefinedFuncIndex, &FunctionBodyData<'_>)>>() .par_iter() .map_init(FuncTranslator::new, |func_translator, (i, input)| { let func_index = env.local.func_index(*i); let mut context = Context::new(); context.func.name = get_func_name(func_index); context.func.signature = env.local.native_func_signature(func_index).clone(); if env.tunables.debug_info { context.func.collect_debug_info(); } let mut func_env = FuncEnvironment::new(isa.frontend_config(), env.local, env.tunables); // We use these as constant offsets below in // `stack_limit_from_arguments`, so assert their values here. This // allows the closure below to get coerced to a function pointer, as // needed by `ir::Function`. // // Otherwise our stack limit is specially calculated from the vmctx // argument, where we need to load the `*const VMInterrupts` // pointer, and then from that pointer we need to load the stack // limit itself. Note that manual register allocation is needed here // too due to how late in the process this codegen happens. // // For more information about interrupts and stack checks, see the // top of this file. let vmctx = context .func .create_global_value(ir::GlobalValueData::VMContext); let interrupts_ptr = context.func.create_global_value(ir::GlobalValueData::Load { base: vmctx, offset: i32::try_from(func_env.offsets.vmctx_interrupts()) .unwrap() .into(), global_type: isa.pointer_type(), readonly: true, }); let stack_limit = context.func.create_global_value(ir::GlobalValueData::Load { base: interrupts_ptr, offset: i32::try_from(func_env.offsets.vminterrupts_stack_limit()) .unwrap() .into(), global_type: isa.pointer_type(), readonly: false, }); context.func.stack_limit = Some(stack_limit); func_translator.translate( env.module_translation.0, input.data, input.module_offset, &mut context.func, &mut func_env, )?; let mut code_buf: Vec<u8> = Vec::new(); let mut reloc_sink = RelocSink::new(func_index); let mut trap_sink = TrapSink::new(); let mut stackmap_sink = binemit::NullStackmapSink {}; context .compile_and_emit( isa, &mut code_buf, &mut reloc_sink, &mut trap_sink, &mut stackmap_sink, ) .map_err(|error| { CompileError::Codegen(pretty_error(&context.func, Some(isa), error)) })?; let unwind_info = context.create_unwind_info(isa).map_err(|error| { CompileError::Codegen(pretty_error(&context.func, Some(isa), error)) })?; let address_transform = get_function_address_map(&context, input, code_buf.len(), isa); let ranges = if env.tunables.debug_info { let ranges = context.build_value_labels_ranges(isa).map_err(|error| { CompileError::Codegen(pretty_error(&context.func, Some(isa), error)) })?; Some(ranges) } else { None }; Ok(( code_buf, context.func.jt_offsets, reloc_sink.func_relocs, address_transform, ranges, context.func.stack_slots, trap_sink.traps, unwind_info, )) }) .collect::<Result<Vec<_>, CompileError>>()? .into_iter() .for_each( |( function, func_jt_offsets, relocs, address_transform, ranges, sss, function_traps, unwind_info, )| { functions.push(CompiledFunction { body: function, jt_offsets: func_jt_offsets, unwind_info, }); relocations.push(relocs); address_transforms.push(address_transform); value_ranges.push(ranges.unwrap_or_default()); stack_slots.push(sss); traps.push(function_traps); }, ); // TODO: Reorganize where we create the Vec for the resolved imports. Ok(( Compilation::new(functions), relocations, address_transforms, value_ranges, stack_slots, traps, )) } #[derive(Hash)] struct CompileEnv<'a> { local: &'a ModuleLocal, module_translation: HashedModuleTranslationState<'a>, function_body_inputs: &'a PrimaryMap<DefinedFuncIndex, FunctionBodyData<'a>>, isa: Isa<'a, 'a>, tunables: &'a Tunables, } /// This is a wrapper struct to hash the specific bits of `TargetIsa` that /// affect the output we care about. The trait itself can't implement `Hash` /// (it's not object safe) so we have to implement our own hashing. struct Isa<'a, 'b>(&'a (dyn isa::TargetIsa + 'b)); impl Hash for Isa<'_, '_> { fn hash<H: Hasher>(&self, hasher: &mut H) { self.0.triple().hash(hasher); self.0.frontend_config().hash(hasher); // TODO: if this `to_string()` is too expensive then we should upstream // a native hashing ability of flags into cranelift itself, but // compilation and/or cache loading is relatively expensive so seems // unlikely. self.0.flags().to_string().hash(hasher); // TODO: ... and should we hash anything else? There's a lot of stuff in // `TargetIsa`, like registers/encodings/etc. Should we be hashing that // too? It seems like wasmtime doesn't configure it too too much, but // this may become an issue at some point. } } /// A wrapper struct around cranelift's `ModuleTranslationState` to implement /// `Hash` since it's not `Hash` upstream yet. /// /// TODO: we should upstream a `Hash` implementation, it would be very small! At /// this moment though based on the definition it should be fine to not hash it /// since we'll re-hash the signatures later. struct HashedModuleTranslationState<'a>(&'a ModuleTranslationState); impl Hash for HashedModuleTranslationState<'_> { fn hash<H: Hasher>(&self, _hasher: &mut H) { // nothing to hash right now } }