Virtual Machine Comparison: V8, JVM and ERTS

Comparing Architectures, Execution Models, and Real-World Applications

Abstract

In the quest to design a versatile and high-performance general-purpose programming language, the choice of a virtual machine (VM) becomes a critical decision. This study explores three leading contenders—V8, JVM, and ERTS—analyzing their architectures, strengths, and trade-offs. Through an in-depth comparative study, we evaluate their suitability across domains such as web development, embedded systems, and distributed applications. Ultimately, ERTS stands out as the best-fit foundation for our language, offering unparalleled scalability, fault tolerance, and concurrency. Inspired by successful ecosystems like Elixir and Phoenix, we envision a language that thrives in modern software development challenges. Looking ahead, integrating WebAssembly (WASM) with ERTS could redefine browser-based applications, creating a unified backend-to-frontend ecosystem. This journey reflects our commitment to empowering developers with a language built for reliability, resilience, and innovation.

Index Terms: Programming Languages, Virtual Machines, V8 Engine, Java Virtual Machine (JVM), Erlang Runtime System (ERTS), BEAM Virtual Machine, Scalability, Fault Tolerance, Concurrency Models, Distributed Systems, WebAssembly (WASM), Web Development, Embedded Systems, Real-Time Applications, Programming Language Design, Functional Programming, Resilient Software, Dynamic Code Execution, Software Architecture, High-Performance Computing.

Introduction

The Case: Building a General-Purpose Programming Language with Virtual Machines

Designing a general-purpose programming language is an ambitious yet rewarding challenge. Our goal is to create a language that excels across diverse domains: web development, systems programming, and embedded systems. Achieving this requires a solid foundation, and the choice of a Software Virtual Machine (VM) will profoundly shape our language’s capabilities, performance, and ecosystem.

Three Paths for Our Programming Language

As we embark on this journey, three powerful virtual machines stand out, each offering unique opportunities and trade-offs:

V8: V8 is a high-performance JavaScript and WebAssembly engine developed by Google. Known for its fast execution and robust ecosystem, V8 enables server-side JavaScript through Node.js and supports integration with systems programming via WebAssembly and native modules in C, C++, and Rust. Choosing V8 allows us to build a language that supports:

Web Development: V8’s strong browser integration makes it ideal for web-centric applications.
Server-Side Development: Node.js offers a proven environment for scalable server-side programs.
Embedded Systems: While V8 itself may be too resource-intensive for constrained devices, its principles can inspire lightweight alternatives like JerryScript, Duktape, or Espruino, tailored for embedded environments.

ERTS: ERTS (Erlang Runtime System) is the powerful runtime environment that powers Erlang and Elixir. It incorporates the BEAM virtual machine and is renowned for its fault-tolerant design, lightweight processes, and preemptive concurrency. It serves as a platform for building functional, general-purpose languages inspired by Lisp, Erlang, or Elixir. With ERTS, our language can target:

Distributed Systems: ERTS’s architecture ensures high reliability and fault tolerance.
Concurrent Applications: Its scalability makes it suitable for handling massive concurrency demands.
Embedded Systems: ERTS (with potential customizations) can be adapted for resource-constrained devices due to its relatively small footprint and process isolation.

JVM: The Java Virtual Machine (JVM) is a mature, versatile platform that excels in portability and performance. With GraalVM, it extends its reach into polyglot programming, enabling seamless interactions between multiple languages. The JVM allows us to craft a language that thrives in:

Enterprise Development: Its extensive ecosystem supports seamless integration into Java-based systems.
Big Data Applications: Frameworks like Hadoop and Spark leverage JVM’s performance and scalability.
Embedded Systems: By customizing JVM-based runtimes, we can adapt it for resource-constrained environments.

Defining Our Language Vision

The strength of a general-purpose language lies in its ability to effectively serve multiple domains. Here’s how we envision our language addressing these needs:

For Web Development: Our language should enable building web servers, API backends, and lightweight frameworks while integrating seamlessly with modern web ecosystems.
For Systems Programming: It must offer low-level control over memory and hardware, ensuring safety and performance akin to Rust or C++.
For Embedded Systems: Lightweight execution and minimal resource usage are essential for targeting constrained devices, such as IoT hardware.

Key Factors to Analyze

To choose the right VM, we must evaluate the following aspects:

Performance: How does the VM handle high-throughput applications, memory management, and execution speed?
Concurrency Models: Does the VM support efficient parallelism, such as threads, event loops, or lightweight processes?
Portability: Can the VM be adapted to embedded environments and support cross-platform needs?
Ecosystem and Tooling: Does the VM provide the necessary libraries, frameworks, and tools for our target domains?
Interoperability: Can our language integrate with existing languages and frameworks for web, systems, or embedded development?

By carefully analyzing these factors, we can choose a virtual machine that provides the foundation for a versatile, high-performance language capable of thriving across multiple domains.

V8: Just-In-Time Compilation with Optimized Garbage Collection

This section provides a detailed, technical overview of V8’s Just-In-Time (JIT) compilation and garbage collection process. Using the transformArray function as an example, we illustrate the internal working of the V8 pipeline step-by-step: from parsing JavaScript to generating optimized machine code and managing memory efficiently.

JIT: Machine Code Generation Pipeline

Multi-tiered architecture

The V8 pipeline is a multi-tiered architecture designed for optimal performance:

TurboFan: A code generation architecture for V8

Initial Execution: The JavaScript source code is parsed into an AST, which is then interpreted by Ignition to generate bytecode.
Profiling: Ignition collects profiling data during execution to inform further optimizations.
Non-Optimized Compilation: Sparkplug quickly converts bytecode into machine code for short-lived functions.
Mid-Tier Optimization: Maglev uses profiling data to rapidly optimize frequently executed functions.
Advanced Optimization: TurboFan performs in-depth optimizations on performance-critical code sections, using detailed profiling data (Profile-Based Optimizations).
Final Output: The optimized machine code is executed for efficient performance.

Maglev

A step-by-step illustration of how the V8 pipeline works is provided below.

Compilation and Execution Steps

Step 1: Tokenization

The working flow starts with Tokenization, also known as lexical analysis. The source code is tokenized into meaningful units like keywords, identifiers, operators, and literals.

Input Code:

function transformArray(arr) {
  return arr.map(x => x * 2);
}

console.log(transformArray([1, 2, 3])); // [2, 4, 6]

Tokens:

[
  { type: 'Keyword', value: 'function' },
  { type: 'Identifier', value: 'transformArray' },
  { type: 'Punctuation', value: '(' },
  { type: 'Identifier', value: 'arr' },
  { type: 'Punctuation', value: ')' },
  { type: 'Punctuation', value: '{' },
  { type: 'Keyword', value: 'return' },
  { type: 'Identifier', value: 'arr' },
  { type: 'Punctuation', value: '.' },
  { type: 'Identifier', value: 'map' },
  { type: 'Punctuation', value: '(' },
  { type: 'Identifier', value: 'x' },
  { type: 'Operator', value: '=>' },
  { type: 'Identifier', value: 'x' },
  { type: 'Operator', value: '*' },
  { type: 'NumericLiteral', value: '2' },
  { type: 'Punctuation', value: ')' },
  { type: 'Punctuation', value: ';' },
  { type: 'Punctuation', value: '}' }
]

Step 2: AST Generation

The parser uses tokens to create an Abstract Syntax Tree (AST), a tree structure representing the syntactic structure of the code:

{
  "type": "FunctionDeclaration",
  "id": { "type": "Identifier", "name": "transformArray" },
  "params": [{ "type": "Identifier", "name": "arr" }],
  "body": {
    "type": "BlockStatement",
    "body": [
      {
        "type": "ReturnStatement",
        "argument": {
          "type": "CallExpression",
          "callee": {
            "type": "MemberExpression",
            "object": { "type": "Identifier", "name": "arr" },
            "property": { "type": "Identifier", "name": "map" }
          },
          "arguments": [
            {
              "type": "ArrowFunctionExpression",
              "params": [{ "type": "Identifier", "name": "x" }],
              "body": {
                "type": "BinaryExpression",
                "operator": "*",
                "left": { "type": "Identifier", "name": "x" },
                "right": { "type": "Literal", "value": 2 }
              }
            }
          ]
        }
      }
    ]
  }
}

Step 3: Bytecode Generation

The AST is converted into Ignition bytecode, a low-level intermediate representation optimized for interpretation. Bytecode is platform-independent and compact.

0 : Ldar a0            // Load parameter arr
1 : PushContext         // Push current context for map callback
2 : LdarClosure         // Load map closure
3 : CallProperty a0, "map" // Call map on arr
4 : Return              // Return result

Step 4: Bytecode Execution (Ignition)

V8 uses a register-based design to execute bytecode. Registers hold variables, and the accumulator performs arithmetic operations.

Registers: { a0: [1, 2, 3], r0: null }
Accumulator: null

0 : Ldar a0            // Accumulator = [1, 2, 3]
1 : PushContext         // Save the context
2 : LdarClosure         // Accumulator = function(x) { return x * 2; }
3 : CallProperty a0, "map" // Execute map with callback
4 : Return              // Return [2, 4, 6]

Step 5: Hot Code Detection

Ignition collects runtime profiling data to identify “hot” (frequently executed) functions.

If transformArray is called repeatedly, Ignition identifies it as a “hot” function and hands it off to TurboFan for optimization.

Step 6: Sea-of-Nodes Representation

TurboFan converts bytecode into a sea-of-nodes graph, where each operation is a node. This enables optimizations like:

Inlining: Embedding frequently called functions directly.
Constant Folding: Simplifying constant expressions at compile-time.

Sea-of-Nodes for x => x * 2:

   Load x      --->  Multiply by 2  ---> Return

Optimizations:

Inlining: The map function is inlined to eliminate function call overhead.
Constant Folding: The multiplier 2 is folded directly into the operation.

Step 7: Machine Code Generation

TurboFan produces highly optimized machine code tailored to the CPU from the optimized graph.

// Generated Machine Code (x86 Assembly)

mov rax, [arr]        ; Load array pointer into register rax
mov rbx, 2            ; Load constant multiplier 2 into register rbx
xor rcx, rcx          ; Initialize index to 0
.loop:
  cmp rcx, [rax.length] ; Check if index < array length
  jge .end             ; If index >= length, exit loop
  mov rdx, [rax + rcx] ; Load current element into rdx
  mul rbx              ; Multiply element by 2
  mov [rax + rcx], rdx ; Store result back in array
  inc rcx              ; Increment index
  jmp .loop            ; Repeat
.end:
ret                    ; Return to caller

Summary of Steps

Step	Purpose
Tokenization	Breaks source code into tokens.
AST Generation	Creates a tree representation of the program.
Bytecode Compilation	Converts AST to Ignition bytecode.
Register/Accumulator	Executes bytecode using a register-based model.
Hot Code Detection	Identifies frequently executed code for optimization.
Sea-of-Nodes	Represents the function as an optimized graph for TurboFan.
Machine Code	Generates CPU-specific instructions for maximum efficiency.

Memory Management

V8 manages memory by dividing it into two primary regions: the stack and the heap. These regions serve distinct purposes and are optimized for specific kinds of memory allocation.

The Stack

The stack is a structured, linear region of memory designed for managing function calls, execution contexts, and local variables. It provides fast allocation and deallocation using a Last-In, First-Out (LIFO) principle.

Characteristics:

Execution Contexts: Stores information about currently executing functions, including parameters, local variables, and return addresses.
LIFO Structure: New function calls push frames onto the stack, and returning functions pop them off.
Automatic Memory Management: Memory is reclaimed automatically when functions exit.
Fixed Size: Limited by system constraints; excessive recursion or large allocations can cause stack overflow errors.

Example:

function factorial(n) {
  if (n === 1) return 1; // Base case
  return n * factorial(n - 1); // Recursive call
}

const result = factorial(5); // Pushes multiple stack frames for each recursive call
console.log(result); // Outputs: 120

Note: Recursive calls can lead to stack overflow if the recursion depth exceeds the stack size.

The Heap

The heap is an unstructured region of memory used for managing dynamic data such as objects, arrays, and closures. Unlike the stack, the heap allows flexible memory allocation for entities whose size or lifetime cannot be determined at compile time.

Characteristics:

Dynamic Allocation: Stores objects, arrays, and closures with unpredictable sizes or lifetimes.
Managed by Orinoco: V8’s advanced garbage collection system optimizes heap usage and reclaims memory occupied by unreachable objects.
Generational Design: Divided into regions for efficient garbage collection:
- Young Generation: Stores short-lived objects (e.g., temporary data).
- Old Generation: Stores long-lived objects (e.g., configuration data).

Example:

function processData() {
  const shortLivedData = new Array(10000).fill("Temporary Data"); // Young generation
  const config = { theme: "dark", user: "admin" }; // May be promoted to old generation
  return config;
}

// Objects are created and garbage collected as per Orinoco’s algorithms
const appConfig = processData();
console.log(appConfig); // GC may already have cleaned up `shortLivedData`

Orinoco: V8’s Heap Manager

Orinoco is V8’s sophisticated garbage collection system, entirely responsible for managing the heap. Key features include:

Generational Garbage Collection:
- Young Generation (Minor GC, Scavenger):
  - Uses a semi-space design, dividing the space into From-Space (active) and To-Space (empty).
  - Copies live objects from From-Space to To-Space during garbage collection and reclaims the rest.
- Old Generation (Major GC):
  - Uses a Mark-Compact algorithm to reclaim memory and reduce fragmentation.
  - Based on the Generational Hypothesis, assuming most objects die young.

The scavenger evacuates live objects to a fresh page

Major GC happens in three phases: marking, sweeping and compacting

Parallel and Concurrent Marking:
- Parallel Processing: Distributes GC tasks across multiple threads to reduce pause times.
- Concurrent Marking: Identifies live objects while JavaScript execution continues, ensuring minimal interruption.

Parallel scavenging distributes scavenging work across multiple helper threads and the main thread

The major GC uses concurrent marking and sweeping, and parallel compaction and pointer updating

Compaction and Fragmentation Handling:
- Compacts memory by moving live objects into contiguous regions.
- Minimizes memory fragmentation for long-lived objects in the old generation.
Idle-Time Garbage Collection:
- Utilizes browser idle time to perform GC tasks, ensuring minimal disruption during active user interactions.

Idle GC makes use of free time on the main thread to perform GC work proactively

Optimizations in V8: Hidden Classes and Inline Caches

JavaScript’s dynamic nature poses challenges for optimizing memory and execution performance. To address this, V8 employs Hidden Classes and Inline Caches (ICs), sophisticated mechanisms that optimize property access and memory usage by leveraging predictable object structures and caching patterns.

Hidden Classes: Dynamic Structure for Static-Like Optimization

Hidden classes, also called shapes or maps, are internal data structures used by V8 to describe the layout (or “shape”) of an object. They reduce the overhead associated with JavaScript’s dynamic property additions and modifications.

How Hidden Classes Work

Initial Assignment:
- When an object is created, V8 assigns it an initial hidden class representing its current shape (e.g., empty object).

Object hidden class (shape)

Transition Chains:
- As properties are added or deleted, the object transitions to new hidden classes that reflect its updated structure.
- If properties are added in the same order across multiple objects, they share the same hidden class, enabling efficient reuse.

Transition chains

Property Offsets:
- Hidden classes map property names to memory offsets. Objects store only property values, while the hidden class holds the structural information, ensuring efficient property lookups.

Example of a source code and its optimized code

Example:

function Point(x, y) {
  this.x = x; // Transition: HiddenClass1 → HiddenClass2
  this.y = y; // Transition: HiddenClass2 → HiddenClass3
}

const p1 = new Point(10, 20);
const p2 = new Point(30, 40); // p1 and p2 share HiddenClass3

Optimization Benefits

Fast Property Access:
- V8 uses the hidden class to locate properties directly by their offsets, avoiding dictionary-style lookups.
Memory Efficiency:
- Objects with shared shapes reduce memory usage, as only one hidden class is stored for all similarly structured objects.

Grouping objects with the same property structure into the same hidden class (shape)

Potential Pitfalls

Order Sensitivity:
- Adding properties in a different order results in distinct hidden classes, potentially degrading performance.

let obj1 = {};
obj1.a = 1;
obj1.b = 2; // HiddenClass1 → HiddenClass2 → HiddenClass3

let obj2 = {};
obj2.b = 2;
obj2.a = 1; // Different hidden class due to order

Memory Overhead:
- Each unique hidden class consumes memory, and excessive variation in object shapes increases hidden class creation.

Inline Caches: Accelerating Property Access

Inline Caches (ICs) complement hidden classes by caching property lookups to avoid repeated resolution costs. ICs leverage the stability of object shapes to enable direct and efficient access.

How Inline Caches Work

Cold State:
- On the first access, V8 performs a full property lookup and records the result, associating the property access pattern (e.g., hidden class and property offset) with the object.
Monomorphic State:
- If subsequent accesses involve the same hidden class, the IC directly fetches the property value using the cached offset.

function getName(user) {
  return user.name; // Property access cached after the first call
}

const user1 = { name: "Alice" };
const user2 = { name: "Bob" };
console.log(getName(user1)); // Monomorphic: Fast cached access
console.log(getName(user2)); // Reuses the cached offset

Polymorphic State:
- If multiple hidden classes are encountered (e.g., objects with the same properties but different orders), the IC generalizes to handle them efficiently.
Megamorphic State:
- For unpredictable or varied object shapes, ICs revert to slower generic lookups.

Optimization Benefits

Reduced Lookup Time:
- ICs bypass repeated full property lookups, accelerating execution.
Support for Polymorphism:
- ICs can handle a limited variety of object shapes without degrading performance.

First time property access

Bypassing the expensive lookup and directly fetching the value

Potential Pitfalls

Shape Changes:
- If an object’s shape changes (e.g., adding or deleting properties), the IC becomes invalid and requires recalibration through a full lookup.
Unstable Patterns:
- Objects with too many variations in shapes (megamorphic behavior) reduce IC effectiveness, reverting to slower property access.

let obj = { a: 1 };
obj.b = 2; // IC invalidated as the shape changes

Key Takeaways

Hidden Classes:
- Provide a structural representation of objects, optimizing property access and reducing memory overhead.
- Encourage consistent property additions for better performance.
Inline Caches:
- Cache property access patterns, significantly speeding up repeated lookups.
- Perform best with stable, predictable object shapes.

Together, hidden classes and inline caches enable V8 to combine the flexibility of JavaScript with the performance of statically typed languages, ensuring efficient memory usage and rapid property access.

Speculative optimization

Speculative optimization is a strategy where the JavaScript engine makes educated guesses (based on runtime information) about the types of values used in the code. These guesses are then used to generate highly optimized machine code.

Profiling with Ignition Interpreter

During the initial execution, V8 uses the Ignition interpreter to collect profiling data.
This data includes:
- Frequency of function calls.
- Types of arguments passed to functions.
- Types of values returned by functions.

Initial execution of the JavaScript code

Optimization with TurboFan

Based on the collected profiling data, V8’s TurboFan compiler generates optimized machine code tailored to the observed types.
For example, if a and b in the above function are always numbers, TurboFan will optimize the addition as a fast numerical addition.

// original code
function add(a, b) {
  return a + b;
}

add(1, 2);  // Profiling begins: types of `a` and `b` are recorded as numbers.

// Optimized code
FastAddNumbers:
    LOAD r1, [a]
    LOAD r2, [b]
    ADD r1, r2

Type feedback enhancements

Fast Execution

Optimized machine code is used for subsequent executions, significantly improving performance compared to the generic bytecode used by the Ignition interpreter.

TurboFan generates optimized machine code that is tailored to the specific types it expects

Deoptimization: Ensuring Correctness

Speculative optimization relies on assumptions about types. If these assumptions are violated, V8 triggers deoptimization to ensure the correctness of the code.

Trigger:
- If V8 encounters a value that doesn’t match the type it assumed (e.g., a string instead of a number), the optimized code becomes invalid.
Fallback:
- The engine discards the optimized code and falls back to the unoptimized bytecode generated by the Ignition interpreter.

Example:

add(1, 2);      // Optimized as numerical addition.
add("hello", 5); // Deoptimization triggered: `a` is now a string.

Deoptimization: Ensuring Correctness

Deoptimization is expensive!

Mitigating Deoptimization Costs

To minimize the performance impact of deoptimization, V8 employs advanced strategies:

Lazy Deoptimization:
- Delays deoptimization until the next function call. This avoids unnecessary deoptimizations if the type mismatch doesn’t affect the current execution.

function compute(a) {
  return a * 2;
}

compute(10); // Optimized for numbers.
compute("5"); // Marked for deoptimization but delays until the next call.

Partial Deoptimization:
- Only deoptimizes the affected parts of the code instead of the entire function, limiting the performance impact.

function process(a, b) {
  if (typeof a === "string") {
    return a + b; // Deoptimized if `b` changes type.
  }
  return a * b; // Remains optimized.
}

The effectiveness of hidden classes, inline caches, and speculative optimization, particularly the deoptimization phase, is heavily influenced by the JavaScript code itself and how data types are handled during runtime. That’s why we should:

Use Consistent Data Types: Avoid frequent type changes for the same variable or function arguments.
Predictable Object Structures: Maintain consistent shapes for objects to leverage hidden classes and inline caches.
Avoid Excessive Polymorphism: Minimize the use of functions that handle a wide variety of types.
Leverage WeakMaps/WeakSets: Manage memory efficiently for ephemeral objects.

let x = 10;      // Start as a number.
x = "hello";     // Type change can cause deoptimization.

let obj = { a: 1 };
obj.b = 2; // Avoid adding properties dynamically if possible.

function sum(a, b) {
  return a + b; // Avoid mixing numbers, strings, and objects.
}

Concurrency Model

The JavaScript Runtime Environment (JRE), powered by engines like V8, operates within a single-threaded architecture. This runtime orchestrates the execution of both synchronous and asynchronous operations through key components like the Call Stack, Queues, and the Event Loop. Despite being single-threaded, JavaScript achieves concurrency by efficiently managing tasks across various queues and APIs.

JavaScript Runtime Environment Overview

The JRE is composed of the following:

JavaScript Engine (e.g., V8): Executes JavaScript code, including compiling and optimizing it for performance.
Host Environment:
- In browsers, it includes Web APIs for DOM manipulation, network requests, and timers.
- In Node.js, it includes Node APIs for file systems, streams, and networking.
Event Loop: The core mechanism that manages task execution from various queues.
Task Queues: Handle pending tasks for asynchronous operations.
Microtask Queue: A higher-priority queue for promises, async/await, and other immediate tasks.

JavaScript Runtime Environment

V8’s Execution Environment

JavaScript’s runtime in V8 is built around a single-threaded Event Loop that interleaves execution between:

The Call Stack: For synchronous execution. This is where function calls are placed and executed one by one, in a last-in, first-out (LIFO) manner.
Micro Task Queue (Job Queue): Holds tasks that need to be processed with higher priority, such as:
- Promises: When a promise resolves or rejects, its .then() or .catch() callbacks are added to the microtask queue.
- async/await: Behind the scenes, async/await uses promises, so their callbacks also end up in the microtask queue.
- queueMicrotask(): This function allows us to explicitly add a microtask to the queue.
Animation Frames: Holds callbacks registered with requestAnimationFrame(). These callbacks are executed before the next repaint of the browser, ensuring smooth animations.
Macro Task (Task Queue): Holds tasks with lower priority, typically originating from Web APIs, such as:
- setTimeout() and setInterval(): Timers add their callbacks to the macrotask queue.
- Events: User interactions (clicks, mouseovers, etc.) and network events generate tasks that are added to the macrotask queue.
- I/O operations: Operations like reading from a file or making a network request also generate tasks for the macrotask queue.

Event Queue

Event Loop Flow Chart

Event Loop ensures non-blocking execution in JavaScript while maintaining synchronous and asynchronous harmony:

Start Execution
- Initialize the Call Stack with the main() function.
Call Stack
- Execute synchronous code:
  - Functions are pushed to the stack when invoked and popped off after execution.
- If the stack is empty, check the Microtask Queue.
Microtask Queue
- Priority 1:
  - Execute all tasks in the Microtask Queue (Promises, async/await, queueMicrotask()).
  - Tasks added during this phase are processed before moving to the next step.
Macrotask Queue (Task Queue)
- Priority 2:
  - Execute tasks in the Macrotask Queue (timers, I/O, setTimeout, setInterval, DOM events).
Animation Frames (Browser-Specific)
- Execute tasks scheduled with requestAnimationFrame(), synchronized with browser refresh cycles.
Web APIs or Host Environment
- Handle external events (AJAX, DOM, timers, etc.) and enqueue tasks into:
  - Microtask Queue for promises.
  - Macrotask Queue for timers or DOM-related tasks.
Repeat
- The Event Loop repeats the process:
  - Check Call Stack, then Microtask Queue, and finally Macrotask Queue.

Example:

console.log("Script start");
setTimeout(() => console.log("Macrotask"), 0);
Promise.resolve().then(() => console.log("Microtask"));
console.log("Script end");

// Output: "Script start", "Script end", "Microtask", "Macrotask."

Script start and Script end are logged from the Call Stack.
Promise callback goes to the Microtask Queue and is executed next.
setTimeout callback goes to the Macrotask Queue and is executed last.

Browser vs. Node.js

While both environments follow the event loop model, they differ in their APIs:

Browser:
- Host Environment: Includes Web APIs for DOM manipulation, AJAX, and animations.
- Task Queues:
  - Animation Frames: Optimized for UI rendering updates.
  - Microtasks: Prioritized for promises and async/await.
  - Macrotasks: Includes timers and DOM-related tasks.
Node.js:
- Host Environment: Includes Node.js APIs for filesystem access and server-side tasks.
- Task Queues:
  - Microtasks: For promises and async operations.
  - Macrotasks: For server tasks like handling HTTP requests.
- Execution Model: Non-blocking I/O allows high concurrency for server-side operations.

Key Takeaways

Event Loop:
- Central to JavaScript’s concurrency model.
- Processes tasks from the Call Stack, Microtask Queue, and Macrotask Queue.
Microtasks vs. Macrotasks:
- Microtasks (higher priority) are always executed before macrotasks.
Browser-Specific Enhancements:
- Adds requestAnimationFrame() for smooth animations.
Node.js-Specific Enhancements:
- Tailored for server-side operations with efficient I/O management.

This Concurrency Model empowers JavaScript to handle asynchronous operations efficiently while maintaining a single-threaded architecture.

Web Workers and Node.js Workers: Extending JavaScript’s Concurrency

JavaScript is inherently single-threaded, which means all operations run on a single main thread. However, for computationally heavy tasks, this model may cause the application to freeze or become unresponsive. To overcome this limitation, JavaScript environments like browsers and Node.js offer workers—a mechanism for running tasks in separate threads.

Feature	Web Workers (Browser)	Node.js Workers
Environment	Browser environment with limited Web APIs.	Node.js runtime environment with full access to Node.js APIs.
Thread Execution	Runs in a separate thread, isolated from the main thread.	Runs in a separate thread, isolated from the main thread.
API Access	Limited to Web APIs (e.g., `fetch`, `WebCrypto`, `setTimeout`); no access to `window` or `document`.	Full access to Node.js APIs (e.g., `fs`, `http`, `stream`, `crypto`).
Communication	Uses `postMessage` and `onmessage` for passing messages between the main thread and workers.	Uses `postMessage` and `on('message')` for communication via the `worker_threads` module.
Shared Memory	Supports `SharedArrayBuffer` for efficient shared memory operations.	Supports `SharedArrayBuffer` for efficient shared memory operations.
Multithreading	Ideal for offloading computationally intensive tasks to prevent UI blocking in web applications.	Ideal for server-side computational tasks, enabling high concurrency in server environments.
DOM Access	No DOM access; workers run in isolation from the main thread.	No DOM access; workers run in isolation.
Concurrency Model	Thread-based concurrency managed by the browser’s event loop.	Thread-based concurrency managed by Node.js’s worker pool and event loop.
V8 Instance	Shares the same V8 instance, but execution contexts are isolated.	Creates a new V8 instance for each worker, fully isolated from the main thread and other workers.
Use Cases	Long-running computations, background processing (e.g., image processing, data parsing, network requests).	Heavy computational tasks (e.g., cryptographic operations, large-scale data processing, parallelized tasks).
Example Syntax	`new Worker()` and `onmessage`.	`new Worker()` from `worker_threads` and `on('message')`.

Portability

Cross-Platform Compatibility

V8 is designed to be highly portable and is integrated into various platforms such as:

Browsers: Google Chrome, Microsoft Edge, and other Chromium-based browsers.
Server-Side: Node.js leverages V8 for server-side JavaScript execution.
Other Embeddings: Tools like Deno and Electron also embed V8 for JavaScript and TypeScript execution.

Integration Challenges

Custom Bindings: Embedding V8 into custom platforms requires creating bindings between the C++ application and JavaScript objects, which can involve significant effort.
Platform-Specific Dependencies: Dependencies like ICU and platform-specific differences (threading, file systems) can complicate integration.

Version-Specific Features

V8 is actively developed, with frequent releases bringing new JavaScript features, performance enhancements, and security updates. This rapid evolution, while beneficial, can introduce compatibility challenges.

V8 is portable across browsers and platforms, allowing JavaScript code to run consistently in different environments, including the browser and Node.js.

Interoperability

Embedding APIs

V8 provides rich APIs for embedding JavaScript into C++ applications, enabling seamless integration of JavaScript into various environments.
Key APIs:
- v8::Isolate: Manages the execution context and isolates script execution for security and efficiency.
- v8::Context: Represents a single execution context (global scope) for running scripts.

Data Exchange

V8 supports bidirectional communication between the host environment and the JavaScript runtime through:

Function Bindings: C++ functions can be exposed as JavaScript functions.
Object Wrappers: JavaScript objects can wrap native objects for seamless interaction.

Multi-Language Interoperability

WebAssembly Support: V8 supports WebAssembly, enabling high-performance execution of code written in languages like C, C++, and Rust alongside JavaScript.
Foreign Function Interfaces: Third-party tools like ffi-napi in Node.js enable calling native functions directly from JavaScript.

V8 enables smooth interaction with Web APIs, Node.js modules, native bindings, and external languages, making it a versatile engine for various types of applications.

Tooling

A rich ecosystem of tools complements V8, aiding in development, debugging, and optimization:

Debugging
- Chrome DevTools: Provides advanced debugging tools, leveraging V8’s internal features for breakpoints, memory profiling, and async stack traces.
- Node.js Inspector: Node’s built-in inspector supports debugging using DevTools for server-side applications.
Profiling
- V8 Profiler: Offers detailed insights into CPU and memory usage.
- Heap Snapshot: Helps identify memory leaks by capturing and analyzing memory allocations.
Build Tools
- Gn/Ninja: Used for building V8 itself, streamlining dependency management and compilation.

The rich ecosystem and tooling around V8 make it a powerful engine for developing, debugging, and optimizing JavaScript applications.

The complete picture

V8 is a high-performance JavaScript engine that powers many platforms, from web browsers like Chrome to server-side environments like Node.js. Its sophisticated architecture and optimizations enable fast and efficient JavaScript execution:

Feature	Description
JIT Compilation	Compiles JavaScript to optimized machine code using a multi-tiered pipeline (parsing, bytecode generation, profiling, optimization).
Garbage Collection (Orinoco garbage collector)	Manages memory efficiently by dividing it into generations (young and old) and employing different algorithms (scavenger, mark-compact) to reclaim unused objects. Uses parallel and concurrent processing.
Hidden Classes	Internal data structures that optimize property access by describing the layout of objects, enabling fast lookups and reducing memory overhead.
Inline Caches	Cache property access patterns to avoid repeated lookups, further speeding up property access.
Speculative Optimization	Makes assumptions about data types to generate optimized code, but can deoptimize if those assumptions are violated at runtime.
Concurrency Model	Uses an event loop to manage the execution of synchronous and asynchronous tasks, enabling non-blocking behavior. Web Workers and Node.js workers allow for true multi-threading.
Portability	Designed to be portable across different platforms, but integration can present challenges due to custom bindings, platform-specific dependencies, and version-specific features.
Interoperability	Provides APIs for embedding JavaScript into C++ applications and supports WebAssembly for high-performance execution of code written in other languages.
Ecosystem and Tooling	A rich ecosystem of tools, including debuggers (Chrome DevTools, Node.js Inspector), profilers (V8 Profiler, heap snapshots), and build tools (GN/Ninja), supports V8 development and optimization.

Suitability for Our Programming Language

In our search for the ideal virtual machine for our programming language, V8, the engine powering JavaScript, warrants careful consideration. To assess its suitability, we must carefully weigh its strengths and trade-offs against our language’s specific needs and priorities.

Aspect	Strength	Trade-Off
Performance	Highly optimized for JavaScript with cutting-edge JIT compilation (Ignition, TurboFan).	Tailored for JavaScript semantics; optimization may not generalize to languages with different paradigms.
Lightweight Footprint	Efficient and compact, suitable for embedding in browsers and resource-constrained environments.	May lack advanced memory management features available in heavier VMs like JVM.
Startup Speed	Fast startup due to bytecode interpretation with Ignition.	Performance improvements may take time as JIT optimizations occur after startup.
Interoperability	Seamless integration with JavaScript libraries and ecosystem.	Tight coupling with JavaScript may hinder interoperability with languages with non-JS-like semantics.
Garbage Collection	High-performance GC (Orinoco) optimized for low-latency applications like browsers.	GC may not support large heaps or complex allocation patterns as efficiently as JVM.
Embedding Support	Designed for embedding (Node.js, Deno), making it ideal for server-side and hybrid apps.	Embedding still requires some complexity for integrating with non-JavaScript environments.
Concurrency Model	Non-blocking, event-driven architecture supports asynchronous programming.	Lacks native multithreading support for computationally heavy tasks (relies on worker threads).
Tooling and Ecosystem	Rich tools for debugging, profiling, and optimizing JavaScript code (e.g., Chrome DevTools).	Fewer general-purpose tools compared to JVM; heavily JS-focused.
Portability	Runs across multiple platforms (Node.js, Deno, browsers).	Less portable for general-purpose programming languages outside JavaScript or WebAssembly contexts.
Customizability	Open-source and actively developed by Google; can be adapted to specific needs.	Deep customization requires significant expertise in V8 internals and JavaScript engine design.
Memory Management	Optimized for small and medium-sized applications; efficient in managing memory for web apps.	May struggle with extremely large heaps or server applications requiring long-lived memory.
License and Deployment	Open-source (BSD license) with wide adoption.	Integration with non-JS environments may require additional effort compared to some lightweight VMs.

To ensure an optimal choice, we need to move forward and explore other VMs: JVM and ERTS. Each offers unique advantages, and a comparative analysis will guide us towards the most suitable foundation for our programming language.

JVM: A Hybrid Approach—Just-In-Time and Ahead-of-Time Compilation with Garbage Collection

JVM Code Generation: A Dynamic and Adaptive Approach

The Java Virtual Machine (JVM) employs a sophisticated code generation strategy that combines Ahead-of-Time (AOT) compilation with Just-In-Time (JIT) compilation to achieve both portability and high performance. This two-stage approach allows Java to achieve both platform independence and high performance.

JVM Model

Stage 1: AOT Compilation (Before Runtime)

AOT compilation is an optional stage that occurs during development or build time. It transforms high-level Java source code into platform-independent bytecode. This bytecode is designed to be portable and can be executed on any system with a JVM. However, AOT compilation may have some limitations, such as reduced opportunities for runtime optimization.

Input: Java source code files (.java)
Output: Java bytecode files (.class), which are platform-independent.
Process:
- Lexical Analysis: The source code is broken down into individual tokens (keywords, identifiers, operators, etc.).
- Parsing: The tokens are organized into an Abstract Syntax Tree (AST), representing the grammatical structure of the code.
- Semantic Analysis: The AST is analyzed for correctness, including type checking, ensuring that the code adheres to the Java language specification.
- Bytecode Generation: The validated AST is translated into bytecode, a compact set of instructions designed for efficient interpretation by the JVM.

Stage 2: JIT Compilation (During Runtime)

JIT compilation is the core of the JVM’s execution strategy. It takes place during program execution, where the JVM dynamically analyzes the running code and selectively compiles frequently used (or “hot“) portions of the bytecode into optimized machine code. This adaptive optimization strategy allows the JVM to tailor the code to the specific CPU architecture and runtime conditions.

Input: Java bytecode (.class files), loaded into the JVM by the Class Loader.
Output:
- Interpreted Code: Used for less frequently executed code paths.
- Optimized Machine Code: Tailored to the specific CPU architecture for improved performance.
Process:
- Bytecode Interpretation: Initially, the JVM interprets the bytecode to execute the program.
- Profiling and Hot Code Detection: The JVM monitors the execution and identifies frequently executed code paths, gathering data about method invocations, loop iterations, and data types.
- Optimized Machine Code Generation:
  - C1 (Client Compiler): Performs a quick compilation of bytecode into machine code, prioritizing fast startup and initial execution. It uses limited profiling information.
  - C2 (Server Compiler): Applies more advanced optimizations to hot code sections, resulting in higher performance but with increased compilation time. It leverages detailed profiling data to make informed optimization decisions.
  - Graal Compiler (Optional): A newer compiler that offers even more sophisticated optimizations, such as speculative inlining and vectorization, potentially leading to further performance gains.

Tiered Compilation

Modern JVMs (Java 8 and later) typically use a tiered compilation strategy by default. This means that the JVM uses multiple tiers of compilation, starting with a faster compiler for quick startup (C1) and progressing to more optimizing compilers (C2 or Graal) for frequently executed code. This approach balances the need for fast initial execution with the desire for optimal long-term performance.

Method compilation life-cycle

The JVM dynamically transitions code between these tiers based on profiling information and runtime behavior. This allows the JVM to adapt to the application’s needs and optimize the code for the most common execution paths.

C1 improves performance faster, while C2 makes better performance improvements based on more information about hotspots

Deoptimization

While JIT compilation optimizes for performance, the JVM also needs to ensure correctness. If the assumptions made during optimization (e.g., about data types) prove invalid at runtime, the JVM triggers deoptimization. This involves discarding the optimized machine code and reverting to interpreting the bytecode, or recompiling with different assumptions. Deoptimization ensures correct behavior but can have a temporary performance impact!

public class DeoptExample {
    public static void main(String[] args) {
        Object obj = "Hello"; // Initially, JVM assumes obj is always a String
        for (int i = 0; i < 10_000; i++) {
            if (i == 5_000) obj = 42; // Type assumption invalidated here
            System.out.println(obj.toString());
        }
    }
}

Key Benefits

Platform Independence: AOT compilation ensures “write once, run anywhere” capability.
High Performance: JIT compilation maximizes execution speed by tailoring code to the system.
Adaptability: Continuous optimization based on usage patterns allows the JVM to adapt to changing program behavior.
Robustness: The JVM gracefully handles deoptimization, ensuring correctness even when optimizations fail.

This dynamic and adaptive code generation strategy makes the JVM a versatile platform that effectively balances portability and efficiency.

JVM Memory Management: A Layered and Efficient Approach

The JVM organizes memory into two primary areas: the Heap and the Non-Heap regions. Each area is further subdivided to manage various objects and execution needs.

Heap Memory

The Heap is used for dynamic memory allocation during runtime. Objects and class instances are stored here, and it is the primary area managed by the Garbage Collector (GC).

Subdivisions of the Heap:
- Young Generation:
  - Short-lived objects (e.g., temporary variables) are created here.
  - Subdivided into:
    - Eden Space: Where new objects are allocated.
    - Survivor Spaces (S0, S1): Hold objects that survive minor GC cycles.
  - Garbage Collection: Uses a Minor GC to quickly reclaim memory.
- Old Generation (Tenured Space):
  - Long-lived objects (e.g., application configuration) are promoted here after surviving multiple GC cycles.
  - Garbage Collection: Uses a Major GC or Full GC for reclaiming memory with compacting strategies.
- Metaspace (Post-Java 8):
  - Stores metadata about classes.
  - Dynamically resizable, replacing the fixed-size PermGen in older JVM versions.

Non-Heap Memory

The Non-Heap region is used for internal JVM structures and execution-related memory.

Subdivisions of Non-Heap Memory:
- Code Cache:
  - Stores JIT-compiled machine code for execution.
  - Optimized for frequently executed methods.
- Thread Stacks:
  - Each thread has its own stack for storing method frames, local variables, and return addresses.
- Direct Memory:
  - Used for buffers and native I/O operations.
  - Managed outside the JVM heap, leveraging the host OS’s memory.

Memory Space in OpenJDK 17

GC Algorithms

Modern JVMs implement multiple garbage collection algorithms tailored to various application needs:

Serial GC:
- A single-threaded collector.
- Best suited for small applications with limited heap size or for single-core systems where simplicity is prioritized.
Parallel GC:
- A multi-threaded collector for both Minor and Major GCs.
- Suitable for multi-core systems and applications where throughput is a primary concern.
G1 GC (Garbage-First): The default GC in recent Java versions.
- It splits the heap into regions and prioritizes garbage collection in regions with the most garbage.
- It offers a good balance between throughput and pause times.
ZGC (Z Garbage Collector):
- Designed for low-latency applications.
- Handles heaps up to terabytes in size with very short pause times.
- A good choice for applications with large heaps and strict latency requirements.
Shenandoah GC:
- Focuses on reducing pause times with concurrent compaction.
- Similar to ZGC in its goals, but with some differences in implementation.
- Suitable for applications where consistent low pause times are critical.
Epsilon GC:
- A “no-op” garbage collector.
- It handles memory allocation but does not perform any garbage collection.
- Primarily used for performance testing, memory pressure analysis, and specific situations where the application has very predictable memory usage patterns.

More information can be found here.

Concurrency Model

The Java Virtual Machine (JVM) has a rich history of supporting concurrent programming, enabling developers to write high-performance applications that leverage multithreading and parallelism. With the introduction of virtual threads in Java 21, the JVM’s concurrency model has been significantly enhanced, offering new levels of scalability and efficiency.

Threads and Threading Models

Platform Threads (Traditional)

Operating system (OS) manages the available threads and assigns tasks to them

Characteristics:
- OS-Dependent Scheduling: The JVM relies on the underlying operating system’s scheduler to manage platform threads.
- Priority Hints: Thread priorities (Thread.setPriority()) provide hints to the OS scheduler, but the actual scheduling decisions are influenced by various factors, including OS policies and the characteristics of other running processes.
- Time-Slicing: Most modern OS schedulers employ time-slicing, where each thread gets a small slice of CPU time before the context is switched to another thread. However, the exact time-slice duration and scheduling algorithm can vary.
- Preemptive Scheduling: Higher-priority threads can generally preempt lower-priority threads, but this behavior is not strictly guaranteed.
Use Cases:
- CPU-Bound Tasks: Well-suited for tasks that require intensive CPU computations.
- OS-Level Interactions: Necessary when interacting with OS-specific features or APIs that require native threads.
- Legacy Code: Often used in existing applications and libraries that were designed before virtual threads were available.

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;

public class PlatformThreadBlockingIO {
    public static void main(String[] args) {
        for (int i = 0; i < 10; i++) {
            Thread thread = new Thread(() -> {
                try {
                    URL url = new URL("https://jsonplaceholder.typicode.com/posts/1");
                    HttpURLConnection conn = (HttpURLConnection) url.openConnection();
                    conn.setRequestMethod("GET");
                    System.out.println(Thread.currentThread().getName() + " response: " + conn.getResponseCode());
                } catch (IOException e) {
                    e.printStackTrace();
                }
            });

            thread.start();

            try {
                thread.join(); // Wait for the thread to finish (optional for sequential execution)
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }
}

Virtual Threads (Java 21 and later)

Java 21 virtual threads

Characteristics:
- JVM-Managed Scheduling: The JVM takes a more active role in scheduling virtual threads. It can efficiently manage a large number of virtual threads, even with a limited number of platform threads.
- Cooperative Scheduling: Virtual threads are primarily scheduled cooperatively. This means that a virtual thread will continue running until it performs a blocking operation (e.g., I/O, waiting on a lock). At that point, the JVM can quickly switch to another virtual thread.
- Work Stealing: The JVM can employ work-stealing algorithms to distribute tasks among platform threads, ensuring that virtual threads are efficiently utilized.
- Priority Management: The JVM might handle priorities differently for virtual threads, potentially allowing for more fine-grained control or different priority levels.
Use Cases:
- I/O-Bound Tasks: Ideal for tasks that spend a significant amount of time waiting for I/O operations (e.g., network requests, file access, database queries).
- High-Throughput Concurrency: Enables server applications to handle a large number of concurrent requests efficiently.
- Simplified Concurrency: Allows developers to use familiar threading models without worrying about the limitations of OS threads.

import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;

public class VirtualThreadBlockingIO {
    public static void main(String[] args) {
        for (int i = 0; i < 10; i++) {
            Thread.startVirtualThread(() -> {
                try {
                    URL url = new URL("https://jsonplaceholder.typicode.com/posts/1");
                    HttpURLConnection conn = (HttpURLConnection) url.openConnection();
                    conn.setRequestMethod("GET");
                    System.out.println(Thread.currentThread().getName() + " response: " + conn.getResponseCode());
                } catch (IOException e) {
                    e.printStackTrace();
                }
            });
        }
    }
}

Virtual Threads vs Platform Threads

Feature	Platform Threads	Virtual Threads
Resource Consumption	Heavyweight	Lightweight
Scalability	Limited	Massive
Management	OS-managed	JVM-managed
Blocking Behavior	Blocks OS thread	Efficiently managed by JVM
Ideal Use Cases	CPU-bound tasks, OS interactions	I/O-bound tasks, high concurrency

Platform threads vs Virtual threads

Synchronization: Ensuring Thread Safety and Data Integrity

In concurrent Java programming, where multiple threads operate independently, the need to safeguard shared resources and maintain data consistency becomes paramount. Synchronization provides the essential mechanisms to coordinate thread access, preventing race conditions and ensuring predictable behavior in multi-threaded environments.

Core Principles of Synchronization

Synchronization revolves around the concept of mutual exclusion, guaranteeing that only one thread can access a shared resource at any given time. This controlled access prevents data corruption and maintains the integrity of shared data structures.

Writing concurrent code can be challenging due to:

Race conditions: Multiple threads messing with the same data at the same time.
Deadlocks: Threads getting stuck waiting for each other, like a traffic jam.
Livelocks: Threads not making progress because they keep reacting to each other.

Synchronization Mechanisms in Java

Java offers a rich repertoire of synchronization tools:

synchronized: When a thread enters a synchronized block or method, it acquires the corresponding monitor lock, ensuring exclusive access to the protected code.
Explicit Locks: For more fine-grained control, the java.util.concurrent.locks package provides advanced locking mechanisms, such as ReentrantLock and ReadWriteLock.
Atomic Variables: The java.util.concurrent.atomic package offers lock-free, thread-safe operations on primitive data types and references.

Synchronization also ensures that all threads see the most up-to-date data:

Volatile Variables: Declaring a variable as volatile enforces immediate visibility of changes across threads. Any write to a volatile variable is instantly flushed to main memory, and subsequent reads retrieve the updated value, preventing threads from observing stale data.
Happens-Before Relationship: The happens-before relationship in the JMM (Java Memory Model) establishes a partial ordering of memory operations. If one operation happens-before another, the JMM guarantees that the first operation’s results are visible to the second, ensuring consistent and predictable behavior in concurrent programs.

Best Practices for Effective Synchronization

To achieve optimal thread safety and performance in concurrent Java applications, we should adhere to the following best practices:

Minimize synchronized blocks: Large synchronized blocks can increase contention (threads waiting for each other) and reduce performance.
Choose the right tools: Java gives us a variety of synchronization tools (synchronized, locks, atomic variables). Selecting the right tool for the job can make a big difference in efficiency and readability.
Avoid deadlocks: Deadlocks are like traffic jams in our code – threads get stuck waiting for each other. Careful lock ordering and avoiding circular dependencies can prevent these frustrating situations.
Use thread-safe data structures: Java provides concurrent collections (like ConcurrentHashMap and ConcurrentLinkedQueue) that are designed for safe concurrent access. Using these can save us from manual synchronization headaches.

Portability

The Java Virtual Machine (JVM) is a cornerstone of Java’s “Write Once, Run Anywhere” philosophy, enabling Java applications to execute across diverse platforms without modification. This portability is achieved through several key mechanisms:

Java Bytecode: Platform-independent bytecode serves as an intermediate representation, interpreted or compiled by the JVM for native execution.
JVM Abstraction: The JVM bridges Java bytecode and the host system by providing platform-specific implementations, enabling consistent execution across diverse environments.
Standard Class Libraries: Unified APIs abstract platform-specific operations (e.g., file I/O, networking), ensuring consistent application behavior.
Just-In-Time (JIT) Compilation: Runtime compilation optimizes performance while maintaining cross-platform compatibility.
Hardware and OS Independence: The JVM abstracts system details, translating bytecode into platform-specific instructions.

JVM portability

Interoperability

Java interoperability refers to Java’s ability to interact with code written in other languages, and vice-versa.

Key Mechanisms for Java Interoperability

Java Native Interface (JNI): This is the most common and fundamental mechanism for Java interoperability. JNI enables Java code to call native code (typically written in C/C++) and allows native code to call Java code.
- Use Cases: Accessing platform-specific features, integrating with legacy systems, performance-critical code, and using existing C/C++ libraries.
Java Native Access (JNA): JNA simplifies the use of native libraries by providing a higher-level interface compared to JNI. It removes the need for explicit native code implementation and relies on dynamic code generation at runtime.
- Use Cases: Similar to JNI, but with a focus on ease of use and faster development.
Foreign Function & Memory API (FFM API): This is a relatively new feature (introduced in Java 19 as a preview) that aims to modernize and improve Java’s interaction with native code. FFM API provides a more efficient and secure way to access native libraries.
- Features: Memory access API, foreign linker API, and foreign function API.

Challenges and Considerations

Platform Dependence: Native code is inherently platform-dependent, so interoperability solutions often require platform-specific implementations or configurations.
Security Risks: Interacting with native code can introduce security vulnerabilities if not handled carefully.
Performance Overhead: There can be performance overhead associated with crossing the boundary between Java and native code.
Debugging Complexity: Debugging interoperability issues can be challenging due to the involvement of multiple languages and environments.

Best Practices

Minimize Native Code Integration: Native code should be utilized judiciously, as it can introduce complexities in development and hinder portability across platforms.
Select the Appropriate Interoperability Mechanism: The choice between JNI, JNA, and the FFM API should be guided by project requirements, developer expertise, and performance considerations.
Prioritize Security Measures: When interacting with native code, rigorous adherence to security guidelines is essential to mitigate potential vulnerabilities.
Conduct Comprehensive Testing: Thorough testing across all target platforms is imperative to ensure the correctness, stability, and reliability of the interoperability solution.

Ecosystem and Tooling

Java Virtual Machines come with a powerful set of tools for development, debugging, profiling, and monitoring.

Debugging and Troubleshooting
- Java Debugger (jdb): A command-line debugger for stepping through code, setting breakpoints, and inspecting variables.
- JVM Tool Interface (JVMTI): A native interface for building custom profiling and debugging tools that interact deeply with the JVM.
Profiling and Performance Monitoring
- JMC 8: A production-time monitoring and diagnostic tool with a GUI for detailed JVM analysis (garbage collection, memory, threads). Requires JDK 11 or later to run.
- Java Flight Recorder (JFR): A low-overhead profiling tool built into the JVM, capturing fine-grained runtime information for performance analysis.
- VisualVM: A visual tool for monitoring JVM performance and resource usage. It provides insights into memory allocation, garbage collection, and thread activity.
Bytecode Manipulation and Analysis
- Byte Buddy: A library for dynamically creating and modifying Java classes at runtime.
- Javassist: A higher-level bytecode manipulation library for easier class file modification.
Command-Line Tools (JDK)
- jmap: Generates heap dumps and memory maps of a running JVM.
- jps: Lists the instrumented JVMs running on a system.
- jstat: Monitors JVM statistics such as garbage collection, class loading, and compiler performance.
- jstack: Prints thread dumps of a Java process, aiding in diagnosing deadlocks and thread contention.
- jcmd: A versatile tool for sending diagnostic command requests to the JVM.
- jconsole: A graphical tool for monitoring JVM performance and resource consumption.

The complete picture

Aspect	Details	Tools and Techniques
Code Compilation	AOT Compilation: Transforms Java source into platform-independent bytecode. JIT Compilation: Converts bytecode to machine code during runtime.	– AOT: `javac` – JIT: Tiered Compilation (C1, C2, Graal).
Garbage Collection	Handles automatic memory management using algorithms tailored to application needs.	– Default: G1 GC – Low Latency: ZGC, Shenandoah – Legacy: Parallel GC, Serial GC – No-op: Epsilon GC.
Memory Management	Heap Memory: For object storage (Young, Old generations). Non-Heap: Code cache, thread stacks, native I/O.	– `jmap` for heap analysis – VisualVM, JMC, JFR for memory monitoring.
Concurrency Model	Supports multithreading via platform threads and lightweight virtual threads (Java 21+).	– Platform Threads: OS-managed. – Virtual Threads: JVM-managed (uses work stealing, cooperative scheduling).
Deoptimization	Reverts optimized machine code to bytecode execution when runtime assumptions fail.	– Debug with `jstack` and `jdb`.
Synchronization	Ensures thread safety and prevents race conditions in concurrent programming.	– Tools: `synchronized`, `ReentrantLock`, `volatile`. – `java.util.concurrent` package for thread-safe utilities.
Portability	Write once, run anywhere. Java bytecode executes on any JVM across diverse platforms.	– JVM abstracts OS and hardware differences. – Standardized class libraries ensure uniform behavior.
Interoperability	Interacts with native code and other languages via JNI, JNA, and Foreign Function & Memory API.	– JNI for platform-specific libraries. – JNA for simplified native library integration. – FFM API for modern, secure native access.
Performance Tuning	JVM dynamically adapts to workload using profiling, optimization, and garbage collection strategies.	– Tools: `jcmd`, VisualVM, JFR, Async Profiler.
Debugging	Diagnoses runtime issues with thread dumps, heap analysis, and debugging interfaces.	– Tools: `jdb`, `jstack`, `jmap`, `jconsole`.
Profiling	Analyzes memory allocation, thread activity, and performance bottlenecks.	– Tools: JFR, JMC, VisualVM, Async Profiler.
Bytecode Manipulation	Modifies or generates Java bytecode for runtime adaptability.	– Libraries: Byte Buddy, Javassist.
Command-Line Tools	Provides JVM diagnostics and insights into runtime operations.	– Tools: `jmap`, `jstat`, `jstack`, `jps`, `jcmd`, `jconsole`.
Ecosystem Tools	Includes GUI-based and CLI tools for JVM diagnostics and monitoring.	– VisualVM, JMC, JFR for real-time monitoring.
Future Enhancements	Continuous improvements in virtual threads, JIT optimizations, and memory management.	– GraalVM for polyglot capabilities and advanced optimizations.

Suitability for Our Programming Language

The Java Virtual Machine (JVM) is a robust, versatile platform designed for running Java and other JVM-based languages like Kotlin, Scala, and Groovy. Evaluating its suitability for a new programming language involves weighing its strengths against potential trade-offs.

Aspect	Strength	Trade-Off
Portability	Platform independence and cross-platform compatibility.	Larger footprint unsuitable for some embedded or resource-limited systems.
Performance	JIT and tiered compilation provide adaptive optimization and high execution speeds.	Startup performance may lag due to JIT and class loading overhead.
Interoperability	Extensive integration with Java libraries, tools, and ecosystems.	Tight coupling with Java semantics may limit compatibility with unconventional language designs.
Concurrency	Advanced threading models (virtual threads in Java 21+) for scalable applications.	Concurrency models not inherently designed for actor-based paradigms like those in ERTS.
Memory Management	Sophisticated GC algorithms (G1, ZGC) for various workloads.	Relatively high memory usage compared to lightweight VMs.
Tooling and Diagnostics	Rich suite of tools for debugging, profiling, and performance monitoring.	Complex setup and higher development overhead for new language runtimes.
License and Deployment	Robust for enterprise applications with long-term support options.	Licensing considerations and potential legal implications with Oracle.

The JVM is a compelling choice for a programming language focused on cross-platform compatibility, high performance, and integration with Java’s ecosystem. However, if resource constraints, unconventional language semantics, or startup performance are critical, exploring alternative VMs like ERTS or more lightweight solutions may be prudent.

ERTS: Lightweight Processes, Efficient Execution, and Concurrent Garbage Collection

The Erlang Runtime System (ERTS): An Overview

When we start an Erlang or Elixir application, what we are actually starting is an Erlang node. A node is a single operating system process that runs the Erlang Runtime System (ERTS), which in turn hosts the BEAM virtual machine (VM):

ERTS Stack

All execution of Erlang/Elixir code happens within a node, and the performance of our application depends not only on our code but also on the layers of the runtime stack underneath it.

Erlang Runtime System (ERTS)

Role: The core runtime environment for Erlang/Elixir code.
Key Features:
- Process Management: Handles lightweight, isolated processes with preemptive scheduling.
- Garbage Collection: Per-process garbage collection to minimize pauses.
- Fault Tolerance: Enables process isolation, supervision trees, and automatic recovery.
- Distributed Systems: Built-in support for inter-node communication and clustering.

BEAM Virtual Machine

Role: The execution engine within ERTS responsible for running bytecode.
Key Features:
- Executes platform-independent BEAM bytecode (.beam files).
- Supports JIT compilation (BeamAsm) for performance optimization.
- Provides high-level instructions tailored to concurrency, pattern matching, and tail-call optimization.

OTP Framework

Role: Provides standard libraries and design principles for building robust applications.
Key Features:
- Abstractions like supervision trees, gen_server, and gen_tcp.
- Libraries for concurrency, fault tolerance, and networking.
- Forms the foundation for applications written in Erlang or Elixir.

This well-structured stack provides:

Scalability: Supports millions of concurrent, lightweight processes with minimal overhead.
Resilience: Process isolation and supervision trees ensure fault-tolerant execution.
Efficiency: JIT compilation and optimized instructions enhance runtime performance.
Ease of Development: OTP libraries simplify building complex, distributed systems.

The Erlang Runtime System and its components provide robust concurrency and fault tolerance, making it a reliable choice for building scalable and distributed applications.

The Compilation Pipeline

The process of transforming Erlang or Elixir source code into executable BEAM bytecode spans two key phases: before runtime (compilation) and after runtime (execution). Each phase ensures high-level functional code is optimized for concurrent and distributed execution on the BEAM virtual machine (VM).

Before Runtime: Compilation Pipeline

Source Code Parsing:
- Input: Erlang/Elixir source files (.erl, .ex).
- Process: The source code is transformed into an Abstract Syntax Tree (AST), representing the logical structure of the program.
Core Erlang Transformation:
- Input: AST.
- Process: The AST is simplified into Core Erlang, a functional intermediate representation designed for analysis and optimization.
Optimization:
- Input: Core Erlang.
- Process:
  - Inlining: Replacing function calls with their definitions.
  - Tail-Call Optimization: Efficient handling of recursive calls.
  - Dead Code Elimination: Removing unused code.
- Output: Optimized Core Erlang.
Kernel Erlang Conversion:
- Input: Optimized Core Erlang.
- Process: Translated into Kernel Erlang, a lower-level representation closer to BEAM instructions.
Bytecode Generation:
- Input: Kernel Erlang.
- Process: Compiled into BEAM bytecode (.beam files), a platform-independent instruction set.
- Output: .beam files, ready for runtime execution.

After Runtime: Execution on BEAM VM

Dynamic Code Loading:
- The ERTS runtime loads .beam files into memory.
- Modules and functions become accessible for execution.
Bytecode Execution:
- BEAM VM interprets the bytecode or uses BeamAsm (JIT compilation) to translate critical paths into native machine code.
Process Scheduling and Isolation:
- Each function call or task is executed within an isolated, lightweight process.
- BEAM ensures fault tolerance through process isolation and preemptive scheduling.
Hot Code Swapping:
- BEAM supports dynamic code replacement, allowing systems to update or patch modules without downtime.

Key Features by Phase

Before Runtime:
- Portability: Bytecode is platform-independent, ensuring cross-environment consistency.
- Optimization: The compiler produces bytecode tailored for efficient execution.
After Runtime:
- Flexibility: Dynamic loading and hot code swapping support seamless updates.
- Efficiency: BEAM optimizes execution using JIT for performance-critical paths.
- Reliability: Fault-tolerant process management ensures robust execution.

This pipeline highlights how the Erlang Runtime System ensures code is not only well-prepared before runtime but also executed efficiently and reliably in live environments.

ERTS: Lightweight Processes and Concurrency

The Erlang Runtime System (ERTS) is built around a concurrency model that prioritizes lightweight, isolated processes and efficient scheduling. This architecture is designed to handle millions of concurrent processes with minimal overhead, making it ideal for distributed, fault-tolerant applications.

Key Features of Lightweight Processes

Isolation:
- Each process has its own memory heap, stack, and mailbox.
- Processes do not share memory, eliminating race conditions and simplifying concurrency.
Lightweight Nature:
- Processes are lighter than OS threads:
  - Typical memory usage is only a few kilobytes per process.
  - ERTS can efficiently run millions of processes simultaneously.
Preemptive Scheduling:
- ERTS uses preemptive multitasking, ensuring that long-running processes do not block the system.
- Processes are assigned small time slices, and the scheduler ensures fair execution across all processes.
Messaging:
- Processes communicate via asynchronous message passing.
- Messages are copied between process mailboxes, maintaining isolation and thread safety.

Process anatomy

Concurrency Model in ERTS

Process Creation
- Creating a new process is lightweight and efficient:
  - A simple function call (e.g., spawn/3) initializes a new process.
  - Unlike OS threads, process creation in ERTS has negligible overhead.
Process Scheduling
- ERTS uses a scheduler for each CPU core, maximizing multicore system performance.
- The scheduler employs run queues to manage active processes and balances workload across cores.
Process Communication
- Communication is handled through message passing:
  - Processes send messages using the ! operator.
  - Messages are stored in the recipient’s mailbox and processed in order of arrival.
  - No locks or shared memory are required.
Process Lifecycle
- Processes follow a lifecycle:
  1. Spawn: A new process is created.
  2. Run: Executes until completion or suspension.
  3. Terminate: Cleans up resources when finished.

Process spawning in Erlang

Example: Concurrent Processes (Ping-Pong)

The below example creates two processes: ping and pong. The ping process sends a message to the pong process, which responds back. The communication continues for a fixed number of rounds.

-module(ping_pong).
-export([start/0, ping/2, pong/0]).

% Start the Ping-Pong game
start() ->
    PongPid = spawn(?MODULE, pong, []), % Spawn the pong process
    PingPid = spawn(?MODULE, ping, [PongPid, 5]), % Spawn the ping process with 5 rounds
    PingPid ! start. % Initiate the game by sending the 'start' message to ping.

% Ping process logic
ping(PongPid, 0) ->
    io:format("Ping process: Finished!~n"),
    PongPid ! stop; % Notify pong process to stop
ping(PongPid, N) ->
    receive
        start -> % Handle the initial start message
            io:format("Ping: Sending message to Pong.~n"),
            PongPid ! {ping, self()}, % Send 'ping' to pong with own PID
            ping(PongPid, N); % Continue for the same number of rounds
        pong -> % Handle 'pong' response from Pong process
            io:format("Ping: Received pong! ~p rounds left.~n", [N - 1]),
            PongPid ! {ping, self()}, % Send another 'ping' to pong
            ping(PongPid, N - 1) % Decrement the round counter
    end.

% Pong process logic
pong() ->
    receive
        {ping, From} -> % Handle 'ping' message
            io:format("Pong: Received ping. Sending pong back.~n"),
            From ! pong, % Send 'pong' response back to Ping
            pong(); % Continue receiving messages
        stop -> % Handle stop message
            io:format("Pong process: Finished!~n"),
            ok
    end.

Explanation:

Process Creation:
- The pong process is started first, waiting for messages.
- The ping process is started with a reference to the pong process and a counter for the number of rounds.
Message Exchange:
- start triggers the ping process to send the first {ping, Pid} message to pong.
- Upon receiving a ping, the pong process responds with a pong message to the sender’s PID.
- The ping process decrements the round counter and sends another ping until the counter reaches zero.
Process Termination:
- When the ping process finishes its rounds, it sends a stop message to the pong process, which then exits gracefully.

Ping: Sending message to Pong.
Pong: Received ping. Sending pong back.
Ping: Received pong! 4 rounds left.
Pong: Received ping. Sending pong back.
Ping: Received pong! 3 rounds left.
Pong: Received ping. Sending pong back.
Ping: Received pong! 2 rounds left.
Pong: Received ping. Sending pong back.
Ping: Received pong! 1 rounds left.
Pong: Received ping. Sending pong back.
Ping process: Finished!
Pong process: Finished!

This example demonstrates the core concurrency model in Erlang:

Lightweight processes.
Asynchronous message passing.
Fault isolation (processes operate independently).

Illustration of “Ping-Pong” implementation

ERTS: Error Handling and Fault Tolerance

The Erlang Runtime System (ERTS) is designed to make systems resilient to failures. It achieves this through process linking, supervision trees, and the “let it crash” philosophy. These mechanisms ensure that errors are contained and handled efficiently without affecting the entire system.

Process Linking: Coordinating Failure Responses

What It Is: A mechanism to link processes so they can react to each other’s failures.
How It Works:
- When a process crashes, it sends an exit signal to all linked processes.
- Linked processes can either crash in response (default behavior) or trap exits and handle them as messages.

process_flag(trap_exit, true),  % Enable trapping exits
link(PID),                     % Link with another process
receive
    {'EXIT', PID, Reason} ->   % Handle the exit signal as a message
        io:format("Process ~p crashed: ~p~n", [PID, Reason])
end.

Benefits:
- Simplifies error detection and response.
- Encourages modular, loosely coupled design by isolating processes.

Cascading chain of error propagation between interlinked processes

Supervision Trees: Structured Recovery

What It Is: A hierarchical framework where supervisors monitor and manage processes (workers).
How It Works:
- Processes are grouped under supervisors.
- If a process fails, the supervisor takes predefined actions to restart it or its group.

Restart Strategies

one_for_one: Only the failed process is restarted.
one_for_all: All processes in the group are restarted.
rest_for_one: The failed process and any started after it are restarted.
simple_one_for_one: A dynamic strategy for managing similar processes, such as workers in a pool.

init([]) ->
    {ok, {{one_for_one, 3, 10},  % Restart strategy, intensity, period
          [
              {worker, {worker_module, start_link, []},
               permanent, 5000, worker, [worker_module]}
          ]}}.

Benefits:
- Isolates failures to specific branches, preventing cascading crashes.
- Ensures that applications remain operational despite failures

Different restart strategies used by supervisors

“Let it Crash” Philosophy: Simplifying Error Handling

What It Means: Focus on cleanly crashing faulty processes rather than trying to fix them.
How It Works:
- Each process is lightweight and isolated, so a crash doesn’t affect others.
- Supervisors automatically restart crashed processes, ensuring system stability.
Advantages:
- Reduces complexity in error-handling code.
- Allows systems to recover quickly and consistently from unexpected issues.

By integrating process linking, supervision trees, and a “let it crash” philosophy, ERTS provides a powerful framework for building resilient, fault-tolerant systems. This design ensures that failures are handled gracefully while keeping the system operational.

ERTS: Concurrent Garbage Collection

The Erlang Runtime System (ERTS) employs a sophisticated concurrent garbage collection mechanism designed to handle millions of lightweight processes while minimizing pauses and maintaining high responsiveness.

Key Characteristics of ERTS Garbage Collection

Per-Process Garbage Collection:
- Each Erlang process has its own heap and stack.
- Garbage collection is performed independently for each process, meaning one process’s garbage collection does not affect others.
- This design ensures that garbage collection pauses are short and localized, avoiding global system pauses.
Generational Garbage Collection:
- Erlang employs a generational garbage collection model for processes:
  - Young Generation: Short-lived objects are collected quickly and frequently.
  - Old Generation: Objects that survive multiple collections are moved to the old generation and collected less often.
- This approach reduces the overhead of frequently scanning long-lived data.
Concurrent Collection:
- While a process is garbage-collected, other processes continue running without interruption.
- This concurrency minimizes system-wide performance impacts and ensures low-latency operations.
Small Memory Footprint:
- Erlang processes are lightweight, starting with small heaps (hundreds of bytes) that grow dynamically as needed.
- This design ensures efficient memory utilization even with millions of processes.

When Garbage Collection is Triggered in ERTS

Heap Exhaustion:
- This is the most common trigger. When a process’s heap is full, garbage collection is initiated to reclaim unused memory.
Explicit Triggers:
- Developers can explicitly request garbage collection for a process using functions like erlang:garbage_collect/1. This can be useful in scenarios where we want to clean up memory proactively, such as after a memory-intensive operation.
Process Reduction Count:
- Each process has a reduction count (a measure of the work it performs). When a process exceeds its reduction limit, the scheduler may perform garbage collection to reclaim memory before resuming the process.
After Process Hibernation:
- When a process is put into hibernation using erlang:hibernate/3, garbage collection is performed to minimize its memory footprint before suspending it.
Memory Fragmentation:
- If the system detects significant memory fragmentation or inefficiency in heap usage, garbage collection may be triggered as part of internal optimization.

Comparison with Traditional Garbage Collection

Aspect	ERTS Garbage Collection	Traditional Garbage Collection
Scope	Per-process	System-wide
Pause Time	Minimal, localized	Can cause significant global pauses
Concurrency	Concurrent	Often sequential
Isolation	Independent for each process	Shared across threads/processes
Suitability	Highly concurrent and distributed systems	General-purpose systems

ERTS’s concurrent garbage collection is a cornerstone of its ability to handle massive concurrency. By isolating memory management to individual processes, it ensures low latency, high scalability, and robust fault tolerance.

BEAM: The Execution Engine

The BEAM virtual machine is the core execution engine within the Erlang Runtime System (ERTS). While ERTS provides the overall environment for running Erlang/Elixir applications, BEAM focuses specifically on executing the bytecode and managing the concurrency model.

Key Responsibilities

Bytecode Interpretation and JIT Compilation: BEAM interprets the platform-independent bytecode generated from Erlang/Elixir source code. For performance-critical sections, it utilizes a Just-In-Time (JIT) compiler (BeamAsm) to translate bytecode into native machine code.
Process Scheduling: BEAM employs a preemptive scheduler to ensure fair execution among the lightweight processes managed by ERTS. This prevents any single process from monopolizing resources and ensures responsiveness.
Optimized Instructions: BEAM offers specialized instructions tailored for functional programming paradigms, such as efficient pattern matching and tail-call optimization, contributing to the execution speed of Erlang/Elixir code.

How BEAM complements ERTS

Think of ERTS as the operating system for Erlang processes, providing the infrastructure (process creation, memory management, inter-process communication) and fault-tolerance mechanisms. BEAM, on the other hand, acts as the CPU within this operating system, executing the code within each process and managing their concurrent execution.

Feature	ERTS	BEAM
Scope	Broader runtime system	Execution engine within ERTS
Responsibilities	Process management, garbage collection, distribution, fault tolerance	Bytecode interpretation, JIT compilation, process scheduling
Abstraction Level	Higher-level, providing system services	Lower-level, focusing on code execution

In essence, ERTS sets the stage, and BEAM runs the show. ERTS provides the environment and resources, while BEAM executes the code within that environment, ensuring efficient and concurrent performance.

Portability

ERTS, along with the BEAM virtual machine, is designed with portability in mind. This means that Erlang/Elixir applications can run on a variety of platforms without requiring significant code modifications.

Platform Independence

ERTS achieves portability by abstracting platform-specific details. It is compatible with major operating systems, including:

Linux (most common for deployment)
macOS
Windows (limited in performance optimizations)
BSD Variants (e.g., FreeBSD, OpenBSD)
Embedded OS (e.g., Raspbian, VxWorks for IoT use cases)

Key Mechanisms for Portability

BEAM Bytecode: Platform-independent .beam files can run on any system with ERTS.
Hardware Abstraction: ERTS abstracts CPU architecture differences, leveraging JIT compilation (BeamAsm) for runtime-specific optimizations.
Multi-core Support: ERTS schedulers adapt dynamically to the number of available cores, enabling scalability on modern hardware.

Interoperability

ERTS facilitates interoperability through its support for various communication protocols and mechanisms. This allows Erlang/Elixir applications to interact with systems written in other languages and technologies:

Ports: External programs communicate with ERTS as lightweight processes via standard I/O.
- Use Cases: Running shell commands, interfacing with non-Erlang programs.
NIFs (Native Implemented Functions): High-performance, native functions written in C or Rust, running within ERTS.
- Use Cases: Compute-heavy tasks like cryptography or image processing.
- Caution: Improper NIFs can crash the system.
C Nodes: Standalone programs in C act as full-fledged Erlang nodes, communicating via distributed protocols.
- Use Cases: Legacy system integrations, custom protocols.
Java and Python Integration: Tools like JInterface and Pyrlang enable connectivity with Java and Python ecosystems.
- Use Cases: Polyglot programming, leveraging Java or Python libraries.
Foreign Function Interfaces (FFI): Libraries like erl_interface bridge Erlang with external APIs or systems.
- Use Cases: Extending system functionality with external libraries.

Ecosystem and Tooling

The Erlang Runtime System (ERTS) and its BEAM virtual machine are supported by a suite of tools designed to monitor, diagnose, and optimize applications during runtime:

eprof: Profiles process execution and resource usage within the ERTS. Useful for identifying performance bottlenecks.
fprof: Profiles function calls to pinpoint areas of code that consume significant execution time.
lcnt: Counts and tracks locks within the ERTS, helping to diagnose concurrency issues.
tprof: Ideal for analyzing the time distribution of function calls in processes to identify performance issues.

The complete picture

The synergy between the Erlang Runtime System (ERTS) and the BEAM virtual machine creates a robust, scalable, and fault-tolerant platform for running concurrent applications. This section summarizes their distinct roles, features, and how they complement each other to deliver unparalleled performance for Erlang and Elixir systems:

Aspect	ERTS (Erlang Runtime System)	BEAM (Bytecode Execution Engine)
Role	Provides the runtime environment for Erlang/Elixir applications.	Executes platform-independent BEAM bytecode and manages concurrency.
Key Features	– Lightweight process management – Per-process garbage collection – Fault tolerance with supervision trees – Distributed system support	– Bytecode interpretation and JIT compilation (BeamAsm) – Preemptive process scheduling – Optimized functional instructions
Scope	High-level runtime system managing processes, memory, and fault tolerance.	Execution engine within ERTS focusing on code execution.
Process Management	Creates and isolates processes with minimal overhead.	Schedules process execution efficiently with fairness.
Garbage Collection	Per-process generational garbage collection ensures minimal global pauses.	Delegates memory management to ERTS; no shared memory or global collection.
Fault Tolerance	Enables error isolation, recovery through supervision trees, and distributed resilience.	Ensures stable execution by isolating processes and managing lightweight concurrency.
Concurrency Model	Asynchronous message passing, process isolation, and lightweight threads enable high scalability.	Preemptive multitasking ensures fair resource allocation among processes.
Portability	Abstracts OS and hardware differences; supports Linux, macOS, Windows, and embedded platforms.	Runs platform-independent BEAM bytecode with hardware-specific JIT optimizations.
Interoperability	Supports Ports, NIFs, and external C Nodes for integration with other technologies.	Works with ERTS to interact seamlessly with external applications and distributed systems.
Tooling and Diagnostics	– Tools like `eprof`, `fprof`, `tprof` for profiling – `lcnt` for lock tracking – `etop` for runtime monitoring	Provides runtime execution metrics and performance insights through diagnostics in conjunction with ERTS.
Development Support	Includes OTP libraries for building robust applications with concurrency, fault tolerance, and networking.	Executes OTP-based abstractions, leveraging BEAM’s optimized execution strategies.

While ERTS acts as the broader runtime framework, BEAM focuses on efficient bytecode execution and concurrency management within the ERTS ecosystem.

Suitability for Our Programming Language

The Erlang Runtime System (ERTS) and the BEAM virtual machine offer a unique foundation for building highly concurrent, fault-tolerant, and distributed applications. Evaluating their suitability for our programming language involves considering their strengths and trade-offs in alignment with specific language goals and requirements.

Aspect	Strength	Trade-Off
Concurrency Model	Lightweight processes and preemptive scheduling enable massive concurrency with minimal overhead.	Not optimized for computationally intensive tasks requiring multithreaded computation.
Fault Tolerance	Built-in process isolation, supervision trees, and “let it crash” philosophy ensure robust error recovery.	The supervision model requires careful design to avoid over-reliance on restarts for error handling.
Scalability	Supports millions of lightweight processes and dynamic workload distribution across multi-core CPUs.	Performance for single-threaded tasks may lag behind specialized single-threaded execution engines.
Garbage Collection	Per-process, generational garbage collection minimizes global pauses and ensures smooth execution.	Higher memory fragmentation due to isolated heaps.
Interoperability	Ports, NIFs, and C Nodes enable interaction with external programs and languages like C, Rust, Java, and Python.	NIFs require careful management to avoid crashes; interoperability layers may add complexity.
Hot Code Swapping	Dynamic code replacement allows updates without downtime, critical for long-running systems.	May introduce challenges in maintaining consistent state during updates.
Portability	Platform-independent BEAM bytecode and hardware abstraction enable deployment across a wide range of operating systems and architectures.	Optimizations and performance may vary across platforms; less support for embedded systems compared to V8.
Tooling and Ecosystem	Advanced runtime tools like `eprof`, `fprof`, and `recon` offer deep insights into system performance and behavior.	Tooling is heavily specialized for Erlang/Elixir and may require adaptation for non-Erlang-based languages.
Distributed Systems	Native support for distributed computing with seamless inter-node communication and clustering.	Distributed systems require careful design to handle edge cases like network partitions effectively.
Memory Management	Small per-process memory footprint with dynamic growth, suitable for high-concurrency scenarios.	May not handle extremely large heaps or long-lived data structures as efficiently as JVM.

ERTS and BEAM are exceptional choices if the language prioritizes concurrency, fault tolerance, and scalability, particularly in distributed systems. However, they may not be ideal for use cases requiring tight integration with computationally intensive tasks or very lightweight embedded systems.

To make an informed final decision on the ideal virtual machine for our programming language, let’s explore some real-life applications and use cases of each candidate.

Real-World Applications

V8

The V8 engine powers various applications across desktop, embedded systems, and distributed environments:

Environment	Applications
Desktop	– Electron Framework: Powers apps like VS Code, Slack, and Spotify.
	– Web Browsers: Core of Google Chrome and Chromium-based browsers.
Embedded Systems	– Node.js: Enables lightweight server-side applications.
	– Smart Devices: Used in IoT controllers and smart TVs with engines like JerryScript.
Distributed Systems	– Serverless Platforms: Backend for AWS Lambda and Google Cloud Functions.
	– Content Delivery: Utilized in tools like Next.js for global content rendering.

JVM

The Java Virtual Machine is a foundation for applications across diverse environments:

Environment	Applications
Desktop	– Enterprise Software: IDEs like IntelliJ IDEA and Eclipse.
	– Business Tools: ERPs and accounting software.
Embedded Systems	– Android Runtime: Powers billions of smartphones and IoT devices.
	– Java Card: Secure execution for SIM cards and smartcards.
Distributed Systems	– Big Data: Core of Apache Hadoop and Apache Spark for distributed processing.
	– Enterprise Applications: Frameworks like Spring Boot for microservice architectures.

ERTS

The Erlang Runtime System demonstrates unparalleled fault tolerance and scalability in a variety of domains:

Environment	Applications
Desktop	– Instant Messaging Clients: Backend for WhatsApp, ensuring reliability.
	– Interactive Tools: Livebook for real-time Elixir-based notebooks.
Embedded Systems	– Telecommunication Systems: Embedded in routers and network switches.
	– IoT Devices: Fault-tolerant runtime for real-time monitoring and control.
Distributed Systems	– Messaging Platforms: Backend for distributed services like WhatsApp.
	– Databases: Used in distributed solutions like CouchDB and Riak.
	– Telecom Networks: Handles real-time call switching and messaging.

Selecting the Best-Fit virtual machine from V8, JVM, and ERTS

Final comparison

To make an informed decision about the most suitable virtual machine (VM) for our programming language, a side-by-side comparison of the three contenders—V8, JVM, and ERTS—is essential. Each VM excels in specific domains, with distinct trade-offs. Here’s a comprehensive comparison:

Aspect	V8	JVM	ERTS
Performance	High performance with cutting-edge JIT compilation (Ignition, TurboFan, Maglev).	Adaptive optimization with tiered JIT compilation (C1, C2, Graal).	Optimized for concurrent execution and message-passing, not single-threaded computation.
Concurrency Model	Event-driven, non-blocking architecture; lacks native multithreading (relies on worker threads).	Supports multithreading with advanced concurrency models, including virtual threads (Java 21+).	Lightweight, isolated processes with preemptive scheduling, ideal for massive concurrency.
Fault Tolerance	Limited fault tolerance; depends on application-level design.	Robust error handling but requires explicit design for fault-tolerant systems.	Native fault tolerance with supervision trees and the “let it crash” philosophy.
Garbage Collection	Generational GC (Orinoco); optimized for low-latency applications like browsers.	Sophisticated GC algorithms (G1, ZGC, Shenandoah); optimized for large heaps and enterprise applications.	Per-process GC ensures minimal pauses and localized collection for lightweight processes.
Portability	Platform-independent but optimized for web and JavaScript-centric applications.	“Write once, run anywhere” philosophy with wide platform support.	Platform-independent BEAM bytecode; supports diverse OS and embedded platforms.
Interoperability	Seamless JavaScript integration; WebAssembly extends compatibility with other languages.	Extensive integration with Java and polyglot programming via GraalVM.	Supports Ports, NIFs, and C Nodes for interaction with C, Rust, Python, and Java.
Tooling and Ecosystem	Rich debugging and profiling tools (e.g., Chrome DevTools, Node.js Inspector).	Comprehensive tooling for development, debugging, and performance analysis (JFR, VisualVM, JMC).	Advanced runtime tools (eprof, fprof, recon) specialized for distributed and concurrent systems.
Scalability	Scales well for web and serverless applications; lacks support for extreme concurrency demands.	Scales well for enterprise and big data applications; may face overhead in extremely lightweight systems.	Scales to millions of concurrent processes with low memory overhead, ideal for distributed systems.
Hot Code Swapping	Not natively supported; requires application-level mechanisms.	Limited to JVM tools like JRebel; typically requires redeployment.	Fully supported; dynamic code replacement without downtime is a core feature.
Ease of Development	Web-centric ecosystem simplifies development for web, serverless, and hybrid applications.	Mature ecosystem with extensive libraries and frameworks; steeper learning curve for new language runtimes.	Simplifies distributed and fault-tolerant application design but requires familiarity with OTP principles.
Embedded Systems	Not directly optimized but lightweight derivatives (e.g., JerryScript) can be adapted.	Adaptable for embedded systems; powers Android and JavaCard.	Adaptable for IoT and telecom systems; small memory footprint enables embedded applications.
Distributed Systems	Widely used in serverless platforms (AWS Lambda, Google Cloud Functions).	Powers big data frameworks (Hadoop, Spark); supports microservices with Spring Boot.	Native support for clustering, messaging, and distributed databases; excels in real-time systems.

In essence:

V8 is ideal for web and serverless applications, with its strong JavaScript and WebAssembly ecosystem. However, its lack of native multithreading and limited fault tolerance may restrict its suitability for computationally intensive or fault-tolerant systems.
JVM excels in enterprise, big data, and general-purpose applications, offering a rich ecosystem and robust performance tuning options. However, its relatively high resource usage and complex tooling may not be ideal for lightweight systems.
ERTS is unmatched for high-concurrency, fault-tolerant, and distributed systems, making it the best choice for scenarios requiring scalability, resilience, and real-time processing. Its specialized nature may require significant adaptation for general-purpose programming.

Our Decision: ERTS as the Foundation

After thoroughly evaluating V8, JVM, and ERTS, we have chosen to base our programming language on ERTS. Its versatility across multiple domains—desktop, embedded systems, web, and distributed applications—makes it the ideal platform to meet our goals.

Key Reasons for Choosing ERTS:

Scalability and Concurrency: With lightweight processes and preemptive scheduling, ERTS can handle millions of concurrent tasks efficiently, ideal for high-demand and real-time applications.
Fault Tolerance: ERTS’s “let it crash” philosophy, combined with supervision trees, provides a robust framework for building resilient and self-healing systems.
Portability: ERTS supports platform-independent BEAM bytecode and works seamlessly on diverse platforms, including Linux, macOS, Windows, and embedded systems.
Distributed Systems Support: Native capabilities for clustering, inter-node communication, and fault isolation make ERTS perfect for building modern distributed architectures.
Proven Flexibility Across Domains:
- Web Development: The Elixir language and its Phoenix framework, built on ERTS, demonstrate exceptional performance and developer productivity for web applications.
- Desktop Applications: Livebook, an interactive notebook for Elixir, showcases the potential for desktop tools built on ERTS.
- Embedded Systems: ERTS powers real-time IoT applications, providing fault tolerance and scalability for constrained devices.
- Distributed Systems: WhatsApp and telecom networks rely on ERTS for their backend scalability and fault tolerance.
Inspirations from Other Languages: ERTS has proven to be an excellent foundation for innovative languages like Elixir, which combine the functional programming ethos of Lisp with the modern concurrency features of BEAM.

By leveraging the strengths of ERTS, we aim to create a programming language that excels in scalability, resilience, and flexibility, empowering developers across diverse domains to tackle complex, distributed, and high-performance systems.

Conclusion

In the ever-evolving landscape of software development, selecting the right virtual machine is a foundational decision that profoundly impacts a programming language’s capabilities, performance, and adaptability. After a thorough exploration of the leading contenders—V8, JVM, and ERTS—we identified that each VM offers unique strengths tailored to specific domains:

V8 excels in web-centric applications and lightweight server-side use cases, with a strong emphasis on JavaScript and WebAssembly performance.
JVM is a powerhouse for enterprise-grade applications, big data processing, and cross-platform compatibility, thanks to its mature ecosystem and sophisticated tooling.
ERTS shines in distributed systems, real-time applications, and highly concurrent architectures, with fault tolerance and resilience at its core.

Ultimately, ERTS emerged as the ideal choice for our programming language. Its lightweight processes, fault-tolerant design, and scalability make it a robust foundation for modern software, whether in desktop tools, embedded systems, web services, or distributed applications. Inspired by successful ecosystems like Elixir, Phoenix, and even the functional roots of Lisp, we are confident in ERTS’s ability to support our vision.

A promising future lies in the potential integration of WebAssembly (WASM) with Elixir or Erlang, enabling BEAM processes to run directly in the browser. This development could eliminate reliance on JavaScript entirely, paving the way for a unified ecosystem that seamlessly spans backend to frontend.

By harnessing the power of ERTS, we aim to deliver a language that not only addresses current challenges in software development but also provides a solid platform for innovation and growth in the years to come. This journey reflects our commitment to building reliable, scalable, and versatile tools for developers worldwide.

2 responses to “Building a Modern Language: Selecting the Best-Fit VM Among V8, JVM, and ERTS”

Hela Ben Khalfallah

December 4, 2024

Presentation:
https://docs.google.com/presentation/d/1mWwq_5BNZgNf2vZoMeRtqepduMbdTcmUo6jVoXNvgp8/edit?usp=sharing

Loading…

Performance Insights: Beyond Hardware Solutions – Code, Craft & Community

February 23, 2025

[…] Java (JVM), C# (.NET CLR), Python (partially), […]

Loading…

Building a Modern Language: Selecting the Best-Fit VM Among V8, JVM, and ERTS

Abstract

Introduction

The Case: Building a General-Purpose Programming Language with Virtual Machines

Three Paths for Our Programming Language

Defining Our Language Vision

Key Factors to Analyze

V8: Just-In-Time Compilation with Optimized Garbage Collection

JIT: Machine Code Generation Pipeline

Multi-tiered architecture

Compilation and Execution Steps

Step 1: Tokenization

Step 2: AST Generation

Step 3: Bytecode Generation

Step 4: Bytecode Execution (Ignition)

Step 5: Hot Code Detection

Step 6: Sea-of-Nodes Representation

Step 7: Machine Code Generation

Summary of Steps

Memory Management

The Stack

The Heap

Orinoco: V8’s Heap Manager

Optimizations in V8: Hidden Classes and Inline Caches

Hidden Classes: Dynamic Structure for Static-Like Optimization

How Hidden Classes Work

Optimization Benefits

Potential Pitfalls

Inline Caches: Accelerating Property Access

How Inline Caches Work

Optimization Benefits

Potential Pitfalls

Key Takeaways

Speculative optimization

Profiling with Ignition Interpreter

Optimization with TurboFan

Fast Execution

Deoptimization: Ensuring Correctness

Mitigating Deoptimization Costs

Concurrency Model

JavaScript Runtime Environment Overview

V8’s Execution Environment

Event Loop Flow Chart

Browser vs. Node.js

Key Takeaways

Web Workers and Node.js Workers: Extending JavaScript’s Concurrency

Portability

Cross-Platform Compatibility

Integration Challenges

Version-Specific Features

Interoperability

Embedding APIs

Data Exchange

Multi-Language Interoperability

Tooling

The complete picture

Suitability for Our Programming Language

JVM: A Hybrid Approach—Just-In-Time and Ahead-of-Time Compilation with Garbage Collection

JVM Code Generation: A Dynamic and Adaptive Approach

Stage 1: AOT Compilation (Before Runtime)

Stage 2: JIT Compilation (During Runtime)

Tiered Compilation

Deoptimization

Key Benefits

JVM Memory Management: A Layered and Efficient Approach

Heap Memory

Non-Heap Memory

GC Algorithms

Concurrency Model

Threads and Threading Models

Platform Threads (Traditional)

Virtual Threads (Java 21 and later)

Virtual Threads vs Platform Threads

Synchronization: Ensuring Thread Safety and Data Integrity

Core Principles of Synchronization

Synchronization Mechanisms in Java

Memory Visibility: Sharing the Latest Updates

Best Practices for Effective Synchronization

Portability

Interoperability