Understanding JVM architecture and how Java really works under the hood is important learning for every Java developer in order to effectively make use of the Java ecosystem. In this blog post series, we will cover the foundation of JVM internals and will dive into its architecture.
Table of Contents
Introduction
When Java was created in 1995, it was modeled after C++. There are some similarities in these languages and some differences. Of course, the syntax is different. However, the main difference lies in how the code is executed.C++ is compiled directly into machine code.
So when you compile your source c++ file using compiler, it’s directly converted to machine code for that particular hardware. After the compiler converts the C++ code to machine code, the computer is ready to execute actions.
In Java, programs are not compiled into executable files like c/c++. So when you use Java compiler(Javac), your programs are compiled into bytecode, which the JVM (Java Virtual Machine) then executes at runtime.
What is JVM?
The Java Virtual Machine is an abstract computing machine, like a real computing machine, it has an instruction set and manipulates various memory areas at run time. JVM is the cornerstone component responsible for Java’s platform independence.
Java Virtual Machine provides the runtime environment needed for Java to work on virtually any computer. Once the Java program is compiled into bytecode, it can then be interpreted into machine code through the JVM.
JVM functions as an operating system to the Java programs written to execute in it. It translates the instructions from its running programs into instructions and commands that run on the local operating system.
JVM knows nothing about the Java programming language and it only understands a particular binary format of class file format(The class file contains Java Virtual Machine instructions (or bytecodes) and a symbol table, as well as other ancillary information).
Java Virtual Machine imposes strong syntactic and structural constraints on the code in a class file.
Java source code is compiled into bytecode when we use the javac compiler. Compiled code to be executed by the Java Virtual Machine is represented using a hardware- and operating system-independent binary format as the class file format. The class file format precisely defines the representation of a class or interface, including details such as byte ordering that might be taken for granted in a platform-specific object file format.
When the program is to be run, the bytecode is converted, using the just-in-time(JIT) compiler. The result is machine code which is then fed to the memory and is executed.
How Does the JVM Work?
JVM is only a specification, and its implementation is different from vendor to vendor. For now, let’s understand the commonly-accepted architecture of JVM as defined in the specification.
Classloader
Classloader is a subsystem of JVM which is used to load class files. Whenever we run the java program, it is loaded first by the classloader. There are three built-in classloaders in Java.
1. Bootstrap ClassLoader
This is the first classloader which is the superclass of Extension classloader. It loads the rt.jar file which contains all class files of Java Standard Edition like java.lang package classes, java.net package classes, java. util package classes, java.io package classes, java.sql package classes, etc.
2. Extension ClassLoader
This is the child classloader of Bootstrap and parent classloader of System classloader. It loads the jar files located inside $JAVA_HOME/JRE/lib/ext directory.
3. System/Application ClassLoader
This is the child classloader of the Extension classloader. It loads the class files from classpath. By default, classpath is set to the current directory. You can change the classpath using “-cp” or “-classpath” switch. It is also known as the Application classloader.
Run-Time Data Areas
The Java Virtual Machine defines various run-time data areas that are used during the execution of a program. Some of these data areas are created on Java Virtual Machine start-up and are destroyed only when the Java Virtual Machine exits. Other data areas are per thread. Per-thread data areas are created when a thread is created and destroyed when the thread exits.
1. The pc Register
The Java Virtual Machine can support many threads of execution at once and Each Java Virtual Machine thread has its own program counter register.
If the current method is not native, The PC register area contains the address of the Java virtual machine instruction currently being executed. If the method currently being executed by the thread is native(A native method is the one whose method implementation is done in other languages like c++ and Java. These programs are linked to Java using JNI or JNA interfaces.), the value of the Java Virtual Machine’s pc register is undefined. contains all the native methods used in the application.
2. Java Virtual Machine Stacks
Each Java Virtual Machine thread has a private Java Virtual Machine stack, created at the same time as the thread. A Java Virtual Machine stack store frames (A-frame is used to store data and partial results, as well as to perform dynamic linking, return values for methods, and dispatch exceptions. A new frame is created each time a method is invoked. A frame is destroyed when its method invocation completes, whether that completion is normal or abrupt.)
If the computation in a thread requires a larger Java Virtual Machine stack than is permitted, the Java Virtual Machine throws a StackOverflowError.
3. Heap
All the Objects, their related instance variables, and arrays are stored in the heap.
The Java Virtual Machine has a heap that is shared among all Java Virtual Machine threads. The heap is the run-time data area from which memory for all class instances and arrays is allocated.
The heap is created on a virtual machine start-up. Heap storage for objects is reclaimed by an automatic storage management system (known as a garbage collector); objects are never explicitly deallocated.
The heap may be of a fixed size or may be expanded as required by the computation and may be contracted if a larger heap becomes unnecessary. The memory for the heap does not need to be contiguous.
4. Method Area
JVM Method Area stores class structures like metadata, the constant runtime pool, and the code for methods. It stores per-class structures such as the run-time constant pool, field and method data, and the code for methods and constructors, including the special methods used in class and instance initialization and interface initialization.
The method area is created on a virtual machine start-up. Although the method area is logically part of the heap, simple implementations may choose not to either garbage collect or compact it.
5. Run-Time Constant Pool
The run-time constant pool for a class or interface is constructed when the class or interface is created.
It is a per-class or per-interface run-time representation of the constant_pool table in a class file. Each run-time constant pool is allocated from the Java Virtual Machine’s method area.
6. Native Method Stacks
Java Virtual Machine may use conventional stacks, colloquially called “C stacks,” to support native methods
Execution Engine
Execution engine execute the .class (bytecode). It reads the byte-code line by line, uses data and information present in various memory areas, and executes instructions.
1. Interpreter
It interprets the bytecode line by line and then executes it. The disadvantage here is that when one method is called multiple times, every time interpretation is required.
2. Just-In-Time Compiler(JIT)
A just-in-time (JIT) code generator, which generates platform-specific instructions only after Java Virtual Machine code has been loaded
JIT runs after the program have started and compiles the code (usually bytecode or some kind of VM instructions) on the fly (or just-in-time, as it’s called) into a form that’s usually faster, typically the host CPU’s native instruction set. A JIT has access to dynamic runtime information whereas a standard compiler doesn’t and can make better optimizations like inlining functions that are used frequently.
conventional compilers build the whole program as an EXE file BEFORE the first time you run it.
JIT is often used with interpreted code to convert it to machine language but yes, purely interpreted code (without any JITting) is slow. Even Java bytecode without a JITter is really slow. A Just-In-Time (JIT) compiler is a feature of the run-time interpreter, that instead of interpreting bytecode every time a method is invoked, will compile the bytecode into the machine code instructions of the running machine, and then invoke this object code instead. Ideally, the efficiency of running object code will overcome the inefficiency of recompiling the program every time it runs
3. Garbage Collector
It destroys un-referenced objects.For more on Garbage Collector
Java Virtual Machine Startup
The Java Virtual Machine starts execution by invoking the method main of some specified class or interface, passing it a single argument which is an array of string.
Java Virtual Machine dynamically loads, links, and initializes classes and interfaces.
Loading is the process of finding the binary representation of a class or interface type with a particular name and creating a class or interface from that binary representation.
Linking is the process of taking a class or interface and combining it into the run-time state of the Java Virtual Machine so that it can be executed.
Initialization of a class or interface consists of executing the class or interface initialization method
There are two kinds of class loaders: the bootstrap class loader supplied by the Java Virtual Machine, and user-defined class loaders.
Load the Class
The initial attempt to execute the method main of class Test discovers that the class Test is not loaded – that is, that the Java Virtual Machine does not currently contain a binary representation for this class.
The Java Virtual Machine then uses a class loader to attempt to find such a binary representation. If this process fails, then an error is thrown.
Link Test: Verify, Prepare, (Optionally) Resolve
After the class file is loaded, it must be initialized before the main can be invoked. And it must be linked before it is initialized. Linking involves verification, preparation, and (optionally) resolution.
Verification checks that the loaded representation of a class is well-formed, with a proper symbol table. Verification also checks that the code that implements Test obeys the semantic requirements of the Java programming language and the Java Virtual Machine. If a problem is detected during verification, then an error is thrown.
Verification ensures that the binary representation of a class or interface is structurally correct. For example, it checks that every instruction has a valid operation code; that every branch instruction branches to the start of some other instruction, rather than into the middle of instruction; that every method is provided with a structurally correct signature; and that every instruction obeys the type discipline of the Java Virtual Machine language. If an error occurs during verification, then an instance of the following subclass of class LinkageError will be thrown at the point in the program that caused the class to be verified
Preparation involves the allocation of static storage and any data structures that are used internally by the implementation of the Java Virtual Machine, such as method tables.
Resolution is the process of checking symbolic references from Test to other classes and interfaces, by loading the other classes and interfaces that are mentioned and checking that the references are correct.
Initialize: Execute Initializers
In our continuing example, the Java Virtual Machine is still trying to execute the method main of class. This is permitted only if the class has been initialized. Initialization consists of the execution of any class variable initializers and static initializers of the class.
Before initializing a class, its direct superclass must be initialized, as well as the direct superclass of its direct superclass, and so on, recursively.
Invoke
Finally, after completion of the initialization for class main method is called.