Merikanto

一簫一劍平生意,負盡狂名十五年

Java - The Language Design

Today let us briefly talk about the language design and unique features in Java. We will compare Java with other languages, and we will also go through JVM and the security design & implementation. We will see that Java is a modern language designed to address all the three pillars:

Portability, Speed, and Security.

Java is a statically typed, late-binding language.


Historically, the speed and portability requirement of a language has been mutually exclusive (for the most part). Fast languages usually provide speed by binding themselves to particular platforms, so portability becomes a problem. Also, safe languages were generally not portable either.



A Virtual Machine

Java is both a compiled & interpreted language. While C / C++ source code is reduced to native instructions for a particular processor model, Java source is compiled to a universal format — bytecode (instructions for a virtual machine).

Compiled bytecode is then executed by a Java runtime interpreter. The runtime system does the hardware processor’s job, but in a safe, virtual environment:

  • Executes a stack-based instruction set, and manages memory like the OS
  • Creates & manipulates primitive data types, and loads & invokes newly referenced code block

The runtime system abides by a strictly defined open specification, that can be implemented by anyone who wants to produce a Java-compliant virtual machine.

Together, the virtual machine and language definition provide a complete specification.

Interpretation:

There are no features of the base Java language left undefined, or implementation-dependent.

E.g. Java specifies the size & properties of all primitive data types, rather than leaving it up to the platform implementation.


The graph below shows the Java runtime environment.


The fundamental unit of Java code is the class, which are application components that hold executable code & data. Compiled Java classes are in a universal binary format that contains Java bytecode & other class information.

Apart from platform-specific runtime system, Java also has fundamental classes (i.e. native methods) that contain architecture-dependent methods. These native methods serve as gateway between JVM & the real world. They are implemented based on different OS, and provide low-level access to resources such as network & host filesystem. But still, the vast majority of Java is written in Java itself. This includes Java compiler, GUI libraries.

The JVM compiles source code to portable bytecode. It improves the performance via JIT (Just-In-Time), or dynamic compilation. With JIT, Java can execute almost as fast as native code, while maintain portability and security.

For the sake of security, there is only one intrinsic performance penalty that compiled Java code suffers at runtime, which is array bounds checking (prevent overflow). Everything else can be optimized to native code, just as it can with a statically compiled language.

The problem with traditional JIT is that, optimizing code takes time. A JIT compiler can produce decent results, but may suffer a significant latency when the application starts up. This isn’t a problem for long-running server-side applications, but problematic for client-side apps with limited capabilities. Hence a Java compiler technology called HotSpot uses adaptive compilation to address this issue.

If we look at what programs spend their time doing, they actually spend almost all their time executing a relatively small part of the code again and again. The chunk of code that is executed repeatedly may be only a small fraction of the total program, but its behavior determines the program’s overall performance.

Also, adaptive compilation allows the Java runtime to take advantage of new kinds of optimizations that simply can’t be done in a statically compiled language, hence Java can run faster than C / C++ in some cases.

HotSpot profiles the code to see which parts are being executed repeatedly. Once it knows, it compiles those sections into optimal native machine code. The rest of the program will just get interpreted to save memory & time. In fact, the JVM runs in one of the two modes:

  • Client: Focus on quick startup time & memory conservation
  • Server: Focus on speeding up performance (but slow startup)

A trade-off:  Compile time & Runtime (start up time)

With the release of Java 5, the profiling information is stored persistently in an optimized form, via shared & read-only classes.



Comparing with Other Languages

Comparison based on portability, speed, security:

  • Java’s basic syntax looks like C / C++, but that’s where the similarities end
  • C# is Microsoft’s answer to Java (C# uses virtual machine, bytecode, sandbox)
  • C trades functionality for portability; Java initially traded speed for portability
  • Smalltalk is compiled to an interpreted bytecode format, and can be dynamically compiled to native code. However, Java improves the design by using a bytecode verifier to ensure the correctness of compiled Java code, which requires fewer runtime checks.

Most scripting languages are not well suited for serious, large-scale programming. Apart from speed, another problem is that:

They are rather casual about program structure and data typing. Also, they have simplified type systems, and generally don’t provide sophisticated scoping of variables & functions.

Fundamental tradeoff:

Scripting languages were born as loose, less structured alternatives to systems programming languages, and are generally not ideal for large / complex projects.



Design Security

Java class loader: The bytecode loading mechanism of the Java interpreter


Simplicity

Java doesn’t allow programmer-defined operator overloading (e.g. redefine meanings of basic symbols like +, - ). Java doesn’t have a source code processor, so it doesn’t have macros, #define statements, or conditional source compilation. Since these constructs exist primarily to support platform dependencies.

Java supports only single inheritance class hierarchy, but allows multiple inheritance of interfaces (implements). An interface (like abstract class in C++) specifies the behavior of an object, without defining its implementation. Interfaces in Java eliminates the need for multiple inheritance of classes.


Type Safety & Method Binding

Type Checking

Generally, languages are classified as static & dynamic. For static languages, information about variables are known at compile time, while for dynamic languages, information about variables are known at runtime.

In a strictly statically typed language like C / C++, data types are firmly established when the source code is compiled. Hence the compiler will have enough information to catch errors before the code is executed. In contrast, dynamic languages only performs type checking at runtime. This allows more complex & powerful behaviors, but generally slower, less safe and harder to debug.


Method Binding

Static dynamic languages also differ in the way they bind method calls to their definitions. In C / C++, they’re binded at compile time (early binding), while in dynamic languages, method definitions are located dynamically at runtime (late binding).

Early binding can speed up performance; An application can run without the overhead incurred by searching for methods at runtime, but late binding is more flexible. It’s also necessary in an object-oriented language (C is not OOP), where new types can be loaded dynamically and only runtime system can determine which method to run.

Java is a statically typed, late-binding language. Each object in Java has a well-defined type that is known at compile time. This means Java compiler can do the same static type checking like in C++.

However, Java is also fully runtime-typed:

  • We can inspect an object at runtime to determine what it is
  • Casts from one type of object to another are checked by the runtime system
  • Possible to use new kinds of dynamically loaded objects with type safety

Also, since Java is late-binding, it’s always possible for a subclass to override methods in the superclass.


Dynamic Memory Management

One of the most important differences between Java & other low-level languages like C / C++ is how Java manages memory. Java does not have pointers. Instead, Java adds object garbage collection & high-level arrays.

Garbage collection frees programmers from manually allocate & deallocate memory. When an object is no longer in use, Java automatically removes it from memory. Java has a sophisticated garbage collector running in the background, and most garbage collecting happens during idle times. For instance, between I/O pauses, mouse clicks, keyboard hits. Advanced runtime systems like HotSpot use improved garbage collectors that can differentiate object usage patterns (short-lived & long-lived) and optimize their collection.

Java does not have pointers, but it provides references, which are safe kinds of pointers. A reference is a strongly typed handle for an object. All objects (except primitive numeric types) are accessed via references. We cannot perform pointer arithmetic with references, because a reference is atomic: we cannot manipulate the reference’s value except by assigning it to an object.

Fundamental aspect of Java security: Reference Protection

References are passed by value, and we cannot reference an object via more than a single level of indirection.

Java references can only point to class types, there are no pointers to methods. Most tasks that call for pointers can be more cleanly done using interfaces & adapter classes.

Arrays in Java are truly first-class objects. They can be dynamically allocated & assigned like other objects. Having true arrays alleviates much need for pointer arithmetic.


Other

In most cases, Java threads need to be synchronized. Java supports synchronization based on the monitor and condition model: a lock & key system for accessing resources.



Implementation Security

Encapsulation hides data & behavior within a class, and it is an important part of object-oriented design (OOD). Arbitrary casting & pointer arithmetic in C / C++ makes it easy to violate access permission on classes without breaking languages rules:

1
2
3
4
5
6
7
8
9
10
11
12
13
class Finances {
private:
char creditCardNumber[16];
...
};

main() {
Finances finances;
}

// Forge a pointer to peek inside the class
char *cardno = (char *)&finances;
printf("Card Number = %.16s\n", cardno);

The code in line 12 & 13 violates the encapsulation of the Finance class, and pulls out secret information. However in Java, the security model wraps 3 layers of protection around imported classes:

  • Security Manager: Application-level security is managed by the security manager & a flexible security policy. A security manager controls access to system resources such as filesystem, network ports. Security manager relies on class loader to protect basic system classes.
  • Class Loader: Handles loading classes from local storage / the network.
  • Verifier: At the innermost level, all system security ultimately depends on the Java bytecode verifier.

The Verifier is a fixed part of the Java runtime system , while class loaders & security managers are components that may be implemented differently by different applications.


Verifier

The verifier is Java’s 1st line of defense. It reads bytecode before running it, and ensures the integrity and correctness. For instance, verified code cannot forge reference or perform illegal casts.

Fundamental innovations in Java: The Java bytecode is a relatively light & low-level instruction set. The ability to statically verify bytecode before execution lets Java perform with full speed at runtime with full safety (without expensive runtime checks).

Three rules the bytecode has to follow:

  • Most bytecode instructions operate only on individual data types
  • Object type resulting from any operation is always known in advance
  • All paths to the same point in the bytecode must arrive with exactly the same type state (Feasible to analyze type state of the stack)

Because an operation always produces a known type, it’s possible to determine the types of all items on the stack and in local variables at any point in the future by looking at the starting state. The collection of all this type information at any given time is called the type state of the stack, and this is what Java tries to analyze before it runs an application.

Java doesn’t know anything about the actual values of stack and variable items at this time. It only knows what kind of items they are. However, this is enough information to enforce the security rules and to ensure that objects are not manipulated illegally.


Class Loader

A class loader is responsible for bringing the Java class bytecode into the interpreter.

After a class has been loaded & passed through the verifier, it remains associated with its class loader. Then classes are partitioned into separate namespaces based on their origin. When a loaded class references another class name, the location of the new class is provided by the original class loader.

The search for classes always begins with Java system built-in classes, and they are loaded from the locations specified by the Java interpreter’s classpath.

Class loaders guarantee that an application is using the core Java system classes, and these classes are the only way to access basic system resources.


Security Manager

A security manager is responsible for application-level security. The security manager works with an access controller that lets us implement security policies at a high level by editing a declarative security policy file.

The integrity of a security manager is based on the protection afforded by the lower levels of the Java security model.


Road Map

Java 1.2 / Java 2:

  • Major release in Dec. 1998
  • Include Swing GUI packages

Java 1.5 / Java 5:

  • 2004
  • Generics, typesafe, enums
  • Concurrency API

Java 1.7 / Java 7:

  • JDBC
  • JNDI (Java Naming & Directory Interface)
    • General service for looking up resources. JNDI unifies access to directory services, such as LDAP, Novell’s NDS.
  • Java Cryptography & Java Security

Java 1.8 / Java 8:

  • 2014, major release
  • Lambda & Functional programming (Collections)
  • Stream API
  • Default methods (interface)
  • New Data & Time API
  • Java profiles (provide different versions of Java for headless / server deployment)