Java/C++ integration
Writing native Java methods in natural C++
PerBothner
Cygnus Solutions
bothner@cygnus.com
1325 Chesapeake Terrace
Sunnyvale, CA 94089,
USA
November, 1997
Background
Not all the code in a Java application can be written in Java. Some
must be written in a lower-level language, either for efficiency
reasons, or to access low-level facilities not accessible in Java.
For this reason, Java methods may be specified as native
.
This means that the method has no method body (implementation)
in the Java source code. Instead, it has a special flag which
tells the Java virtual machine to look for the method using
some unspecified lookup mechanism.
Sun's original Java Development Kit (JDK) version 1.0 defined a
programming interface for writing native methods in C.
This provided rather direct and efficient access to the underlying
VM, but was not officially documented, and was tied to specifics
of the VM implementation. (There was little attempt to make it an
abstract API that could work with any VM.)
This document is a proposal and a work-in-progress.
It is not a specification, and Cygnus makes no commitment to implement
any part of the proposal.
Note also that I use the word Java
(a trademark of
Sun Microsystems) rather casually. This needs to be cleaned up.
(Cygnus has not yet decided what we will call our implementation
of the Java language platform.)
Assymmetrix has a Supercede Java environment that boasts
seamless
C++/Java integration.
That needs to be investigated.
The Java Native Interface
In JDK 1.1, Sun defined a Java Native Interface
(JNI) that defines the
offical portable programming interface for writing such
native methods
in C or C++.
This is a binary interface (ABI), allowing someone
to ship a compiled library of JNI-compiled native code,
and have it work with any VM implementation
(for that platform). The downside is that it is a rather heavy-weight
interface, with substantial overheads. For example, for native code to
access a field in an object, it needs to make two function calls
(though the result of the first can be saved for future accesses).
This is cumbersome to write and slow at run-time.
Worse, for some applications, is that the field is specified by a
run-time string, and found by searching run-time reflective
data structures.
Thus the JNI requires the availability at run-time of complete
reflective data (names, types, and positions of all fields, methods,
and classes). The reflective data has other uses (there is a standard
set of Java classes for accessing the reflective data), but when memory
is tight, it is a luxury many applications do not need.
As an example, here is a small Java example of a class
intended for timing purposes. (This could be written in portable
Java, but let us assume for some reason we don't want to do that.)
package timing ;
class Timer {
private long last_time;
private String last_comment;
/** Return time in milliseconds since last call,
* and set last_comment. */
native long sinceLast(String comment);
}
This is how it could be programmed using the JNI:
extern "C" /* specify the C calling convention */
jdouble Java_Timer_sinceLast (
JNIEnv *env, /* interface pointer */
jobject obj, /* "this" pointer */
jstring comment) /* argument #1 */
{
// Note that the results of the first three statements
// could be saved for future use (though the results
// have to be made "global" first).
jclass cls = env->FindClass("timing.Timer");
jfieldId last_time_id = env->GetFieldID(cls, "last_time", "J");
jfieldId last_comment_id = env->GetFieldID(cls, "last_comment",
"Ljava_lang_String;");
jlong old_last_time = env->GetLongField(obj, last_time_id);
jlong new_last_time = calculate_new_time();
env->SetLongField(obj, last_time_id, new_last_time);
env->SetObjectField(obj, last_comment_id, comment);
return new_last_time - old_last_time;
}
Note the first env parameter, which is a pointer to
a thread-specific area, which also includes a pointer to a table of
functions. The entire JNI is defined in terms of these functions,
which cannot be inlined (since that would make JNI methods no
longer binary compatible across VMs).
The Cygnus Java product will support the JNI, but we will also offer
a more efficient, lower-level, and more natural native API.
The basic idea is to make GNU Java compatible with GNU C++ (G++), and provide
a few hooks in G++ so C++ code can access Java objects as naturally
as native C++ objects. The rest of this paper goes into details
about this integrated Java/C++ model.
We will go into more detail about this "Kaffe Native Interface"
(KNI) in this paper. However, the key is that the
calling conventions and data accesses for KNI are the same as for
normal nonnative Java methods. Thus there is no extra
JNIEnv parameter, and the C++ programmer gets
direct access to the VM representation. This does require co-ordination
between the C++ and Java implementations.
Here is the earlier example written using KNI:
#include "timing_Timer.h"
timing::Timer::sinceLast(jstring comment)
{
jlong old_last_time = this->last_time_id;
jlong new_last_time = calculate_new_time();
this->last_time_id = new_last_time;
this->last_comment_id = comment;
return new_last_time - old_last_time;
}
This uses the following automatically-generated
timing_Timer.h:
#include <kni.h> // "Kaffe Native Interface"
class timing {
class Timer : public java::lang::Object {
jlong last_time;
jstring last_comment;
public:
jlong virtual sinceLast(jstring comment);
};
};
Utility macros
Whether or not we are using the JNI, we still need a toolkit of utility
functions so C++ code code can request various services of the VM.
For operations that have a direct correspondence in C++ (such as accessing
an instance field or throwing an exception), we want to use the C++ facility.
For other features, such as creating a Java string from a nul-terminated
C string, we need utility functions.
In such cases we define a set of interfaces that have similar names
and functionality as the JNI functions, except that they do not
depend on a JNIEnv pointer.
For example, the JNI interface to get a Java string from a C string is
the following in C:
jstring str = (*env)->NewStringUTF(env, "Hello");
and the following in C++:
jstring str = env->NewStringUTF("Hello");
(The C++ interface is just a set of inline methods that warp the C interface.)
In the KNI, we do not use a JNIEnv pointer, so the
usage is:
jstring str = JvNewStringUTF("Hello");
We use the prefix Jv to indicate the KNI facilities.
It is useful to be able to conditionally compile the same source to
use either the fast KNI or the portable JNI.
That is possible, with some minor inconvenience,
because when USE_JNI is defined, the Jv
features are defined as macros that expand to JNI functions:
#if USE_JNI
#define JNIENV() JvEnv /* Must be available in scope. */
#define JvNewStringUTF(BYTES) \
((JNIENV())->NewStringUTF(BYTES))
#else /* ! USE_JNI */
extern "C" jstring JvNewStringUTF (const char*);
#endif /* ! USE_JNI */
Field access are more tricky. When using JNI, we have to use
a jfieldId, but when using KNI we can access the
field directly. We require that the programmer uses a convention where
the jfieldId used to access a field named
foo is foo_id.
#if USE_JNI
#define JvGetLongField(OBJ, FIELD) \
(JNIENV()->GetLongField(OBJ, FIELD##_id))
#else
#define JvGetLongField(OBJ, FIELD) ((OBJ)->FIELD)
#endif
Here is how we can write the earlier example to support either interface:
#if USE_JNI
extern "C" jdouble
Java_Timer_sinceLast (JNIEnv *JvEnv, jobject JvThis,
jstring comment)
#else
jdouble
timing::Timer::sinceLast(jstring comment)
#endif
{
#if USE_JNI
jclass cls = env->FindClass("timing.Timer");
jfieldId last_time_id = env->GetFieldID(cls, "last_time", "J");
jfieldId last_comment_id = env->GetFieldID(cls, "last_comment",
"Ljava_lang_String;");
#endif
jlong old_last_time = JvGetLongField(JvThis, last_time);
jlong new_last_time = calculate_new_time();
JvSetLongField(JvThis, last_time, new_last_time);
JvSetObjectField(JvThis, last_comment, comment);
return new_last_time - old_last_time;
}
Using the C language
Some programmers might prefer to write Java native methods using C.
The main advantages of that are that C is more universally available
and more portable. However, if portability to multiple Java implementations
is important, one should use the JNI. Still, it might be nice to have
Jv-style macros that would allow one to select between
portable JNI-based C, or Kaffe-optimize KNI. The problem is that an
efficient KNI-style interface is much more inconvenient in C than in C++.
In C++, we can have the compiler handle inheritance, exception handling,
name mangling of methods, and so on. In C the programmer would have to
do much more of this by hand. It should be possible to come up with a
set of macros for programmers willing to do that. I am not convinced
that this is a high priority, given that most environments that support
C and Java will also support C++. The main issue is whether it is OK
to require a C++ compiler to build the Kaffe native methods.
If using C++ makes it easier to write core Java libraries more efficiently,
I think the trade-off is worth it.
Packages
The only global names in Java are class names, and packages.
A package can contains zero or more classes, and
also zero or more sub-packages.
Every class belongs to either an unnamed package or a package that
has a hierarchical and globally unique name.
A Java package is mapped to a C++ namespace.
The Java class java.lang.String
is in the package java.lang, which is a sub-package
of java. The C++ equivalent is the
class java::lang::String,
which is in the namespace java::lang,
which is in the namespace java.
The suggested way to do that is:
// Declare the class(es), possibly in a header file:
namespace java {
namespace lang {
class Object;
class String;
}
}
class java::lang::String : public java::lang::Object
{
...
};
Leaving out package names
Having to always type the fully-qualified class name is verbose.
It also makes it more difficult to change the package containing a class.
The Java package declaration specifies that the
following class declarations are in the named package, without having
to explicitly name the full package qualifiers.
The package declaration can be followed by zero or
more import declarations, which allows either
a single class or all the classes in a package to be named by a simple
identifier. C++ provides something similar
with the using declaration and directive.
A Java simple-type-import declaration:
import PackageName.TypeName;
allows using TypeName as a shorthand for
PackageName.TypeName.
The C++ (more-or-less) equivalent is a using-declaration:
using PackageName::TypeName;
A Java import-on-demand declaration:
import PackageName.*;
allows using TypeName as a shorthand for
PackageName.TypeName
The C++ (more-or-less) equivalent is a using-directive:
using namespace PackageName;
Nested classes as a substitute for namespaces
G++ does not implement namespaces yet.
However, it does implement nested classes, which provide similar
(though less convenient) functionality.
This style seems to work:
class java {
class lang {
class Object { } ;
class String;
};
};
class java::lang::String : public java::lang::Object
{ ... }
Note that the generated code (including name mangling)
using nested classes is the same as that using namespaces.
Object model
From an implementation point of view we can consider Java to be a subset
of C++. Java has a few important extensions, plus a powerful standard
class library, but on the whole that does not change the basic similarity.
Java is a hybrid object-oriented language, with a few native types,
in addition to class types. It is class-based, where a class may have
static as well as per-object fields, and static as well as instance methods.
Non-static methods may be virtual, and may be overloaded. Overloading in
resolved at compile time by matching the actual argument types against
the parameter types. Virtual methods are implemented using indirect calls
through a dispatch table (virtual function table). Objects are
allocated on the heap, and initialized using a constructor method.
Classes are organized in a package hierarchy.
All of the listed attributes are also true of C++, though C++ has
extra features (for example in C++ objects may also be allocated statically
or in a local stack frame in addition to the heap).
So the most important task in integrating Java and C++ is to
remove gratuitous incompatibilities.
Object references
We implement a Java object reference as a pointer to the start
of the referenced object. It maps to a C++ pointer.
(We cannot use C++ references for Java references, since
once a C++ reference has been initialized, you cannot change it to
point to another object.)
The null Java reference maps to the NULL
C++ pointer.
Note that in JDK an object reference is implemented as
a pointed to a two-word handle
. One word of the handle
points to the fields of the object, while the other points
to a method table. GNU Java does not use this extra indirection.
Primitive types
Java provides 8 primitives
types:
byte, short, int,
long, float, double,
char, and boolean.
These as the same as the following C++ typedefs
(which are defined in a standard header file):
jbyte, jshort, jint,
jlong, jfloat,
jdouble,
jchar, and jboolean.
Java type
C/C++ typename
Description
byte
jbyte
8-bit signed integer
short
jshort
16-bit signed integer
int
jint
32-bit signed integer
long
jlong
64-bit signed integer
float
jfloat
32-bit IEEE floating-point number
double
jdouble
64-bit IEEE floating-point number
char
jchar
16-bit Unicode character
boolean
jboolean
logical (Boolean) values
void
void
no value
Object fields
Each object contains an object header, followed by the instance
fields of the class, in order. The object header consists of
a single pointer to a dispatch or virtual function table.
(There may be extra fields in front of
the object,
for example for
memory management, but this is invisible to the application, and
the reference to the object points to the dispatch table pointer.)
The fields are laid out in the same order, alignment, and size
as in C++. Specifically, 8-bite and 16-bit native types
(byte, short, char,
and boolean) are not
widened to 32 bits.
Note that the Java VM does extend 8-bit and 16-bit types to 32 bits
when on the VM stack or temporary registers.
The JDK implementation
and earlier versions of Kaffe also extends 8-bit and 16-bit
object fields to use a full 32 bits. However, GNU Java was recently changed
so that 8-bit and 16-bits fields now only take 8 or 16 bits in an object.
In general Java field sizes and alignment are now the same as C and C++.
Arrays
While in many ways Java is similar to C and C++,
it is quite different in its treatment of arrays.
C arrays are based on the idea of pointer arithmetic,
which would be incompatible with Java's security requirements.
Java arrays are true objects (array types inherit from
java.lang.Object). An array-valued variable
is one that contains a reference (pointer) to an array object.
Referencing a Java array in C++ code is done using the
JArray template, which as defined as follows:
class __JArray : public java::lang::Object
{
public:
int length;
};
template<class T>
class JArray : public __JArray
{
T data[0];
public:
T& operator[](jint i) { return data[i]; }
};
The following convenience typedefs
(matching JNI) are provided.
typedef __JArray *jarray;
typedef JArray<jobject> *jobjectArray;
typedef JArray<jboolean> *jbooleanArray;
typedef JArray<jbyte> *jbyteArray;
typedef JArray<jchar> *jcharArray;
typedef JArray<jshort> *jshortArray;
typedef JArray<jint> *jintArray;
typedef JArray<jlong> *jlongArray;
typedef JArray<jfloat> *jfloatArray;
typedef JArray<jdouble> *jdoubleArray;
Overloading
Both Java and C++ provide method overloading, where multiple
methods in a class have the same name, and the correct one is chosen
(at compile time) depending on the argument types.
The rules for choosing the correct method are (as expected) more complicated
in C++ than in Java, but the fundamental idea is the same.
We do have to make sure that all the typedefs for
Java types map to distinct C++ types.
Common assemblers and linkers are not aware of C++ overloading,
so the standard implementation strategy is to encode the
parameter types of a method into its assembly-level name.
This encoding is called mangling,
and the encoded name is the mangled name.
The same mechanism is used to implement Java overloading.
For C++/Java interoperability, it is important to use the
same encoding scheme. (This is already
implemented in jc1, except for some minor
necessary adjustments.)
Virtual method calls
Virtual method dispatch is handled essentially the same
in C++ and Java -- i.e. by doing an
indirect call through a function pointer stored in a per-class virtual
function table. C++ is more complicated because it has to support
multiple inheritance. Traditionally, this is implemented
by putting an extra delta integer offset in
each entry in the virtual function table.
This is not needed for Java, which only needs a single function pointer
in each entry of the virtual function table.
There is a more modern C++ implementation technique, which uses
thunks, which does away with the need for the
delta fields in the virtual function tables.
This is now an option in G++, and will soon be the default on Linux.
We need to make sure that Java classes (i.e. those that
inherit from java.lang.Object) are implemented as
if using thunks. (No actual thunks are needed for Java classes,
since Java does not have multiple inheritance.)
The first one or two elements of the virtual function table
are used for special purposes in both GNU Java and C++; in Java,
it points to the class that owns the virtual function table.
G++ needs to know that Java is slightly different.
Allocation
New Java objects are allocated using a
class-instance-creation-expression:
new Type ( arguments )
The same syntax is used in C++. The main difference is that
C++ objects have to be explicitly deleted, which in Java they are
automatically deleted by the garbage collector.
For a specic class, we can define in C++ operator new:
class CLASS {
void* operator new (size_t size) { return soft_new(MAGIC); }
}
However, we don't want a user to have to define this
magic operator new for each class. It needs to be done
in java.lang.Object. This is not possible
without some compiler support (because the MAGIC
argument is class-dependent); however, it is straight-forward to
implement such support. Allocating an array is a special case,
since the space needed depends on the run-time length given.
Object construction
In both C++ and Java newly created objects are allocated by a
constructor. In both languages, a
constructor is a method that is automatically called.
Java has some restrictions on how constructors are called,
but basically the calling convention (and overload resolution)
are as for standard methods. In G++, methods get passed an
extra magic argument, which is not passed for Java constructors.
G++ also has the constructors set up the vtable pointers.
In Java, the object allocator sets up the vtable pointer,
and the constructor does not change the vtable pointer.
Hence, the G++ compiler needs to know about these differences.
Object finalization
A Java methods with the special name finalize
serves some of the function as a C++ destructor method.
The latter is responsible for freeing up any resources owned
by the object before it is destroyed, including deleting
any sub-objects it points. In Java, the garbage collector will
take care of deleting no-longer-needed sub-objects, so there
is much less need for finalization, but it is occasionally needed.
It might make sense to consider the C++ syntax for a finalizer:
~ClassName
as being equivalent to the Java finalize method.
That would mean that if class that inherits from
java.lang.Object defined a C++-style destructor,
it would be equivalent to defining a finalize method.
However, I see no useful need solved by doing that.
Instead: If you want to define or invoke a Java finalizer from C++ code,
you will need to define or invoke a method named finalize.
In this proposed hybrid C++/Java environment, there is no clear
distinction between C++ and Java objects. Java objects inherit
from java.lang.Object, and are garbage collected.
On the other hand, regular C++ objects are not garbage collected,
but must be explicitly deleted.
It may be useful to support C++ objects (that do not
inherit from java.lang.Object) that would wantbe
garbage collected. KNI will probably provide a way to do that,
by overloading operator new.
What happens if you explicitly delete an object
(Java or C++) that is garbage collected? The Ellis/Detlefs garbage
collection proposal for C++ says that should cause the finalizer
to be run, but otherwise whether the object memory is freed
is unpredictable; that seems reasonable to me.
Interfaces
A Java class can implement zero or more
interfaces, in addition to inheriting from
a single base class.
An interface is a collection of constants and method specifications;
it is similar to the signatures available
as a G++ extension. An interface provides a subset of the
functionality of C++ abstract virtual base classes, but are
normally implemented differently. Since the mechanism used to
implement interfaces in GNU Java will change, and since interfaces
are infrequently used by Java native methods, we will not say
anything more about them now.
Exceptions
It is a goal of the Gcc exception handling mechanism that it as far as possible
be language independent. The existing support is geared towards C++,
but should be extended for Java. Essentially, the Java features are
a subset of the G++ features, in that C++ allows near-arbitrary values
to be thrown, while Java only allows throwing of references to
objects that inherit from java.lang.Throwable.
So once the Gcc exception handling is more stable, it should be
trivial to add Java support. The main change needed for Java is
how type-matching is done; fixing that would benefit C++ as well.
The main other issue is that we need to make Kaffe's representation
of exception ranges be compatible with Gcc's.
The goal is that C++ code that needs to throw a Java exception would
just use the C++ throw statement. For example:
throw new java::io::IOException(JvNewStringUTF("I/O Error!"));
There is also no difference between catching a Java exception,
and catching a C++ exception.
The following Java fragment:
try {
do_stuff();
} catch (java.IOException ex) {
System.out.println("caught I/O Error");
} finally {
cleanup();
}
could be expressed this way in G++:
try {
try {
do_stuff();
} catch (java::io::IOException ex) {
printf("caught I/O Error\n;");
}
catch (...) {
cleanup();
throw; // re-throws exception
}
Note that in C++ we need to use two nested try statements.
Synchonization
Each Java object has an implicit monitor.
The Java VM uses the instruction monitorenter to acquire
and lock a monitor, and monitorexit to release it.
The JNI has corresponding methods MonitorEnter
and MonitorExit. The corresponding KNI macros
are JvMonitorEnter and JvMonitorExit.
The Java source language does not provide direct access to these primitives.
Instead, there is a synchonized statement that does an
implicit monitorenter before entry to the block,
and does a monitorexit on exit from the block.
Note that the lock has to be released even the block is abnormally
terminated by an exception, which means there is an implicit
try-finally.
From C++, it makes sense to use a destructor to release a lock.
KNI defines the following utility class.
class JvSynchronize() {
jobject obj;
JvSynchronize(jobject o) { obj = o; JvMonitorEnter(o); }
~JvSynchronize() { JvMonitorExit(obj); }
};
The equivalent of Java's:
synchronized (OBJ) { CODE; }
can be simply expressed:
{ JvSynchronize dummy(OBJ); CODE; }
Java also has methods with the synchronized attribute.
This is equivalent to wrapping the entire method body in a
synchronized statement.
Alternatively, the synchronization can be done by the caller
wrapping the method call in a synchronized.
That implementation is not practical for virtual method calls in compiled code,
since it would require the caller to check at run-time for the
synchronized attribute. Hence our implementation of
Java will have the called method do the synchronization inline.
Improved String implementation
The standard Java implementation is a bit inefficient, because
every string requires two object:
A java.lang.String object, which contains a
reference to an internal char array, which
contains the actual character data.
If we allow the actual java.lang.String object
to have a size the varies depending on how many characters it contains
(just like array objects vary in size), we can save the overhead of
the extra object. This would save space, reduce cache misses,
and reduce garbage collection over-head.
class java::lang::String : public java::lang::Object
{
jint length; /* In characters. */
jint offset; /* In bytes, from start of base. */
Object *base; /* Either this or another String or a char array. */
private:
jchar& operator[](jint i) { return ((jchar*)((char*)base+offset))[i]; }
public:
jchar charAt(jint i)
{
if ((unsigned32) i >= length)
throw new IndexOutOfBoundsException(i);
return (*this)[i];
}
String* substring (jint beginIndex, jint endIndex)
{
... check for errors ...;
String *s = new String();
s.base = base;
s.length = endIndex - beginIndex;
s.offset = (char*) &base[beginIndex] - (char*) base;
return s;
}
...
}
The tricky part about variable-sized objects is that we can no longer
cleanly separate object allocation from object construction,
since the size of the object to be allocated depends on the arguments
given to the constructor. We can deal with this fairly straight-forwardly
from C++ or when compiling Java source code. It is more complicated
(though quite doable) when compiling from Java byte-code. We don't
have to worry about that, since in any case we have to support
the less efficient scheme with separate allocation and construction.
(This is needed for JNI and reflection compatibility.)
Changes needed to G++
Here is a list of tweaks needed to G++ before it can provide
the C++/Java interoperability we have discussed:
We need a utility to translate Java class definitions into
equivalent C++ class declarations. Most convenient would be
adding the ability for G++ to directly read class properties
from a .class file. However, a simple
program that reads a .class and generates
a suitable C++ include file is almost as convenient.
We need a way to indicate to G++ that the class
java.lang.Object is magic, in that it, and all classes
that inherit from it should be implemented following Java conventions
instead of C++ conventions. We say that such classes have
the Java property
.
(Our goal is that on the whole it should not
matter, but there are a few places where it matters. Hopefully,
these are all listed here.)
Virtual function tables and calls in classes
that have the Java property are different.
A new expression needs to be modified to call the
correct Kaffe function (for classes that have the Java property).
The interface to constructors needs to be changed so magic
vtable pointer initialization and the extra constructor argument
do not happen when constructing a Java object.
The typedefs for the primitive types (such as
jlong) map to concrete implementation types.
G++ needs some minor changes so that the mangling of those
implementation types are all disjoint (and preferably that the
manglings are the same on all platforms).
Change representation of exception ranges to be more suitable for Java.