反射在JVM层面的实现

April 27, 2022lwohvyeBase, JVM

该篇文章，将从我们熟悉的Java反射API出发，一直到JVM源码对Java反射的支持。本文分析用的是JDK18。与原文的1.7相比，变化还是不少的，但总体思想没变

首先看使用的示例代码：

    public static void main(String[] args) throws Exception {

        Class<String> stringClass = String.class;

        Method hashCodeMethod = stringClass.getDeclaredMethod("hashCode");

        String str= "hello, world";
        Object hashCode= hashCodeMethod.invoke(str);
        System.out.println( hashCode);

    }

这段代码的逻辑十分简单，就是利用反射来获取一个字符串的哈希值。该示例毫无实际利用的价值，但是拿来举例子是足够的了。

这篇文章我们要弄明白的问题是：

getDeclaredMethod方法是如何正确查找到方法的；
invoke方法是如何被执行的；
这两个方法的调用，在虚拟机的层面上究竟发生了什么；

JDK分析

反射的源码在JDK层面上是比较简单的，getDeclaredMethod方法的源码是：

    @CallerSensitive
    public Method getDeclaredMethod(String name, Class<?>... parameterTypes)
        throws NoSuchMethodException, SecurityException {
        Objects.requireNonNull(name);
        @SuppressWarnings("removal")
        SecurityManager sm = System.getSecurityManager();
        if (sm != null) {
            checkMemberAccess(sm, Member.DECLARED, Reflection.getCallerClass(), true);
        }
        Method method = searchMethods(privateGetDeclaredMethods(false), name, parameterTypes);
        if (method == null) {
            throw new NoSuchMethodException(methodToString(name, parameterTypes));
        }
        return getReflectionFactory().copyMethod(method);
    }

其中最重要的就是

//name -- 方法名字, parameterTypes -- 方法参数
Method method = searchMethods(privateGetDeclaredMethods(false), name, parameterTypes);

searchMethods是依据传入的方法名字和方法参数，从privateGetDeclaredMethods返回的Method[]中找到对应的方法，返回该方法的副本。
所以重要的逻辑是在privateGetDeclaredMethods中，该方法的源码是：

    private Method[] privateGetDeclaredMethods(boolean publicOnly) {
        Method[] res;
        ReflectionData<T> rd = reflectionData();
        if (rd != null) {
            res = publicOnly ? rd.declaredPublicMethods : rd.declaredMethods;
            if (res != null) return res;
        }
        // No cached value available; request value from VM
        res = Reflection.filterMethods(this, getDeclaredMethods0(publicOnly));
        if (rd != null) {
            if (publicOnly) {
                rd.declaredPublicMethods = res;
            } else {
                rd.declaredMethods = res;
            }
        }
        return res;
    }

该方法大体上可以分成两个部分：

从本地缓存中获取Method[]；
如果本地没有缓存，则从虚拟机中获取；

本地缓存

先考察从本地缓存获取，其中关键的一句话是：

        ReflectionData<T> rd = reflectionData();

这一句话就是访问本地缓存。ReflectionData是Class的一个内部类，它里面存储了一个Class所含有的各色信息，包括字段、方法，并且进行了一定的分类，源码是：

    private static class ReflectionData<T> {
        //...其余字段
        volatile Method[] declaredMethods;
        volatile Method[] publicMethods;

        // Value of classRedefinedCount when we created this ReflectionData instance
        final int redefinedCount;

        ReflectionData(int redefinedCount) {
            this.redefinedCount = redefinedCount;
        }
    }

这个源码要注意的是其构造函数，在这个构造函数中，只设置了redefinedCount，其余字段都会是Null。
reflectionData()方法源码是：

    // Lazily create and cache ReflectionData
    private ReflectionData<T> reflectionData() {
        SoftReference<ReflectionData<T>> reflectionData = this.reflectionData;
        int classRedefinedCount = this.classRedefinedCount;
        ReflectionData<T> rd;
        if (reflectionData != null &&
            (rd = reflectionData.get()) != null &&
            rd.redefinedCount == classRedefinedCount) {
            return rd;
        }
        // else no SoftReference or cleared SoftReference or stale ReflectionData
        // -> create and replace new instance
        return newReflectionData(reflectionData, classRedefinedCount);
    }

这段代码有一个很关键的点：本地缓存的ReflectionData是使用SoftReference的，这意味着，在内存紧张的时候会被回收掉。
如果在回收掉之后再次请求获得Method[]，那么就会新建一个SoftReference，作为缓存。也就是其中newReflectionData的逻辑：

    private ReflectionData<T> newReflectionData(SoftReference<ReflectionData<T>> oldReflectionData,
                                                int classRedefinedCount) {
        while (true) {
            ReflectionData<T> rd = new ReflectionData<>(classRedefinedCount);
            // try to CAS it...
            if (Atomic.casReflectionData(this, oldReflectionData, new SoftReference<>(rd))) {
                return rd;
            }
            // else retry
            oldReflectionData = this.reflectionData;
            classRedefinedCount = this.classRedefinedCount;
            if (oldReflectionData != null &&
                (rd = oldReflectionData.get()) != null &&
                rd.redefinedCount == classRedefinedCount) {
                return rd;
            }
        }
    }

这段逻辑看上去很简单，但是其中有一些很容易遗漏的点。首先要注意的是，当走到

ReflectionData<T> rd = new ReflectionData<>(classRedefinedCount);

这一句的时候，rd里面只有一个redefinedCount字段被设置，其余字段都还是null。该方法的核心是：

Atomic.casReflectionData(this, oldReflectionData, new SoftReference<>(rd))

该方法的底层是使用Unsafe的compareAndSwapObject方法来实现的：

        static <T> boolean casReflectionData(Class<?> clazz,
                                             SoftReference<ReflectionData<T>> oldData,
                                             SoftReference<ReflectionData<T>> newData) {
            return unsafe.compareAndSetReference(clazz, reflectionDataOffset, oldData, newData);
        }

这个方法的含义就是：如果clazz内存分布中，在reflectionDataOffset位置的地方，如果期望的值是oldData，那么就会使用newData来替换掉oldData。而在clazz内存的reflectionDataOffset的地方，恰好就是Class类中reflectionData域引用的地方。所以这个方法的真实含义，借助CAS原子操作，将老的reflectionData替换为新的reflectionData。

但是，这里要注意的一点是，此时的newData，也就是

new SoftReference<>(rd)

它是一个新建的ReflectionData的实例的soft reference。而新建的ReflectionData，也就是rd，前面已经提到了，它只有redefinedCount字段是被设置好了的，其余字段都还是null。所以如果没有本地缓存，在方法返回了之后，一直到privateGetDeclaredMethods方法的

ReflectionData<T> rd = reflectionData();

此刻的rd就是方才新建的ReflectionData，只有redefinedCount字段是设置的。所以判断：

if (res != null) return res;

是必然不成立的，因此最终会走到从JVM中获取Method[]的代码处。我们总结一下，在这种情况下，Class的实例类似于：

从JVM获取

前面已经提到，在本地缓存失效，或者被回收了之后，需要从JVM当中获得Method[]：

        // No cached value available; request value from VM
        res = Reflection.filterMethods(this, getDeclaredMethods0(publicOnly));

所以最后是通过调用native方法。下面是C++的环节了

    private native Method[]      getDeclaredMethods0(boolean publicOnly);

从JVM中取到了Method[]。
该方法在JVM中的头文件声明是：

// src/hotsopt/share/include/jvm.h
//JNIEnv: java本地方法调用的上下文，
//ofClas: 在java中的Class实例
// publicOnly: 是否是public,对应于java中native方法中的publicOnly参数
JNIEXPORT jobjectArray JNICALL
JVM_GetClassDeclaredMethods(JNIEnv *env, jclass ofClass, jboolean publicOnly);

该方法的实现是：

// src/hotspot/share/prims/jvm.cpp
JVM_ENTRY(jobjectArray, JVM_GetClassDeclaredMethods(JNIEnv *env, jclass ofClass, jboolean publicOnly))
{
  return get_class_declared_methods_helper(env, ofClass, publicOnly,
                                           /*want_constructor*/ false,
                                           /*下面的
                                                    vmClasses::reflect_Method_klass()
是取得了Java类java.lang.reflect.Method类在JVM中对应的klass实例。确切说，这个klass是java.lang.reflect.Method在JVM中的镜像类。它和JVM运行时刻真正用的java.lang.reflect.Method的klass是不一样的。*/
                                           vmClasses::reflect_Method_klass(), THREAD);
}

static jobjectArray get_class_declared_methods_helper(
                                  JNIEnv *env,
                                  jclass ofClass, jboolean publicOnly,
                                  bool want_constructor,
                                  Klass* klass, TRAPS) {

  JvmtiVMObjectAllocEventCollector oam;

  oop ofMirror = JNIHandles::resolve_non_null(ofClass);
  // Exclude primitive types and array types 1
  if (java_lang_Class::is_primitive(ofMirror)
      || java_lang_Class::as_Klass(ofMirror)->is_array_klass()) {
    // Return empty array
    oop res = oopFactory::new_objArray(klass, 0, CHECK_NULL);
    return (jobjectArray) JNIHandles::make_local(THREAD, res);
  }

  InstanceKlass* k = InstanceKlass::cast(java_lang_Class::as_Klass(ofMirror));

  // Ensure class is linked 2
  k->link_class(CHECK_NULL);

  Array<Method*>* methods = k->methods();
  int methods_length = methods->length();

  // Save original method_idnum in case of redefinition, which can change
  // the idnum of obsolete methods.  The new method will have the same idnum
  // but if we refresh the methods array, the counts will be wrong.
  ResourceMark rm(THREAD);
  GrowableArray<int>* idnums = new GrowableArray<int>(methods_length);
  int num_methods = 0;

  for (int i = 0; i < methods_length; i++) {
    methodHandle method(THREAD, methods->at(i));
    if (select_method(method, want_constructor)) {
      if (!publicOnly || method->is_public()) {
        idnums->push(method->method_idnum());
        ++num_methods;
      }
    }
  }

  // Allocate result
  objArrayOop r = oopFactory::new_objArray(klass, num_methods, CHECK_NULL);
  objArrayHandle result (THREAD, r);

  // Now just put the methods that we selected above, but go by their idnum
  // in case of redefinition.  The methods can be redefined at any safepoint,
  // so above when allocating the oop array and below when creating reflect
  // objects. 
  for (int i = 0; i < num_methods; i++) {
    methodHandle method(THREAD, k->method_with_idnum(idnums->at(i)));
    if (method.is_null()) {
      // Method may have been deleted and seems this API can handle null
      // Otherwise should probably put a method that throws NSME
      result->obj_at_put(i, NULL);
    } else {
      oop m;
      if (want_constructor) {
        m = Reflection::new_constructor(method, CHECK_NULL);
      } else {
        m = Reflection::new_method(method, false, CHECK_NULL);
      }
      result->obj_at_put(i, m);
    }
  }

  return (jobjectArray) JNIHandles::make_local(THREAD, result());
}

如我在其中注释的一样，整个过程可以分成几步：

先处理基本类型和数组类型；
确保类已经完成链接，在此处就是确保String已经完成了链接
统计符合要求的方法，并根据方法个数为结果分配内存
创建Method[]数组

下面我们将对2和4进行详细的分析。
在这之前先对JVM的oop-klass做一个简单的介绍：

oop: ordinary object pointer，普通对象指针，用于描述对象的实例信息
klass：Java类在JVM中的表示，是对Java类的描述，看名称是元数据
对于我们日常说的一个对象来说，它们的oop-klass模型如图：

JVM就是用这种方式，将一个对象的数据和对象模型进行分离。普遍意义上来说，我们说持有一个对象的引用，指的是图中的handle，它是oop的一个封装。

处理基本类型和数组类型

  oop ofMirror = JNIHandles::resolve_non_null(ofClass);
  // Exclude primitive types and array types
  if (java_lang_Class::is_primitive(ofMirror)
      || java_lang_Class::as_Klass(ofMirror)->is_array_klass()) {
    // Return empty array
    oop res = oopFactory::new_objArray(klass, 0, CHECK_NULL);
    return (jobjectArray) JNIHandles::make_local(THREAD, res);
  }

JNIHandles::resolve_non_null方法将ofClass转化为JVM内部的一个oop，因为JVM只会直接操作oop实例。
顾名思义，is_primitive是判断是否属于基本类型，其源码实现是：

//src/hotspot/share/classfile/javaClasses.inline.hpp
inline bool java_lang_Class::is_primitive(oop java_class) {
  // should assert:
  //assert(java_lang_Class::is_instance(java_class), "must be a Class object");
  bool is_primitive = (java_class->metadata_field(_klass_offset) == NULL);

#ifdef ASSERT
  if (is_primitive) {
    Klass* k = ((Klass*)java_class->metadata_field(_array_klass_offset));
    assert(k == NULL || is_java_primitive(ArrayKlass::cast(k)->element_type()),
        "Should be either the T_VOID primitive or a java primitive");
  }
#endif

  return is_primitive;
}

这一段的逻辑十分简单，取到java_class这个oop中在klass字段上的值，如果是Null则被认为是基本类型。因为对于任何一个非基本类型的对象来说，oop中必然包含着一个指向其klass实例的指针。

另外一个判断条件：

java_lang_Class::as_Klass(ofMirror)->is_array_klass()

java_lang_Class::as_klass方法将一个oop包装成一个klass，逻辑和前面的is_primitive十分相像：

//src/hotsopt/share/classfile/javaClasses.inline.hpp
inline Klass* java_lang_Class::as_Klass(oop java_class) {
  //%note memory_2
  assert(java_lang_Class::is_instance(java_class), "must be a Class object");
  Klass* k = ((Klass*)java_class->metadata_field(_klass_offset));
  assert(k == NULL || k->is_klass(), "type check");
  return k;
}

在JVM中，对象在内存中的基本存在形式就是oop，正如前面的图所描述的那样。那么，对象所属的类，在JVM中也是一种对象，因此它们实际上也会被组织成一种oop，即klassOop。同样的，对于klassOop，也有对应的一个klass来描述，它就是klassKlass，也是klass的一个子类。在这种设计下，JVM对内存的分配和回收，都可以采用统一的方式来管理。oop-klass-klassKlass关系如图：

我们接下来看调用的 is_array_klass 判断该klass是不是对java数组的描述。
在if判断为真的情况下，会执行

oop res = oopFactory::new_objArray(klass, 0, CHECK_NULL);

new_objArray方法的源码是：

// src/hotspot/share/memory/oopFactory.cpp
//length--数组长度，也就是要为多少个对象分配空间
//klass -- 类的描述，它自身知道应该为每一个java对象分配多大的空间
objArrayOop oopFactory::new_objArray(Klass* klass, int length, TRAPS) {
  assert(klass->is_klass(), "must be instance class");
  if (klass->is_array_klass()) {
    return ArrayKlass::cast(klass)->allocate_arrayArray(1, length, THREAD);
  } else {
    return InstanceKlass::cast(klass)->allocate_objArray(1, length, THREAD);
  }
}

很显然关键的部分就是：

allocate_arrayArray(1, length, THREAD)

前面提到过，一个klass是对java类的描述，因此在分配内存的时候，klass必然知道应该给该java类的实例分配多大的内存空间。前面应该知道，此处传入的length是0。但是这并不意味着不分配内存空间。从我们前面对oop的描述来看，即便是没有任何数据——也就是数组不包含任何元素，但是它依然占据一些空间，用于存放mark,metadata等数据。这与

Method[] methods =null;

是全然不同的。
让我们暂时忽略后面跟着的

 (jobjectArray) JNIHandles::make_local(THREAD, res);

后面会对这个方法进行深入分析。

创建method[]

这个部分，将在前面分配的内存的基础上，创建Method[]数组。我们应该知道，在JVM里面，是没有Method这个类的，因此，创建Method[]数组，其实质是创建一个methodOop数组。不过我们现在也还没看到，JVM是如何获得一个Class所具有的方法。答案就隐藏在：

InstanceKlass* k = InstanceKlass::cast(java_lang_Class::as_Klass(ofMirror));

这一句的关键就是创建了一个instanceKlass的实例k。我们可以先来看看这个instanceKlass的定义

//  src/hotspot/share/oops/instanceKlass.hpp

class InstanceKlass: public Klass {
  /...
}

我们将注意力转回：

k->methods()

//  src/hotspot/share/oops/instanceKlass.hpp
// methods
  Array<Method*>* methods() const          { return _methods; }

所以这个部分，直接返回的就是_methods属性。给属性的值，会在String类加载的时候，被赋予值（该部分，本篇文章将不会讨论，实际上，类的加载，创建对应的JVM表示，是一个很复杂的过程）。在获取到了这个代表类方法的Array*。
现在只剩下最后一个步骤，真正创建Method[]数组。首先来看for循环：

  for (int i = 0; i < methods_length; i++) {
    methodHandle method(THREAD, methods->at(i));
    if (select_method(method, want_constructor)) {
      if (!publicOnly || method->is_public()) {
        idnums->push(method->method_idnum());
        ++num_methods;
      }
    }
  }

  // Allocate result
  objArrayOop r = oopFactory::new_objArray(klass, num_methods, CHECK_NULL);
  objArrayHandle result (THREAD, r);

  // Now just put the methods that we selected above, but go by their idnum
  // in case of redefinition.  The methods can be redefined at any safepoint,
  // so above when allocating the oop array and below when creating reflect
  // objects.
  for (int i = 0; i < num_methods; i++) {
    methodHandle method(THREAD, k->method_with_idnum(idnums->at(i)));
    if (method.is_null()) {
      // Method may have been deleted and seems this API can handle null
      // Otherwise should probably put a method that throws NSME
      result->obj_at_put(i, NULL);
    } else {
      oop m;
      if (want_constructor) {
        m = Reflection::new_constructor(method, CHECK_NULL);
      } else {
        m = Reflection::new_method(method, false, CHECK_NULL);
      }
      result->obj_at_put(i, m);
    }
  }

根据前面的解析，读者大概都能够理解methodHandle和methods->at(i)是什么作用。我们将注意力放在关键的：

 m = Reflection::new_method(method, false, CHECK_NULL);

这个调用的返回值就是oop，我们知道，所有的对象，在JVM的存在形式就是一个oop。所以，到这一步基本上就可以认为已经创建了一个Method的对象。
Reflection::new_method是一个十分长的方法，它的源码是：

// src/hotspot/share/runtime/reflection.cpp
oop Reflection::new_method(const methodHandle& method, bool for_constant_pool_access, TRAPS) {
  // Allow sun.reflect.ConstantPool to refer to <clinit> methods as java.lang.reflect.Methods.
  assert(!method()->is_initializer() ||
         (for_constant_pool_access && method()->is_static()),
         "should call new_constructor instead");
  InstanceKlass* holder = method->method_holder();
  int slot = method->method_idnum();

  Symbol*  signature  = method->signature();
  int parameter_count = ArgumentCount(signature).size();
  oop return_type_oop = NULL;
  objArrayHandle parameter_types = get_parameter_types(method, parameter_count, &return_type_oop, CHECK_NULL);
  if (parameter_types.is_null() || return_type_oop == NULL) return NULL;

  Handle return_type(THREAD, return_type_oop);

  objArrayHandle exception_types = get_exception_types(method, CHECK_NULL);
  assert(!exception_types.is_null(), "cannot return null");

  Symbol*  method_name = method->name();
  oop name_oop = StringTable::intern(method_name, CHECK_NULL);
  Handle name = Handle(THREAD, name_oop);
  if (name == NULL) return NULL;

  const int modifiers = method->access_flags().as_int() & JVM_RECOGNIZED_METHOD_MODIFIERS;

  Handle mh = java_lang_reflect_Method::create(CHECK_NULL);
// 设置oop的各属性
  java_lang_reflect_Method::set_clazz(mh(), holder->java_mirror());
  java_lang_reflect_Method::set_slot(mh(), slot);
  java_lang_reflect_Method::set_name(mh(), name());
  java_lang_reflect_Method::set_return_type(mh(), return_type());
  java_lang_reflect_Method::set_parameter_types(mh(), parameter_types());
  java_lang_reflect_Method::set_exception_types(mh(), exception_types());
  java_lang_reflect_Method::set_modifiers(mh(), modifiers);
  java_lang_reflect_Method::set_override(mh(), false);
  if (method->generic_signature() != NULL) {
    Symbol*  gs = method->generic_signature();
    Handle sig = java_lang_String::create_from_symbol(gs, CHECK_NULL);
    java_lang_reflect_Method::set_signature(mh(), sig());
  }
  typeArrayOop an_oop = Annotations::make_java_array(method->annotations(), CHECK_NULL);
  java_lang_reflect_Method::set_annotations(mh(), an_oop);
  an_oop = Annotations::make_java_array(method->parameter_annotations(), CHECK_NULL);
  java_lang_reflect_Method::set_parameter_annotations(mh(), an_oop);
  an_oop = Annotations::make_java_array(method->annotation_default(), CHECK_NULL);
  java_lang_reflect_Method::set_annotation_default(mh(), an_oop);
  return mh();
}

整个方法可以看成两个部分：

取得oop属性对应的handle
设置oop属性
在一个Method实例里面，它有很多的属性，比如说参数类型。那么很显然的是，这些属性在JVM的存在也是oop的形式。所以一个Method的oop形如：

这里还要提醒一下读者的是，parameter types oop是一个数组类型的oop，其元素，每一个都是klassOop，因为每一个parameter type在Java语言中，就是Class的一个实例，因而正好是一个klassOop。

到了这一步，我们已经完成了大部分的工作，只剩下最后一步了：

(jobjectArray) JNIHandles::make_local(THREAD, result())

该方法的源码是：

//src/hotspot/share/runtime/jniHandles.cpp
// Used by NewLocalRef which requires NULL on out-of-memory
jobject JNIHandles::make_local(JavaThread* thread, oop obj, AllocFailType alloc_failmode) {
  if (obj == NULL) {
    return NULL;                // ignore null handles
  } else {
    assert(oopDesc::is_oop(obj), "not an oop");
    assert(!current_thread_in_native(), "must not be in native");
    return thread->active_handles()->allocate_handle(thread, obj, alloc_failmode);
  }
}

这段代码的含义，十分简单，就是获得当前执行反射代码的线程，然后为引用分配内存，最后返回一个已经装配好了的，能够被java代码所访问的jobject。
现在我们从内存的分配角度来一下整个过程。整个过程，实际上只涉及两次内存分配，一次是为oop分配，这个一次的分配是根据要返回的结果来决定；第二次分配是将oop包装成jobject的时候，也就是allocate_handle方法的调用。前面一次分配很好理解，那么最后一次分配是什么意思呢？我的个人理解是这一次是为Java引用所指向的handle分配内存，它的大小就是一个handle指针的大小，在32位上，是4个字节，在64位上是8个字节。简单粗暴的说，就是在栈上分配一个引用。

最后

到这里，我们已经分析完了整个流程。但是因为涉及了一大堆的代码，不容易对流程有个整体的把握。整个过程可以看成是：

实际上，如果理解oop-klass模型，那么很容易就明白这一段的源码。不过这篇文章其实忽略了很多的细节，比如说内存分配的分配，就没有深入探究调用链，有兴趣的读者可以自己去看看。

IT技术分享