Dude, Where’s My char[]?

Looking for String.value in Android M

Written by Pierre-Yves Ricau.

This article started as a thread on an internal mailing list and I thought it would also be of interest to people outside of Square.

When Android M preview 2 was released, I started receiving reports of LeakCanarycrashing when parsing heap dumps. LeakCanary reached into the char array of a String object to read a thread name, but in Android M that char array wasn’t there anymore.

Let’s dig

Here’s the structure of String.java prior to Android M:

public final class String {
  private final char value[];
  private final int offset;
  private final int count;
  private int hash;
  // ...
}

Here’s String.java in M:

public final class String {
  private final int count;
  private int hash;
  // ...
}

Where did that char[] go? To learn more, let’s see what happens when we concatenate two strings:

String baguette = flour + love;

In other words:

String baguette = flour.concat(love);

String.concat() is now a native method:

public final class String {
  public native String concat(String string);
  // ...
}

Going native

concat() is implemented in String.cc:

static jstring String_concat(JNIEnv* env, jobject java_this, jobject java_string_arg) {
  ScopedFastNativeObjectAccess soa(env);
  if (UNLIKELY(java_string_arg == nullptr)) {
    ThrowNullPointerException("string arg == null");
    return nullptr;
  }
  StackHandleScope<2> hs(soa.Self());
  Handle<mirror::String> string_this(hs.NewHandle(soa.Decode<mirror::String*>(java_this)));
  Handle<mirror::String> string_arg(hs.NewHandle(soa.Decode<mirror::String*>(java_string_arg)));
  int32_t length_this = string_this->GetLength();
  int32_t length_arg = string_arg->GetLength();
  if (length_arg > 0 && length_this > 0) {
    mirror::String* result = mirror::String::AllocFromStrings(soa.Self(), string_this, string_arg);
    return soa.AddLocalReference<jstring>(result);
  }
  jobject string_original = (length_this == 0) ? java_string_arg : java_this;
  return reinterpret_cast<jstring>(string_original);
}

The actual concatenation is done in mirror::String::AllocFromStrings in mirror::String.cc:

String* String::AllocFromStrings(Thread* self, Handle<String> string, Handle<String> string2) {
  int32_t length = string->GetLength();
  int32_t length2 = string2->GetLength();
  gc::AllocatorType allocator_type = Runtime::Current()->GetHeap()->GetCurrentAllocator();
  SetStringCountVisitor visitor(length + length2);
  String* new_string = Alloc<true>(self, length + length2, allocator_type, visitor);
  if (UNLIKELY(new_string == nullptr)) {
    return nullptr;
  }
  uint16_t* new_value = new_string->GetValue();
  memcpy(new_value, string->GetValue(), length * sizeof(uint16_t));
  memcpy(new_value + length, string2->GetValue(), length2 * sizeof(uint16_t));
  return new_string;
}

First, it allocates a new string of the right size using Alloc in string-inl.h:

template <bool kIsInstrumented, typename PreFenceVisitor>
inline String* String::Alloc(Thread* self, int32_t utf16_length, gc::AllocatorType allocator_type,
                             const PreFenceVisitor& pre_fence_visitor) {
  size_t header_size = sizeof(String);
  size_t data_size = sizeof(uint16_t) * utf16_length;
  size_t size = header_size + data_size;
  Class* string_class = GetJavaLangString();
  // Check for overflow and throw OutOfMemoryError if this was an unreasonable request.
  if (UNLIKELY(size < data_size)) {
    self->ThrowOutOfMemoryError(StringPrintf("%s of length %d would overflow",
                                             PrettyDescriptor(string_class).c_str(),
                                             utf16_length).c_str());
    return nullptr;
  }
  gc::Heap* heap = Runtime::Current()->GetHeap();
  return down_cast<String*>(
      heap->AllocObjectWithAllocator<kIsInstrumented, true>(self, string_class, size,
                                                            allocator_type, pre_fence_visitor));
}
  • This allocates an object of size header_size + data_size

  • header_size is the size of mirror::String in mirror::String.h:

  • int32_t count_

  • uint32_t hash_code_

  • uint16_t value_[0]

  • data_size is essentially the total character count times the size of one char (uint16_t).

So this means that the char array is inlined in the String object.

Also notice the zero length array: uint16_t value_[0].

Let’s continue reading mirror::String::AllocFromStrings:

uint16_t* new_value = new_string->GetValue();
  memcpy(new_value, string->GetValue(), length * sizeof(uint16_t));
  memcpy(new_value + length, string2->GetValue(), length2 * sizeof(uint16_t));

Where GetValue() is defined in mirror::String.h and returns the address of the uint16_t value_[0] that we noticed above:

uint16_t* GetValue() SHARED_LOCKS_REQUIRED(Locks::mutator_lock_) {
  return &value_[0];
}

This is a quite straightforward copy from the memory address of one array to another.

Offset

You probably noticed that the offset field is now entirely gone. Java strings are immutable, so older versions of the JDK allowed substrings to share the char array of their parent, with a different offset and count. This meant holding onto a small substring could hold onto a larger string in memory and prevent it from being garbage collected.

The char array is now inlined in the String object, so substrings can’t share their parent char array, which is why offset isn’t needed anymore.

Advantages

Let’s speculate on why those changes are interesting:

  • Spatial locality of reference: instead of having to follow a reference and risking invalidating a CPU cache, the char array is available right next to the rest of the String data.

  • Smaller footprint: a Java char array contains a header to store its type and length, which was redundant.

  • Both objects had to be 4-byte aligned with padding, now there’s only one object to pad.

String is one of the most used types of the VM, so these micro optimizations will add up to huge improvements.

Conclusion: Back to Heap Dumps

Because the char[] value field was removed from String.java, it could not be parsed in heap dumps. However in Android M Preview 2 the char buffer is still serialized in the heap dump, 16 bytes after the String address (because the String structure isn’t longer than 16 bytes). This means we can get LeakCanary to work again with Android M:

Object value = fieldValue(values, "value");
ArrayInstance charArray;
if (isCharArray(value)) {
  charArray = (ArrayInstance) value;
} else {
  charArray = (ArrayInstance) heap.getInstance(instance.getId() + 16);
}

This hack will eventually be fixed in Android M by inserting a virtual char[] valuefield in all String objects when dumping the heap.

Huge thanks to Chester Hsieh, Romain Guy, Jesse Wilson, and Jake Wharton for their help figuring this out. Pierre-Yves Ricau (@Piwai) | Twitter The latest Tweets from Pierre-Yves Ricau (@Piwai). Android baker @Square. Paris / San Franciscotwitter.com

Table Of Contents
View More Articles ›