Dude, Where’s My char[]?

Looking for String.value in Android M

Reddit
LinkedIn

Written by Pierre-Yves Ricau.

This article started as a thread on an internal mailing list and I thought it would also be of interest to people outside of Square.

When Android M preview 2 was released, I started receiving reports of LeakCanarycrashing when parsing heap dumps. LeakCanary reached into the char array of a String object to read a thread name, but in Android M that char array wasn’t there anymore.

Let’s dig

Here’s the structure of String.java prior to Android M:

**public** **final** **class** **String** **{**
  **private** **final** **char** value**[];**
  **private** **final** **int** offset**;**
  **private** **final** **int** count**;**
  **private** **int** hash**;**
  *// ...*
**}**

Here’s String.java in M:

**public** **final** **class** **String** **{**
  **private** **final** **int** count**;**
  **private** **int** hash**;**
  *// ...*
**}**

Where did that char[] go? To learn more, let’s see what happens when we concatenate two strings:

String baguette **=** flour **+** love**;**

In other words:

String baguette **=** flour**.**concat**(**love**);**

String.concat() is now a native method:

**public** **final** **class** **String** **{**
  **public** **native** String **concat(**String string**);**
  *// ...*
**}**

Going native

concat() is implemented in String.cc:

**static** jstring **String_concat**(JNIEnv***** env, jobject java_this, jobject java_string_arg) {
  ScopedFastNativeObjectAccess soa(env);
  **if** (UNLIKELY(java_string_arg **==** **nullptr**)) {
    ThrowNullPointerException("string arg == null");
    **return** **nullptr**;
  }
  StackHandleScope**<**2**>** hs(soa.Self());
  Handle**<**mirror**::**String**>** string_this(hs.NewHandle(soa.Decode**<**mirror**::**String***>**(java_this)));
  Handle**<**mirror**::**String**>** string_arg(hs.NewHandle(soa.Decode**<**mirror**::**String***>**(java_string_arg)));
  **int32_t** length_this **=** string_this**->**GetLength();
  **int32_t** length_arg **=** string_arg**->**GetLength();
  **if** (length_arg **>** 0 **&&** length_this **>** 0) {
    mirror**::**String***** result **=** mirror**::**String**::**AllocFromStrings(soa.Self(), string_this, string_arg);
    **return** soa.AddLocalReference**<**jstring**>**(result);
  }
  jobject string_original **=** (length_this **==** 0) **?** java_string_arg : java_this;
  **return** **reinterpret_cast<**jstring**>**(string_original);
}

The actual concatenation is done in mirror::String::AllocFromStrings in mirror::String.cc:

String***** String**::**AllocFromStrings(Thread***** self, Handle**<**String**>** string, Handle**<**String**>** string2) {
  **int32_t** length **=** string**->**GetLength();
  **int32_t** length2 **=** string2**->**GetLength();
  gc**::**AllocatorType allocator_type **=** Runtime**::**Current()**->**GetHeap()**->**GetCurrentAllocator();
  SetStringCountVisitor **visitor**(length **+** length2);
  String***** new_string **=** Alloc**<**true**>**(self, length **+** length2, allocator_type, visitor);
  **if** (UNLIKELY(new_string **==** **nullptr**)) {
    **return** **nullptr**;
  }
  **uint16_t*** new_value **=** new_string**->**GetValue();
  memcpy(new_value, string**->**GetValue(), length ***** **sizeof**(**uint16_t**));
  memcpy(new_value **+** length, string2**->**GetValue(), length2 ***** **sizeof**(**uint16_t**));
  **return** new_string;
}

First, it allocates a new string of the right size using Alloc in string-inl.h:

**template** **<bool** kIsInstrumented, **typename** PreFenceVisitor**>**
**inline** String***** String**::**Alloc(Thread***** self, **int32_t** utf16_length, gc**::**AllocatorType allocator_type,
                             **const** PreFenceVisitor**&** pre_fence_visitor) {
  **size_t** header_size **=** **sizeof**(String);
  **size_t** data_size **=** **sizeof**(**uint16_t**) ***** utf16_length;
  **size_t** size **=** header_size **+** data_size;
  Class***** string_class **=** GetJavaLangString();
  *// Check for overflow and throw OutOfMemoryError if this was an unreasonable request.*
  **if** (UNLIKELY(size **<** data_size)) {
    self**->**ThrowOutOfMemoryError(StringPrintf("%s of length %d would overflow",
                                             PrettyDescriptor(string_class).c_str(),
                                             utf16_length).c_str());
    **return** **nullptr**;
  }
  gc**::**Heap***** heap **=** Runtime**::**Current()**->**GetHeap();
  **return** down_cast**<**String***>**(
      heap**->**AllocObjectWithAllocator**<**kIsInstrumented, true**>**(self, string_class, size,
                                                            allocator_type, pre_fence_visitor));
}
  • This allocates an object of size headersize + datasize

  • header_size is the size of mirror::String in mirror::String.h:

  • int32t count

  • uint32t hashcode_

  • uint16t value[0]

  • datasize is essentially the total character count times the size of one char (uint16t).

So this means that the char array is inlined in the String object.

Also notice the zero length array: uint16t value[0].

Let’s continue reading mirror::String::AllocFromStrings:

**uint16_t*** new_value **=** new_string**->**GetValue();
  memcpy(new_value, string**->**GetValue(), length ***** **sizeof**(**uint16_t**));
  memcpy(new_value **+** length, string2**->**GetValue(), length2 ***** **sizeof**(**uint16_t**));

Where GetValue() is defined in mirror::String.h and returns the address of the uint16t value[0] that we noticed above:

**uint16_t*** **GetValue**() SHARED_LOCKS_REQUIRED(Locks**::**mutator_lock_) {
  **return** **&**value_[0];
}

This is a quite straightforward copy from the memory address of one array to another.

Offset

You probably noticed that the offset field is now entirely gone. Java strings are immutable, so older versions of the JDK allowed substrings to share the char array of their parent, with a different offset and count. This meant holding onto a small substring could hold onto a larger string in memory and prevent it from being garbage collected.

The char array is now inlined in the String object, so substrings can’t share their parent char array, which is why offset isn’t needed anymore.

Advantages

Let’s speculate on why those changes are interesting:

  • Spatial locality of reference: instead of having to follow a reference and risking invalidating a CPU cache, the char array is available right next to the rest of the String data.

  • Smaller footprint: a Java char array contains a header to store its type and length, which was redundant.

  • Both objects had to be 4-byte aligned with padding, now there’s only one object to pad.

String is one of the most used types of the VM, so these micro optimizations will add up to huge improvements.

Conclusion: Back to Heap Dumps

Because the char[] value field was removed from String.java, it could not be parsed in heap dumps. However in Android M Preview 2 the char buffer is still serialized in the heap dump, 16 bytes after the String address (because the String structure isn’t longer than 16 bytes). This means we can get LeakCanary to work again with Android M:

Object value **=** fieldValue**(**values**,** "value"**);**
ArrayInstance charArray**;**
**if** **(**isCharArray**(**value**))** **{**
  charArray **=** **(**ArrayInstance**)** value**;**
**}** **else** **{**
  charArray **=** **(**ArrayInstance**)** heap**.**getInstance**(**instance**.**getId**()** **+** 16**);**
**}**

This hack will eventually be fixed in Android M by inserting a virtual char[] valuefield in all String objects when dumping the heap.

Huge thanks to Chester Hsieh, Romain Guy, Jesse Wilson, and Jake Wharton for their help figuring this out. Pierre-Yves Ricau (@Piwai) | Twitter The latest Tweets from Pierre-Yves Ricau (@Piwai). Android baker @Square. Paris / San Franciscotwitter.com