Story of an Android Q Leak: attachment crazy town!

Debugging leaks for Square POS in Android Q

Reddit
LinkedIn

Ticket assignment

Our QA team has been testing Square POS on Android Q, and recently assigned the following ticket to me:

Leak Ticket

We automatically upload memory leaks to Bugsnag in our debug builds, so I took a look. Sure enough, the leaks on Q (Android 10) were off the charts:

Leak Log

I then looked at the top Q leaks, and their leak traces were all a variant of this:

┬
├─ android.view.accessibility.AccessibilityNodeIdManager
│    Leaking: NO (it's a GC root and a class is never leaking)
│    ↓ static AccessibilityNodeIdManager.sIdManager
├─ android.view.accessibility.AccessibilityNodeIdManager
│    Leaking: NO (SparseArray↓ is not leaking)
│    ↓ AccessibilityNodeIdManager.mIdsToViews
├─ android.util.SparseArray
│    Leaking: NO (Object[]↓ is not leaking)
│    ↓ SparseArray.mValues
├─ java.lang.Object[]
│    Leaking: NO (DeterminateProgressView↓ is not leaking)
│    ↓ array Object[].[8]
├─ com.squareup.jail.DeterminateProgressView
│    Leaking: NO (View attached)
│    View#mAttachInfo is not null (view attached)
│    View#mParent is set
│    View.mWindowAttachCount=1
│    ↓ DeterminateProgressView.mContext
│                              ~~~~~~~~
╰→ com.squareup.ui.main.MainActivity
​     Leaking: YES (RefWatcher was watching this and
MainActivity#mDestroyed is true)

That's unusual: DeterminateProgressView is attached, yet it's mContext field is a destroyed activity. How could that be? On all other Android releases, all views in the view hierarchy become detached when the activity is destroyed. Did that change?

Note: you might also notice that the trace has "Leaking: NO (View attached)" for DeterminateProgressView. LeakCanary 2 has heuristics built in, it tries to guess if an instance is leaking or not by determining its lifecycle state. An attached view is still in use, so it's not leaking. These heuristics help LeakCanary highlight the likely causes of the leak, see the "~~~~~~~~" under mContext.

I was able to reproduce this leak, so I started debugging and determined that this view indeed belonged to the destroyed activity. Why was it still in an attached state?

Android Q sources (or lack thereof)

It's always easier to blame the Android Framework, so I decided to learn more about the AccessibilityNodeIdManager at the top of the LeakTrace and read its sources. Unfortunately, the Android Q sources aren't available yet, so I got them the fun way:

  • I started my Android Q emulator
  • Then I pulled the framework JAR: adb pull /system/framework/framework.jar
  • The JAR contains dex files. I cloned and built dex2jar from master (the latest release cannot read recent dex versions), then decompiled it: d2j-dex2jar.sh classes.dex
  • This gave me a jar with bytecode. The easiest to view the decompiled sources as Java from there was to simply add the Jar as a library and open up the classes I wanted to look at.

image7

image1

Look at the AccessibilityNodeIdManager, there was nothing obviously wrong. It's a singleton that keeps all views passed to registerViewWithId, until unregisterViewWithId is called

public final class AccessibilityNodeIdManager {
  private static AccessibilityNodeIdManager sIdManager;
  private SparseArray<View> mIdsToViews = new SparseArray();

  private AccessibilityNodeIdManager() {
  }

  public static AccessibilityNodeIdManager getInstance() {
    synchronized(AccessibilityNodeIdManager.class){}

    AccessibilityNodeIdManager var0;
    try {
      if (sIdManager == null) {
        var0 = new AccessibilityNodeIdManager();
        sIdManager = var0;
      }

      var0 = sIdManager;
    } finally {
    }

    return var0;
  }

  public View findView(int param1) {}

  public void registerViewWithId(View param1, int param2) {}

  public void unregisterViewWithId(int param1) {}
}

I looked into the View class to understand when AccessibilityNodeIdManager is called:

public class View {
  protected void onAttachedToWindow() {
    // ...
    int viewId =  getAccessibilityViewId()
    AccessibilityNodeIdManager.getInstance()
        .registerViewWithId(this, viewId);
  }

  protected void onDetachedFromWindowInternal() {
    // ...
    int viewId =  getAccessibilityViewId()
    AccessibilityNodeIdManager.getInstance()
        .unregisterViewWithId(viewId);
  }
}

Assuming there was a problem with how the view attached / detached, this leak now made sense: the view registered but not unregistered because it wasn't detached.

Crazy Town

I added log statements to the onAttachedToWindow() and onDetachedFromWindow() callbacks of both the leaking view (DeterminateProgressView) and it's parent (JailView). Here was the output:

JailView attached
JailView detached
DeterminateProgressView detached
DeterminateProgressView attached

Wait, what 🤯 ? Why is attached called after detached??

Ok so let's start debugging this thing. I put a break point in the ViewGroup.dispatchAttachedToWindow() method on JailView. That method calls dispatchAttachedToWindow() on self and then the children:

   @Override
    void dispatchAttachedToWindow(AttachInfo info, int visibility) {
        super.dispatchAttachedToWindow(info, visibility);

        final int count = mChildrenCount;
        final View[] children = mChildren;
        for (int i = 0; i < count; i++) {
            final View child = children[i];
            child.dispatchAttachedToWindow(info, visibility);
        }
    }

When I hit the breakpoint, I went up the stack to look at the local variables in ViewGroup.dispatchAttachedToWindow() for the parent of JailView (a FrameLayout). Here's what I saw:

image2

Notice how child is set to JailView, but children is an array of one element containing only OrderEntryView. Based on the code above, how could child be different from children[0] 🤯 ?

image4

Actually, there is one explanation that makes sense here, which is that children[0] was initially referencing JailView, then mChildren[0] was updated, which could only have happened from within child.dispatchAttachedToWindow(). But then, how and why would the child update mChildren from dispatchAttachedToWindow()?

Eureka!

I added a breakpoint to JailView.onDetachedFromWindow(), and all of a sudden it all made sense.

As you can see from the arrows, JailView.onAttachedToWindow() ends up calling ViewGroup.removeAllViews() from its parent, and then inserting a new view in its place, which indeed updates mChildren:

image3

JailView is a loading screen that keeps users in "jail" until the app is ready for them. When JailView gets attached, it wires up its presenter which starts loading data if necessary. When the data is ready, the presenter tells the app to navigate to a new view. If the data is already loaded by the time the presenter checks, then it immediately moves to the next view. Note that this is synchronous – we're still attaching JailView, and it's decided to go away while we're doing that.

When JailView receives ViewGroup.dispatchDetachedFromWindow(), it dispatches the detach call to its children. Then the call stack pops, and we continue on attaching JailView – which means ViewGroup.dispatchAttachedToWindow() continues on to attach JailView's children. JailView's children, which have already received a detach call in the middle of all this, now receive their attach call out of order, and end up being stored in AccessibilityNodeIdManager forever. Leak!

Repro

If you want to try this for yourself, I wrote a simple activity that reproduces the behavior and the leak:

import android.app.Activity
import android.os.Bundle
import android.util.Log
import android.view.ViewGroup
import android.widget.Button
import android.widget.FrameLayout
import leakcanary.LeakSentry

class MainActivity : Activity() {

  override fun onCreate(savedInstanceState: Bundle?) {
    super.onCreate(savedInstanceState)

    val container = object : FrameLayout(this) {
      override fun onAttachedToWindow() {
        super.onAttachedToWindow()
        Log.d("MainActivity", "Container attached")
        // Calling removeAllViews() on the parent synchronously 
        // triggers a call to onDetachedFromWindow() on this 
        // FrameLayout and recursively its children (which haven't 
        // been attached yet). 
        // Then as soon as this method (onAttachedToWindow()) 
        // is exited  then ViewGroup.dispatchAttachedToWindow() 
        // dispatches to its children (the button). 
        // As a result, the button is receiving its 
        // onDetachedFromWindow() first and then 
        // its onAttachedToWindow().
        // On Android Q Beta this creates a leak because 
        // android.view.View#onAttachedToWindow calls  
        // AccessibilityNodeIdManager.registerViewWithId(); 
        // and then that view is never detached.
        (parent as ViewGroup).removeAllViews()
      }

      override fun onDetachedFromWindow() {
        super.onDetachedFromWindow()
        Log.d("MainActivity", "Container detached")
      }
    }

    container.addView(object : Button(this) {
      override fun onAttachedToWindow() {
        super.onAttachedToWindow()
        Log.d("MainActivity", "Button attached")
      }

      override fun onDetachedFromWindow() {
        super.onDetachedFromWindow()
        Log.d("MainActivity", "Button detached")
        // This tells LeakCanary 2 to ensure the button gets GCed 
        // within 5 seconds.
        // Which it won't, ever, because at this point it's held 
        // forever by AccessibilityNodeIdManager
        LeakSentry.refWatcher.watch(this)
      }
    })

    setContentView(container)
  }
}

Now what?

Is this an Android bug? Hard to say, it's quite unexpected to have a view remove itself from its parent from within its onAttachedToWindow() parent. That being said, a lot of apps still use an MVP architecture and rely on onAttachedToWindow() to wire up presenters. And don't forget Murphy's law: Anything that can go wrong will go wrong.

I've been informed that AccessibilityNodeIdManager will use weak references in a future Android Q release and the leaks from that edge case will disappear 🎉. As for Square POS, we have deprecated our MVP architecture and are actively migrating to Workflow, which does not rely on onAttachedToWindow().