Using neural networks to predict customers’ needs

Using neural networks to predict customers’ needs

Deep learning for browsing and path analysis.

Square provides a powerful analytics tool called Seller Dashboard that gives our sellers insights into, and tracking of, their businesses. Using this tool, Square sellers can view their daily sales summary, manage inventory, confirm deposits, and do much more to untangle complicated tasks they face every day.

The primary goal of the Seller Dashboard team here at Square is to promptly address our sellers’ problems when they arise, while also continuing to enhance and build new features for this product. In order to do this well, we first have to understand whether Seller Dashboard is being used effectively by our sellers.

Because every seller uses their dashboard differently, providing support for all sellers’ specific problems can be time-consuming and resource-intensive for both sellers and Square. So we looked into automating custom, individualized solutions for each seller.


As sellers click through and view Square’s web pages, Square analyzes this behavior. This data includes important cues that allow us to understand how sellers use their dashboard. Given the vast amount of data involved, the biggest challenge was compressing and aggregating this data in a way that could provide important insights for the business.

We looked into various supervised and unsupervised machine learning techniques in order to analyze the web page flow data. Traditional analytic techniques, including n-gram analysis and building Markov chains, provided valuable insights about which pages are most heavily used and which are closely linked together. What turned out to be most fruitful in learning the context of this complicated customer behavior, however, was a neural network model that could detect usage patterns indicative of sellers experiencing a problem.


Artificial neural networks, mimicking a behavior of biological neurons, have been a powerful tool for pattern recognition, and have been utilized in recent years with improved performance over traditional statistical learning tools. Feed-forward neural networks perform well for computer vision and speech recognition by picking up various spatial features of data, while recurrent neural networks are powerful tools that allow learning dependencies in sequential data, such as text and music.

I was inspired by François Chollet, a primary author of one of the most popular open-source neural network library called Keras, using a subset of recurrent neural networks, LSTM (Long-Short Term Memory) to classify IMDb movie reviews*. In this research example, the sentiment of a review could be extracted with high precision from the user-inputted free form text.

The problem at hand was similar in that we wanted a binary classifier that could detect signals from a series. Think of each page in the website as a token of words, and whether someone needs help as positive or negative outcome, and we have the exact same problem the aforementioned research strives to solve. There were some final hurdles to jump through, including the fact that we had an imbalanced dataset, as only a small fraction of the population requiring help while using our product. This was mitigated by exploring traditional resampling options, such as undersampling the majority class or oversampling the minority class, to create a more balanced sample before training our model.


Predicting human behavior is a rather difficult problem to solve. This is because everyone reacts differently, even if they look like they are performing the same set of behaviors. For example, some people actively seek help through support documents; others reach out to colleagues, or find other ways to resolve their problems. Given this fact, we were able to detect signals for support with a high recall rate of 66%, while keeping our fall-out rates at around 24%. This is a huge improvement on 5% recall we would otherwise get with random guesses. It also significantly outperforms Markov chain models and heuristic approaches.

Figure 1. ROC curve for Model PerformanceFigure 1. ROC curve for Model Performance


The ability to correctly identify signals from the page flow is highly valuable outside the scope of this analysis. Other than promptly helping other sellers with better support, this methodology could be used to collect additional signals and strengthen Square’s focus on automation to enhance customer experience.


View More Articles ›