Day by day, the amount of data and information on the internet is growing exponentially, new websites, new images are coming up every second. So, how an e-Commerce organisation can make best out of this huge data? Here, Web Analytics comes into play.

In this blog, we are going to learn the below:

  • What is Web Analytics?
  • Metrics used for Web Analytics.
  • What is Predictive Web Analytics
  • Steps to perform Predictive Web Analytics
  • Case Study: Build a Predictive Model using Web Data
  • Conclusion

So, lets start…

What is Web Analytics

Web analytics is the collection, reporting, and analysis of website data, to identify the measures based on your organizational and user goals and using the website data to determine the success or failure of those goals and to derive a strategy to achieve the goal and improve the user’s experience.

Metrics used for Web Analytics:

Some common web metrics that Web Analytics experts track include:

  • Number of visitors a website receives.
  • From where web traffic is coming.
  • Timer spent by the user on each page
  • What links are and are not clicked on
  • How well a website performs in search engine results

For more details, please visit:

https://www.cooladata.com/wiki/display/webanalyticsbi/Web+Analytics+Metrics

What is Predictive Web Analytics

Based on these collected metrics, we can predict certain customer behaviour using predictive modelling, so that we can take corrective actions in order to achieve our target.

So, Predictive Analytics is a set of methodologies that assist us in anticipating customer behaviour. Some of the reason why you should opt for Predictive Analytics strategies are as below:

  • Traditional Web Analytics tools generate tons of click stream data, from where Predictive Analytics helps filter out the noise and go beyond aggregate level metrics.
  • Analytical models helps you understand the complex patterns between the various data points, which can become the basis of your decision-making process.
  • It helps you to prepare a data driven marketing plan, to allocate proper investments.

Few examples of Predictive Web Analytics includes below:

The company uses predictions performed by a predictive model to maximize engagement rates for its content, by sending personalized messages to the target audiences, which helps to increase the ROI for its content marketing efforts.

The same principle can be applied to run tighter email campaigns. You can let machine-learning algorithms personalize subject lines according to demographics, time of the day, and other factors.

Steps to perform Predictive Web Analytics

Now, in order to perform Predictive Analytics, you will require the following:

Objective: The business problem that we want to solve.

Data: Having the right data required to solve the business problem. If you have a user centric business model, where you can get rich data regarding your customers behaviour.

Methodology: Once you have the data and a clear objective, you can start thinking about the statistical method you will use to build the prediction model. For example: through Cluster Analysis, we can group the users having similar behaviour and can plan a marketing strategy to acquire those customers or through logistic analysis we can predict which customer can buy a plan or not.

Tool: There are a variety of predictive analytics tools available. KNIME, RStudio, Alteryx Platform, MATLAB, IBM SPSS, Python and SAP Analytics Cloud are few names among these. We have to select the right tool which is most suitable for our in-house analytics talent pool and allocated budget, keeping in our mind.

Here, we are going to build a predictive model, using the data set published in UCI Machine Learning.

Case Study

We are going to build a Predictive Model using customer visits data over a website.

Please refer here to get the dataset and details about it.

Dataset Information:

The dataset consists of data points belongs to 12,330 sessions of customer visits to the website. The dataset was formed so that each session would belong to a different user in a 1-year period to avoid any tendency to a specific campaign, special day, user profile, or period.

Attribute Information:

The dataset consists of 10 numerical and 8 categorical attributes.
‘Revenue’: Class level. Possible values: False and True.

“Administrative”, “Administrative Duration”: Represent the Administrative pages visited by the visitor in that session and total time spent in each of this page category.

“Informational”, “Informational Duration”: Represent the Information related pages visited by the visitor in that session and total time spent in each of this page category.

“Product Related” and “Product Related Duration”: Represent the Product Related pages visited by the visitor in that session and total time spent in each of this page category.

“Bounce Rate” refers to the percentage of visitors who enter the site from that page and then leave without triggering any other requests to the analytics server during that session.

“Exit Rate” depicts the percentage of exits on a page.

“Page Value” feature represents the average value for a web page that a user visited before completing an e-commerce transaction.

“Special Day” feature indicates the closeness of the site visiting time to a specific special day.

The dataset also includes some other features such as operating system, browser, region, traffic type, visitor type as returning or new visitor, a Boolean value indicating whether the date of the visit is weekend, and month of the year.

Objective: To build a predictive model, which shall decide whether the customer will buy or not, means the variable: Revenue shall be the Response Variable and others are the Predictor Variables.

Step 1: Import all the required libraries

Python

Step 2: Upload the required dataset

Python

Step 3: Get the size of the dataset

Python

Step 4: Get first 10 records from the dataset

Python

Step 5: Get the descriptive statistics of the dataset

HTML

Step 6: Count of Missing values

Python

Step 7: Plotting the Percentage of customers have brought Revenue. ‘True’ means customer has bought the product and ‘False’ means customer didnot buy the product.

Python

Step 8: Distribution of VisitorType

Python
Python

Step 9: Percentage distribution of ‘VisitorType’ over the ‘Weekend’

Python

Step 10: Distribution of Revenue (Buy or Not) for different Traffic Types

Python

Step 11: Distribution of Customers based on Different Traffic Type Codes

Python

Step 12: Distribution of Customers based on Region Codes

Python

Step 13: Distribution of Customers over OperatingSystems

Python

Step 14: Distribution of Customers over Months

Python

Step 15: Distribution of Pagevalues over Revenue. seaborn.stripplot draws a scatterplot where one variable is categorical.

Python

Step 16: Distribution of Revenue over BounceRates

Python

Step 17: Distribution of TrafficType over Revenue

Python

Step 18: Distribution of Region over Revenue

Python

Step 19: Linear Regression plot between Administrative and Informational

Python

Step 20: Multi-variate analysis.

Month vs Pagevalues wrt Revenue

Python

Step 22: month vs bouncerates wrt revenue

Python

Step 23: visitor type vs exit rates w.r.t revenue

Python
Python

Step 24: The goal of cluster analysis in marketing is to accurately segment customers in order to achieve more effective customer marketing via personalization. A common cluster analysis method is a mathematical algorithm known as k-means cluster analysis, sometimes referred to as scientific segmentation.

Cluster of customers Administrative Duration vs Bounce Rate. We have considered columns 1 as Administrative Duration and column 6 as Bounce Rate. Total we have built 11 clusters.

WCSS: One measurement is Within Cluster Sum of Squares (WCSS), which measures the squared average distance of all the points within a cluster to the cluster centroid. To calculate WCSS, you first find the Euclidean distance (see figure below) between a given point and the centroid to which it is assigned.

Here, Elbow method is a graph between WCSS and No.of Clusters.

Python

Step 25: The maximum bend is at third index, that is the number of Optimal no. of Clusters for Adminstrative Duration and Revenue is Three. plotting the clusters

Python

Step 26: We have considered columns 3 as Informational Duration and column 6 as Bounce Rate.

Python

Step 27: Here, we have 2 clusters

Python

Step 28: From where customer comes: Region vs Traffic Type

Python
Python

Step 29: Data Preprocessing to build Random Forest classifier and Logistic Regression. Here, we want to predict whether the customer will buy or not. So, we have used binary classifier.

Python
Python
Python

Step 30: splitting the data between train and test sets

Python

Step 31: RandomForest classifier model Building

Python

Step 32: Confusion Matrix. Model accuracy is 89%.

Python
Python

Step 33: Plotting the ROC curve for Random Forest

Python

Step 34: Saving the predictions of of Random Forest model into a dataframe, which can later be written in a .csv file, so that we can know from which customer we will get the revenue.

Python

Step 35: Building Logistic Regression model

Python

Step 36: Printing Confusion Matrix

Python

Step 37: Plotting Confusion Matrix

Python

Step 38: Printing the Classification Report Accuracy of Logistic Regression is 87%

Python

Step 39: Plotting ROC curve for Logistic Regression

Python

Step 40: Saving the predictions of of Logistic Regression model into a dataframe

Python

Step 41: Plotting ROC curve for both Random Forest and Logistic Regression

Python

conclusion

In this blog, we learnt, about Predictive Web Analytics, various metrics used for this , took a case study, performed Data Visualizations, made clusters based on customer behaviors, built two predictive models: Random Forest classifier and Logistic classifier, compared performance of both the models using Confusion Matrix and ROC curve and also wrote the predictions from both the models into respective data-frames, so that the business decision makers can know the exact customers who will generate the revenue and who will not, by writing those prediction outputs into csv files.

Hope, you enjoyed this article.

So, what’s for you…. Please come up with the performance tuning of both of the models and let me know the metrics in the comment’s box…

See you in our next blog…till then, Happy Learning…Stay tuned!