Azure ML Studio - Build a Predictive ML Model

Updated: Feb 25

This post demonstrates how to use Azure ML Studio to build a targeted marketing campaign for Adventure Works, by predicting if a customer is likely to buy a bicycle or not.

Required:


A SQL pool pre-loaded with AdventureWorksDW2017 sample data.

Click here to Download the AdventureWorksDW2017.

I restored the AdventureWorksDW2017.bak to my local instance of SQL.

I then deployed it to my Azure SQL Server by right clicking on the AdventureWorksDW2017 Database > All Tasks > Deploy Database to Microsoft SQL Azure Database. Edit your Azure SQL connection string and deploy the DB.



The data we will use is in the dbo.vTargetMail view in the AdventureWorksDW database. To read this data:

1. Sign into Azure Machine Learning studio and click on my experiments.

2. Click +NEW on the bottom left of the screen and select Blank Experiment.

3. Enter a name for your experiment: Targeted Marketing.

4. Drag the Import data module under Data Input and output from the modules pane into the canvas.

5. Specify the details of your SQL pool in the Properties pane.

6. Specify the database query to read the data of interest.

Use the following query:


SELECT [CustomerKey]

,[GeographyKey]

,[CustomerAlternateKey]

,[MaritalStatus]

,[Gender]

,cast ([YearlyIncome] as int) as SalaryYear

,[TotalChildren]

,[NumberChildrenAtHome]

,[EnglishEducation]

,[EnglishOccupation]

,[HouseOwnerFlag]

,[NumberCarsOwned]

,[CommuteDistance]

,[Region]

,[Age]

,[BikeBuyer]

FROM [dbo].[vTargetMail]


After the experiment finishes running successfully, click the output port at the bottom of the Reader module and select Visualize to see the imported data.


To clean the data, drop some columns that are not relevant for the model. To do this:

1. Drag the Select Columns in Dataset module under Data Transformation < Manipulation into the canvas. Connect this module to the Import Data module.

2. Click Launch column selector in the Properties pane to specify which columns you wish to drop.


Exclude two columns: CustomerAlternateKey and GeographyKey.


We will split the data 80-20: 80% to train a machine learning model and 20% to test the model. We will make use of the "Two-Class Boosted Decision Tree" algorithm for this binary classification problem.


1. Drag the Split module into the canvas.

2. In the properties pane, enter 0.8 for Fraction of rows in the first output dataset.


Drag the Two-Class Boosted Decision Tree module into the canvas.

Drag the Train Model module into the canvas and specify inputs by connecting it to the Two-Class Boosted Decision Tree (ML algorithm) and Split (data to train the algorithm on) modules.


Then, click Launch column selector in the Properties pane. Select the BikeBuyer column as the column to predict.


Drag the Score Model module into the canvas and connect it to Train Model and Split Data modules.


To look at what the model predicted, click on output port of the Score Model and click Visualize.

You will see two more columns added to your test dataset.

· Scored Probabilities: the likelihood that a customer is a bike buyer.

· Scored Labels: the classification done by the model – bike buyer (1) or not (0). This probability threshold for labeling is set to 50% and can be adjusted.

Comparing the column BikeBuyer (actual) with the Scored Labels (prediction), you can see how well the model has performed.


Create a Table to export the Results – Run the following SQL Code on your Azure SQL Server from SQL Management Studio


USE AdentureWorksDW

GO


CREATE TABLE [dbo].[ML_vTargetMail](

[CustomerKey] [int] NOT NULL,

[Scored Labels] [bit] NOT NULL,

[Scored Probabilities] [decimal](7, 6) NOT NULL

) ON [PRIMARY]

GO


Drag the Export Data module into the canvas, create a connection to your AdventureWorksDW2017 Database.


Comma Seperated list of columns - CustomerKey,Scored Labels,Scored Probabilities

Data Table Name - ML_vTargetMail

Comma Seperated List of Datatable Columns - CustomerKey,[Scored Labels],[Scored Probabilities]


Run the Experiment – In your Azure SQL Server, you will see the results written to the table ML_vTargetMail.


Right click on the Train Model module > Trained Model > Save as Trained Model.

You now have a trained Model that can be used for Targeted Mail Campaigns given a new subset of data.



27 views0 comments