Analyze Website logs using Azure Data Lake Analytics

Learn how to analyze website logs using Data Lake Analytics, especially on finding out which referrers ran into errors when they tried to visit the website.

Important

Azure Data Lake Analytics retired on 29 February 2024. Learn more with this announcement.

For data analytics, your organization can use Azure Synapse Analytics or Microsoft Fabric.

Prerequisites

Connect to Azure

Before you can build and test any U-SQL scripts, you must first connect to Azure.

To connect to Data Lake Analytics

  1. Open Visual Studio.
  2. Select Data Lake > Options and Settings.
  3. Select Sign In, or Change User if someone has signed in, and follow the instructions.
  4. Select OK to close the Options and Settings dialog.

To browse your Data Lake Analytics accounts

  1. From Visual Studio, open Server Explorer by press CTRL+ALT+S.
  2. From Server Explorer, expand Azure, and then expand Data Lake Analytics. You shall see a list of your Data Lake Analytics accounts if there are any. You can't create Data Lake Analytics accounts from the studio. To create an account, see Get Started with Azure Data Lake Analytics using Azure portal or Get Started with Azure Data Lake Analytics using Azure PowerShell.

Develop U-SQL application

A U-SQL application is mostly a U-SQL script. To learn more about U-SQL, see Get started with U-SQL.

You can add addition user-defined operators to the application. For more information, see Develop U-SQL user defined operators for Data Lake Analytics jobs.

To create and submit a Data Lake Analytics job

  1. Select the File > New > Project.

  2. Select the U-SQL Project type.

    new U-SQL Visual Studio project

  3. Select OK. Visual studio creates a solution with a Script.usql file.

  4. Enter the following script into the Script.usql file:

    // Create a database for easy reuse, so you don't need to read from a file very time.
    CREATE DATABASE IF NOT EXISTS SampleDBTutorials;
    
    // Create a Table valued function. TVF ensures that your jobs fetch data from he weblog file with the correct schema.
    DROP FUNCTION IF EXISTS SampleDBTutorials.dbo.WeblogsView;
    CREATE FUNCTION SampleDBTutorials.dbo.WeblogsView()
    RETURNS @result TABLE
    (
        s_date DateTime,
        s_time string,
        s_sitename string,
        cs_method string,
        cs_uristem string,
        cs_uriquery string,
        s_port int,
        cs_username string,
        c_ip string,
        cs_useragent string,
        cs_cookie string,
        cs_referer string,
        cs_host string,
        sc_status int,
        sc_substatus int,
        sc_win32status int,
        sc_bytes int,
        cs_bytes int,
        s_timetaken int
    )
    AS
    BEGIN
    
        @result = EXTRACT
            s_date DateTime,
            s_time string,
            s_sitename string,
            cs_method string,
            cs_uristem string,
            cs_uriquery string,
            s_port int,
            cs_username string,
            c_ip string,
            cs_useragent string,
            cs_cookie string,
            cs_referer string,
            cs_host string,
            sc_status int,
            sc_substatus int,
            sc_win32status int,
            sc_bytes int,
            cs_bytes int,
            s_timetaken int
        FROM @"/Samples/Data/WebLog.log"
        USING Extractors.Text(delimiter:' ');
        RETURN;
    END;
    
    // Create a table for storing referrers and status
    DROP TABLE IF EXISTS SampleDBTutorials.dbo.ReferrersPerDay;
    @weblog = SampleDBTutorials.dbo.WeblogsView();
    CREATE TABLE SampleDBTutorials.dbo.ReferrersPerDay
    (
        INDEX idx1
        CLUSTERED(Year ASC)
        DISTRIBUTED BY HASH(Year)
    ) AS
    
    SELECT s_date.Year AS Year,
        s_date.Month AS Month,
        s_date.Day AS Day,
        cs_referer,
        sc_status,
        COUNT(DISTINCT c_ip) AS cnt
    FROM @weblog
    GROUP BY s_date,
            cs_referer,
            sc_status;
    

    To understand the U-SQL, see Get started with Data Lake Analytics U-SQL language.

  5. Add a new U-SQL script to your project and enter the following:

    // Query the referrers that ran into errors
    @content =
        SELECT *
        FROM SampleDBTutorials.dbo.ReferrersPerDay
        WHERE sc_status >=400 AND sc_status < 500;
    
    OUTPUT @content
    TO @"/Samples/Outputs/UnsuccessfulResponses.log"
    USING Outputters.Tsv();
    
  6. Switch back to the first U-SQL script and next to the Submit button, specify your Analytics account.

  7. From Solution Explorer, right select Script.usql, and then select Build Script. Verify the results in the Output pane.

  8. From Solution Explorer, right select Script.usql, and then select Submit Script.

  9. Verify the Analytics Account is the one where you want to run the job, and then select Submit. Submission results and job link are available in the Data Lake Tools for Visual Studio Results window when the submission is completed.

  10. Wait until the job is completed successfully. If the job failed, it's most likely missing the source file. See the Prerequisite section of this tutorial. For more troubleshooting information, see Monitor and troubleshoot Azure Data Lake Analytics jobs.

    When the job is completed, you shall see the following screen:

    data lake analytics analyze weblogs website logs

  11. Now repeat steps 7- 10 for Script1.usql.

To see the job output

  1. From Server Explorer, expand Azure, expand Data Lake Analytics, expand your Data Lake Analytics account, expand Storage Accounts, right-click the default Data Lake Storage account, and then select Explorer.
  2. Double-click Samples to open the folder, and then double-click Outputs.
  3. Double-click UnsuccessfulResponses.log.
  4. You can also double-click the output file inside the graph view of the job in order to navigate directly to the output.

Next steps

To get started with Data Lake Analytics using different tools, see: