Data Lake Analytics: Job History Analysis

PowerShell scripts to help analyze trends on U-SQL jobs.

Common scenarios for listing jobs

List jobs submitted in the last five days and that successfully completed.

$d = (Get-Date).AddDays(-5)
Get-AdlJob -Account $adla -SubmittedAfter $d -State Ended -Result Succeeded

List all failed jobs submitted by "joe@contoso.com" within the past seven days.

Get-AdlJob -Account $adla -Submitter "joe@contoso.com" -SubmittedAfter (Get-Date).AddDays(-7) -Result Failed

Filtering a list of jobs

Once you have a list of jobs in your current PowerShell session. You can use normal PowerShell cmdlets to filter the list.

Filter a list of jobs to the jobs submitted in the last 24 hours

$upperdate = Get-Date
$lowerdate = $upperdate.AddHours(-24)
$jobs | Where-Object { $_.EndTime -ge $lowerdate }

Filter a list of jobs to the jobs that ended in the last 24 hours

$upperdate = Get-Date
$lowerdate = $upperdate.AddHours(-24)
$jobs | Where-Object { $_.SubmitTime -ge $lowerdate }

Filter a list of jobs to the jobs that started running.

A job might fail at compile time - and so it never starts. Let's look at the failed jobs that actually started running and then failed.

$jobs | Where-Object { $_.StartTime -ne $null }

Analyzing a list of jobs

Use the Group-Object cmdlet to analyze a list of jobs.

Count the number of jobs by Submitter

$jobs | Group-Object Submitter | Select -Property Count,Name

Count the number of jobs by Result

$jobs | Group-Object Result | Select -Property Count,Name

Count the number of jobs by State

$jobs | Group-Object State | Select -Property Count,Name

Count the number of jobs by DegreeOfParallelism

$jobs | Group-Object DegreeOfParallelism | Select -Property Count,Name

When performing an analysis, it can be useful to add properties to the Job objects to make filtering and grouping simpler. The following snippet shows how to annotate a JobInfo with calculated properties.

function annotate_job( $j )
{
    $dic1 = @{
        Label='AUHours';
        Expression={ ($_.DegreeOfParallelism * ($_.EndTime-$_.StartTime).TotalHours)}}
    $dic2 = @{
        Label='DurationSeconds';
        Expression={ ($_.EndTime-$_.StartTime).TotalSeconds}}
    $dic3 = @{
        Label='DidRun';
        Expression={ ($_.StartTime -ne $null)}}

    $j2 = $j | select *, $dic1, $dic2, $dic3
    $j2
}

$jobs = Get-AdlJob -Account $adla -Top 10
$jobs = $jobs | %{ annotate_job( $_ ) }