Questions? Feedback? powered by Olark live chat software
Přeskočit navigaci

Working with Dates in Azure DocumentDB

Publikováno dne 19 listopadu, 2014

Director of Program Management, DocumentDB
JSON (www.json.org) is a lightweight data-interchange format that is easy for humans to read and write yet also easy for machines to parse and generate. JSON is at the heart of DocumentDB. We transmit JSON over the wire, we store JSON as JSON, and we index the JSON tree allowing queries on the full JSON document. It is not something we bolt on to the product, it is core to the service. It should therefore come as no surprise that DocumentDB natively supports the same data types as JSON; String, Number, Boolean, Array, Object and Null. 1 A .NET developer might notice the omission of certain types they are used to. Most notable is probably the .NET DateTime type. So, how do we store, retrieve, and query using DateTime if the database doesn’t support these types natively? Keep reading as that is the purpose of this post … For the purposes of this post let us assume we’re working with the following sample POCO objects.
public class Order
{
    [JsonProperty(PropertyName="id")]
    public string OrderNumber { get; set; }
    public DateTime OrderDate { get; set; }      
    public DateTime ShipDate { get; set; }                
    public double Total { get; set; }
}
This is a simple demonstration of how an Order could be represented in .NET. The order has two DateTime properties, OrderDate and ShipDate. For the purposes of this post we will be focusing primarily on these two properties. The default format used by the DocumentDB SDK for handling DateTime objects is to convert them in to the ISO 8601 string format. Therefore, if we do nothing but pass our Order object to DocumentDB, as shown in the code snippet below,
var doc1 = client.CreateDocumentAsync(colLink, new Order { 
      OrderNumber = "09152014101",
      OrderDate = DateTime.UtcNow.AddDays(-30),
      ShipDate = DateTime.UtcNow.AddDays(-14), 
      Total = 113.39
});
The two .NET DateTime properties would be stored as a string similar to;
{
    "OrderDate": "2014-09-15T23:14:25.7251173Z",
    "ShipDate": "2014-09-30T23:14:25.7251173Z"
}
That string looks nice and readable like that, so why is this a problem? DocumentDB has support for range based indexes on numeric fields allowing you to do range queries, (e.g. where field > 10 and field < 20). To avoid doing costly scans when doing range queries (records older than yesterday, or orders placed last week, or orders shipped today) on dates we need to convert the string representation of a Date to a number, so that we can use range indexes on these fields. DocumentDB and JSON are neutral to how they represent DateTimes and you can use this neutrality to powerful effect to best suit your application’s needs. This post introduces two methods of dealing with DateTime properties efficiently, but there are many more implementations that would treat Dates and Times efficiently.   For the rest of the post we will be treating DateTimes as epoch values, or the number of seconds since a particular date. We are using 1 January 1970 00:00 in this post, you can use a different starting point if you like depending on your data. I.e. if you are doing orders and your system only needs to deal with orders from today, then pick today as your starting point. Our fictitious system has lots of historical orders, so we need to go back in time a little. Here is a simple .NET extension method for DateTime that will do this conversion for us;
public static class Extensions
{
    public static int ToEpoch(this DateTime date)
    {
        if (date == null) return int.MinValue;
        DateTime epoch = new DateTime(1970, 1, 1);
        TimeSpan epochTimeSpan = date - epoch;
        return (int)epochTimeSpan.TotalSeconds;
    }
}
Now how do you use this in your application? Well there are two ways to proceed, and you can choose the way that best suits the needs of your application. The first way is store a number field that represents the DateTime instead of the DateTime itself. The easiest way to do this is to implement a custom serializer & deserializer for dealing with JSON. I’m going to show doing this with JSON.NET by implementing a customer JsonConverter that changes the default behavior of a DateTime property. To do this we define a class that extends the JsonConverter abstract class and overrides the ReadJson and WriteJson methods.
public class EpochDateTimeConverter : JsonConverter
{
    ...
}
Below is the WriteJson implementation that takes  in  a .NET DateTime and outputs a number using the same ToEpoch() extension method we created above.
public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
{
    int seconds;
    if (value is DateTime)
    {
        DateTime dt = (DateTime)value;
        if (!dt.Equals(DateTime.MinValue))
            seconds = dt.ToEpoch();
        else
            seconds = int.MinValue;
    }
    else
    {
        throw new Exception("Expected date object value.");
    }

    writer.WriteValue(seconds);
}
We also have the reverse, the ReadJson method, which will be used during serialization from JSON back in to .NET. This method takes in a number representing the number of seconds since Jan 1 1970 UTC and returns the .NET DateTime equivalent.
public override object ReadJson(JsonReader reader, Type type, object value, JsonSerializer serializer)
{
    if (reader.TokenType == JsonToken.None || reader.TokenType == JsonToken.Null) 
        return null;

    if (reader.TokenType != JsonToken.Integer)
    {
        throw new Exception(
            String.Format("Unexpected token parsing date. Expected Integer, got {0}.",
            reader.TokenType));
    }

    int seconds = (int)reader.Value;
    return new DateTime(1970, 1, 1).AddSeconds(seconds);
}
To use this in an application we need to decorate our DateTime properties with the JsonConverter attribute. Now, when these properties are serialized / deserialized Json.NET knows not to work with its default but to use our custom code instead.
    public class Order
    {
        [JsonProperty(PropertyName="id")]
        public string OrderNumber { get; set; }

        [JsonConverter(typeof(EpochDateTimeConverter))]
        public DateTime OrderDate { get; set; }

        [JsonConverter(typeof(EpochDateTimeConverter))]
        public DateTime ShipDate { get; set; }

        public double Total { get; set; }
    }
The result after serialization is now a number in JSON and a number being stored in DocumentDB.
{
    "OrderDate": 1408318702,
    "ShipDate": 1408318702 
}
To do an efficient range queries on a numeric field in DocumentDB we have to define a Range Index on the path containing our numeric field, when we create the DocumentCollection. The example below shows creating a DocumentCollection with a custom IndexPolicy. We use a Numeric Precision of 7 bytes for this range index because we are dealing with numbers in the billions. For a smaller range of numbers it would be sufficient to use smaller precision levels.
var collection = new DocumentCollection
{
    Id = id
};

//set the default IncludePath to set all properties in the document to have a Hash index
collection.IndexingPolicy.IncludedPaths.Add(new IndexingPath
{
    IndexType = IndexType.Hash,
    Path = "/",
});

//now define two additional paths in Order that we know we want to do Range based queries on
collection.IndexingPolicy.IncludedPaths.Add(new IndexingPath
{
    IndexType = IndexType.Range,
    Path = "/\"OrderDate\"/?",
    NumericPrecision = 7
});

collection.IndexingPolicy.IncludedPaths.Add(new IndexingPath
{
    IndexType = IndexType.Range,
    Path = "/\"ShipDate\"/?",
    NumericPrecision = 7
});

collection = client.CreateDocumentCollectionAsync(dbLink, collection).Result;
Once this is in place, doing a query on a DateTime property is as easy as;
//convert "7 days ago" to epoch number using our ToEpoch extension method for DateTime
int epocDateTime DateTime.UtcNow.AddDays(-7).ToEpoch();

//build up the query string
string sql = string.Format("SELECT * FROM Collection c WHERE c.OrderDate > {0}", epocDateTime);

//execute the query and get the results in a List
var orders = client.CreateDocumentQuery<Order>(col.SelfLink, sql).ToList();
That’s one approach, and it works very efficiently. The downside however is that you lose the human readable date string in your database. Should another application connect to our database the custom deserializer won’t necessarily be run and the value returned is now difficult to deal with because not many humans can do an epoch conversion in their heads (I certainly can’t). The second way to implement this is to preserve the readable DateTime field but add an additional field to your document which stores the numeric representation of the DateTime in addition to the string representation. To do this, we can create a new custom type that looks like this;
public class DateEpoch
{
   public DateTime Date { get; set; }
   public int Epoch
   {
       get
       {
           return (this.Date.Equals(null) || this.Date.Equals(DateTime.MinValue))
               ? int.MinValue
               : this.Date.ToEpoch();
        }
    }
}
Now change the two DateTime properties of the Order object to this new type like this;
public class Order
{
    [JsonProperty(PropertyName = "id")]
    public string OrderNumber { get; set; }

    public DateEpoch OrderDate { get; set; }

    public DateEpoch ShipDate { get; set; }

    public double Total { get; set; }
}
With this in place, when you now pass your object to DocumentDB you will end up with JSON that looks something like this;
{
    "OrderDate": {
        "Date": "2014-09-15T23: 14: 25.7251173Z",
        "Epoch": 1408318702
    },
    "ShipDate": {
        "Date": "2014-09-30T23: 14: 25.7251173Z",
        "Epoch": 1408318702
    }
}
As you did with the first technique you have to define a custom IndexPolicy on your DocumentCollection except this time we add the range path on the Epoch value only. You could exclude the date string path from the index if you like but keeping the default hash index still allows for equality operations if you want to do that.
var collection = new DocumentCollection
{
    Id = id
};

//set the default IncludePath to set all properties in the document to have a Hash index
collection.IndexingPolicy.IncludedPaths.Add(new IndexingPath
{
    IndexType = IndexType.Hash,
    Path = "/",
});

//now define two additional paths in Order2 that we know we want to do Range based queries on
collection.IndexingPolicy.IncludedPaths.Add(new IndexingPath
{
    IndexType = IndexType.Range,
    Path = "/\"OrderDate\"/\"Epoch\"/?",
    NumericPrecision = 7
});

collection.IndexingPolicy.IncludedPaths.Add(new IndexingPath
{
    IndexType = IndexType.Range,
    Path = "/\"ShipDate\"/\"Epoch\"/?",
    NumericPrecision = 7
});

//could also exclude the Date portion of the dates in Order2 as we're never going to 
//index on these, but will leave these there because they're still indexed with a hash 
//so you could do equality operations on them.

collection = client.CreateDocumentCollectionAsync(dbLink, collection).Result;
Now to query with this approach you could execute the following LINQ query, I chose to demonstrate LINQ here and not SQL as in the previous example, because this is another advantage of this approach.
var orders = from o in client.CreateDocumentQuery<Order2>(col.SelfLink)
    where o.OrderDate.Epoch >= DateTime.Now.AddDays(-7).ToEpoch()
    select o;
Or, you can choose the following LINQ Lambda syntax if you favor that;
var orders2 = client.CreateDocumentQuery<Order2>(col.SelfLink)
    .Where(o => o.OrderDate.Epoch >= DateTime.UtcNow.AddDays(-7).ToEpoch())
    .ToList();
And of course the SQL syntax we used earlier would also be valid and give the same result and is what you would use when LINQ was not available to you;
string sql = String.Format("SELECT * FROM c WHERE c.OrderDate.Epoch >= {0}", 
                 DateTime.UtcNow.AddDays(-7).ToEpoch());
This second approach has two main advantages over the first. Firstly it does not rely on a custom serialization technique for specific tool like JSON.NET so it can be used with other JSON serializers or with other languages beside just .NET. This technique would work equally well in say Node.JS or in Python. Secondly because we keep the human readable DateTime value as a string if another application queries your data, humans will still have a readable version of the date to work with. This approach requires more storage and results in slightly larger documents as a result, but if you are not sensitive about the storage costs or the size of the document being too big then I favor this approach over the first for the benefits listed above. And that’s it, DateTime properties in DocumentDB. Done If you would like the sample code for this blog post you can download it from here To get started with Azure DocumentDB head over to the service documentation page on azure.com  => https://azure.microsoft.com/en-us/documentation/services/documentdb/ where you can find everything you need to get up & running.