略過導覽

Build richer applications with the new asynchronous Azure Storage SDK for Java

在 七月 24, 2018 上貼文

Program Manager

Cloud scale applications typically require high concurrency to achieve desired performance when accessing remote data. The new Storage Java SDK simplifies building such applications by offering asynchronous operations, eliminating the need to create and manage a large thread-pool. This new SDK uses the RxJava reactive programming model for asynchronous operations, also relying on Netty HTTP client for REST requests. Get started with the Azure Storage SDK for Java now.

Azure Storage SDK v10 for Java adopts the next-generation Storage SDK design providing thread-safe types that were introduced earlier with the Storage Go SDK release. This new SDK is built to effectively move data without any buffering on the client, and provides interfaces close to the ones in the Storage REST APIs. Some of the improvements in the new SDK are:

  • Asynchronous programming model with RxJava
  • Low-level APIs consistent with Storage REST APIs
  • New high-level APIs built for convenience
  • Thread-safe interfaces
  • Consistent versioning across all Storage SDKs

Asynchronous programming model with RxJava

Now that the Storage SDK supports RxJava it is easier to build event driven applications. This is because it allows you to compose sequences together with the observer pattern. The following sample, that uploads a directory of xml files as they are found, depicts this pattern:

// Walk the directory and filter for .xml files
Stream<Path> walk = Files.walk(filePath).filter(p -> p.toString().endsWith(".xml"));

// Upload files found asynchronously into Blob storage in 20 concurrent operations
Observable.fromIterable(() -> walk.iterator()).flatMap(path -> {
    BlockBlobURL blobURL = containerURL.createBlockBlobURL(path.getFileName().toString());

    FileChannel fc = FileChannel.open(path);
    return TransferManager.uploadFileToBlockBlob(
        fc, blobURL,
            BlockBlobURL.MAX_PUT_BLOCK_BYTES, null)
        .toObservable()
        .doOnError(throwable -> {
             if (throwable instanceof RestException) {
                 System.out.println("Failed to upload " + path + " with error:" + ((RestException) throwable).response().statusCode());
             } else {
                 System.out.println(throwable.getMessage());
             }
         })
         .doAfterTerminate(() -> {
              System.out.println("Upload of " + path + " completed");
              fc.close();
          });

    }, 20)  // Max concurrency of 20 - this is usually determined based on the number of cores you have in your environment
    .subscribe();

The full sample is located at the Azure Storage Java SDK samples repository.

The sample above calls TransferManager.uploadFileToBlockBlob, a high-level API, as the Observable emits signals, in this case java.nio.file.Path type. By using flatMap, we can configure the maximum concurrent connections, which is set to 20 in this example. If we were to upload these files using the Azure Storage SDK v7, we would have to create threads (up to 20) and manage them, whereas in the example above RxJava manages the threadpool uploading the same data set concurrently in a lot fewer threads, which is more resource efficient.

For more information, read about RxJava and Reactive programming model.

Low-level APIs consistent with storage REST APIs

Low-level APIs exist on the URL types, e.g. BlockBlobURL, and are designed to be simple wrappers around the REST APIs providing convenience but no hidden behavior. Each call to these low-level APIs guarantees exactly one REST request sent (excluding retries). Further, the names on these types have been updated to make their behavior more clear. For example, PutBlob is now Upload, PutBlock is now StageBlock, and PutBlockList is now CommitBlockList, and more.

BlockBlobURL blobURL = containerURL.createBlockBlobURL("mysampledata");
 
String data = "Hello world!";
blobURL.upload(Flowable.just(ByteBuffer.wrap(data.getBytes())), data.length(), null, null, null)
    .subscribe(blockBlobsUploadResponse -> {
        System.out.println("Status code: " + blockBlobsUploadResponse.statusCode());
    }, throwable -> {
        System.out.println("Throwable: " + throwable.getMessage());
    });

New high-level APIs, built for convenience

The TransferManager class is where we provide convenient high-level APIs that internally call the other lower-level APIs. For example, the uploadFileToBlockBlob method can upload a 1GB file by internally making 10 x StageBlock calls (with each block configured as 100MB in size) followed by 1 call to CommitBlockList to atomically commit the uploaded blocks in the Blob service.

Single response = TransferManager.uploadFileToBlockBlob(
        FileChannel.open(filePath), blobURL,
        BlockBlobURL.MAX_PUT_BLOCK_BYTES, null)
        .doOnError(throwable -> {
            if (throwable instanceof RestException) {
                System.out.println("Failed to upload " + filePath + " with error:" + ((RestException) throwable).response().statusCode());
            } else {
                System.out.println(throwable.getMessage());
            }
        })
        .doAfterTerminate(() -> System.out.println());

response.subscribe(commonRestResponse -> {System.out.println(commonRestResponse.statusCode());});

Thread-safe interfaces

Earlier Storage SDKs (version 9 or earlier) offered objects such as CloudBlockBlob and CloudBlobContainer which weren't thread safe and were mutable in such a way that could result in issues during runtime. The new Storage SDKs (v10 or above) provide interfaces closer to the Storage REST APIs and most of the associated objects are immutable allowing them to be shared.

For instance, when you want to perform an operation on a blob (e.g., http://account.blob.core.windows.net/container/myblob), you construct a BlockBlobURL object with the Blob URI and all the associated REST API operations are methods of that type. You can call Upload, StageBlock and CommitBlockList on that URI object. These methods all return a Single (io.reactivex.Single) of RestResponse<THeaders, TBody> that wraps the REST API response, which is immutable. It does not modify the created instance of the BlockBlobURL type.

New Storage SDK versions

The new SDKs follow a new versioning strategy that is tied to the Storage Service REST API version. Version 10, the current release, will be tied to the Storage REST API version 2018-03-28. All the new Storage SDKs across all programming languages will use V10 for REST API 2018-03-28 release so it is easy for you to navigate through different versions. When a new SDK is released supporting the next REST API release, its major version will be bumped (v10 to V11 for instance) regardless of any breaking changes in the client. This is mainly due to possible behavior changes in the Service when moving from one REST API version to another.

Versions earlier than 10 will be reserved for the older Storage SDK design. Any Storage SDK with the version 10 or later will adopt the new SDK design.

Get started now

To get started with the Azure Storage SDK v10 for Java, use the following Blob maven package (File and Queue coming soon).

<dependency>
     <groupid>com.microsoft.azure</groupid>
     <artifactid>azure-storage-blob</artifactid>
     <version>10.0.1-Preview</version>
</dependency>  

Here are a few helpful links to help you get started:

Roadmap

Azure Storage SDK v10 for Java is currently in Preview and supports Blob storage only. We'll be releasing a few updates soon adding more functionality based on user feedback. So please check it out and let us know your feedback on GitHub. Here are a few of the major changes that are scheduled for a release soon:

  • Support for 2018-03-28 coming very soon
  • Support for Queue, File services
  • GA release