• 4 min read

ONNX Runtime is now open source

Today we are announcing we have open sourced Open Neural Network Exchange (ONNX) Runtime on GitHub. ONNX Runtime is a high-performance inference engine for machine learning models in the ONNX format on Linux, Windows, and Mac.

ONNX Runtime LogoToday we are announcing we have open sourced Open Neural Network Exchange (ONNX) Runtime on GitHub. ONNX Runtime is a high-performance inference engine for machine learning models in the ONNX format on Linux, Windows, and Mac.

ONNX is an open format for deep learning and traditional machine learning models that Microsoft co-developed with Facebook and AWS. The ONNX format is the basis of an open ecosystem that makes AI more accessible and valuable to all: developers can choose the right framework for their task, framework authors can focus on innovative enhancements, and hardware vendors can streamline optimizations for neural network computations.

Microsoft has been conducting research in AI for more than two decades and incorporating machine learning and deep neural networks in a plethora of products and services. With teams using many different training frameworks and targeting different deployment options, there was a real need to unify these scattered solutions to make it quick and simple to operationalize models. ONNX Runtime provides that solution. It gives data scientists the flexibility to train and tune models in the framework of their choice and productionize these models with high performance in products spanning both cloud and edge.

Why use ONNX Runtime

ONNX Runtime is the first publicly available inference engine with full support for ONNX 1.2 and higher including the ONNX-ML profile. This means it is advancing directly alongside the ONNX standard to support an evolving set of AI models and technological breakthroughs.

At Microsoft, teams are using ONNX Runtime to improve the scoring latency and efficiency for many of our models used in core scenarios in Bing Search, Bing Ads, Office productivity services, and more. For models we’ve converted to ONNX, we’ve seen average performance improve by 2X compared to scoring in their existing solutions. ONNX Runtime is also incorporated in other Microsoft offerings including Windows ML and ML.net.

ONNX Runtime is lightweight and modular in design, with the CPU build only a few megabytes in size. The extensible architecture enables optimizers and hardware accelerators to provide low latency and high efficiency for computations by registering as “execution providers.” The result is smoother end-to-end user experiences with lower perceived latency, as well as cost savings from decreased machine utilization and higher throughput.

Deep support from industry partners

Leading companies in the ONNX community are actively working or planning to integrate their technology with ONNX Runtime. This enables them to support the full ONNX specification while achieving the best performance.

Microsoft and Intel are working together to integrate the nGraph Compiler as an execution provider for the ONNX Runtime. The nGraph Compiler is capable of accelerating both existing and upcoming hardware targets by applying both non-device specific and device specific optimizations. Using the nGraph Compiler for CPU inference achieves up to 45x performance boost as compared to native frameworks.

NVIDIA is helping integrate TensorRT with ONNX Runtime to offer an easy workflow for deploying a rapidly growing set of models and apps on NVIDIA GPUs while achieving the best performance possible. NVIDIA TensorRT includes a high-performance inference optimizer and runtime that delivers dramatically higher throughput at minimal latency across applications such as recommenders, natural language processing, and image/video processing.

Qualcomm, another early advocate of ONNX, has also expressed support for ONNX Runtime. “The introduction of ONNX Runtime is a positive next step in further driving framework interoperability, standardization, and performance optimization across multiple device categories and we expect developers to welcome support for ONNX Runtime on Snapdragon mobile platforms,” says Gary Brotman, senior director of AI product management at Qualcomm Technologies, Inc.

After joining ONNX recently, leading IoT chip maker NXP also announced support for ONNX Runtime. “When it comes to choosing from among the many machine learning frameworks, we want our customers to have maximum flexibility and freedom,” says Markus Levy, head of the AI Technology Center at NXP. “We’re happy to bring the ONNX benefits to our customer community of ML developers by supporting the ONNX Runtime released by Microsoft in our platform.”

In addition to hardware partners, framework provider Preferred Networks is also leveraging ONNX Runtime. “Preferred Networks, in addition to developing the deep learning framework Chainer, has created Menoh, an ONNX inference engine wrapper library for multiple programming languages,” says Toru Nishikawa, President and CEO of Preferred Networks, Inc. “Menoh will use ONNX Runtime as its main backend, and Chainer currently uses ONNX Runtime to test its ONNX export features. Preferred Networks is delighted that Microsoft has made ONNX Runtime and looks forward to working on ONNX with Microsoft in the future.”

How to use ONNX Runtime

First, you’ll need an ONNX model. Don’t have an ONNX model? No problem. The beauty of ONNX is the framework interoperability enabled through a multitude of tools.

  • You can get pretrained versions of popular models like ResNet and TinyYOLO directly from the ONNX Model Zoo.
  • You can create your own customized computer vision models using Azure Custom Vision Cognitive Service.
  • If you already have models in TensorFlow, Keras, Scikit-Learn, or CoreML format, you can convert them using our open source converters (ONNXMLTools and TF2ONNX).
  • You can train new models using Azure Machine Learning service and save into ONNX format.

To use ONNX Runtime, just install the package for your desired platform and language of choice or create a build from the source. ONNX Runtime supports both CPU and GPU (CUDA) with Python, C#, and C interfaces that are compatible on Linux, Windows, and Mac. Check GitHub for installation instructions.

You can integrate ONNX Runtime into your code directly from source or from precompiled binaries, but an easy way to operationalize it is to use Azure Machine Learning to deploy a service for your application to call.

Get involved

The release of ONNX Runtime marks a significant step in our endeavor towards an open and interoperable ecosystem for AI, and we are extremely excited about the enthusiasm and support from the community thus far. We hope this makes it easier to drive product innovation in AI and strongly encourage the development community to try it out. We are continuously evolving and improving ONNX Runtime, and we look forward to your feedback and contributions to this very exciting area!

Have feedback or questions? File an issue on Github, and follow us on Twitter.