• 5 min read

Compute and stream IoT insights with data-driven applications

There is a lot more data in the world than can possibly be captured with even the most robust, cutting-edge technology. Edge computing and the Internet of Things (IoT) are just two examples of technologies increasing the volume of useful data.

There is a lot more data in the world than can possibly be captured with even the most robust, cutting-edge technology. Edge computing and the Internet of Things (IoT) are just two examples of technologies increasing the volume of useful data. There is so much data being created that the current telecom infrastructure will struggle to transport it and even the cloud may become strained to store it. Despite the advent of 5G in telecom, and the rapid growth of cloud storage, data growth will continue to outpace the capacities of both infrastructures. One solution is to build stateful, data-driven applications with technology from SWIM.AI.

The Azure platform offers a wealth of services for partners to enhance, extend, and build industry solutions. Here we describe how one Microsoft partner uses Azure to solve a unique problem.

Shared awareness and communications

The increase in volume has other consequences, especially when IoT devices must be aware of each other and communicate shared information. Peer-to-peer (P2P) communications between IoT assets can overwhelm a network and impair performance. Smart grids are an example of how sensors or electric meters are networked across a distribution grid to improve the overall reliability and cost of delivering electricity. Using meters to determine the locality of issues can help improve service to a residence, neighborhood, municipality, sector, or region. The notion of shared awareness extends to vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications. As networked AI spreads to more cars and devices, so do the benefits of knowing the performance or status of other assets. Other use cases include:

  • Traffic lights that react to the flow of vehicles across a neighborhood.
  • Process manufacturing equipment that can determine the impact from previous process steps.
  • Upstream oil/gas equipment performance that reacts to downstream oil/gas sensor validation.

Problem: Excess data means data loss

When dealing with large volumes of data, enterprises often struggle to determine which data to retain, how much to retain, and for how long they must retain it. By default, they may not retain any of it. Or, they may sub-sample data and retain an incomplete data set. That lost data may potentially contain high value insights. For example, consider traffic information that could be used for efficient vehicle routing, commuter safety, insurance analysis, and government infrastructure reviews. The city of Las Vegas maintains over 1,100 traffic light intersections that can generate more than 45TB of data every day. As stated before, IoT data will challenge our ability to transport and store data at these volumes.

Data may also become excessive when it’s aggregated. For example, telecom and network equipment typically create snapshots of data and send it every 15 minutes. By normalizing this data into a summary over time, you lose granularity. This means the nature or pattern of data over time along with any unique, intuitive events would be missed. The same applies to any equipment capturing fixed-time, window summary data. The loss of data is detrimental to networks where devices share data, either for awareness or communication. The problem is also compounded, as only snapshots are captured and aggregated for an entire network of thousands or millions of devices. Real-time is the goal.

Real-time is the goal

Near real-time is the current standard for stateless application architectures, but “near” real-time is not fast enough anymore. Real-time processing or processing within milliseconds is the new standard for V2V or V2I communications and requires a much more performant architecture. Swim does this by leveraging stateful API’s. With stateful connections, it’s possible to have a rapid response between peers in a network. Speed has enormous effects on efficiency and reliability and it’s essential for systems where safety is paramount such as preventing crashes. Autonomous systems will rely on real-time performance for safety purposes.

An intelligent edge data strategy

SWIM.AI delivers a solution for building scalable streaming applications. According to their site Meet Swim:

“Instead of configuring a separate message broker, app server and database, Swim provides for its own persistence, messaging, scheduling, clustering, replication, introspection, and security. Because everything is integrated, Swim seamlessly scales across edge, cloud, and client, for a fraction of the infrastructure and development cost of traditional cloud application architectures.”

The figure below shows an abstract view of how Swim can simplify IoT architectures:

Diagram display of how Swim can simplify architectures

Harvest data in mid-stream

SWIM.AI uses the lightweight Swim platform, only generating a 2MB footprint to compute and stream IoT insights, building what they call “data-driven applications.” These applications sit in the data stream and generate unique, intelligent web agents for each data source it sees. These intelligent web agents then process the raw data as it streams, only publishing state changes from the data stream. This streamed data can be used by other web agents or stored in a data lake, such as Azure.

Swim uses the “needle in a haystack” metaphor to explain this unique advantage. Swim allows you to apply a metal detector while harvesting the grain to find the needle, without having to bail, transport, or store the grain before searching for the needle. The advantage is in continuously processing data, where intelligent web agents can learn over time or be influenced by domain experts that set thresholds.

Because of the stateful architecture of Swim, only the minimum data necessary is transmitted over the network. Furthermore, application services need not wait for the cloud to establish application context. This results in extremely low latencies, as the stateful connections don’t incur the latency cost of reading and writing to a database or updating based on poll requests.

On SWIM.AI’s website, a Smart City application shows the real-time status of lights and traffic across a hundred intersections with thousands of sensors. The client using the app could be a connected or an autonomous car approaching the intersection. It could be a handheld device next to the intersection, or a browser a thousand miles away in the contiguous US. The latency to real-time is 75-150ms, less than the blink of an eye across the internet.


  • The immediate benefit is saving costs for transporting and storing data.
  • Through Swim’s technology, you can retain the granularity. For example, take the case of 10 seconds of TB per day generated from every 1000 traffic light intersections. Winnow that data down to 100 seconds of GB per day. But the harvested dataset fully describes the original raw dataset.
  • Create efficient networked apps for various data sources. For example, achieve peer-to-peer awareness and communications between assets such as vehicles, devices, sensors, and other data sources across the internet.
  • Achieve ultra-low latencies in the 75-150 millisecond range. This is the key to creating apps that depend on data for awareness and communications.

Azure services used in the solution

The demonstration of DataFabric from SWIM.AI relies on core Azure services for security, provisioning, management, and storage. DataFabric also uses the Common Data Model to simplify sharing information with other systems, such as Power BI or PowerApps, in Azure. Azure technology enables the customer’s analytics to be integrated with events and native ML and cognitive services.

DataFabric is based on the Microsoft IoT reference architecture and uses the following core components:

  • IoT Hub: Provides a central point in the cloud to manage devices and their data.
  • IoT Edge Field gateway: An on-premises solution for delivering cloud intelligence.
  • Azure Event Hubs: Ingests millions of events per second.
  • Azure Blob: Efficient storage that includes options for hot, warm and archived data.
  • Azure Data Lake storage: A highly scalable and cost-effective data lake solution for big data analytics.
  • Azure Streaming Analytics: For transforming data into actionable insights and predictions in near real-time.

Next steps

To learn more about other industry solutions, go to the Azure for Manufacturing page.

To find out more about this solution, go to DataFabric for Azure IoT and select Get it now.