Ignorar navegação

Seattle Safety Data

Seattle 911 Fire Dispatch E911 SFD Mobile Public Safety

Despachos do corpo de bombeiros de Seattle para a central de emergência.

Volume e retenção

Este conjunto de dados está armazenado no formato Parquet. É atualizado diariamente e contém cerca de 800 mil linhas (20 MB) no total desde 2019.

Este conjunto de dados contém registros históricos acumulados de 2010 até o presente. Você pode usar as configurações de parâmetro no nosso SDK para buscar dados em um intervalo de tempo específico.

Local de armazenamento

Este conjunto de dados está armazenado na região Leste dos EUA do Azure. É recomendável alocar recursos de computação no Leste dos EUA para afinidade.

Informações adicionais

Este conjunto de dados é originado do governo da cidade de Seattle. O link de origem pode ser encontrado aqui. Localize Licenciamento e atribuição para conferir os termos de uso deste conjunto de dados. Envie um email para em caso de dúvidas sobre a fonte de dados.

Avisos

A MICROSOFT FORNECE O AZURE OPEN DATASETS NO ESTADO EM QUE SE ENCONTRA. A MICROSOFT NÃO OFERECE GARANTIAS OU COBERTURAS, EXPRESSAS OU IMPLÍCITAS, EM RELAÇÃO AO USO DOS CONJUNTOS DE DADOS. ATÉ O LIMITE PERMITIDO PELA LEGISLAÇÃO LOCAL, A MICROSOFT SE EXIME DE TODA A RESPONSABILIDADE POR DANOS OU PERDAS, INCLUSIVE DIRETOS, CONSEQUENTES, ESPECIAIS, INDIRETOS, ACIDENTAIS OU PUNITIVOS, RESULTANTES DO USO DOS CONJUNTOS DE DADOS.

Esse conjunto de dados é fornecido de acordo com os termos originais com que a Microsoft recebeu os dados de origem. O conjunto de dados pode incluir dados originados da Microsoft.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

dataType dataSubtype dateTime category subcategory status address latitude longitude source extendedProperties
Safety 911_Fire 1/21/2021 7:37:00 AM Aid Response null null 111 Cedar St 47.615877 -122.35052 null
Safety 911_Fire 1/21/2021 7:37:00 AM MVI - Motor Vehicle Incident null null Rainier Ave S / S Massachusetts St 47.588411 -122.305827 null
Safety 911_Fire 1/21/2021 7:22:00 AM Illegal Burn null null 888 Western Av 47.603448 -122.336632 null
Safety 911_Fire 1/21/2021 6:59:00 AM Aid Response null null 3256 Portage Bay Pl E 47.651445 -122.320417 null
Safety 911_Fire 1/21/2021 6:58:00 AM Aid Response null null 4547 19th Av Ne 47.661675 -122.307239 null
Safety 911_Fire 1/21/2021 6:49:00 AM Triaged Incident null null Lake City Way Ne / Ne Northgate Way 47.71042 -122.300472 null
Safety 911_Fire 1/21/2021 6:44:00 AM Aid Response null null 11030 5th Av Ne 47.709488 -122.323301 null
Safety 911_Fire 1/21/2021 6:11:00 AM Aid Response null null 607 3rd Av 47.602813 -122.331449 null
Safety 911_Fire 1/21/2021 5:58:00 AM Aid Response null null 2309 20th Ave S 47.58287 -122.306868 null
Safety 911_Fire 1/21/2021 5:48:00 AM Triaged Incident null null 3641 2nd Av S 47.571212 -122.332002 null
Name Data type Unique Values (sample) Description
address string 191,633 517 3rd Av
318 2nd Av Et S

Localização do incidente.

category string 232 Aid Response
Medic Response

Tipo de resposta.

dataSubtype string 1 911_Fire

“911_Fire”

dataType string 1 Safety

“Segurança”

dateTime timestamp 1,509,547 2020-11-04 06:49:00
2020-05-21 00:35:00

A data e a hora da chamada.

latitude double 93,772 47.602172
47.600194

Este é o valor da latitude. As linhas de latitude são paralelas ao Equador.

longitude double 79,111 -122.330863
-122.330541

Este é o valor da longitude. As linhas de longitude são perpendiculares às linhas de latitude e todas passam em ambos os polos.

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import SeattleSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = SeattleSafety(start_date=start_date, end_date=end_date)
safety = safety.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe ActivityStarted, to_pandas_dataframe_in_worker Looking for parquet files... Reading them into Pandas dataframe... Reading Safety/Release/city=Seattle/part-00119-tid-845600952581210110-a4f62588-4996-42d1-bc79-23a9b4635c63-446962.c000.snappy.parquet under container citydatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe_in_worker, HowEnded=Success, Duration=6116.21 [ms] ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=6117.7 [ms]
In [2]:
safety.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 68346 entries, 14 to 1382908 Data columns (total 11 columns): dataType 68346 non-null object dataSubtype 68346 non-null object dateTime 68346 non-null datetime64[ns] category 68346 non-null object subcategory 0 non-null object status 0 non-null object address 68345 non-null object latitude 68346 non-null float64 longitude 68346 non-null float64 source 0 non-null object extendedProperties 68346 non-null object dtypes: datetime64[ns](1), float64(2), object(8) memory usage: 6.3+ MB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "citydatacontainer"
folder_name = "Safety/Release/city=Seattle"
In [3]:
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)
In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import SeattleSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = SeattleSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=2751.74 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=2753.86 [ms]
In [2]:
display(safety.limit(5))
dataTypedataSubtypedateTimecategorysubcategorystatusaddresslatitudelongitudesourceextendedProperties
Safety911_Fire2015-05-04T19:18:42.000+0000Medic Responsenullnull7101 38th Av S47.538872-122.284744nullincident_number:F150047883
Safety911_Fire2015-12-01T23:29:47.000+0000Aid Responsenullnull1011 S Weller St47.597509-122.319511nullincident_number:F150137603
Safety911_Fire2015-12-13T20:20:59.000+0000Aid Responsenullnull10049 College Way N47.701742-122.335029nullincident_number:F150142622
Safety911_Fire2015-11-23T00:19:21.000+0000Medic Responsenullnull9428 58th Av S47.518216-122.260497nullincident_number:F150134268
Safety911_Fire2015-05-19T16:25:55.000+0000Medic Responsenullnull10011 51st Av S47.510803-122.27006nullincident_number:F150054054
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=Seattle"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python Python SQL
In [21]:
# This is a package in preview.
from azureml.opendatasets import SeattleSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = SeattleSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
In [22]:
# Display top 5 rows
display(safety.limit(5))
Out[22]:
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=Seattle"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))
SELECT
    TOP 100 *
FROM
    OPENROWSET(
        BULK             'https://azureopendatastorage.blob.core.windows.net/citydatacontainer/Safety/Release/city=Seattle/*.parquet',
        FORMAT         = 'parquet'
    ) AS [r];

City Safety

From the Urban Innovation Initiative at Microsoft Research, databricks notebook for analytics with safety data (311 and 911 call data) from major U.S. cities. Analyses show frequency distributions and geographic clustering of safety issues within cities.