Omitir navegación

New York City Safety Data

New York City Social Services 311 Service Requests City Government Public Safety

Todas las solicitudes de servicio en el número 311 de la ciudad de Nueva York desde 2010 hasta la actualidad.

Volumen y retención

Este conjunto de datos se almacena en formato Parquet. Se actualiza a diario y contiene alrededor de 12 millones de filas (500 MB) en total desde 2019.

Este conjunto de datos contiene registros históricos acumulados desde 2010 hasta la actualidad. Puede usar la configuración de parámetros de nuestro SDK para recuperar los datos de un intervalo de tiempo específico.

Ubicación de almacenamiento

Este conjunto de datos se almacena en la región Este de EE. UU. de Azure. Se recomienda asignar recursos de proceso de la misma región por afinidad.

Información adicional

Este conjunto de datos se alimenta con los datos de la administración pública de la ciudad de Nueva York. Encontrará información más detallada aquí. Consulte los términos de uso de este conjunto de datos aquí.

Avisos

MICROSOFT PROPORCIONA AZURE OPEN DATASETS “TAL CUAL”. MICROSOFT NO OFRECE NINGUNA GARANTÍA, EXPRESA O IMPLÍCITA, NI CONDICIÓN CON RESPECTO AL USO QUE USTED HAGA DE LOS CONJUNTOS DE DATOS. EN LA MEDIDA EN LA QUE LO PERMITA SU LEGISLACIÓN LOCAL, MICROSOFT DECLINA TODA RESPONSABILIDAD POR POSIBLES DAÑOS O PÉRDIDAS, INCLUIDOS LOS DAÑOS DIRECTOS, CONSECUENCIALES, ESPECIALES, INDIRECTOS, INCIDENTALES O PUNITIVOS, QUE RESULTEN DE SU USO DE LOS CONJUNTOS DE DATOS.

Este conjunto de datos se proporciona bajo los términos originales con los que Microsoft recibió los datos de origen. El conjunto de datos puede incluir datos procedentes de Microsoft.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

dataType dataSubtype dateTime category subcategory status address latitude longitude source extendedProperties
Safety 311_All 5/13/2020 2:00:55 AM Noise - Residential Loud Music/Party In Progress 181 CLAREMONT AVENUE 40.8144207475676 -73.9605353838577 null
Safety 311_All 5/13/2020 1:59:37 AM Noise - Residential Loud Music/Party In Progress 550 CAULDWELL AVENUE 40.8141267254164 -73.9107570266164 null
Safety 311_All 5/13/2020 1:55:53 AM Noise - Vehicle Car/Truck Music In Progress 4580 BROADWAY 40.8601035501839 -73.9310051621654 null
Safety 311_All 5/13/2020 1:55:22 AM Noise - Residential Loud Music/Party In Progress 620 EAST 178 STREET 40.8467594802674 -73.8924387844491 null
Safety 311_All 5/13/2020 1:55:20 AM Noise - Residential Loud Music/Party In Progress 86 WADSWORTH AVENUE 40.8473439995984 -73.9367978886086 null
Safety 311_All 5/13/2020 1:53:55 AM Noise - Residential Loud Music/Party In Progress 37-33 COLLEGE POINT BOULEVARD 40.7600402705856 -73.8348451659958 null
Safety 311_All 5/13/2020 1:52:50 AM Noise - Residential Loud Music/Party Closed 1501 METROPOLITAN AVENUE 40.8392274708503 -73.8588551889489 null
Safety 311_All 5/13/2020 1:50:16 AM Noise - Residential Banging/Pounding In Progress 2323 BATCHELDER STREET 40.5956257587681 -73.9380587812188 null
Safety 311_All 5/13/2020 1:49:04 AM Noise - Residential Loud Music/Party Closed 218 HIMROD STREET 40.6993157284304 -73.9206083694195 null
Safety 311_All 5/13/2020 1:48:42 AM Noise - Residential Loud Music/Party In Progress 801 NEILL AVENUE 40.8503419997941 -73.8644306318384 null
Name Data type Unique Values (sample) Description
address string 1,471,598 89-21 ELMHURST AVENUE
34 ARDEN STREET

Número de casa de la dirección del incidente proporcionado por el remitente.

category string 440 Noise - Residential
HEAT/HOT WATER

Este es el primer nivel de una jerarquía que identifica el tema del incidente o la condición (tipo de reclamación). Puede tener una subcategoría correspondiente (descriptor) o ser independiente.

dataSubtype string 1 311_All

“311_All”

dataType string 1 Safety

“Safety”

dateTime timestamp 15,305,059 2013-01-24 00:00:00
2015-01-08 00:00:00

Fecha en la que se creó la solicitud de servicio.

latitude double 1,423,530 40.747420328856705
40.86193738190631

Latitud geográfica de la ubicación donde ocurrió el incidente.

longitude double 1,423,552 -73.87685300997468
-73.92714580413967

Longitud geográfica de la ubicación donde ocurrió el incidente.

status string 12 Closed
Pending

Estado de la solicitud de servicio enviada.

subcategory string 1,670 Loud Music/Party
HEAT

Está asociada a la categoría (tipo de reclamación) y proporciona más detalles sobre el incidente o la condición. Los valores dependen del tipo de reclamación y no siempre son necesarios en las solicitudes de servicio.

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe ActivityStarted, to_pandas_dataframe_in_worker Looking for parquet files... Reading them into Pandas dataframe... Reading Safety/Release/city=NewYorkCity/part-00026-tid-845600952581210110-a4f62588-4996-42d1-bc79-23a9b4635c63-446869.c000.snappy.parquet under container citydatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe_in_worker, HowEnded=Success, Duration=106593.46 [ms] ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=106687.96 [ms]
In [2]:
safety.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1204035 entries, 7 to 12307252 Data columns (total 11 columns): dataType 1204035 non-null object dataSubtype 1204035 non-null object dateTime 1204035 non-null datetime64[ns] category 1204035 non-null object subcategory 1203974 non-null object status 1204035 non-null object address 1010833 non-null object latitude 1169358 non-null float64 longitude 1169358 non-null float64 source 0 non-null object extendedProperties 0 non-null object dtypes: datetime64[ns](1), float64(2), object(8) memory usage: 110.2+ MB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "citydatacontainer"
folder_name = "Safety/Release/city=NewYorkCity"
In [3]:
from azure.storage.blob import BlockBlobService

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception("Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' + folder_name + ' in container "' + container_name + '"...')
blob_service = BlockBlobService(account_name = azure_storage_account_name, sas_token = azure_storage_sas_token,)
blobs = blob_service.list_blobs(container_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName=''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
parquet_file=blob_service.get_blob_to_path(container_name, targetBlobName, filename)
In [4]:
# Read the local parquet file into Pandas data frame
import pyarrow.parquet as pq
import pandas as pd

appended_df = []
print('Reading the local parquet file into Pandas data frame')
df = pq.read_table(filename).to_pandas()
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=4392.11 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=4395.98 [ms]
In [2]:
display(safety.limit(5))
dataTypedataSubtypedateTimecategorysubcategorystatusaddresslatitudelongitudesourceextendedProperties
Safety311_All2015-12-28T13:58:58.000+0000HEAT/HOT WATERENTIRE BUILDINGClosed548 11 STREET40.664924841709606-73.98101480555805nullnull
Safety311_All2015-06-14T01:11:08.000+0000Noise - ResidentialLoud Music/PartyClosednull40.86969422534882-73.86620623861982nullnull
Safety311_All2015-06-14T04:47:37.000+0000Noise - ResidentialLoud TalkingClosednull40.858744389082254-73.93011726711445nullnull
Safety311_All2015-06-16T16:56:00.000+0000SewerCatch Basin Clogged/Flooding (Use Comments) (SC)Closed82 JEWETT AVENUE40.63510898432114-74.12886658384302nullnull
Safety311_All2015-06-22T14:03:05.000+0000ELECTRICLIGHTINGClosed2170 BATHGATE AVENUE40.852335329676464-73.89389734164266nullnull
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

City Safety

From the Urban Innovation Initiative at Microsoft Research, databricks notebook for analytics with safety data (311 and 911 call data) from major U.S. cities. Analyses show frequency distributions and geographic clustering of safety issues within cities.