Passer la navigation

New York City Safety Data

New York City Social Services 311 Service Requests City Government Public Safety

Toutes les demandes de service 311 à New York de 2010 à nos jours.

Volume et rétention

Ce jeu de données est stocké au format Parquet. Il est mis à jour quotidiennement et contient environ 12 millions de lignes (500 Mo) en 2019.

Ce jeu de données contient les enregistrements historiques accumulés de 2010 à aujourd’hui. Vous pouvez utiliser les paramètres de paramétrage de notre SDK pour récupérer les données dans un intervalle de temps spécifique.

Emplacement de stockage

Ce jeu de données est stocké dans la région Azure USA Est. L’allocation de ressources de calcul dans la région USA Est est recommandée à des fins d’affinité.

Informations supplémentaires

Ce jeu de données est issu de l’administration de la ville de New York. Pour plus d’informations, cliquez ici. Cliquez ici pour les termes utilisés dans ce jeu de données.

Remarques

MICROSOFT FOURNIT AZURE OPEN DATASETS “EN L’ÉTAT”. MICROSOFT N’OFFRE AUCUNE GARANTIE, EXPRESSE OU IMPLICITE, DE GARANTIE NI DE CONDITIONS RELATIVES À VOTRE UTILISATION DES JEUX DE DONNÉES. DANS LA MESURE AUTORISÉE PAR VOTRE DROIT LOCAL, MICROSOFT DÉCLINE TOUTE RESPONSABILITÉ POUR TOUT DOMMAGE OU PERTES, Y COMPRIS LES DIRECTIVES, CONSEQUENTIELLES, SPÉCIALES, INDIRECTES OU PUNITIVES, RÉSULTANT DE VOTRE UTILISATION DES JEUX DE DONNÉES.

Ce jeu de données est fourni selon les conditions initiales par lesquelles Microsoft a reçu les données sources. Le jeu de données peut inclure des données provenant de Microsoft.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

dataType dataSubtype dateTime category subcategory status address latitude longitude source extendedProperties
Safety 311_All 10/14/2020 2:03:28 AM Noise - Residential Loud Television In Progress 1465 WASHINGTON AVENUE 40.8369406629242 -73.903744412981 null
Safety 311_All 10/14/2020 2:02:43 AM Homeless Street Condition N/A In Progress 450 WEST 41 STREET 40.7585940401634 -73.9949609641797 null
Safety 311_All 10/14/2020 2:02:19 AM Noise - Residential Loud Music/Party In Progress 39-50 60 STREET 40.7460931503272 -73.90418965074 null
Safety 311_All 10/14/2020 1:59:18 AM Noise - Residential Loud Television In Progress 225 EAST 57 STREET 40.7599260384691 -73.9667366441501 null
Safety 311_All 10/14/2020 1:57:15 AM Noise - Residential Loud Music/Party In Progress 3920 DURYEA AVENUE 40.890030335786 -73.8349309115965 null
Safety 311_All 10/14/2020 1:56:47 AM Noise - Residential Loud Music/Party In Progress 448 72 STREET 40.6322749092097 -74.02300743057 null
Safety 311_All 10/14/2020 1:54:27 AM Noise - Commercial Loud Talking In Progress 1808 HONE AVENUE 40.8492298659755 -73.8549700129484 null
Safety 311_All 10/14/2020 1:54:12 AM Non-Emergency Police Matter Trespassing In Progress 586 EAST 42 STREET 40.6408071991073 -73.9369610872751 null
Safety 311_All 10/14/2020 1:54:05 AM Noise - Commercial Loud Music/Party In Progress 603 SENECA AVENUE 40.7040534487438 -73.9097754948087 null
Safety 311_All 10/14/2020 1:53:54 AM Noise - Residential Loud Talking In Progress 364 SOUTH 1 STREET 40.7105292133653 -73.9525207711782 null
Name Data type Unique Values (sample) Description
address string 1,464,307 655 EAST 230 STREET
89-21 ELMHURST AVENUE

Numéro de rue de l’adresse de l’incident fourni par le demandeur.

category string 443 Noise - Residential
HEAT/HOT WATER

Il s’agit du premier niveau d’une hiérarchie identifiant le sujet de l’incident ou de la condition (type de plainte). Il peut avoir une sous-catégorie correspondante (descripteur) ou peut être autonome.

dataSubtype string 1 311_All

“311_All”

dataType string 1 Safety

“Safety”

dateTime timestamp 16,331,918 2013-01-24 00:00:00
2015-01-08 00:00:00

Date de création de la demande de service.

latitude double 1,461,638 40.89187241649303
40.1123853

Latitude de la zone géographique du lieu de l’incident.

longitude double 1,483,506 -73.86016845296459
-77.5195844

Longitude de la zone géographique du lieu de l’incident.

status string 12 Closed
Pending

Statut de la demande de service envoyée.

subcategory string 1,691 Loud Music/Party
HEAT

Ceci est associé à la catégorie (type de plainte) et fournit des détails supplémentaires sur l’incident ou la condition. Ses valeurs dépendent du type de plainte et ne sont pas toujours requises dans la demande de service.

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe ActivityStarted, to_pandas_dataframe_in_worker Looking for parquet files... Reading them into Pandas dataframe... Reading Safety/Release/city=NewYorkCity/part-00026-tid-845600952581210110-a4f62588-4996-42d1-bc79-23a9b4635c63-446869.c000.snappy.parquet under container citydatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe_in_worker, HowEnded=Success, Duration=106593.46 [ms] ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=106687.96 [ms]
In [2]:
safety.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1204035 entries, 7 to 12307252 Data columns (total 11 columns): dataType 1204035 non-null object dataSubtype 1204035 non-null object dateTime 1204035 non-null datetime64[ns] category 1204035 non-null object subcategory 1203974 non-null object status 1204035 non-null object address 1010833 non-null object latitude 1169358 non-null float64 longitude 1169358 non-null float64 source 0 non-null object extendedProperties 0 non-null object dtypes: datetime64[ns](1), float64(2), object(8) memory usage: 110.2+ MB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "citydatacontainer"
folder_name = "Safety/Release/city=NewYorkCity"
In [3]:
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)
In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=4392.11 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=4395.98 [ms]
In [2]:
display(safety.limit(5))
dataTypedataSubtypedateTimecategorysubcategorystatusaddresslatitudelongitudesourceextendedProperties
Safety311_All2015-12-28T13:58:58.000+0000HEAT/HOT WATERENTIRE BUILDINGClosed548 11 STREET40.664924841709606-73.98101480555805nullnull
Safety311_All2015-06-14T01:11:08.000+0000Noise - ResidentialLoud Music/PartyClosednull40.86969422534882-73.86620623861982nullnull
Safety311_All2015-06-14T04:47:37.000+0000Noise - ResidentialLoud TalkingClosednull40.858744389082254-73.93011726711445nullnull
Safety311_All2015-06-16T16:56:00.000+0000SewerCatch Basin Clogged/Flooding (Use Comments) (SC)Closed82 JEWETT AVENUE40.63510898432114-74.12886658384302nullnull
Safety311_All2015-06-22T14:03:05.000+0000ELECTRICLIGHTINGClosed2170 BATHGATE AVENUE40.852335329676464-73.89389734164266nullnull
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python Python
In [15]:
# This is a package in preview.
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
In [16]:
# Display top 5 rows
display(safety.limit(5))
Out[16]:
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

City Safety

From the Urban Innovation Initiative at Microsoft Research, databricks notebook for analytics with safety data (311 and 911 call data) from major U.S. cities. Analyses show frequency distributions and geographic clustering of safety issues within cities.