Ignora esplorazione

New York City Safety Data

New York City Social Services 311 Service Requests City Government Public Safety

Tutte le richieste di assistenza del numero 311 della città di New York dal 2010 a oggi.

Volume e conservazione

Il set di dati è archiviato nel formato Parquet. Viene aggiornato ogni giorno e contiene circa 12 milioni di righe (500 MB) in totale a oggi (2019).

Questo set di dati include record cronologici accumulati dal 2010 a oggi. Puoi usare le impostazioni dei parametri nell’SDK per recuperare i dati entro un intervallo di tempo specifico.

Posizione di archiviazione

Questo set di dati è archiviato nell’area Stati Uniti orientali di Azure. L’allocazione delle risorse di calcolo nell’area Stati Uniti orientali è consigliata per motivi di affinità.

Informazioni aggiuntive

Il set di dati viene generato dalla pubblica amministrazione della città di New York. Altri dettagli sono disponibili qui. Per informazioni sulle condizioni per l’utilizzo del set di dati, vedi qui.

Notifiche

MICROSOFT FORNISCE I SET DI DATI APERTI DI AZURE “COSÌ COME SONO”. MICROSOFT NON OFFRE ALCUNA GARANZIA O CONDIZIONE ESPLICITA O IMPLICITA RELATIVAMENTE ALL’USO DEI SET DI DATI DA PARTE DELL’UTENTE. NELLA MISURA MASSIMA CONSENTITA DALLE LEGGI LOCALI, MICROSOFT NON RICONOSCE ALCUNA RESPONSABILITÀ RELATIVAMENTE A DANNI O PERDITE COMMERCIALI, INCLUSI I DANNI DIRETTI, CONSEQUENZIALI, SPECIALI, INDIRETTI, INCIDENTALI O PUNITIVI DERIVANTI DALL’USO DEI SET DI DATI DA PARTE DELL’UTENTE.

Questo set di dati viene fornito in conformità con le condizioni originali in base alle quali Microsoft ha ricevuto i dati di origine. Il set di dati potrebbe includere dati provenienti da Microsoft.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

dataType dataSubtype dateTime category subcategory status address latitude longitude source extendedProperties
Safety 311_All 6/16/2021 2:11:41 AM Street Condition Pothole Open 5 EAST 93 STREET 40.6626086763589 -73.9271039905362 null
Safety 311_All 6/16/2021 2:07:15 AM Street Sign - Missing Construction In Progress 90 MORTON STREET 40.7310393428493 -74.0078188218355 null
Safety 311_All 6/16/2021 2:07:10 AM Illegal Parking Double Parked Blocking Vehicle In Progress 4429 MONTICELLO AVENUE 40.9009861265865 -73.8420660823785 null
Safety 311_All 6/16/2021 2:06:53 AM Homeless Person Assistance Chronic Assigned 1541 WEST 12 STREET 40.6087150125206 -73.9855615036706 null
Safety 311_All 6/16/2021 2:05:06 AM LinkNYC Damaged/Defective In Progress 7 AVENUE 40.7428118020446 -73.9965788865602 null
Safety 311_All 6/16/2021 2:04:45 AM Illegal Fireworks N/A In Progress EAST 51 STREET 40.6538626091028 -73.9295930708907 null
Safety 311_All 6/16/2021 2:03:23 AM Noise - Residential Loud Music/Party In Progress 715 WEST 175 STREET 40.8462829053307 -73.938851866729 null
Safety 311_All 6/16/2021 2:03:18 AM Illegal Parking Blocked Hydrant In Progress 4423 MONTICELLO AVENUE 40.9008927482709 -73.8420228959644 null
Safety 311_All 6/16/2021 2:03:16 AM Traffic Drag Racing In Progress EAST 52 STREET 40.6554568563526 -73.9288273307769 null
Safety 311_All 6/16/2021 2:02:19 AM Noise - Residential Loud Music/Party In Progress 320 BLEECKER STREET 40.7006572310429 -73.9157776829108 null
Name Data type Unique Values (sample) Description
address string 1,544,273 655 EAST 230 STREET
78-15 PARSONS BOULEVARD

Numero civico dell’indirizzo dell’evento fornito dal richiedente.

category string 446 Noise - Residential
HEAT/HOT WATER

Questo è il primo livello di una gerarchia che identifica l’argomento dell’evento o della condizione (tipo di reclamo). Potrebbe avere una sottocategoria corrispondente (descrittore) o potrebbe essere una categoria autonoma.

dataSubtype string 1 311_All

“311_All”

dataType string 1 Safety

“Safety”

dateTime timestamp 17,540,548 2013-01-24 00:00:00
2015-01-08 00:00:00

Data di creazione della richiesta di assistenza.

latitude double 1,533,095 40.89187241649303
40.72195913199264

Latitudine basata su indicazione geografica della posizione dell’evento.

longitude double 1,533,119 -73.86016845296459
-73.80969682426189

Longitudine basata su indicazione geografica della posizione dell’evento.

status string 13 Closed
Pending

Stato della richiesta di assistenza inviata.

subcategory string 1,717 Loud Music/Party
ENTIRE BUILDING

Questa sottocategoria è associata alla categoria (tipo di reclamo) e fornisce altri dettagli sull’evento o sulla condizione. I rispettivi valori dipendono dal tipo di reclamo e non sono sempre necessari nella richiesta di assistenza.

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe ActivityStarted, to_pandas_dataframe_in_worker Looking for parquet files... Reading them into Pandas dataframe... Reading Safety/Release/city=NewYorkCity/part-00026-tid-845600952581210110-a4f62588-4996-42d1-bc79-23a9b4635c63-446869.c000.snappy.parquet under container citydatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe_in_worker, HowEnded=Success, Duration=106593.46 [ms] ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=106687.96 [ms]
In [2]:
safety.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1204035 entries, 7 to 12307252 Data columns (total 11 columns): dataType 1204035 non-null object dataSubtype 1204035 non-null object dateTime 1204035 non-null datetime64[ns] category 1204035 non-null object subcategory 1203974 non-null object status 1204035 non-null object address 1010833 non-null object latitude 1169358 non-null float64 longitude 1169358 non-null float64 source 0 non-null object extendedProperties 0 non-null object dtypes: datetime64[ns](1), float64(2), object(8) memory usage: 110.2+ MB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "citydatacontainer"
folder_name = "Safety/Release/city=NewYorkCity"
In [3]:
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)
In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=4392.11 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=4395.98 [ms]
In [2]:
display(safety.limit(5))
dataTypedataSubtypedateTimecategorysubcategorystatusaddresslatitudelongitudesourceextendedProperties
Safety311_All2015-12-28T13:58:58.000+0000HEAT/HOT WATERENTIRE BUILDINGClosed548 11 STREET40.664924841709606-73.98101480555805nullnull
Safety311_All2015-06-14T01:11:08.000+0000Noise - ResidentialLoud Music/PartyClosednull40.86969422534882-73.86620623861982nullnull
Safety311_All2015-06-14T04:47:37.000+0000Noise - ResidentialLoud TalkingClosednull40.858744389082254-73.93011726711445nullnull
Safety311_All2015-06-16T16:56:00.000+0000SewerCatch Basin Clogged/Flooding (Use Comments) (SC)Closed82 JEWETT AVENUE40.63510898432114-74.12886658384302nullnull
Safety311_All2015-06-22T14:03:05.000+0000ELECTRICLIGHTINGClosed2170 BATHGATE AVENUE40.852335329676464-73.89389734164266nullnull
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python Python
In [15]:
# This is a package in preview.
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
In [16]:
# Display top 5 rows
display(safety.limit(5))
Out[16]:
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

City Safety

From the Urban Innovation Initiative at Microsoft Research, databricks notebook for analytics with safety data (311 and 911 call data) from major U.S. cities. Analyses show frequency distributions and geographic clustering of safety issues within cities.