Navigation überspringen

New York City Safety Data

New York City Social Services 311 Service Requests City Government Public Safety

Alle 311-Dienstanforderungen in New York City von 2010 bis heute.

Volume und Aufbewahrung

Dieses Dataset wird im Parquet-Format gespeichert. Es wird täglich aktualisiert und enthält ab 2019 insgesamt etwa 12 Mio. Zeilen (500 MB).

Dieses Dataset enthält historische Datensätze, die von 2010 bis heute gesammelt wurden. Verwenden Sie Parametereinstellungen im SDK, um Daten innerhalb eines bestimmten Zeitbereichs abzurufen.

Speicherort

Dieses Dataset wird in der Azure-Region „USA, Osten“ gespeichert. Aus Gründen der Affinität wird die Zuweisung von Computeressourcen in der Region „USA, Osten“ empfohlen.

Weitere Informationen

Dieses Dataset wurde von der Stadtverwaltung von New York City bezogen. Weitere Details finden Sie hier. Die Nutzungsbedingungen für dieses Dataset finden Sie hier.

Benachrichtigungen

MICROSOFT STELLT DATASETS DER PLATTFORM AZURE OPEN DATASETS AUF EINER „AS IS“-BASIS (D. H. OHNE MÄNGELGEWÄHR) ZUR VERFÜGUNG. MICROSOFT ÜBERNIMMT WEDER AUSDRÜCKLICH NOCH STILLSCHWEIGEND DIE GEWÄHRLEISTUNG FÜR IHRE NUTZUNG DER DATENSÄTZE UND SICHERT KEINERLEI GARANTIEN ODER BEDINGUNGEN ZU. SOWEIT NACH ÖRTLICH ANWENDBAREM RECHT ZULÄSSIG, LEHNT MICROSOFT JEGLICHE HAFTUNG FÜR SCHÄDEN ODER VERLUSTE AB. DIES SCHLIEßT DIREKTE, INDIREKTE, BESONDERE ODER ZUFÄLLIGE SCHÄDEN ODER VERLUSTE SOWIE FOLGE- UND STRAFSCHÄDEN UND DAMIT VERBUNDENE VERLUSTE EIN.

Für die Bereitstellung dieses Datasets gelten die ursprünglichen Nutzungsbedingungen, unter denen Microsoft die Quelldaten bezogen hat. Das Dataset kann Daten von Microsoft enthalten.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

dataType dataSubtype dateTime category subcategory status address latitude longitude source extendedProperties
Safety 311_All 2/14/2021 2:03:27 AM Noise - Residential Loud Music/Party In Progress 2111 LAFONTAINE AVENUE 40.8505302389672 -73.8919374843883 null
Safety 311_All 2/14/2021 2:02:58 AM Noise - Residential Loud Music/Party In Progress BRIGHTON BEACH 40.5745850596257 -73.9584560873857 null
Safety 311_All 2/14/2021 2:02:58 AM Noise - Residential Loud Music/Party In Progress 3155 ROCHAMBEAU AVENUE 40.8749192824058 -73.880438925099 null
Safety 311_All 2/14/2021 2:02:53 AM Blocked Driveway No Access In Progress 843 GREENE AVENUE 40.6903353059805 -73.934415352365 null
Safety 311_All 2/14/2021 2:02:42 AM Noise - Residential Loud Music/Party In Progress 146 BEACH 59 STREET 40.5906339004178 -73.78875579204 null
Safety 311_All 2/14/2021 2:02:00 AM Blocked Driveway No Access In Progress 70-01 69 STREET 40.705202592284 -73.8842166083503 null
Safety 311_All 2/14/2021 2:00:48 AM Blocked Driveway No Access In Progress 3767 OLINVILLE AVENUE 40.8834333597552 -73.8655131701992 null
Safety 311_All 2/14/2021 2:00:40 AM Noise - Residential Loud Music/Party In Progress 45-10 111 STREET 40.7486071990041 -73.8543084545447 null
Safety 311_All 2/14/2021 2:00:37 AM Noise - Residential Loud Music/Party In Progress 3020 SURF AVENUE 40.5727562482045 -73.9966379905925 null
Safety 311_All 2/14/2021 2:00:14 AM Noise - Residential Loud Music/Party In Progress 823 MADISON STREET 40.6878963849973 -73.9230270335402 null
Name Data type Unique Values (sample) Description
address string 1,526,822 655 EAST 230 STREET
78-15 PARSONS BOULEVARD

Vom Absender angegebene Hausnummer der Adresse, an der sich der Vorfall ereignet hat.

category string 445 Noise - Residential
HEAT/HOT WATER

Dies ist die erste Ebene einer Hierarchie, die das Thema des Vorfalls oder der Bedingung (Art der Beschwerde) identifiziert. Sie kann eine entsprechende Unterkategorie (Deskriptor) haben oder allein stehen.

dataSubtype string 1 311_All

„311_All“ (311 alle)

dataType string 1 Safety

„Safety“ (Sicherheit)

dateTime timestamp 16,982,771 2013-01-24 00:00:00
2015-01-08 00:00:00

Erstellungsdatum der Serviceanfrage.

latitude double 1,492,892 40.89187241649303
40.72195913199264

Geografischer Breitegrad des Standorts, an dem der Vorfall stattgefunden hat.

longitude double 1,492,914 -73.86016845296459
-73.80969682426189

Geografischer Längengrad des Standorts, an dem sich der Vorfall ereignet hat.

status string 13 Closed
Pending

Der Dienstanforderungsstatus wurde gesendet.

subcategory string 1,708 Loud Music/Party
ENTIRE BUILDING

Diese Unterkategorie ist der entsprechenden Kategorie zugeordnet (Art der Beschwerde) und umfasst weitere Details des Vorfalls oder Zustands. Die Werte sind abhängig von der Art der Beschwerde und sind bei Serviceanträgen nicht immer erforderlich.

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe ActivityStarted, to_pandas_dataframe_in_worker Looking for parquet files... Reading them into Pandas dataframe... Reading Safety/Release/city=NewYorkCity/part-00026-tid-845600952581210110-a4f62588-4996-42d1-bc79-23a9b4635c63-446869.c000.snappy.parquet under container citydatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe_in_worker, HowEnded=Success, Duration=106593.46 [ms] ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=106687.96 [ms]
In [2]:
safety.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1204035 entries, 7 to 12307252 Data columns (total 11 columns): dataType 1204035 non-null object dataSubtype 1204035 non-null object dateTime 1204035 non-null datetime64[ns] category 1204035 non-null object subcategory 1203974 non-null object status 1204035 non-null object address 1010833 non-null object latitude 1169358 non-null float64 longitude 1169358 non-null float64 source 0 non-null object extendedProperties 0 non-null object dtypes: datetime64[ns](1), float64(2), object(8) memory usage: 110.2+ MB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "citydatacontainer"
folder_name = "Safety/Release/city=NewYorkCity"
In [3]:
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)
In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=4392.11 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=4395.98 [ms]
In [2]:
display(safety.limit(5))
dataTypedataSubtypedateTimecategorysubcategorystatusaddresslatitudelongitudesourceextendedProperties
Safety311_All2015-12-28T13:58:58.000+0000HEAT/HOT WATERENTIRE BUILDINGClosed548 11 STREET40.664924841709606-73.98101480555805nullnull
Safety311_All2015-06-14T01:11:08.000+0000Noise - ResidentialLoud Music/PartyClosednull40.86969422534882-73.86620623861982nullnull
Safety311_All2015-06-14T04:47:37.000+0000Noise - ResidentialLoud TalkingClosednull40.858744389082254-73.93011726711445nullnull
Safety311_All2015-06-16T16:56:00.000+0000SewerCatch Basin Clogged/Flooding (Use Comments) (SC)Closed82 JEWETT AVENUE40.63510898432114-74.12886658384302nullnull
Safety311_All2015-06-22T14:03:05.000+0000ELECTRICLIGHTINGClosed2170 BATHGATE AVENUE40.852335329676464-73.89389734164266nullnull
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python Python
In [15]:
# This is a package in preview.
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
In [16]:
# Display top 5 rows
display(safety.limit(5))
Out[16]:
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

City Safety

From the Urban Innovation Initiative at Microsoft Research, databricks notebook for analytics with safety data (311 and 911 call data) from major U.S. cities. Analyses show frequency distributions and geographic clustering of safety issues within cities.