Hoppa över navigering

New York City Safety Data

New York City Social Services 311 Service Requests City Government Public Safety

Alla 311-tjänstbegäranden i New York City från 2010 fram till nutid.

Volym och kvarhållning

Datamängden lagras i Parquet-format. Den uppdateras dagligen och innehåller cirka 12 miljoner rader (500 MB) sammanlagt 2019.

Datamängden innehåller historiska poster som ackumulerats från 2010 fram till nutid. Du kan använda parameterinställningar i vår SDK till att hämta data inom ett specifikt tidsintervall.

Lagringsplats

Datamängden lagras i Azure-regionen Östra USA. Vi rekommenderar att beräkningsresurser tilldelas i Östra USA av tillhörighetsskäl.

Ytterligare Information

Den här datamängden hämtas från New York Citys myndigheter. Mer information finns här. Du kan läsa om användningsvillkoren för denna datamängd här.

Meddelanden

MICROSOFT TILLHANDAHÅLLER AZURE OPEN DATASETS I BEFINTLIGT SKICK. MICROSOFT UTFÄRDAR INTE NÅGRA GARANTIER ELLER VILLKOR, UTTRYCKLIGA ELLER UNDERFÖRSTÅDDA, AVSEENDE ANVÄNDNINGEN AV DATAMÄNGDERNA. I DEN UTSTRÄCKNING DET ÄR TILLÅTET ENLIGT NATIONELL LAGSTIFTNING, FRISKRIVER MICROSOFT SIG FRÅN ALLT ANSVAR BETRÄFFANDE SKADOR OCH FÖRLUSTER, INKLUSIVE DIREKTA SKADOR, FÖLJDSKADOR, SÄRSKILDA SKADOR, INDIREKTA SKADOR, ELLER OFÖRUTSEDDA SKADOR FRÅN ANVÄNDNINGEN AV DATAMÄNGDERNA.

Datamängden tillhandahålls enligt de ursprungliga villkor som gällde när Microsoft tog emot källdatan. Datamängden kan innehålla data från Microsoft.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

dataType dataSubtype dateTime category subcategory status address latitude longitude source extendedProperties
Safety 311_All 10/28/2020 1:59:10 AM Illegal Parking Parking Permit Improper Use In Progress 88-01 SUTTER AVENUE 40.6761408840691 -73.8486912842578 null
Safety 311_All 10/28/2020 1:58:36 AM Noise - Street/Sidewalk Loud Music/Party In Progress 87 MONUMENT WALK 40.6941096484195 -73.9788969071643 null
Safety 311_All 10/28/2020 1:58:17 AM Blocked Driveway No Access In Progress 307 ALBEMARLE ROAD 40.6453840981006 -73.9774420901142 null
Safety 311_All 10/28/2020 1:58:08 AM Noise - Street/Sidewalk Loud Music/Party In Progress 137-72 NORTHERN BOULEVARD 40.7638134402157 -73.828619537758 null
Safety 311_All 10/28/2020 1:57:47 AM Noise - Residential Loud Music/Party In Progress 1354 LYMAN PLACE 40.8292903445233 -73.8964815810803 null
Safety 311_All 10/28/2020 1:57:25 AM Noise - Street/Sidewalk Loud Music/Party In Progress 204 LEONARD STREET 40.7110429013718 -73.9471783641139 null
Safety 311_All 10/28/2020 1:57:13 AM Noise - Residential Loud Music/Party In Progress 397 SMITH STREET 40.6784745515883 -73.9959620774948 null
Safety 311_All 10/28/2020 1:56:54 AM Illegal Parking Blocked Hydrant In Progress 1281 UNION STREET 40.6689117777497 -73.9496305313093 null
Safety 311_All 10/28/2020 1:56:19 AM Noise - Residential Loud Music/Party In Progress 270 CENTRAL AVENUE 40.6966904257672 -73.9227247724356 null
Safety 311_All 10/28/2020 1:56:13 AM Noise - Commercial Banging/Pounding In Progress 162-05 ROCKAWAY BOULEVARD 40.662282039925 -73.7760575899208 null
Name Data type Unique Values (sample) Description
address string 1,468,443 655 EAST 230 STREET
89-21 ELMHURST AVENUE

Husnummer på den incidentadress som angetts av anmälaren.

category string 444 Noise - Residential
HEAT/HOT WATER

Detta är den första hierarkinivån som identifierar ämnet för incidenten eller omständigheten (klagomålstyp). Den kan ha en motsvarande underkategori (beskrivning) eller vara fristående.

dataSubtype string 1 311_All

”311_All”

dataType string 1 Safety

”Säkerhet”

dateTime timestamp 16,418,566 2013-01-24 00:00:00
2015-01-08 00:00:00

Datatjänstbegäran skapades.

latitude double 1,476,165 40.89187241649303
40.1123853

Geografiskt baserad latitud för incidentplatsen.

longitude double 1,498,042 -73.86016845296459
-77.5195844

Geografiskt baserad longitud för incidentplatsen.

status string 12 Closed
Pending

Status för den tjänstbegäran som har skickats.

subcategory string 1,692 Loud Music/Party
HEAT

Detta är associerat med kategorin (klagomålstypen) och ger ytterligare information om incidenten eller omständigheten. Dessa värden beror på klagomålstypen och krävs inte alltid i en tjänstebegäran.

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe ActivityStarted, to_pandas_dataframe_in_worker Looking for parquet files... Reading them into Pandas dataframe... Reading Safety/Release/city=NewYorkCity/part-00026-tid-845600952581210110-a4f62588-4996-42d1-bc79-23a9b4635c63-446869.c000.snappy.parquet under container citydatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe_in_worker, HowEnded=Success, Duration=106593.46 [ms] ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=106687.96 [ms]
In [2]:
safety.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1204035 entries, 7 to 12307252 Data columns (total 11 columns): dataType 1204035 non-null object dataSubtype 1204035 non-null object dateTime 1204035 non-null datetime64[ns] category 1204035 non-null object subcategory 1203974 non-null object status 1204035 non-null object address 1010833 non-null object latitude 1169358 non-null float64 longitude 1169358 non-null float64 source 0 non-null object extendedProperties 0 non-null object dtypes: datetime64[ns](1), float64(2), object(8) memory usage: 110.2+ MB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "citydatacontainer"
folder_name = "Safety/Release/city=NewYorkCity"
In [3]:
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
  raise Exception(
    "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
   folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
  container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
  if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
    targetBlobName = blob.name
    break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
  blob_client.download_blob().download_to_stream(local_file)
In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=4392.11 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=4395.98 [ms]
In [2]:
display(safety.limit(5))
dataTypedataSubtypedateTimecategorysubcategorystatusaddresslatitudelongitudesourceextendedProperties
Safety311_All2015-12-28T13:58:58.000+0000HEAT/HOT WATERENTIRE BUILDINGClosed548 11 STREET40.664924841709606-73.98101480555805nullnull
Safety311_All2015-06-14T01:11:08.000+0000Noise - ResidentialLoud Music/PartyClosednull40.86969422534882-73.86620623861982nullnull
Safety311_All2015-06-14T04:47:37.000+0000Noise - ResidentialLoud TalkingClosednull40.858744389082254-73.93011726711445nullnull
Safety311_All2015-06-16T16:56:00.000+0000SewerCatch Basin Clogged/Flooding (Use Comments) (SC)Closed82 JEWETT AVENUE40.63510898432114-74.12886658384302nullnull
Safety311_All2015-06-22T14:03:05.000+0000ELECTRICLIGHTINGClosed2170 BATHGATE AVENUE40.852335329676464-73.89389734164266nullnull
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
 'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
 blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python Python
In [15]:
# This is a package in preview.
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
In [16]:
# Display top 5 rows
display(safety.limit(5))
Out[16]:
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
 'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
 blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

City Safety

From the Urban Innovation Initiative at Microsoft Research, databricks notebook for analytics with safety data (311 and 911 call data) from major U.S. cities. Analyses show frequency distributions and geographic clustering of safety issues within cities.