略過導覽

New York City Safety Data

New York City Social Services 311 Service Requests City Government Public Safety

從 2010 年至今所有的紐約市 311 服務要求。

磁碟區與保留期

此資料集以 Parquet 格式儲存, 其每天更新一次,到 2019 年為止共包含約 1200 萬個資料列 (500MB)。

此資料集包含從 2010 年累積至今的歷史記錄。 在我們的 SDK 中,您可以使用參數設定來擷取特定時間範圍內的資料。

儲存位置

此資料集儲存於美國東部 Azure 區域。 建議您在美國東部配置計算資源,以確保同質性。

其他資訊

此資料集來源為紐約市政府。 如需詳細資料,請參閱這裡。 如需此資料集的使用條款,請參閱這裡

通知

Microsoft 係依「現況」提供 Azure 開放資料集。 針對 貴用戶對資料集的使用,Microsoft 不提供任何明示或默示的擔保、保證或條件。 在 貴用戶當地法律允許的範圍內,針對因使用資料集而導致的任何直接性、衍生性、特殊性、間接性、附隨性或懲罰性損害或損失,Microsoft 概不承擔任何責任。

此資料集是根據 Microsoft 接收來源資料的原始條款所提供。 資料集可能包含源自 Microsoft 的資料。

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

dataType dataSubtype dateTime category subcategory status address latitude longitude source extendedProperties
Safety 311_All 2/28/2021 2:04:41 AM Noise - Residential Loud Music/Party In Progress 180 ATKINS AVENUE 40.6747687507063 -73.8790672972994 null
Safety 311_All 2/28/2021 2:04:10 AM Noise - Residential Loud Music/Party In Progress 575 HART STREET 40.6970513297355 -73.9290678249186 null
Safety 311_All 2/28/2021 2:04:09 AM Noise - Residential Loud Music/Party In Progress 2600 DECATUR AVENUE 40.8630283295477 -73.8905722859285 null
Safety 311_All 2/28/2021 2:03:55 AM New Tree Request For One Address In Progress 304 EAST 178 STREET 40.850004456691 -73.9025290800053 null
Safety 311_All 2/28/2021 2:03:47 AM Noise - Residential Loud Music/Party In Progress 35 GREAT JONES STREET 40.7268920595982 -73.9928383038984 null
Safety 311_All 2/28/2021 2:03:41 AM Noise - Residential Loud Music/Party In Progress 53 NASSAU AVENUE 40.7232922541744 -73.9522880002276 null
Safety 311_All 2/28/2021 2:03:25 AM Blocked Driveway No Access In Progress 101-21 92 STREET 40.6832274842353 -73.8483759439409 null
Safety 311_All 2/28/2021 2:03:09 AM Noise - Residential Loud Music/Party In Progress 820 HENDERSON AVENUE 40.6364843721673 -74.1210614889782 null
Safety 311_All 2/28/2021 2:02:20 AM Noise - Residential Loud Music/Party In Progress 49 GREAT JONES STREET 40.7266916662882 -73.9924306331279 null
Safety 311_All 2/28/2021 2:02:13 AM Noise - Residential Loud Music/Party In Progress 40 BOND STREET 40.726148245967 -73.9930909345299 null
Name Data type Unique Values (sample) Description
address string 1,528,477 655 EAST 230 STREET
78-15 PARSONS BOULEVARD

提交者提供的事件地址門牌號碼。

category string 445 Noise - Residential
HEAT/HOT WATER

這是識別事件主題或條件的第一層階層 (投訴類型)。 它可能會有對應的子類別 (描述項),也可能獨立存在。

dataSubtype string 1 311_All

“311_All”

dataType string 1 Safety

“Safety”

dateTime timestamp 17,052,560 2013-01-24 00:00:00
2015-01-08 00:00:00

已建立日期服務要求。

latitude double 1,496,647 40.89187241649303
40.72195913199264

事件位置的地理緯度。

longitude double 1,496,669 -73.86016845296459
-73.80969682426189

事件位置的地理經度。

status string 13 Closed
Pending

已提交的服務要求狀態。

subcategory string 1,710 Loud Music/Party
ENTIRE BUILDING

這與類別 (投訴類型) 相關,並提供與事件或狀況相關的詳細資料。 其值取決於投訴類型,而且在服務要求中並非一律為必要項。

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe ActivityStarted, to_pandas_dataframe_in_worker Looking for parquet files... Reading them into Pandas dataframe... Reading Safety/Release/city=NewYorkCity/part-00026-tid-845600952581210110-a4f62588-4996-42d1-bc79-23a9b4635c63-446869.c000.snappy.parquet under container citydatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe_in_worker, HowEnded=Success, Duration=106593.46 [ms] ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=106687.96 [ms]
In [2]:
safety.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1204035 entries, 7 to 12307252 Data columns (total 11 columns): dataType 1204035 non-null object dataSubtype 1204035 non-null object dateTime 1204035 non-null datetime64[ns] category 1204035 non-null object subcategory 1203974 non-null object status 1204035 non-null object address 1010833 non-null object latitude 1169358 non-null float64 longitude 1169358 non-null float64 source 0 non-null object extendedProperties 0 non-null object dtypes: datetime64[ns](1), float64(2), object(8) memory usage: 110.2+ MB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "citydatacontainer"
folder_name = "Safety/Release/city=NewYorkCity"
In [3]:
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)
In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=4392.11 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=4395.98 [ms]
In [2]:
display(safety.limit(5))
dataTypedataSubtypedateTimecategorysubcategorystatusaddresslatitudelongitudesourceextendedProperties
Safety311_All2015-12-28T13:58:58.000+0000HEAT/HOT WATERENTIRE BUILDINGClosed548 11 STREET40.664924841709606-73.98101480555805nullnull
Safety311_All2015-06-14T01:11:08.000+0000Noise - ResidentialLoud Music/PartyClosednull40.86969422534882-73.86620623861982nullnull
Safety311_All2015-06-14T04:47:37.000+0000Noise - ResidentialLoud TalkingClosednull40.858744389082254-73.93011726711445nullnull
Safety311_All2015-06-16T16:56:00.000+0000SewerCatch Basin Clogged/Flooding (Use Comments) (SC)Closed82 JEWETT AVENUE40.63510898432114-74.12886658384302nullnull
Safety311_All2015-06-22T14:03:05.000+0000ELECTRICLIGHTINGClosed2170 BATHGATE AVENUE40.852335329676464-73.89389734164266nullnull
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python Python
In [15]:
# This is a package in preview.
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
In [16]:
# Display top 5 rows
display(safety.limit(5))
Out[16]:
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

City Safety

From the Urban Innovation Initiative at Microsoft Research, databricks notebook for analytics with safety data (311 and 911 call data) from major U.S. cities. Analyses show frequency distributions and geographic clustering of safety issues within cities.