New York City Safety Data
從 2010 年至今所有的紐約市 311 服務要求。
磁碟區與保留期
此資料集以 Parquet 格式儲存, 其每天更新一次,到 2019 年為止共包含約 1200 萬個資料列 (500MB)。
此資料集包含從 2010 年累積至今的歷史記錄。 在我們的 SDK 中,您可以使用參數設定來擷取特定時間範圍內的資料。
儲存位置
此資料集儲存於美國東部 Azure 區域。 建議您在美國東部配置計算資源,以確保同質性。
其他資訊
此資料集來源為紐約市政府。 如需詳細資料,請參閱這裡。 如需此資料集的使用條款,請參閱這裡。
通知
Microsoft 係依「現況」提供 Azure 開放資料集。 針對 貴用戶對資料集的使用,Microsoft 不提供任何明示或默示的擔保、保證或條件。 在 貴用戶當地法律允許的範圍內,針對因使用資料集而導致的任何直接性、衍生性、特殊性、間接性、附隨性或懲罰性損害或損失,Microsoft 概不承擔任何責任。
此資料集是根據 Microsoft 接收來源資料的原始條款所提供。 資料集可能包含源自 Microsoft 的資料。
Access
Available in | When to use |
---|---|
Azure Notebooks | Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine. |
Azure Databricks | Use this when you need the scale of an Azure managed Spark cluster to process the dataset. |
Azure Synapse | Use this when you need the scale of an Azure managed Spark cluster to process the dataset. |
Preview
dataType | dataSubtype | dateTime | category | subcategory | status | address | latitude | longitude | source | extendedProperties |
---|---|---|---|---|---|---|---|---|---|---|
Safety | 311_All | 2/28/2021 2:04:41 AM | Noise - Residential | Loud Music/Party | In Progress | 180 ATKINS AVENUE | 40.6747687507063 | -73.8790672972994 | null | |
Safety | 311_All | 2/28/2021 2:04:10 AM | Noise - Residential | Loud Music/Party | In Progress | 575 HART STREET | 40.6970513297355 | -73.9290678249186 | null | |
Safety | 311_All | 2/28/2021 2:04:09 AM | Noise - Residential | Loud Music/Party | In Progress | 2600 DECATUR AVENUE | 40.8630283295477 | -73.8905722859285 | null | |
Safety | 311_All | 2/28/2021 2:03:55 AM | New Tree Request | For One Address | In Progress | 304 EAST 178 STREET | 40.850004456691 | -73.9025290800053 | null | |
Safety | 311_All | 2/28/2021 2:03:47 AM | Noise - Residential | Loud Music/Party | In Progress | 35 GREAT JONES STREET | 40.7268920595982 | -73.9928383038984 | null | |
Safety | 311_All | 2/28/2021 2:03:41 AM | Noise - Residential | Loud Music/Party | In Progress | 53 NASSAU AVENUE | 40.7232922541744 | -73.9522880002276 | null | |
Safety | 311_All | 2/28/2021 2:03:25 AM | Blocked Driveway | No Access | In Progress | 101-21 92 STREET | 40.6832274842353 | -73.8483759439409 | null | |
Safety | 311_All | 2/28/2021 2:03:09 AM | Noise - Residential | Loud Music/Party | In Progress | 820 HENDERSON AVENUE | 40.6364843721673 | -74.1210614889782 | null | |
Safety | 311_All | 2/28/2021 2:02:20 AM | Noise - Residential | Loud Music/Party | In Progress | 49 GREAT JONES STREET | 40.7266916662882 | -73.9924306331279 | null | |
Safety | 311_All | 2/28/2021 2:02:13 AM | Noise - Residential | Loud Music/Party | In Progress | 40 BOND STREET | 40.726148245967 | -73.9930909345299 | null |
Name | Data type | Unique | Values (sample) | Description |
---|---|---|---|---|
address | string | 1,528,477 | 655 EAST 230 STREET 78-15 PARSONS BOULEVARD |
提交者提供的事件地址門牌號碼。 |
category | string | 445 | Noise - Residential HEAT/HOT WATER |
這是識別事件主題或條件的第一層階層 (投訴類型)。 它可能會有對應的子類別 (描述項),也可能獨立存在。 |
dataSubtype | string | 1 | 311_All | “311_All” |
dataType | string | 1 | Safety | “Safety” |
dateTime | timestamp | 17,052,560 | 2013-01-24 00:00:00 2015-01-08 00:00:00 |
已建立日期服務要求。 |
latitude | double | 1,496,647 | 40.89187241649303 40.72195913199264 |
事件位置的地理緯度。 |
longitude | double | 1,496,669 | -73.86016845296459 -73.80969682426189 |
事件位置的地理經度。 |
status | string | 13 | Closed Pending |
已提交的服務要求狀態。 |
subcategory | string | 1,710 | Loud Music/Party ENTIRE BUILDING |
這與類別 (投訴類型) 相關,並提供與事件或狀況相關的詳細資料。 其值取決於投訴類型,而且在服務要求中並非一律為必要項。 |
Azure Notebooks
# This is a package in preview.
from azureml.opendatasets import NycSafety
from datetime import datetime
from dateutil import parser
end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_pandas_dataframe()
safety.info()
# Pip install packages
import os, sys
!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "citydatacontainer"
folder_name = "Safety/Release/city=NewYorkCity"
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
if azure_storage_account_name is None or azure_storage_sas_token is None:
raise Exception(
"Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")
print('Looking for the first parquet under the folder ' +
folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
container_url, azure_storage_sas_token if azure_storage_sas_token else None)
container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
targetBlobName = blob.name
break
print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
blob_client.download_blob().download_to_stream(local_file)
# Read the parquet file into Pandas data frame
import pandas as pd
print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
Azure Databricks
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NycSafety
from datetime import datetime
from dateutil import parser
end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
display(safety.limit(5))
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
blob_sas_token)
print('Remote blob path: ' + wasbs_path)
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))
Azure Synapse
# This is a package in preview.
from azureml.opendatasets import NycSafety
from datetime import datetime
from dateutil import parser
end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
# Display top 5 rows
display(safety.limit(5))
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
blob_sas_token)
print('Remote blob path: ' + wasbs_path)
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

City Safety
From the Urban Innovation Initiative at Microsoft Research, databricks notebook for analytics with safety data (311 and 911 call data) from major U.S. cities. Analyses show frequency distributions and geographic clustering of safety issues within cities.