Пропустить навигацию

New York City Safety Data

311 City Government New York City Public Safety Service Requests Social Services

Описание

All New York City 311 service requests from 2010 to the present.

Volume and Retention

This dataset is stored in Parquet format. It is updated daily, and contains about 12M rows (500MB) in total as of 2019.

This dataset contains historical records accumulated from 2010 to the present. You can use parameter settings in our SDK to fetch data within a specific time range.

Storage Location

This dataset is stored in the East US Azure region. Allocating compute resources in East US is recommended for affinity.

Notices

MICROSOFT PROVIDES AZURE OPEN DATASETS ON AN “AS IS” BASIS. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, GUARANTEES OR CONDITIONS WITH RESPECT TO YOUR USE OF THE DATASETS. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAW, MICROSOFT DISCLAIMS ALL LIABILITY FOR ANY DAMAGES OR LOSSES, INCLUDING DIRECT, CONSEQUENTIAL, SPECIAL, INDIRECT, INCIDENTAL OR PUNITIVE, RESULTING FROM YOUR USE OF THE DATASETS.

This dataset is provided under the original terms that Microsoft received source data. The dataset may include data sourced from Microsoft. See below for more information.

This dataset is sourced from New York City government. More details can be found from here. Reference here for the terms of using this dataset.

Доступ

Доступно вСценарии использования
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Предварительная версия

dataType dataSubtype dateTime category subcategory status address latitude longitude
Safety 311_All 7/19/2019 4:56:34 AM Illegal Parking Commercial Overnight Parking In Progress 113 ROAD 40.6978156545844 -73.7635396148755
Safety 311_All 7/19/2019 4:55:57 AM Blocked Driveway No Access In Progress 95-30 114 STREET 40.6909246685054 -73.831230096348
Safety 311_All 7/19/2019 4:54:43 AM Noise - Street/Sidewalk Loud Talking In Progress AMSTERDAM AVENUE 40.8302700973981 -73.9439218886152
Safety 311_All 7/19/2019 4:54:38 AM Blocked Driveway Partial Access In Progress 3750 HUDSON MANOR TERRACE 40.8887122264288 -73.91559729951
Safety 311_All 7/19/2019 4:49:35 AM Illegal Parking Unauthorized Bus Layover In Progress 950 GRAND STREET 40.7129560913081 -73.9364924583659
Safety 311_All 7/19/2019 4:46:52 AM Noise - Street/Sidewalk Loud Talking In Progress 541 WEST 142 STREET 40.8239740084621 -73.9505501383344
Safety 311_All 7/19/2019 4:44:26 AM Noise - Vehicle Car/Truck Music In Progress 300 WEST 53 STREET 40.7643434322916 -73.9851667724664
Safety 311_All 7/19/2019 4:39:16 AM Consumer Complaint Retail Store In Progress 124 FULTON STREET 40.710159973498 -74.0076468436323
Safety 311_All 7/19/2019 4:37:13 AM Illegal Parking Posted Parking Sign Violation In Progress 45-05 ASTORIA BLVD NORTH 40.7686614840399 -73.9054351843392
Safety 311_All 7/19/2019 4:33:01 AM Noise - Vehicle Car/Truck Music In Progress 1041 EAST 226 STREET 40.885410056108 -73.8503270422492
Имя Тип данных Уникальные Значения (пример) Описание
address string 1,347,228 203 WEST 115 STREET
6 CORTLANDT STREET

House number of incident address provided by submitter.

category string 430 Noise - Residential
HEAT/HOT WATER

This is the first level of a hierarchy identifying the topic of the incident or condition (Complaint Type ). It may have a corresponding subcategory (Descriptor) or may stand alone.

dataSubtype string 1 311_All

“311_All”

dataType string 1 Safety

“Safety”

dateTime timestamp 12,139,654 2012-01-04 00:00:00
2014-01-23 00:00:00

Date Service Request was created.

latitude double 1,308,743 40.802695363309105
40.70996214232855

Geo based Latitude of the incident location.

longitude double 1,308,762 -73.95338331901017
-74.0103159942943

Geo based Longitude of the incident location.

status string 12 Closed
Pending

Status of Service Request submitted.

subcategory string 1,634 Loud Music/Party
ENTIRE BUILDING

This is associated to the category (Complaint Type), and provides further detail on the incident or condition. Its values are dependent on the Complaint Type, and are not always required in Service Request.

Выберите предпочитаемую службу:

Azure Notebooks

Azure Databricks

Azure Notebooks

Пакет: Язык: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe ActivityStarted, to_pandas_dataframe_in_worker Looking for parquet files... Reading them into Pandas dataframe... Reading Safety/Release/city=NewYorkCity/part-00026-tid-845600952581210110-a4f62588-4996-42d1-bc79-23a9b4635c63-446869.c000.snappy.parquet under container citydatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe_in_worker, HowEnded=Success, Duration=106593.46 [ms] ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=106687.96 [ms]
In [2]:
safety.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1204035 entries, 7 to 12307252 Data columns (total 11 columns): dataType 1204035 non-null object dataSubtype 1204035 non-null object dateTime 1204035 non-null datetime64[ns] category 1204035 non-null object subcategory 1203974 non-null object status 1204035 non-null object address 1010833 non-null object latitude 1169358 non-null float64 longitude 1169358 non-null float64 source 0 non-null object extendedProperties 0 non-null object dtypes: datetime64[ns](1), float64(2), object(8) memory usage: 110.2+ MB
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas

# COMMAND ----------

# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "citydatacontainer"
folder_name = "Safety/Release/city=NewYorkCity"

# COMMAND ----------

from azure.storage.blob import BlockBlobService

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception("Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' + folder_name + ' in container "' + container_name + '"...')
blob_service = BlockBlobService(account_name = azure_storage_account_name, sas_token = azure_storage_sas_token,)
blobs = blob_service.list_blobs(container_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName=''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
parquet_file=blob_service.get_blob_to_path(container_name, targetBlobName, filename)

# COMMAND ----------

# Read the local parquet file into Pandas data frame
import pyarrow.parquet as pq
import pandas as pd

appended_df = []
print('Reading the local parquet file into Pandas data frame')
df = pq.read_table(filename).to_pandas()

# COMMAND ----------

# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df

# COMMAND ----------


Azure Databricks

Пакет: Язык: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=4392.11 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=4395.98 [ms]
In [2]:
display(safety.limit(5))
dataTypedataSubtypedateTimecategorysubcategorystatusaddresslatitudelongitudesourceextendedProperties
Safety311_All2015-12-28T13:58:58.000+0000HEAT/HOT WATERENTIRE BUILDINGClosed548 11 STREET40.664924841709606-73.98101480555805nullnull
Safety311_All2015-06-14T01:11:08.000+0000Noise - ResidentialLoud Music/PartyClosednull40.86969422534882-73.86620623861982nullnull
Safety311_All2015-06-14T04:47:37.000+0000Noise - ResidentialLoud TalkingClosednull40.858744389082254-73.93011726711445nullnull
Safety311_All2015-06-16T16:56:00.000+0000SewerCatch Basin Clogged/Flooding (Use Comments) (SC)Closed82 JEWETT AVENUE40.63510898432114-74.12886658384302nullnull
Safety311_All2015-06-22T14:03:05.000+0000ELECTRICLIGHTINGClosed2170 BATHGATE AVENUE40.852335329676464-73.89389734164266nullnull
# Databricks notebook source
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""

# COMMAND ----------

# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)

# COMMAND ----------

# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')

# COMMAND ----------

# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

City Safety

From the Urban Innovation Initiative at Microsoft Research, databricks notebook for analytics with safety data (311 and 911 call data) from major U.S. cities. Analyses show frequency distributions and geographic clustering of safety issues within cities.