New York City Safety Data
Alle New York City 311-tjenesteforespørsler fra 2010 til dags dato.
Volum og dataoppbevaring
Dette datasettet er lagret i Parquet-format. Det oppdateres daglig og inneholder ca. 12M rader (500 MB) totalt fra og med 2019.
Dette datasettet inneholder historiske poster akkumulert fra 2010 til nå. Du kan bruke parameterinnstillinger i vårt SDK til å hente data innenfor et spesifikt tidsintervall.
Lagerplassering
Dette datasettet er lagret i Azure-området i øst-USA. Tildeling av databehandlingsressurser i øst-USA er anbefalt for affinitet.
Mer informasjon
Dette datasettet er hentet fra myndighetene i New York City. Du finner ytterligere opplysninger her. Slå opp her for å se vilkårene til bruk av datasettet.
Merknader
MICROSOFT LEVERER AZURE OPEN DATASETS PÅ EN “SOM DE ER”-BASIS. MICROSOFT GIR INGEN GARANTIER, UTTRYKTE ELLER IMPLISERTE, ELLER BETINGELSER MED HENSYN TIL DIN BRUK AV DATASETTENE. I DEN GRAD LOKAL LOV TILLATER DET, FRASKRIVER MICROSOFT SEG ALT ANSVAR FOR EVENTUELLE SKADER ELLER TAP, INKLUDERT DIREKTE SKADE, FØLGESKADE, DOKUMENTERT ERSTATNINGSKRAV, INDIREKTE SKADE ELLER ERSTATNING UTOVER DET SOM VILLE VÆRE NORMALT, SOM FØLGE AV DIN BRUK AV DATASETTENE.
Dette datasettet leveres i henhold til de originale vilkårene Microsoft mottok kildedata. Datasettet kan inkludere data hentet fra Microsoft.
Access
Available in | When to use |
---|---|
Azure Notebooks | Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine. |
Azure Databricks | Use this when you need the scale of an Azure managed Spark cluster to process the dataset. |
Azure Synapse | Use this when you need the scale of an Azure managed Spark cluster to process the dataset. |
Preview
dataType | dataSubtype | dateTime | category | subcategory | status | address | latitude | longitude | source | extendedProperties |
---|---|---|---|---|---|---|---|---|---|---|
Safety | 311_All | 2/21/2021 1:59:24 AM | Noise - Residential | Loud Music/Party | In Progress | 14-74 BEACH CHANNEL DRIVE | 40.6100848832634 | -73.7535605680209 | null | |
Safety | 311_All | 2/21/2021 1:59:04 AM | Noise - Residential | Loud Music/Party | In Progress | 65 AINSLIE STREET | 40.712579294372 | -73.951902491108 | null | |
Safety | 311_All | 2/21/2021 1:58:48 AM | Noise - Residential | Banging/Pounding | In Progress | 271 PROSPECT PARK WEST | 40.6584006795199 | -73.9821267568679 | null | |
Safety | 311_All | 2/21/2021 1:58:47 AM | Blocked Driveway | No Access | In Progress | 101-21 92 STREET | 40.6832274842353 | -73.8483759439409 | null | |
Safety | 311_All | 2/21/2021 1:58:10 AM | Noise - Residential | Loud Music/Party | In Progress | 758 GREENWICH STREET | 40.7359964368603 | -74.0067404918206 | null | |
Safety | 311_All | 2/21/2021 1:57:54 AM | Noise - Residential | Loud Music/Party | In Progress | 346 54 STREET | 40.6447594131065 | -74.0158084063265 | null | |
Safety | 311_All | 2/21/2021 1:57:54 AM | Noise - Commercial | Loud Music/Party | In Progress | 227 EAST 19 STREET | 40.7358033279575 | -73.9835601756515 | null | |
Safety | 311_All | 2/21/2021 1:57:33 AM | Noise - Residential | Loud Music/Party | In Progress | 1010 SOUNDVIEW AVENUE | 40.8254817783779 | -73.8704142498206 | null | |
Safety | 311_All | 2/21/2021 1:57:16 AM | Noise - Residential | Loud Music/Party | In Progress | 272 NAGLE AVENUE | 40.8634229702802 | -73.9199280064557 | null | |
Safety | 311_All | 2/21/2021 1:56:36 AM | Noise - Residential | Loud Music/Party | In Progress | 153 EAST 33 STREET | 40.7455410958093 | -73.9797323203144 | null |
Name | Data type | Unique | Values (sample) | Description |
---|---|---|---|---|
address | string | 1,526,822 | 655 EAST 230 STREET 78-15 PARSONS BOULEVARD |
Husnummer og hendelsesadresse oppgitt av innsender. |
category | string | 445 | Noise - Residential HEAT/HOT WATER |
Dette er det første nivået i et hierarki som identifiserer emnet for hendelsen eller tilstanden (klagetype). Det kan ha en korresponderende underkategori (deskriptor) eller kan stå alene. |
dataSubtype | string | 1 | 311_All | “311_All” |
dataType | string | 1 | Safety | “Safety” |
dateTime | timestamp | 16,982,771 | 2013-01-24 00:00:00 2015-01-08 00:00:00 |
Dato tjenesteforespørselen ble opprettet. |
latitude | double | 1,492,892 | 40.89187241649303 40.72195913199264 |
Geobasert breddegrad for hendelsesstedet. |
longitude | double | 1,492,914 | -73.86016845296459 -73.80969682426189 |
Geobasert lengdegrad for hendelsesstedet. |
status | string | 13 | Closed Pending |
Status for innsendt tjenesteforespørsel. |
subcategory | string | 1,708 | Loud Music/Party ENTIRE BUILDING |
Dette er forbundet med kategorien (klagetype) og gir mer informasjon om hendelsen eller tilstanden. Verdiene er uavhengige av klagetypen og trengs ikke alltid i tjenesteforespørselen. |
Azure Notebooks
# This is a package in preview.
from azureml.opendatasets import NycSafety
from datetime import datetime
from dateutil import parser
end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_pandas_dataframe()
safety.info()
# Pip install packages
import os, sys
!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "citydatacontainer"
folder_name = "Safety/Release/city=NewYorkCity"
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
if azure_storage_account_name is None or azure_storage_sas_token is None:
raise Exception(
"Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")
print('Looking for the first parquet under the folder ' +
folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
container_url, azure_storage_sas_token if azure_storage_sas_token else None)
container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
targetBlobName = blob.name
break
print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
blob_client.download_blob().download_to_stream(local_file)
# Read the parquet file into Pandas data frame
import pandas as pd
print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
Azure Databricks
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NycSafety
from datetime import datetime
from dateutil import parser
end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
display(safety.limit(5))
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
blob_sas_token)
print('Remote blob path: ' + wasbs_path)
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))
Azure Synapse
# This is a package in preview.
from azureml.opendatasets import NycSafety
from datetime import datetime
from dateutil import parser
end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
# Display top 5 rows
display(safety.limit(5))
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
blob_sas_token)
print('Remote blob path: ' + wasbs_path)
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

City Safety
From the Urban Innovation Initiative at Microsoft Research, databricks notebook for analytics with safety data (311 and 911 call data) from major U.S. cities. Analyses show frequency distributions and geographic clustering of safety issues within cities.