Přeskočit navigaci

New York City Safety Data

New York City Social Services 311 Service Requests City Government Public Safety

Všechny žádosti o služby 311 v New Yorku od roku 2010 až do současnosti.

Objem a uchovávání

Tato datová sada se uchovává ve formátu Parquet. Aktualizuje se každý den a k roku 2019 obsahuje celkem přibližně 12 milionů řádků (500 MB).

Tato datová sada obsahuje historické záznamy shromážděné od roku 2010 až do současnosti. Pomocí nastavení parametrů v naší sadě SDK můžete načíst data v určitém časovém rozsahu.

Umístění úložiště

Tato datová sada se uchovává v oblasti Azure Východní USA. Kvůli přidružení se doporučuje přidělovat výpočetní prostředky v oblasti Východní USA.

Další informace

Zdrojem této datové sady je vedení města New York. Další podrobnosti najdete tady. Podmínky používání této datové sady najdete tady.

Sdělení

MICROSOFT POSKYTUJE SLUŽBU AZURE OPEN DATASETS TAK, JAK JE. MICROSOFT V SOUVISLOSTI S VAŠÍM POUŽÍVÁNÍM DATOVÝCH SAD NEPOSKYTUJE ŽÁDNÉ ZÁRUKY, AŤ UŽ VÝSLOVNÉ NEBO PŘEDPOKLÁDANÉ, ANI JEJ NIJAK NEPODMIŇUJE. V ROZSAHU POVOLENÉM MÍSTNÍM ZÁKONEM MICROSOFT ODMÍTÁ JAKOUKOLI ODPOVĚDNOST ZA ŠKODY A ZTRÁTY ZPŮSOBENÉ VAŠÍM POUŽÍVÁNÍM DATOVÝCH SAD, VČETNĚ PŘÍMÝCH, NÁSLEDNÝCH, ZVLÁŠTNÍCH, NEPŘÍMÝCH, NÁHODNÝCH NEBO TRESTNÍCH ŠKOD.

Na tuto datovou sadu se vztahují původní podmínky, které Microsoft přijal se zdrojovými daty. Datová sada může obsahovat data pocházející z Microsoftu.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

dataType dataSubtype dateTime category subcategory status address latitude longitude source extendedProperties
Safety 311_All 2/21/2021 1:59:24 AM Noise - Residential Loud Music/Party In Progress 14-74 BEACH CHANNEL DRIVE 40.6100848832634 -73.7535605680209 null
Safety 311_All 2/21/2021 1:59:04 AM Noise - Residential Loud Music/Party In Progress 65 AINSLIE STREET 40.712579294372 -73.951902491108 null
Safety 311_All 2/21/2021 1:58:48 AM Noise - Residential Banging/Pounding In Progress 271 PROSPECT PARK WEST 40.6584006795199 -73.9821267568679 null
Safety 311_All 2/21/2021 1:58:47 AM Blocked Driveway No Access In Progress 101-21 92 STREET 40.6832274842353 -73.8483759439409 null
Safety 311_All 2/21/2021 1:58:10 AM Noise - Residential Loud Music/Party In Progress 758 GREENWICH STREET 40.7359964368603 -74.0067404918206 null
Safety 311_All 2/21/2021 1:57:54 AM Noise - Residential Loud Music/Party In Progress 346 54 STREET 40.6447594131065 -74.0158084063265 null
Safety 311_All 2/21/2021 1:57:54 AM Noise - Commercial Loud Music/Party In Progress 227 EAST 19 STREET 40.7358033279575 -73.9835601756515 null
Safety 311_All 2/21/2021 1:57:33 AM Noise - Residential Loud Music/Party In Progress 1010 SOUNDVIEW AVENUE 40.8254817783779 -73.8704142498206 null
Safety 311_All 2/21/2021 1:57:16 AM Noise - Residential Loud Music/Party In Progress 272 NAGLE AVENUE 40.8634229702802 -73.9199280064557 null
Safety 311_All 2/21/2021 1:56:36 AM Noise - Residential Loud Music/Party In Progress 153 EAST 33 STREET 40.7455410958093 -73.9797323203144 null
Name Data type Unique Values (sample) Description
address string 1,526,822 655 EAST 230 STREET
78-15 PARSONS BOULEVARD

Číslo domu na adrese incidentu uvedené odesílatelem

category string 445 Noise - Residential
HEAT/HOT WATER

Toto je první úroveň hierarchie označující téma incidentu nebo podmínky (Typ stížnosti). Může mít odpovídající podkategorii (Popisovač) nebo může zůstat samostatně.

dataSubtype string 1 311_All

311_All

dataType string 1 Safety

Safety

dateTime timestamp 16,982,771 2013-01-24 00:00:00
2015-01-08 00:00:00

Datum vytvoření žádosti o služby

latitude double 1,492,892 40.89187241649303
40.72195913199264

Zeměpisná šířka místa incidentu

longitude double 1,492,914 -73.86016845296459
-73.80969682426189

Zeměpisná délka místa incidentu

status string 13 Closed
Pending

Stav odeslané žádosti o služby

subcategory string 1,708 Loud Music/Party
ENTIRE BUILDING

Tento sloupec je přidružený ke kategorii (Typ stížnosti) a obsahuje další podrobnosti o incidentu nebo podmínce. Jeho hodnoty závisí na typu stížnosti a ne vždy se v žádostech o služby vyžadují.

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe ActivityStarted, to_pandas_dataframe_in_worker Looking for parquet files... Reading them into Pandas dataframe... Reading Safety/Release/city=NewYorkCity/part-00026-tid-845600952581210110-a4f62588-4996-42d1-bc79-23a9b4635c63-446869.c000.snappy.parquet under container citydatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe_in_worker, HowEnded=Success, Duration=106593.46 [ms] ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=106687.96 [ms]
In [2]:
safety.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1204035 entries, 7 to 12307252 Data columns (total 11 columns): dataType 1204035 non-null object dataSubtype 1204035 non-null object dateTime 1204035 non-null datetime64[ns] category 1204035 non-null object subcategory 1203974 non-null object status 1204035 non-null object address 1010833 non-null object latitude 1169358 non-null float64 longitude 1169358 non-null float64 source 0 non-null object extendedProperties 0 non-null object dtypes: datetime64[ns](1), float64(2), object(8) memory usage: 110.2+ MB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "citydatacontainer"
folder_name = "Safety/Release/city=NewYorkCity"
In [3]:
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)
In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=4392.11 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=4395.98 [ms]
In [2]:
display(safety.limit(5))
dataTypedataSubtypedateTimecategorysubcategorystatusaddresslatitudelongitudesourceextendedProperties
Safety311_All2015-12-28T13:58:58.000+0000HEAT/HOT WATERENTIRE BUILDINGClosed548 11 STREET40.664924841709606-73.98101480555805nullnull
Safety311_All2015-06-14T01:11:08.000+0000Noise - ResidentialLoud Music/PartyClosednull40.86969422534882-73.86620623861982nullnull
Safety311_All2015-06-14T04:47:37.000+0000Noise - ResidentialLoud TalkingClosednull40.858744389082254-73.93011726711445nullnull
Safety311_All2015-06-16T16:56:00.000+0000SewerCatch Basin Clogged/Flooding (Use Comments) (SC)Closed82 JEWETT AVENUE40.63510898432114-74.12886658384302nullnull
Safety311_All2015-06-22T14:03:05.000+0000ELECTRICLIGHTINGClosed2170 BATHGATE AVENUE40.852335329676464-73.89389734164266nullnull
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python Python
In [15]:
# This is a package in preview.
from azureml.opendatasets import NycSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = NycSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
In [16]:
# Display top 5 rows
display(safety.limit(5))
Out[16]:
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=NewYorkCity"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

City Safety

From the Urban Innovation Initiative at Microsoft Research, databricks notebook for analytics with safety data (311 and 911 call data) from major U.S. cities. Analyses show frequency distributions and geographic clustering of safety issues within cities.