탐색 건너뛰기

Boston Safety Data

Boston 311 CRM Case Management City Services Public Safety

보스턴시에 보고된 311 통화입니다.

BOS:311에 관해 자세히 알아보려면 이 링크를 참조하세요.

볼륨 및 보존

이 데이터 세트는 Parquet 형식으로 저장됩니다. 이 데이터 세트는 매일 업데이트되며 2019년 기준 총 약 10만 개의 행(10MB)을 포함합니다.

이 데이터 세트는 2011년부터 현재까지 누적된 기록 레코드를 포함합니다. SDK의 매개 변수 설정을 사용하여 특정 시간 범위의 데이터를 가져올 수 있습니다.

스토리지 위치

이 데이터 세트는 미국 동부 Azure 지역에 저장됩니다. 선호도를 위해 미국 동부에 컴퓨팅 리소스를 할당하는 것이 좋습니다.

추가 정보

이 데이터 세트는 보스턴시 정부에서 제공한 것입니다. 자세한 내용은 여기에서 확인할 수 있습니다. 이 데이터 세트 사용과 관련된 라이선스는 ODC PDDL(개방형 데이터 공통 공용 도메인 지정 및 라이선스)을 참조하세요.

알림

Microsoft는 Azure Open Datasets를 “있는 그대로” 제공합니다. Microsoft는 귀하의 데이터 세트 사용과 관련하여 어떠한 명시적이거나 묵시적인 보증, 보장 또는 조건을 제공하지 않습니다. 귀하가 거주하는 지역의 법규가 허용하는 범위 내에서 Microsoft는 귀하의 데이터 세트 사용으로 인해 발생하는 일체의 직접적, 결과적, 특별, 간접적, 부수적 또는 징벌적 손해 또는 손실을 비롯한 모든 손해 또는 손실에 대한 모든 책임을 부인합니다.

이 데이터 세트는 Microsoft가 원본 데이터를 받은 원래 사용 약관에 따라 제공됩니다. 데이터 세트에는 Microsoft가 제공한 데이터가 포함될 수 있습니다.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

dataType dataSubtype dateTime category subcategory status address latitude longitude source extendedProperties
Safety 311_All 5/12/2021 12:03:24 AM Animal Issues Animal Generic Request Open 295 Newbury St Boston MA 02115 42.3491 -71.0851 Constituent Call
Safety 311_All 5/11/2021 11:54:14 PM Enforcement & Abandoned Vehicles Parking Enforcement Open 48 Putnam St East Boston MA 02128 42.3813 -71.0333 Citizens Connect App
Safety 311_All 5/11/2021 11:52:00 PM Notification Notification Open 76 Bowdoin St Dorchester MA 02121 42.3018 -71.0721 Constituent Call
Safety 311_All 5/11/2021 11:49:55 PM Enforcement & Abandoned Vehicles Parking Enforcement Open INTERSECTION of Sheafe St & Snow Hill St Boston MA 42.3594 -71.0587 Citizens Connect App
Safety 311_All 5/11/2021 11:27:47 PM Enforcement & Abandoned Vehicles Parking Enforcement Open 40 Woolson St Mattapan MA 02126 42.2817 -71.0902 Citizens Connect App
Safety 311_All 5/11/2021 11:25:44 PM Enforcement & Abandoned Vehicles Parking Enforcement Open 32 Oak St Charlestown MA 02129 42.3809 -71.0687 Citizens Connect App
Safety 311_All 5/11/2021 11:19:57 PM Enforcement & Abandoned Vehicles Parking Enforcement Open 161 Townsend St Dorchester MA 02121 42.3173 -71.0879 Citizens Connect App
Safety 311_All 5/11/2021 11:14:16 PM Enforcement & Abandoned Vehicles Parking Enforcement Open 26 Trescott St Dorchester MA 02125 42.3157 -71.061 Citizens Connect App
Safety 311_All 5/11/2021 11:13:02 PM Highway Maintenance Sidewalk Repair (Make Safe) Open 217 W Springfield St Roxbury MA 02118 42.3407 -71.0804 Citizens Connect App
Safety 311_All 5/11/2021 11:12:33 PM Enforcement & Abandoned Vehicles Parking Enforcement Open 10 Wayne St Dorchester MA 02121 42.3069 -71.0851 Citizens Connect App
Name Data type Unique Values (sample) Description
address string 74,495 \" \"
1 City Hall Plz Boston MA 02108

위치입니다.

category string 50 Street Cleaning
Sanitation

서비스 요청의 이유입니다.

dataSubtype string 1 311_All

“311_All”

dataType string 1 Safety

“안전”

dateTime timestamp 167,133 2015-07-23 10:51:00
2015-07-23 10:47:00

서비스 요청의 시작 날짜 및 시간입니다.

latitude double 1,613 42.3594
42.3603

위도 값입니다. 위선은 적도와 평행입니다.

longitude double 1,783 -71.0587
-71.0583

경도 값입니다. 경선은 위선과 수직을 이루며 모두 양극을 통과합니다.

source string 7 Constituent Call
Citizens Connect App

사례의 원본입니다.

status string 2 Closed
Open

사례 상태입니다.

subcategory string 194 Parking Enforcement
Requests for Street Cleaning

서비스 요청의 유형입니다.

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import BostonSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = BostonSafety(start_date=start_date, end_date=end_date)
safety = safety.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe ActivityStarted, to_pandas_dataframe_in_worker Looking for parquet files... Reading them into Pandas dataframe... Reading Safety/Release/city=Boston/part-00196-tid-845600952581210110-a4f62588-4996-42d1-bc79-23a9b4635c63-447039.c000.snappy.parquet under container citydatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe_in_worker, HowEnded=Success, Duration=2213.69 [ms] ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=2216.01 [ms]
In [2]:
safety.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1 entries, 56262 to 56262 Data columns (total 11 columns): dataType 1 non-null object dataSubtype 1 non-null object dateTime 1 non-null datetime64[ns] category 1 non-null object subcategory 1 non-null object status 1 non-null object address 1 non-null object latitude 1 non-null float64 longitude 1 non-null float64 source 1 non-null object extendedProperties 0 non-null object dtypes: datetime64[ns](1), float64(2), object(8) memory usage: 96.0+ bytes
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "citydatacontainer"
folder_name = "Safety/Release/city=Boston"
In [3]:
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)
In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import BostonSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = BostonSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=2380.02 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=2381.75 [ms]
In [2]:
display(safety)
dataTypedataSubtypedateTimecategorysubcategorystatusaddresslatitudelongitudesourceextendedProperties
Safety311_All2015-07-24T12:48:24.000+0000Call InquiryOCR Front Desk InteractionsClosed 42.3594-71.0587Constituent Callnull
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=Boston"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python Python
In [1]:
from azureml.opendatasets import BostonSafety

from datetime import datetime
from dateutil import parser


end_date = parser.parse('2016-01-01')
start_date = parser.parse('2015-05-01')
safety = BostonSafety(start_date=start_date, end_date=end_date)
safety = safety.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=2380.02 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=2381.75 [ms]
In [2]:
display(safety)
dataTypedataSubtypedateTimecategorysubcategorystatusaddresslatitudelongitudesourceextendedProperties
Safety311_All2015-07-24T12:48:24.000+0000Call InquiryOCR Front Desk InteractionsClosed 42.3594-71.0587Constituent Callnull
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "citydatacontainer"
blob_relative_path = "Safety/Release/city=Boston"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

City Safety

From the Urban Innovation Initiative at Microsoft Research, databricks notebook for analytics with safety data (311 and 911 call data) from major U.S. cities. Analyses show frequency distributions and geographic clustering of safety issues within cities.