Ignora esplorazione

US Population by County

US Census Population Decennial County

Popolazione degli Stati Uniti in base a sesso ed etnia per ogni contea degli Stati Uniti derivata dal censimento decennale del 2000 e del 2010.

Il set di dati viene originato dalle API del set di dati del censimento decennale del United States Census Bureau. Per informazioni sui termini e sulle condizioni correlati all’uso del set di dati, vedi le condizioni per l’utilizzo e i criteri e informative.

Volume e conservazione

Il set di dati viene archiviato nel formato Parquet e include dati per l’anno 2000 e l’anno 2010.

Posizione di archiviazione

Questo set di dati è archiviato nell’area Stati Uniti orientali di Azure. L’allocazione delle risorse di calcolo nell’area Stati Uniti orientali è consigliata per motivi di affinità.

Set di dati correlati

Notifiche

MICROSOFT FORNISCE I SET DI DATI APERTI DI AZURE “COSÌ COME SONO”. MICROSOFT NON OFFRE ALCUNA GARANZIA O CONDIZIONE ESPLICITA O IMPLICITA RELATIVAMENTE ALL’USO DEI SET DI DATI DA PARTE DELL’UTENTE. NELLA MISURA MASSIMA CONSENTITA DALLE LEGGI LOCALI, MICROSOFT NON RICONOSCE ALCUNA RESPONSABILITÀ RELATIVAMENTE A DANNI O PERDITE COMMERCIALI, INCLUSI I DANNI DIRETTI, CONSEQUENZIALI, SPECIALI, INDIRETTI, INCIDENTALI O PUNITIVI DERIVANTI DALL’USO DEI SET DI DATI DA PARTE DELL’UTENTE.

Questo set di dati viene fornito in conformità con le condizioni originali in base alle quali Microsoft ha ricevuto i dati di origine. Il set di dati potrebbe includere dati provenienti da Microsoft.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

decennialTime stateName countyName population race sex minAge maxAge year
2010 Texas Crockett County 123 WHITE ALONE Male 5 9 2010
2010 Texas Crockett County 1 ASIAN ALONE Female 67 69 2010
2010 Texas Crockett County 111 WHITE ALONE Female 55 59 2010
2010 Texas Crockett County 64 TWO OR MORE RACES null 2010
2010 Texas Crockett County 18 null Male 85 2010
2010 Texas Crockett County 16 AMERICAN INDIAN AND ALASKA NATIVE ALONE Female 2010
2010 Texas Crockett County 7 WHITE ALONE Male 21 21 2010
2010 Texas Crockett County 45 null Female 85 2010
2010 Texas Crockett County 0 NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE Female 67 69 2010
2010 Texas Crockett County 4 SOME OTHER RACE ALONE Male 67 69 2010
Name Data type Unique Values (sample) Description
countyName string 1,960 Washington County
Jefferson County

Nome della contea.

decennialTime string 2 2010
2000

Data in cui è stato eseguito il censimento decennale, ad esempio 2010, 2000.

maxAge int 23 61
20

Valore massimo dell’intervallo di età. Se il valore è Null, è relativo a tutte le età o l’intervallo di date non prevede alcun limite superiore, ad esempio età > 85.

minAge int 23 5
55

Valore minimo della fascia di età. Se il valore è Null, è relativo a tutte le età.

population int 47,229 1
2

Popolazione di questo segmento.

race string 8 ASIAN ALONE
TWO OR MORE RACES

Categoria relativa all’etnia nei dati del censimento. Se il valore è Null, è relativo a tutte le etnie.

sex string 3 Female
Male

Maschio o femmina. Se il valore è Null, è relativo a tutti i sessi.

stateName string 52 Texas
Georgia

Nome dello stato negli Stati Uniti.

year int 2 2010
2000

Anno (in numeri interi) del periodo decennale.

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import UsPopulationCounty

population = UsPopulationCounty()
population_df = population.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe
ActivityStarted, to_pandas_dataframe_in_worker
Looking for parquet files...
Reading them into Pandas dataframe...
Reading release/us_population_county/year=2000/part-00177-tid-926394737839939592-51ecde30-440a-40fd-9b41-831814678ab5-1919150.c000.snappy.parquet under container censusdatacontainer
Reading release/us_population_county/year=2010/part-00178-tid-926394737839939592-51ecde30-440a-40fd-9b41-831814678ab5-1919151.c000.snappy.parquet under container censusdatacontainer
Done.
ActivityCompleted: Activity=to_pandas_dataframe_in_worker, HowEnded=Success, Duration=11624.4 [ms]
ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=11659.25 [ms]
In [2]:
population_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3664512 entries, 0 to 1855295
Data columns (total 8 columns):
decennialTime    object
stateName        object
countyName       object
population       int32
race             object
sex              object
minAge           float64
maxAge           float64
dtypes: float64(2), int32(1), object(5)
memory usage: 237.6+ MB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "censusdatacontainer"
folder_name = "release/us_population_county/"
In [3]:
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)
In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import UsPopulationCounty

population = UsPopulationCounty()
population_df = population.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=3770.1 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=3771.78 [ms]
In [2]:
display(population_df.limit(5))
decennialTimestateNamecountyNamepopulationracesexminAgemaxAgeyear
2010TexasCrockett County123WHITE ALONEMale592010
2010TexasCrockett County1ASIAN ALONEFemale67692010
2010TexasCrockett County111WHITE ALONEFemale55592010
2010TexasCrockett County64TWO OR MORE RACESnullnullnull2010
2010TexasCrockett County18nullMale85null2010
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "censusdatacontainer"
blob_relative_path = "release/us_population_county/"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python Python
In [39]:
# This is a package in preview.
from azureml.opendatasets import UsPopulationCounty

population = UsPopulationCounty()
population_df = population.to_spark_dataframe()
In [40]:
# Display top 5 rows
display(population_df.limit(5))
Out[40]:
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "censusdatacontainer"
blob_relative_path = "release/us_population_county/"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))