Spring over navigation

US Population by County

US Census Population Decennial County

Den amerikanske befolkning efter køn og race for hver region i USA, der stammer fra folketællingen, der finder sted hvert 10. år, fra år 2000 og 2010.

Dette datasæt stammer fra det amerikanske Census Bureaus Decennial Census-datasæt-API’er. Gennemse servicebetingelserne samt politikker og meddelelser for at få oplysninger om betingelser og vilkår for brug af dette datasæt.

Mængde og opbevaring

Dette datasæt er lagret i Parquet-formatet og rummer data fra år 2000 og 2010.

Lagerplacering

Dette datasæt er gemt i Azure-området Det østlige USA. Tildeling af beregningsressourcer i det østlige USA anbefales af tilhørsmæssige årsager.

Relaterede datasæt

Meddelelser

MICROSOFT STILLER AZURE OPEN DATASETS TIL RÅDIGHED, SOM DE ER OG FOREFINDES. MICROSOFT FRASKRIVER SIG ETHVERT ANSVAR, UDTRYKKELIGT ELLER STILTIENDE, OG GARANTIER ELLER BETINGELSER MED HENSYN TIL BRUGEN AF DATASÆTTENE. I DET OMFANG DET ER TILLADT I HENHOLD TIL GÆLDENDE LOVGIVNING FRASKRIVER MICROSOFT SIG ETHVERT ANSVAR FOR SKADER ELLER TAB, INKLUSIVE DIREKTE, FØLGESKADER, SÆRLIGE SKADER, INDIREKTE SKADER, HÆNDELIGE SKADER ELLER PONALE SKADER, DER MÅTTE OPSTÅ I FORBINDELSE MED BRUG AF DATASÆTTENE.

Dette datasæt stilles til rådighed under de oprindelige vilkår, som Microsoft modtog kildedataene under. Datasættet kan indeholde data fra Microsoft.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

decennialTime stateName countyName population race sex year minAge maxAge
2000 Alabama Autauga County 7473 BLACK OR AFRICAN AMERICAN ALONE null 2000
2000 Alabama Autauga County 7 SOME OTHER RACE ALONE Female 2000 15 17
2000 Alabama Autauga County 452 WHITE ALONE Male 2000 18 19
2000 Alabama Autauga County 2 ASIAN ALONE Female 2000 20 20
2000 Alabama Autauga County 9 AMERICAN INDIAN AND ALASKA NATIVE ALONE Male 2000 35 39
2000 Alabama Autauga County 0 ASIAN ALONE Female 2000 60 61
2000 Alabama Autauga County 1 ASIAN ALONE Male 2000 10 14
2000 Alabama Autauga County 10 TWO OR MORE RACES Male 2000 30 34
2000 Alabama Autauga County 781 WHITE ALONE Female 2000 15 17
2000 Alabama Autauga County 2 TWO OR MORE RACES Female 2000 75 79
Name Data type Unique Values (sample) Description
countyName string 1,960 Washington County
Jefferson County

Navn på region.

decennialTime string 2 2010
2000

Det tidspunkt, hvor den folketælling, der foretages hvert 10. år, fandt sted, f.eks. 2010, 2000.

maxAge int 23 14
61

Maksimum for aldersinterval. Hvis værdien er null, er alle aldre inkluderet, eller også er der ingen øvre grænse for aldersintervallet, f.eks. > 85.

minAge int 23 35
10

Minimum for aldersinterval. Hvis værdien er null, er det på tværs af alle aldre.

population int 47,229 1
2

Befolkningstal for dette segment.

race string 8 ASIAN ALONE
TWO OR MORE RACES

Racekategori i Census-data. Hvis værdien er null, er det på tværs af alle racer.

sex string 3 Male
Female

Mand eller kvinde. Hvis værdien er null, er det på tværs af begge køn.

stateName string 52 Texas
Georgia

Navn på staten i USA.

year int 2 2010
2000

År (i heltal) for den tiårige periode.

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import UsPopulationCounty

population = UsPopulationCounty()
population_df = population.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe
ActivityStarted, to_pandas_dataframe_in_worker
Looking for parquet files...
Reading them into Pandas dataframe...
Reading release/us_population_county/year=2000/part-00177-tid-926394737839939592-51ecde30-440a-40fd-9b41-831814678ab5-1919150.c000.snappy.parquet under container censusdatacontainer
Reading release/us_population_county/year=2010/part-00178-tid-926394737839939592-51ecde30-440a-40fd-9b41-831814678ab5-1919151.c000.snappy.parquet under container censusdatacontainer
Done.
ActivityCompleted: Activity=to_pandas_dataframe_in_worker, HowEnded=Success, Duration=11624.4 [ms]
ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=11659.25 [ms]
In [2]:
population_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3664512 entries, 0 to 1855295
Data columns (total 8 columns):
decennialTime    object
stateName        object
countyName       object
population       int32
race             object
sex              object
minAge           float64
maxAge           float64
dtypes: float64(2), int32(1), object(5)
memory usage: 237.6+ MB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "censusdatacontainer"
folder_name = "release/us_population_county/"
In [3]:
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)
In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import UsPopulationCounty

population = UsPopulationCounty()
population_df = population.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=3770.1 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=3771.78 [ms]
In [2]:
display(population_df.limit(5))
decennialTimestateNamecountyNamepopulationracesexminAgemaxAgeyear
2010TexasCrockett County123WHITE ALONEMale592010
2010TexasCrockett County1ASIAN ALONEFemale67692010
2010TexasCrockett County111WHITE ALONEFemale55592010
2010TexasCrockett County64TWO OR MORE RACESnullnullnull2010
2010TexasCrockett County18nullMale85null2010
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "censusdatacontainer"
blob_relative_path = "release/us_population_county/"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python Python
In [39]:
# This is a package in preview.
from azureml.opendatasets import UsPopulationCounty

population = UsPopulationCounty()
population_df = population.to_spark_dataframe()
In [40]:
# Display top 5 rows
display(population_df.limit(5))
Out[40]:
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "censusdatacontainer"
blob_relative_path = "release/us_population_county/"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))