Ignorar Navegação

Microsoft News Recommendation Dataset

News Recommendation MIND

O MIcrosoft News Dataset (MIND) é um conjunto de dados de grande escala para a investigação de recomendação de notícias. Foi recolhido a partir de registos de comportamento anónimos do site Microsoft News. A missão do MIND é funcionar como um conjunto de dados de referência para a recomendação de notícias e para facilitar a investigação na área da recomendação de notícias e de sistemas de recomendação.

O MIND contém cerca de 160 mil artigos de notícias em inglês e mais de 15 milhões de registos de impressões gerados por um milhão de utilizadores. Todos os artigos noticiosos incluem conteúdo textual rico, incluindo título, resumo, corpo, categoria e entidades. Cada registo de impressão contém os eventos de cliques, os eventos sem clique e os comportamentos históricos de cliques em notícias desse utilizador antes desta impressão. Para proteger a privacidade dos utilizadores, cada utilizador foi desassociado do sistema de produção quando o ID foi tornado anónimo em segurança através de hash. Para obter informações mais detalhadas sobre o conjunto de dados MIND, pode ler o documento MIND: A Large-scale Dataset for News Recommendation (MIND: um conjunto de dados de grande escala para a recomendação de notícias).

Volume

Os dados de preparação e de validação são ambos uma pasta zip comprimida, que contêm quatro ficheiros diferentes:

Nome de Ficheiro Descrição
behaviors.tsv Os históricos de cliques e os registos de impressões dos utilizadores
news.tsv A informação dos artigos de notícias
entity_embedding.vec As incorporações de entidades em notícias extraídas do gráfico de conhecimentos
relation_embedding.vec As incorporações de relações entre entidades extraídas do gráfico de conhecimentos

behaviors.tsv

O ficheiro behaviors.tsv contém os registos de impressões e os históricos de cliques em notícias dos utilizadores. Tem cinco colunas divididas pelo símbolo de tabulação:

  • ID de Impressão. O ID de uma impressão.
  • ID de Utilizador. O ID anónimo de um utilizador.
  • Data. A hora da impressão no formato “MM/DD/AAAA HH:MM:SS”.
  • Histórico. O histórico de cliques em notícias (lista de IDs das notícias que foram clicadas) deste utilizador antes desta impressão.
  • Impressões. Lista de notícias apresentadas nesta impressão e os comportamentos de clique do utilizador nas mesmas (1 para clicadas, 0 para não clicadas).

Segue-se um exemplo na tabela abaixo:

Coluna Conteúdo
ID de Impressão 123
ID de Utilizador U131
Hora 11/13/2019 8:36:57
Histórico N11 N21 N103
Impressões N4-1 N34-1 N156-0 N207-0 N198-0

news.tsv

O ficheiro news.tsv contém as informações detalhadas dos artigos de notícias envolvidos no ficheiro behaviors.tsv. Tem sete colunas divididas pelo símbolo de tabulação:

  • ID da Notícia
  • Categoria
  • Subcategoria
  • Título
  • Abstract
  • URL
  • Entidades de Título (as entidades presentes no título desta notícia)
  • Entidades de Resumo (as entidades presentes no resumo desta notícia)

O corpo completo do conteúdo dos artigos de notícias do MSN não é disponibilizado para transferência, devido à estrutura de licenciamento. No entanto, para sua conveniência, disponibilizámos um script de utilitário para ajudar a analisar as páginas Web de notícias dos URLs do MSN no conjunto de dados. Devido às limitações de tempo, alguns URLs expiram e não é possível aceder aos mesmos com êxito. Estamos atualmente a trabalhar arduamente para resolver este problema.

É mostrado um exemplo na tabela seguinte:

Coluna Conteúdo
ID da Notícia N37378
Categoria desporto
Subcategoria golfe
Título Vencedores do PGA Tour
Abstract Galeria dos vencedores recentes do PGA Toru.
URL https://www.msn.com/en-us/sports/golf/pga-tour-winners/ss-AAjnQjj?ocid=chopendata
Entidades do Título [{“Label”: “PGA Tour”, “Type”: “O”, “WikidataId”: “Q910409”, “Confidence”: 1.0, “OccurrenceOffsets”: [0], “SurfaceForms”: [“PGA Tour”]}]
Entidades do Resumo [{“Label”: “PGA Tour”, “Type”: “O”, “WikidataId”: “Q910409”, “Confidence”: 1.0, “OccurrenceOffsets”: [35], “SurfaceForms”: [“PGA Tour”]}]

As descrições das chaves do dicionário na coluna “Entities” (“Entidades”) são listadas da seguinte forma:

Chaves Descrição
Etiqueta O nome da entidade no gráfico de conhecimentos de Wikidata
Tipo O tipo desta entidade em Wikidata
WikidataId O ID da entidade em Wikidata
Confiança A confiança da associação de entidades
OccurrenceOffsets O desvio de entidades ao nível de carateres no texto do título ou do resumo
SurfaceForms Os nomes das entidades em bruto no texto original

entity_embedding.vec & relation_embedding.vec

Os ficheiros entity_embedding.vec e relation_embedding.vec contêm as incorporações de 100 dimensões das entidades e das relações aprendidas com base no subgráfico (a partir do gráfico de conhecimentos do Wikidata) através do método TransE. Em ambos os ficheiros, a primeira coluna é o ID da entidade/relação e as restantes são os valores do vetor de incorporação. Esperamos que estes dados possam facilitar a investigação relativa à recomendação de notícias orientadas por conhecimentos. Apresentamos um exemplo abaixo:

ID Valores da Incorporação
Q42306013 0.014516 -0.106958 0.024590 … -0.080382

Devido a alguns motivos na aprendizagem das incorporações do subgráfico, é possível que algumas entidades não tenham incorporações no ficheiro entity_embedding.vec.

Localização do armazenamento

Os dados são armazenados em blobs no datacenter dos E.U.A. Oeste/Leste, no seguinte contentor de blobs: https://mind201910small.blob.core.windows.net/release/.

No contentor, os conjuntos de preparação e validação são comprimidos em MINDlarge_train.zip e MINDlarge_dev.zip respetivamente.

Informações adicionais

A transferência do conjunto de dados MIND é gratuita para fins de investigação ao abrigo dos Termos de Licenciamento da Microsoft Research. Se tiver dúvidas relativamente ao conjunto de dados, contacte mind@microsoft.com.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Select your preferred service:

Azure Notebooks

Azure Notebooks

Package: Language: Python

Demo notebook for accessing MIND data on Azure

This notebook provides an example of accessing MIND data from blob storage on Azure.

MIND data are stored in the West/East US data center, so this notebook will run more efficiently on the Azure compute located in West/East US.

Imports and environment

In [1]:
import os
import tempfile
import shutil
import urllib
import zipfile
import pandas as pd

# Temporary folder for data we need during execution of this notebook (we'll clean up
# at the end, we promise)
temp_dir = os.path.join(tempfile.gettempdir(), 'mind')
os.makedirs(temp_dir, exist_ok=True)

# The dataset is split into training and validation set, each with a large and small version.
# The format of the four files are the same.
# For demonstration purpose, we will use small version validation set only.
base_url = 'https://mind201910small.blob.core.windows.net/release'
training_small_url = f'{base_url}/MINDsmall_train.zip'
validation_small_url = f'{base_url}/MINDsmall_dev.zip'
training_large_url = f'{base_url}/MINDlarge_train.zip'
validation_large_url = f'{base_url}/MINDlarge_dev.zip'

Functions

In [2]:
def download_url(url,
                 destination_filename=None,
                 progress_updater=None,
                 force_download=False,
                 verbose=True):
    """
    Download a URL to a temporary file
    """
    if not verbose:
        progress_updater = None
    # This is not intended to guarantee uniqueness, we just know it happens to guarantee
    # uniqueness for this application.
    if destination_filename is None:
        url_as_filename = url.replace('://', '_').replace('/', '_')
        destination_filename = \
            os.path.join(temp_dir,url_as_filename)
    if (not force_download) and (os.path.isfile(destination_filename)):
        if verbose:
            print('Bypassing download of already-downloaded file {}'.format(
                os.path.basename(url)))
        return destination_filename
    if verbose:
        print('Downloading file {} to {}'.format(os.path.basename(url),
                                                 destination_filename),
              end='')
    urllib.request.urlretrieve(url, destination_filename, progress_updater)
    assert (os.path.isfile(destination_filename))
    nBytes = os.path.getsize(destination_filename)
    if verbose:
        print('...done, {} bytes.'.format(nBytes))
    return destination_filename

Download and extract the files

In [3]:
# For demonstration purpose, we will use small version validation set only.
# This file is about 30MB.
zip_path = download_url(validation_small_url, verbose=True)
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(temp_dir)

os.listdir(temp_dir)
Downloading file MINDsmall_dev.zip to C:\Users\wutao\AppData\Local\Temp\mind\https_mind201910small.blob.core.windows.net_release_MINDsmall_dev.zip...done, 30945572 bytes.
Out[3]:
['behaviors.tsv',
 'entity_embedding.vec',
 'https_mind201910small.blob.core.windows.net_release_MINDsmall_dev.zip',
 'news.tsv',
 'relation_embedding.vec']

Read the files with pandas

In [4]:
# The behaviors.tsv file contains the impression logs and users' news click histories. 
# It has 5 columns divided by the tab symbol:
# - Impression ID. The ID of an impression.
# - User ID. The anonymous ID of a user.
# - Time. The impression time with format "MM/DD/YYYY HH:MM:SS AM/PM".
# - History. The news click history (ID list of clicked news) of this user before this impression.
# - Impressions. List of news displayed in this impression and user's click behaviors on them (1 for click and 0 for non-click).
behaviors_path = os.path.join(temp_dir, 'behaviors.tsv')
pd.read_table(
    behaviors_path,
    header=None,
    names=['impression_id', 'user_id', 'time', 'history', 'impressions'])
Out[4]:
impression_id user_id time history impressions
0 1 U80234 11/15/2019 12:37:50 PM N55189 N46039 N51741 N53234 N11276 N264 N40716... N28682-0 N48740-0 N31958-1 N34130-0 N6916-0 N5...
1 2 U60458 11/15/2019 7:11:50 AM N58715 N32109 N51180 N33438 N54827 N28488 N611... N20036-0 N23513-1 N32536-0 N46976-0 N35216-0 N...
2 3 U44190 11/15/2019 9:55:12 AM N56253 N1150 N55189 N16233 N61704 N51706 N5303... N36779-0 N62365-0 N58098-0 N5472-0 N13408-0 N5...
3 4 U87380 11/15/2019 3:12:46 PM N63554 N49153 N28678 N23232 N43369 N58518 N444... N6950-0 N60215-0 N6074-0 N11930-0 N6916-0 N248...
4 5 U9444 11/15/2019 8:25:46 AM N51692 N18285 N26015 N22679 N55556 N5940-1 N23513-0 N49285-0 N23355-0 N19990-0 N3...
5 6 U69606 11/15/2019 1:24:44 PM N879 N19591 N63054 N53033 N54088 N34140 N14952... N29862-0 N48740-0 N11390-0 N5472-0 N53572-0 N2...
6 7 U70421 11/15/2019 5:12:12 AM N38118 N55189 N16233 N37942 N23105 N27526 N965... N42767-1 N30290-0 N36779-0 N20036-0 N32536-0 N...
7 8 U38418 11/15/2019 10:12:44 AM N29464 N17952 N19028 N28338 N31631 N35831 N609... N53687-0 N31289-0 N37458-0 N8455-0 N56211-0 N5...
8 9 U9568 11/15/2019 11:59:42 AM N53393 N61857 N17744 N62644 N28274 N63634 N503... N45612-0 N60939-1 N33397-0 N19685-0
9 10 U77860 11/15/2019 3:52:43 PM N55829 N2203 N3909 N18459 N59704 N9146 N33096 ... N29091-0 N60762-0 N29862-0 N512-0 N48740-0 N60...
10 11 U24918 11/15/2019 12:55:33 PM N10792 N63241 N26424 N49745 N23487 N46978 N623... N54125-0 N60799-0 N29862-0 N3159-0 N46749-0 N4...
11 12 U50227 11/15/2019 12:46:28 PM N30974 N7171 N2186 N22561 N51630 N55951 N12098... N49285-0 N38951-0 N62365-0 N28682-0 N5472-0 N2...
12 13 U46210 11/15/2019 7:15:13 AM N28079 N64634 N30867 N40977 N11101 N51597 N586... N32536-0 N13408-0 N5940-0 N35216-0 N30290-0 N2...
13 14 U47134 11/15/2019 6:19:16 AM N35809 N13395 N23284 N64496 N29177 N1150 N2613... N49554-0 N12446-0 N7342-0 N6638-0 N12320-0 N10...
14 15 U27387 11/15/2019 11:41:22 AM N26500 N37363 N7889 N47479 N45146 N60611 N3264... N55237-0 N29862-0 N6400-1 N31958-0 N5472-0 N21...
15 16 U18529 11/15/2019 6:13:34 AM N28850 N17770 N33458 N13724 N12760 N51396 N530... N16118-0 N28072-0 N37204-0 N6645-0 N57515-0 N1...
16 17 U15905 11/15/2019 7:17:27 AM N20633 N59496 N50770 N62058 N50710 N31958-1 N5940-0 N20187-0 N13408-0 N30290-0 N2...
17 18 U67590 11/15/2019 2:48:12 PM N29177 N54540 N54827 N28345-0 N6400-0 N21681-0 N24460-0 N3168-0 N60...
18 19 U8181 11/15/2019 2:41:52 PM N56253 N35009 N54590 N52709 N9808 N53803 N5518... N24802-0 N51470-0 N48740-0 N34130-0 N29091-0 N...
19 20 U35337 11/15/2019 7:48:26 AM N47261 N52097 N38118 N55189 N36128 N41251 N507... N30290-0 N23513-1 N29393-0 N20036-0 N19990-0 N...
20 21 U1987 11/15/2019 10:27:09 AM N6922 N59704 N54179 N30290-0 N5940-0 N31958-0 N36779-0 N13408-0 N5...
21 22 U36076 11/15/2019 6:06:23 AM NaN N20036-0 N5940-0 N30290-0 N31958-0 N36779-0 N3...
22 23 U39839 11/15/2019 8:23:05 AM N16233 N40545 N55326 N20413 N15300 N4866 N2876... N46976-1 N31958-1 N20036-0 N13408-0 N53283-0 N...
23 24 U44035 11/15/2019 6:54:41 PM N47685 N55189 N26633 N33969 N19741 N19293 N341... N37204-1 N48487-0 N59933-0 N512-0 N51776-0 N64...
24 25 U48086 11/15/2019 9:40:20 AM N29177 N2203 N32483 N3863 N32089 N57737 N35920... N37352-0 N30290-0 N19990-0 N58251-0 N50775-0 N...
25 26 U44340 11/15/2019 1:55:43 AM N39374 N54575 N49874 N52022 N30290-0 N32237-0 N41934-0 N42950-0 N22978-0 N...
26 27 U18041 11/15/2019 8:39:11 AM N12907 N18242 N51706 N50455 N59496 N64986 N456... N65145-0 N38498-0 N37233-0 N17513-0 N55712-0 N...
27 28 U12288 11/15/2019 7:07:47 AM N871 N51706 N45794 N28058 N43142 N51705 N31958-0 N23513-1 N46976-0 N35216-0 N32536-0 N...
28 29 U83282 11/15/2019 1:06:18 AM N42299 N15254 N63448 N20028 N12907 N18777 N474... N10616-0 N20729-0 N9284-0 N20036-0 N47612-0 N6...
29 30 U23227 11/15/2019 7:29:10 AM N53234 N55148 N12349 N11784-0 N42478-0 N496-0 N14507-0 N32237-0 N57...
... ... ... ... ... ...
73122 73123 U47451 11/15/2019 6:31:49 AM N26136 N10629 N5663 N35403 N6638-0 N29722-0 N60747-0 N60939-1 N16237-0 N2...
73123 73124 U597 11/15/2019 4:18:58 PM N10340 N54827 N52962 N29091-0 N45057-0 N512-0 N29862-0 N52492-0 N25...
73124 73125 U12371 11/15/2019 1:25:33 PM N28016 N10078 N27911 N43674 N60933 N21336 N112... N53572-0 N29091-0 N6400-0 N24802-0 N29862-1
73125 73126 U67841 11/15/2019 5:51:21 AM N50155 N9304 N13233 N43353 N29438 N51840 N6004... N11930-0 N32536-0 N48622-0 N50055-0 N53242-0 N...
73126 73127 U10526 11/15/2019 12:02:31 PM N548 N34947 N62993 N32421 N4893 N58504 N22561 ... N51470-0 N49285-0 N11150-0 N29862-0 N48740-0 N...
73127 73128 U75216 11/15/2019 11:04:54 AM N30616 N62287 N56447 N51238 N39263 N52227 N338... N50775-0 N51470-0 N62365-0 N14223-0 N5472-0 N4...
73128 73129 U55334 11/15/2019 8:13:16 AM N35877 N59137 N47810 N37509 N30864 N47458 N140... N31958-0 N29393-0 N19990-0 N36779-0 N5940-0 N3...
73129 73130 U9483 11/15/2019 6:10:05 AM N60844 N871 N18030 N34934 N26900 N619 N26250 N... N20036-0 N32536-0 N31958-0 N36779-0 N23513-1
73130 73131 U25339 11/15/2019 9:11:43 AM N11243 N28555 N46955 N33415 N33415 N63624 N972... N53283-0 N6638-0 N58098-0 N55237-0 N46976-0 N3...
73131 73132 U12185 11/15/2019 9:49:32 AM N58078 N46622 N11264 N42581 N29177 N53199 N418... N53283-0 N50775-0 N13408-0 N55036-0 N30290-0 N...
73132 73133 U89284 11/15/2019 2:07:43 AM N60681 N5104 N64273 N20036-1 N36779-0
73133 73134 U90086 11/15/2019 7:57:06 AM N1150 N16215 N51136 N60033 N45794 N46909 N5970... N46162-0 N41946-0 N57327-0 N53283-0 N12320-0 N...
73134 73135 U35810 11/15/2019 1:29:06 AM N36811 N56753 N21178 N38390 N29227 N41197 N156... N42767-0 N23535-0 N2960-0 N32237-0 N46917-1 N4...
73135 73136 U71833 11/15/2019 8:19:02 AM N1150 N55846 N871 N49745 N46978 N32483 N39778 ... N29393-0 N30290-1 N23513-0 N23355-0 N5940-0 N2...
73136 73137 U18219 11/15/2019 6:06:27 AM N51991 N50048 N25113 N20286 N58173 N21977 N562... N53615-0 N23767-0 N22978-0 N17807-0 N9284-0 N6...
73137 73138 U76851 11/15/2019 8:02:07 AM N43357 N62553 N47077 N29597 N16821 N40447 N624... N7556-0 N61829-0 N36786-0 N27289-0 N61697-0 N5...
73138 73139 U58475 11/15/2019 3:18:06 PM N39556 N17933 N7432 N17587 N63875 N23571 N1150... N48740-0 N46749-0 N512-0 N60675-0 N24802-0 N69...
73139 73140 U86046 11/15/2019 9:43:02 AM N36133 N49456 N14629 N5056 N28296 N18777 N1809... N53572-1 N37352-0
73140 73141 U5662 11/15/2019 7:35:08 AM N6956 N37387 N20997 N17102 N1966 N7812 N62703 ... N46162-0 N45912-0 N20036-0 N45289-0 N60939-0 N...
73141 73142 U37385 11/15/2019 5:37:22 AM N61471 N3046 N59173 N19048 N42526 N9226 N9120 ... N38915-0 N23513-0 N62992-0 N13347-0 N10051-0 N...
73142 73143 U35542 11/15/2019 2:39:43 AM N41618 N50454 N20335 N63260 N53538 N20039 N162... N20036-1 N36779-0
73143 73144 U11 11/15/2019 4:43:28 AM N31820 N4647 N5905 N33271 N49023 N18870 N49505-0 N39683-0 N29490-0 N53754-0 N40094-0 N...
73144 73145 U92886 11/15/2019 1:49:17 PM N28741 N35450 N31801 N8201 N14617 N5742 N18656... N59933-0 N1952-1 N60724-0 N55036-0 N44621-0 N2...
73145 73146 U23543 11/15/2019 9:17:23 AM N22570 N19638 N47525 N17210 N22161 N28058 N469... N59602-0 N58251-0 N19990-0 N53572-0 N5472-0 N2...
73146 73147 U67440 11/15/2019 10:27:45 AM N55312 N9897 N55846 N1569 N6956 N28501 N11641 ... N11930-0 N50775-0 N33439-0 N58251-0 N7556-0 N3...
73147 73148 U77536 11/15/2019 8:40:16 PM N28691 N8845 N58434 N37120 N22185 N60033 N4702... N496-0 N35159-0 N59856-0 N13270-0 N47213-0 N26...
73148 73149 U56193 11/15/2019 1:11:26 PM N4705 N58782 N53531 N46492 N26026 N28088 N3109... N49285-0 N31958-0 N55237-0 N42844-0 N29862-0 N...
73149 73150 U16799 11/15/2019 3:37:06 PM N40826 N42078 N15670 N15295 N64536 N46845 N52294 N7043-0 N512-0 N60215-1 N45057-0 N496-0 N37055...
73150 73151 U8786 11/15/2019 8:29:26 AM N3046 N356 N20483 N46107 N44598 N18693 N8254 N... N23692-0 N19990-0 N20187-0 N5940-0 N13408-0 N3...
73151 73152 U68182 11/15/2019 11:54:34 AM N20297 N53568 N4690 N60608 N43709 N43123 N1885... N29862-0 N5472-0 N21679-1 N6400-0 N53572-0 N50...

73152 rows × 5 columns

In [5]:
# The news.tsv file contains the detailed information of news articles involved in the behaviors.tsv file.
# It has 7 columns, which are divided by the tab symbol:
# - News ID
# - Category
# - Subcategory
# - Title
# - Abstract
# - URL
# - Title Entities (entities contained in the title of this news)
# - Abstract Entities (entities contained in the abstract of this news)
news_path = os.path.join(temp_dir, 'news.tsv')
pd.read_table(news_path,
              header=None,
              names=[
                  'id', 'category', 'subcategory', 'title', 'abstract', 'url',
                  'title_entities', 'abstract_entities'
              ])
Out[5]:
id category subcategory title abstract url title_entities abstract_entities
0 N55528 lifestyle lifestyleroyals The Brands Queen Elizabeth, Prince Charles, an... Shop the notebooks, jackets, and more that the... https://assets.msn.com/labs/mind/AAGH0ET.html [{"Label": "Prince Philip, Duke of Edinburgh",... []
1 N18955 health medical Dispose of unwanted prescription drugs during ... NaN https://assets.msn.com/labs/mind/AAISxPN.html [{"Label": "Drug Enforcement Administration", ... []
2 N61837 news newsworld The Cost of Trump's Aid Freeze in the Trenches... Lt. Ivan Molchanets peeked over a parapet of s... https://assets.msn.com/labs/mind/AAJgNsz.html [] [{"Label": "Ukraine", "Type": "G", "WikidataId...
3 N53526 health voices I Was An NBA Wife. Here's How It Affected My M... I felt like I was a fraud, and being an NBA wi... https://assets.msn.com/labs/mind/AACk2N6.html [] [{"Label": "National Basketball Association", ...
4 N38324 health medical How to Get Rid of Skin Tags, According to a De... They seem harmless, but there's a very good re... https://assets.msn.com/labs/mind/AAAKEkt.html [{"Label": "Skin tag", "Type": "C", "WikidataI... [{"Label": "Skin tag", "Type": "C", "WikidataI...
5 N2073 sports football_nfl Should NFL be able to fine players for critici... Several fines came down against NFL players fo... https://assets.msn.com/labs/mind/AAJ4lap.html [{"Label": "National Football League", "Type":... [{"Label": "National Football League", "Type":...
6 N11429 news newsscienceandtechnology How to record your screen on Windows, macOS, i... The easiest way to record what's happening on ... https://assets.msn.com/labs/mind/AADlomf.html [{"Label": "Microsoft Windows", "Type": "J", "... []
7 N49186 weather weathertopstories It's been Orlando's hottest October ever so fa... There won't be a chill down to your bones this... https://assets.msn.com/labs/mind/AAJwoxD.html [{"Label": "Orlando, Florida", "Type": "G", "W... [{"Label": "Orlando, Florida", "Type": "G", "W...
8 N2131 health weightloss This Guy Altered His Diet and Training to Drop... Take Brandon Reid's advice: "Don't worry what ... https://assets.msn.com/labs/mind/AAGBR44.html [] []
9 N59295 news newsworld Chile: Three die in supermarket fire amid prot... Three people have died in a supermarket fire a... https://assets.msn.com/labs/mind/AAJ43pw.html [{"Label": "Chile", "Type": "G", "WikidataId":... [{"Label": "Santiago", "Type": "G", "WikidataI...
10 N24510 entertainment gaming Best PS5 games: top PlayStation 5 titles to lo... Every confirmed or expected PS5 game we can't ... https://assets.msn.com/labs/mind/AACHUn8.html [{"Label": "PlayStation", "Type": "J", "Wikida... []
11 N59883 foodanddrink recipes This Roasted Squash Panzanella Is the Perfect ... Introducing the perfect way to balance out you... https://assets.msn.com/labs/mind/AAAAPoj.html [{"Label": "Christmas", "Type": "H", "Wikidata... []
12 N9721 health nutrition 50 Foods You Should Never Eat, According to He... This is so depressing. https://assets.msn.com/labs/mind/AABDHTv.html [] []
13 N60905 autos autosenthusiasts Trying to Make a Ram 3500 as Quick as a Viper ... The 2019 Ram 3500's new Cummins diesel has 100... https://assets.msn.com/labs/mind/AADKhPQ.html [{"Label": "Ram Pickup", "Type": "V", "Wikidat... [{"Label": "Ram Pickup", "Type": "V", "Wikidat...
14 N16587 sports football_nfl Rye football wins 2019 rendition of The Game, ... After going into halftime tied, the Garnets re... https://assets.msn.com/labs/mind/AAIGv0N.html [] []
15 N28361 health wellness Instagram Filters with Plastic Surgery-Inspire... In an effort to combat some of the negative me... https://assets.msn.com/labs/mind/AAJaBOM.html [{"Label": "Instagram", "Type": "W", "Wikidata... []
16 N18680 health health-news Michigan apple recall: Nearly 2,300 crates cou... A Michigan produce company has recalled nearly... https://assets.msn.com/labs/mind/AAJwfO8.html [{"Label": "Michigan", "Type": "G", "WikidataI... [{"Label": "Michigan", "Type": "G", "WikidataI...
17 N55610 lifestyle lifestyleroyals Kate Middleton's Best Hairstyles Through the Y... The Duchess of Cambridge knows her way around ... https://assets.msn.com/labs/mind/AAEBVOU.html [{"Label": "Catherine, Duchess of Cambridge", ... [{"Label": "Catherine, Duchess of Cambridge", ...
18 N35621 entertainment celebrity Stars who got fired from major projects Take a look back at the celebs who got the boo... https://assets.msn.com/labs/mind/AABv6WU.html [] []
19 N22850 travel travelarticle Newark Liberty Airport's Terminal One a $2.7 b... The project, which is the bi-state agency's si... https://assets.msn.com/labs/mind/AAJfTqo.html [{"Label": "Newark Liberty International Airpo... []
20 N58173 autos autossuvs Is This The 2021 GMC Yukon Denali? A Motor1.com reader sent this to us, and it su... https://assets.msn.com/labs/mind/AAGZhlc.html [{"Label": "Chevrolet Tahoe", "Type": "V", "Wi... [{"Label": "Motorsport Network", "Type": "O", ...
21 N29120 sports football_nfl John Dorsey admits talks with Washington, but ... Team officials in Washington "emphatically" de... https://assets.msn.com/labs/mind/AAISxPW.html [{"Label": "John Dorsey (American football)", ... [{"Label": "John Dorsey (American football)", ...
22 N9786 news newspolitics Elijah Cummings to lie in state at US Capitol ... Cummings, a Democrat whose district included s... https://assets.msn.com/labs/mind/AAJgNxm.html [{"Label": "Elijah Cummings", "Type": "P", "Wi... [{"Label": "Elijah Cummings", "Type": "P", "Wi...
23 N46481 sports more_sports Michigan finally shows some fight, but can't s... UNIVERSITY PARK, Pa. -- Fans on their way out ... https://assets.msn.com/labs/mind/AAJ4lmA.html [] [{"Label": "Beaver Stadium", "Type": "S", "Wik...
24 N47705 travel traveltripideas 17 Abandoned Theme Parks to Explore for Thrill... Disney, Six Flags, and even the Flintstones ha... https://assets.msn.com/labs/mind/AADlunl.html [{"Label": "Amusement park", "Type": "C", "Wik... [{"Label": "Six Flags", "Type": "O", "Wikidata...
25 N1834 video animals Dog dies protecting Florida children from a de... The Richardson family shares home video and ph... https://assets.msn.com/labs/mind/AAI33em.html [{"Label": "Florida", "Type": "G", "WikidataId... []
26 N3574 autos autosnews Ford Bronco Test Mule Spotted Flexing Its Musc... It still won't be offered in the Land Down Und... https://assets.msn.com/labs/mind/AAGBWeL.html [{"Label": "Ford Bronco", "Type": "V", "Wikida... []
27 N42474 news newsbusiness Trump's Trustbusters Bring Microsoft Lessons t... DOJ's Makan Delrahim and the FTC's Joe Simons ... https://assets.msn.com/labs/mind/AACI1SK.html [{"Label": "Big Four tech companies", "Type": ... [{"Label": "Makan Delrahim", "Type": "P", "Wik...
28 N64498 sports golf PGA Tour winners A gallery of recent winners on the PGA Tour. https://assets.msn.com/labs/mind/AAjnQjj.html [{"Label": "PGA Tour", "Type": "O", "WikidataI... [{"Label": "PGA Tour", "Type": "O", "WikidataI...
29 N59538 foodanddrink newstrends Nashville restaurants: Ms. Cheap rounds up lun... Ms. Cheap rounds up Nashville restaurant lunch... https://assets.msn.com/labs/mind/AABDPUi.html [{"Label": "Nashville, Tennessee", "Type": "G"... [{"Label": "Nashville, Tennessee", "Type": "G"...
... ... ... ... ... ... ... ... ...
42386 N19666 news newsus Ohio BCI investigating officer-involved shooti... Investigators are at the scene of an officer-i... https://assets.msn.com/labs/mind/BBWsaAY.html [] [{"Label": "Ohio Attorney General", "Type": "K...
42387 N40542 news newsscienceandtechnology People puzzled by peculiar texts, and no one c... If you woke up Thursday to a weird text that s... https://assets.msn.com/labs/mind/BBWsaXd.html [] [{"Label": "United States", "Type": "G", "Wiki...
42388 N22213 news newsworld Mexican cartels 'worse than ISIS': massacre vi... Mexican cartels 'worse than ISIS': massacre vi... https://assets.msn.com/labs/mind/BBWsaYy.html [{"Label": "Islamic State of Iraq and the Leva... [{"Label": "Islamic State of Iraq and the Leva...
42389 N10997 news newsus 'If we don't condemn it, we condone it': Victi... ROSELLE, Ill. The head football coach at Lak... https://assets.msn.com/labs/mind/BBWsaer.html [] [{"Label": "Lake Park High School", "Type": "F...
42390 N58014 sports basketball_nba LaMarcus Aldridge's big night lifts Spurs past... In the moments before the Spurs' 121-112 victo... https://assets.msn.com/labs/mind/BBWsakC.html [{"Label": "LaMarcus Aldridge", "Type": "P", "... [{"Label": "LaMarcus Aldridge", "Type": "P", "...
42391 N53002 sports football_nfl Twitter reacts to Chargers' loss to Raiders NaN https://assets.msn.com/labs/mind/BBWsb16.html [{"Label": "Oakland Raiders", "Type": "O", "Wi... []
42392 N47373 weather weathertopstories Amid freezing temperatures, firefighters make ... INDIANAPOLIS, Ind. Capt. Michael Pruitt with... https://assets.msn.com/labs/mind/BBWsb26.html [] [{"Label": "Indianapolis", "Type": "G", "Wikid...
42393 N2292 news newspolitics House investigators release more impeachment t... House Democrats released new transcripts of Tr... https://assets.msn.com/labs/mind/BBWsb2c.html [] [{"Label": "Donald Trump", "Type": "P", "Wikid...
42394 N22044 news newsus Prosecutor's letter details fatal KC police sh... The Jackson County Prosecutor's Office has rel... https://assets.msn.com/labs/mind/BBWsbOB.html [{"Label": "Grand jury", "Type": "C", "Wikidat... [{"Label": "Grand jury", "Type": "C", "Wikidat...
42395 N27291 lifestyle lifestylebuzz Mural in Downtown S.F. Depicts Swedish Teen Cl... The newest addition to the San Francisco skyli... https://assets.msn.com/labs/mind/BBWsbQg.html [{"Label": "Greta Thunberg", "Type": "N", "Wik... [{"Label": "Greta Thunberg", "Type": "N", "Wik...
42396 N36948 news newsworld Australian tourist killed by elephant in Namibia A 59-year-old Australian tourist has been kill... https://assets.msn.com/labs/mind/BBWsbZq.html [{"Label": "Namibia", "Type": "G", "WikidataId... [{"Label": "Namibia", "Type": "G", "WikidataId...
42397 N48840 news newsus Famed Hollywood Boulevard Superman Christopher... He was the Walk of Fame street performer who f... https://assets.msn.com/labs/mind/BBWsbkb.html [{"Label": "Hollywood Boulevard", "Type": "S",... [{"Label": "Hollywood Walk of Fame", "Type": "...
42398 N52871 news newsworld Residents of Mexican town struggle with fear a... A mother and two sons were laid to rest in han... https://assets.msn.com/labs/mind/BBWscU8.html [{"Label": "Mexico, New York", "Type": "G", "W... [{"Label": "Mexico", "Type": "G", "WikidataId"...
42399 N36658 news newsus Apartments for rent in Minneapolis: What will ... Curious just how far your dollar goes in Minne... https://assets.msn.com/labs/mind/BBWscWw.html [{"Label": "Minneapolis", "Type": "G", "Wikida... [{"Label": "Minneapolis", "Type": "G", "Wikida...
42400 N32558 news elections-2020-us Trump campaign launching black outreach effort... WASHINGTON (AP) During the 2016 campaign, ca... https://assets.msn.com/labs/mind/BBWsd7A.html [{"Label": "Donald Trump", "Type": "P", "Wikid... [{"Label": "Donald Trump", "Type": "P", "Wikid...
42401 N12470 news newscrime 3 teens shot in Northeast DC Thursday Thursday was a violent night in the District, ... https://assets.msn.com/labs/mind/BBWse0x.html [{"Label": "Northeast (Washington, D.C.)", "Ty... [{"Label": "Metropolitan Police Department of ...
42402 N25642 news newspolitics Texas custody battle fuels debate over transge... A custody battle between two parents has led t... https://assets.msn.com/labs/mind/BBWseUG.html [{"Label": "Texas", "Type": "G", "WikidataId":... [{"Label": "Ted Cruz", "Type": "P", "WikidataI...
42403 N20845 news newsus Police say 23-year-old man allegedly fled stat... HALL COUNTY, Ga. (CBS46) -- Fourteen-year-old ... https://assets.msn.com/labs/mind/BBWsf0T.html [] [{"Label": "Hall County, Georgia", "Type": "G"...
42404 N16016 health healthnews More than 130,000 pounds of ground beef recall... About 130,464 pounds of raw ground beef produc... https://assets.msn.com/labs/mind/BBWsnTh.html [{"Label": "Ground beef", "Type": "U", "Wikida... [{"Label": "Ground beef", "Type": "U", "Wikida...
42405 N25854 finance finance-companies Billionaire-controlled companies outperform, s... Billionaires tend to be good at making money f... https://assets.msn.com/labs/mind/BBWso4u.html [{"Label": "UBS", "Type": "O", "WikidataId": "... []
42406 N7618 autos autosnews Ford v Ferrari: the real story The film about the epic Le Mans rivalry promis... https://assets.msn.com/labs/mind/BBWylQ7.html [{"Label": "Ford v Ferrari", "Type": "N", "Wik... []
42407 N16804 music music-celebrity Neil Young Says U.S. Citizenship Application D... Neil Young has revealed that his attempts to a... https://assets.msn.com/labs/mind/BBWylZD.html [{"Label": "Neil Young", "Type": "P", "Wikidat... [{"Label": "Neil Young", "Type": "P", "Wikidat...
42408 N19926 lifestyle lifestylebeauty These Haircuts Are Going to be Huge in 2020 Bring on the bobs, lobs, and bangs! It's time ... https://assets.msn.com/labs/mind/BBWymAg.html [{"Label": "Hairstyle", "Type": "C", "Wikidata... []
42409 N42491 movies movies-celebrity Roman Polanski Denies Rape Allegation by Valen... Responding to French actress Valentine Monnier... https://assets.msn.com/labs/mind/BBWysN3.html [{"Label": "Roman Polanski", "Type": "P", "Wik... [{"Label": "Roman Polanski", "Type": "P", "Wik...
42410 N13097 movies movienews Marvel's Kevin Feige Breaks Silence on Scorses... In his first public comments about the debate ... https://assets.msn.com/labs/mind/BBWywC9.html [{"Label": "Martin Scorsese", "Type": "P", "Wi... [{"Label": "Marvel Cinematic Universe", "Type"...
42411 N63550 lifestyle lifestyleroyals Why Kate & Meghan Were on Different Balconies ... There's no scandal here. It's all about the or... https://assets.msn.com/labs/mind/BBWyynu.html [{"Label": "Meghan, Duchess of Sussex", "Type"... []
42412 N30345 entertainment entertainment-celebrity See the stars at the 2019 Baby2Baby gala Stars like Chrissy Teigen and Kate Hudson supp... https://assets.msn.com/labs/mind/BBWyz7N.html [] [{"Label": "Kate Hudson", "Type": "P", "Wikida...
42413 N30135 news newsgoodnews Tennessee judge holds lawyer's baby as he swea... Tennessee Court of Appeals Judge Richard Dinki... https://assets.msn.com/labs/mind/BBWyzI8.html [{"Label": "Tennessee", "Type": "G", "Wikidata... [{"Label": "Tennessee Court of Appeals", "Type...
42414 N44276 autos autossports Best Sports Car Deals for October NaN https://assets.msn.com/labs/mind/BBy5rVe.html [{"Label": "Peugeot RCZ", "Type": "V", "Wikida... []
42415 N39563 sports more_sports Shall we dance: Sports stars shake their leg NaN https://assets.msn.com/labs/mind/BBzMpnG.html [] []

42416 rows × 8 columns

In [6]:
# The entity_embedding.vec file contains the 100-dimensional embeddings
# of the entities learned from the subgraph by TransE method.
# The first column is the ID of entity, and the other columns are the embedding vector values.
entity_embedding_path = os.path.join(temp_dir, 'entity_embedding.vec')
entity_embedding = pd.read_table(entity_embedding_path, header=None)
entity_embedding['vector'] = entity_embedding.iloc[:, 1:101].values.tolist()
entity_embedding = entity_embedding[[0,
                                     'vector']].rename(columns={0: "entity"})
entity_embedding
Out[6]:
entity vector
0 Q34433 [0.017808, -0.07325599999999999, 0.102521, -0....
1 Q41 [-0.063388, -0.181451, 0.057501, -0.091254, -0...
2 Q56037 [0.02155, -0.044888, -0.027872000000000004, -0...
3 Q1860 [0.060958000000000005, 0.06993400000000001, 0....
4 Q39631 [-0.093106, -0.052002, 0.020556, -0.020801, 0....
5 Q30 [-0.11573699999999999, -0.17911300000000002, 0...
6 Q60 [-0.051036, -0.16563699999999998, 0.132802, -0...
7 Q183 [0.052779999999999994, -0.139523, -0.027571, -...
8 Q2736 [-0.091826, -0.021255, -0.049415, -0.167199000...
9 Q21198 [-0.096089, -0.0068379999999999995, -0.0278399...
10 Q131524 [0.0043479999999999994, 0.028879000000000002, ...
11 Q12788174 [-0.080052, 0.11183199999999999, 0.076317, -0....
12 Q142 [0.044556, 0.005308, 0.135917, -0.203567, 0.02...
13 Q298 [0.013886, -0.08859199999999999, -0.004194, -0...
14 Q2887 [0.039070999999999995, -0.076391, 0.015528, 0....
15 Q155 [0.034837, -0.18423399999999998, 0.050972, -0....
16 Q15180 [-0.11381199999999998, 0.003113, -0.0070030000...
17 Q408 [0.030718000000000002, -0.167667, -0.032142000...
18 Q177220 [-0.12244100000000001, 0.115449, 0.04222300000...
19 Q38 [0.070083, -0.18845, 0.141631, -0.040243, 0.08...
20 Q752297 [-0.011161, -0.049322000000000005, 0.035052, -...
21 Q29 [-0.0057729999999999995, -0.166473, 0.05220900...
22 Q2807 [-0.036898, -0.150758, 0.03325, -0.0336, 0.017...
23 Q336286 [0.028158999999999997, 0.136965, -0.0331220000...
24 Q33506 [0.029741000000000004, -0.06346900000000001, 0...
25 Q37226 [-0.180374, 0.009042, -0.075288, -0.094787, 0....
26 Q213 [-0.052516, -0.089791, 0.07205, -0.063529, -0....
27 Q90 [0.028043000000000002, 0.016425, 0.12071099999...
28 Q11424 [-0.166153, -0.213444, -0.035954, -0.061191999...
29 Q25089 [-0.033586000000000005, -0.039048, 0.062833, 0...
... ... ...
22863 Q52424113 [-0.015946000000000002, -0.043258, -0.063225, ...
22864 Q51849371 [-0.015587, -0.029361, -0.005646, 0.016144, 0....
22865 Q52590967 [-0.01426, -0.030755, -0.060229, 0.010758, -0....
22866 Q65923113 [0.016715, 0.013468, -0.03318, -0.041919, 0.00...
22867 Q65935365 [-0.058651999999999996, -0.017408, -0.039342, ...
22868 Q52715530 [-0.039839, -0.037683, -0.069898, -0.052635, 0...
22869 Q52247498 [-0.017968, -0.022007, -0.0462, 0.020104, -0.0...
22870 Q52290003 [0.015207, 0.0025440000000000003, -0.052101999...
22871 Q52290425 [-0.084103, -0.062223, -0.017676, 0.022031, -0...
22872 Q52298665 [-0.019474, -0.056829, -0.078429, 0.0321269999...
22873 Q66363416 [0.047139999999999994, -0.030843000000000002, ...
22874 Q41571596 [0.023031, -0.029602999999999997, 0.01, -0.026...
22875 Q40234791 [0.016026, -0.042275, -0.011025, 0.033408, -0....
22876 Q53567614 [0.008826, -0.008437, -0.071004, 0.007258, -0....
22877 Q53583116 [0.00042300000000000004, -0.01745, 0.009059000...
22878 Q53720737 [0.069862, 0.059307000000000006, 0.02590400000...
22879 Q41788679 [-0.032408, 0.025944, -0.014616999999999998, 0...
22880 Q41790042 [-0.066249, 0.027369, -0.037001, 0.023492, -0....
22881 Q52985750 [-0.065515, -0.014433000000000001, -0.050944, ...
22882 Q53236501 [0.025514, 0.020045, 0.026060000000000003, 0.0...
22883 Q54621943 [-0.014662999999999999, -0.05322999999999999, ...
22884 Q54861465 [-0.002764, -0.021321, -0.020571000000000002, ...
22885 Q54952832 [0.040638, 0.050203, -0.011036, -0.008562, -0....
22886 Q53870565 [0.001374, -0.02647, -0.026286, 0.018, -0.0510...
22887 Q54085113 [0.016056, 0.019562, 0.011992000000000001, -0....
22888 Q278846 [0.042413, 0.021957, 0.07241399999999999, -0.0...
22889 Q54621949 [-0.018299, -0.048378, -0.021644999999999998, ...
22890 Q42225228 [-0.051346, -0.028947000000000004, -0.07587, 0...
22891 Q54862508 [-0.052323, -0.078029, -0.060925, -0.052536, 0...
22892 Q42301562 [-0.00519, -0.047871, 0.009753, -0.0215, -4.9e...

22893 rows × 2 columns

In [7]:
# The relation_embedding.vec file contains the 100-dimensional embeddings
# of the relations learned from the subgraph by TransE method.
# The first column is the ID of relation, and the other columns are the embedding vector values.
relation_embedding_path = os.path.join(temp_dir, 'relation_embedding.vec')
relation_embedding = pd.read_table(relation_embedding_path, header=None)
relation_embedding['vector'] = relation_embedding.iloc[:,
                                                       1:101].values.tolist()
relation_embedding = relation_embedding[[0, 'vector'
                                         ]].rename(columns={0: "relation"})
relation_embedding
Out[7]:
relation vector
0 P31 [-0.07346699999999999, -0.132227, 0.034173, -0...
1 P21 [-0.078436, 0.108589, -0.049429, -0.131355, 0....
2 P106 [-0.052137, 0.052444000000000005, -0.019886, -...
3 P735 [-0.051398, 0.056219000000000005, 0.0680289999...
4 P108 [0.09123099999999999, 0.022525999999999997, 0....
5 P101 [-0.03845, 0.053671, -0.063569, -0.150071, 0.0...
6 P69 [0.070871, 0.017891999999999998, 0.071605, -0....
7 P27 [-0.001034, -0.071413, 0.078409, -0.1355869999...
8 P19 [0.00088, -0.047513, 0.055876, -0.066817999999...
9 P1412 [0.030543999999999998, 0.149371, 0.01523600000...
10 P1343 [0.11008299999999999, 0.10230399999999999, -0....
11 P20 [0.013021000000000001, -0.046431, 0.0827, -0.0...
12 P509 [0.141286, 0.030367, 0.109865, -0.124899, 0.10...
13 P1196 [0.081161, -0.0418, 0.09615, -0.10577, -0.0562...
14 P734 [0.039076, -0.021949, 0.016378, -0.02883500000...
15 P17 [-0.066804, -0.157604, 0.013737000000000001, -...
16 P641 [-0.016671000000000002, -0.119618, -0.015854, ...
17 P463 [0.0344, 0.075337, -0.022096, -0.171531, 0.001...
18 P131 [-0.12283, -0.14671199999999998, 0.01054200000...
19 P159 [-0.095321, -0.141079, -0.011192, -0.077814, -...
20 P39 [-0.0519, 0.06034, -0.058713, -0.1645859999999...
21 P3373 [-0.008519, 0.001118, 0.00867, 0.000638, 0.006...
22 P551 [-0.039375, -0.051916, 0.053480999999999994, -...
23 P793 [0.13571, 0.180002, 0.029276, 0.12283699999999...
24 P2094 [0.042525, 0.11483399999999999, -0.009143, -0....
25 P1344 [0.122527, 0.14366600000000002, 0.018272999999...
26 P1303 [-0.034934, 0.072995, -0.010043999999999999, -...
27 P512 [0.112223, 0.078682, 0.06856699999999999, -0.1...
28 P84 [-0.028439999999999997, -0.07585599999999999, ...
29 P466 [0.000472, -0.057613, -0.0081, 0.074849, 0.058...
... ... ...
1061 P3005 [-0.022373, -0.031122000000000004, -0.00252100...
1062 P1777 [-0.025655, -0.021695, -0.022119, 0.021091, -0...
1063 P7406 [0.008973, -0.015265, 0.006358, -0.013365, 0.0...
1064 P2286 [0.04372, 0.010529, -0.004605, -0.024543000000...
1065 P922 [0.014927000000000001, -0.0032990000000000003,...
1066 P4843 [0.003708, 0.015212, -0.020975, -0.010442, -0....
1067 P4424 [-0.06400700000000001, -0.056288, -0.00518, -0...
1068 P2366 [0.016396, 0.046333, -0.022376, 0.00665, -0.02...
1069 P4988 [-0.006942, 0.000916, 0.022667, -0.025558, -0....
1070 P3190 [0.013193999999999999, -0.017598, -0.015018, 0...
1071 P1318 [-0.039105, -0.01598, 0.01339, 0.005304, -0.01...
1072 P1437 [0.028199000000000002, -0.0006900000000000001,...
1073 P5054 [0.024443, 0.018856, 0.037403, -0.025896, 0.02...
1074 P926 [0.011612, 0.010583, -0.0076370000000000006, 0...
1075 P1425 [-0.020383000000000002, 0.065183, -0.015123, 0...
1076 P1704 [-0.017636000000000002, -0.019274, -0.001118, ...
1077 P3357 [0.004563, 0.016324, 0.011803, -0.000982, 0.00...
1078 P3027 [-0.033355, -0.021512, 0.009940000000000001, 0...
1079 P3028 [-0.028638, -0.000949, -0.011845, 0.004555, 0....
1080 P925 [-0.021649, -0.001027, 0.008454999999999999, 0...
1081 P5961 [-0.036083, -0.027208, -0.010634999999999999, ...
1082 P5873 [0.010484, -0.038612, 0.041136, -0.02501, 0.01...
1083 P3969 [-0.016941, -0.010539, -0.027496, -0.014163999...
1084 P7469 [0.011467, -0.067762, 0.020363, 0.008853, -0.0...
1085 P3274 [0.006123, -0.006601999999999999, 0.0277100000...
1086 P1897 [-0.019021, 0.001183, -0.009602, -0.040833, -0...
1087 P3776 [-0.018365, 0.028526, -0.025934, 0.032296, -0....
1088 P1194 [-0.026819, 0.0032310000000000004, -0.011298, ...
1089 P2502 [0.003554, -0.041121, -0.010559, -0.037862, -0...
1090 P6977 [-0.023617, -0.021648, 0.009369, -0.021757, 0....

1091 rows × 2 columns

Clean up temporary files

In [8]:
shutil.rmtree(temp_dir)

Next Steps

Check out several baseline news recommendation models developed on MIND from Microsoft Recommenders Repository

MIND News Recommendation Challenge

MIND probe: A competition on news recommendations with the world’s biggest news dataset

Microsoft Recommenders Repository

Collection of open source recommenders algorithms, utilities and best practices, including deployment samples on Azure.

MIND: MIcrosoft News Dataset

A Large-Scale English Dataset for News Recommendation Research.