Pomiń nawigację

Microsoft News Recommendation Dataset

News Recommendation MIND

MIcrosoft News Dataset (MIND) to zestaw danych o dużej skali na potrzeby badań rekomendacji wiadomości. Został on zebrany na podstawie zanonimizowanych dzienników zachowań z witryny internetowej Microsoft News. Zestaw danych MIND służy jako zestaw danych testu porównawczego dla rekomendacji wiadomości i ułatwia badania w obszarze rekomendacji wiadomości oraz systemów rekomendacji.

Zestaw danych MIND zawiera ponad 160 tys. artykułów z wiadomościami i ponad 15 mln dzienników wrażeń wygenerowanych przez ponad 1 mln użytkowników. Każdy artykuł z wiadomościami zawiera bogatą zawartość tekstową, w tym tytuł, abstrakt, treść, kategorię i jednostki. Każdy dziennik wrażeń zawiera zdarzenia związane z kliknięciami, zdarzenia związane z brakiem kliknięcia oraz zachowania związane z kliknięciami dla wiadomości historycznych danego użytkownika przed danym wrażeniem. Aby chronić prywatność użytkownika, każdy użytkownik został odłączony od systemu produkcyjnego po bezpiecznym utworzeniu skrótu do zanonimizowanego identyfikatora użytkownika. Bardziej szczegółowe informacje o zestawie danych MIND możesz znaleźć w dokumencie MIND: A Large-scale Dataset for News Recommendation (Zestaw danych o dużej skali na potrzeby rekomendacji wiadomości).

Wolumin

Dane szkoleniowe oraz dane na potrzeby weryfikacji mają postać skompresowanych folderów w formacie ZIP, które zawierają cztery różne pliki:

Nazwa pliku Opis
behaviors.tsv Historie związane z kliknięciami i dzienniki wrażeń użytkowników
news.tsv Informacje zawarte w artykułach z wiadomościami
entity_embedding.vec Osadzenia jednostek w wiadomościach wyodrębnione z grafu wiedzy
relation_embedding.vec Osadzenia relacji między jednostkami wyodrębnione z grafu wiedzy

behaviors.tsv

Plik behaviors.tsv zawiera dzienniki wrażeń i historie związane z korzystaniem z wiadomości przez użytkowników. Znajduje się w nim 5 kolumn rozdzielonych symbolem tabulacji:

  • Identyfikator wrażenia. Identyfikator wrażenia.
  • Identyfikator użytkownika. Anonimowy identyfikator użytkownika.
  • Data. Data wrażenia w formacie „MM/DD/RRRR GG:MM:SS AM/PM”.
  • Historia. Historia kliknięć wiadomości (lista identyfikatorów klikniętych wiadomości) tego użytkownika przed danym wrażeniem.
  • Wrażenia. Lista wiadomości wyświetlana w ramach wrażenia i zachowania związane z kliknięciami użytkownika względem tych wiadomości (1 w przypadku kliknięcia i 0 w przypadku braku kliknięcia).

Przykład został przedstawiony w poniższej tabeli:

Kolumna Zawartość
Identyfikator wrażenia 123
Identyfikator użytkownika U131
Godzina 13.11.2019 8:36:57
Historia N11 N21 N103
Wrażenia N4-1 N34-1 N156-0 N207-0 N198-0

news.tsv

Plik news.tsv zawiera szczegółowe informacje dotyczące artykułów z wiadomościami znajdujące się w pliku behaviors.tsv. Znajduje się w nim 7 kolumn rozdzielonych symbolem tabulacji:

  • Identyfikator wiadomości
  • Kategoria
  • Subcategory (Podkategoria)
  • Tytuł
  • Abstract
  • Adres URL
  • Jednostki tytułu (jednostki zawarte w tytule tej wiadomości)
  • Jednostki abstraktu (jednostki zawarte w abstrakcie tej wiadomości)

Pełna treść zawartości artykułów z wiadomościami w witrynie MSN nie została udostępniona do pobierania ze względu na strukturę licencji. Jednak dla Twojej wygody udostępniliśmy skrypt narzędziowy ułatwiający parsowanie witryny internetowej z wiadomościami na podstawie adresów URL witryny MSN znajdujących się w zestawie danych. Ze względu na ograniczenia czasowe niektóre adresy URL wygasły i nie można do nich uzyskać dostępu. Aktualnie staramy się rozwiązać ten problem.

Przykład przedstawiono w następującej tabeli:

Kolumna Zawartość
Identyfikator wiadomości N37378
Kategoria sports
SubCategory golf
Tytuł PGA Tour winners
Abstract Galeria ostatnich zwycięzców turniejów z cyklu PGA Tour.
Adres URL https://www.msn.com/en-us/sports/golf/pga-tour-winners/ss-AAjnQjj?ocid=chopendata
Jednostki tytułów [{“Label”: “PGA Tour”, “Type”: “O”, “WikidataId”: “Q910409”, “Confidence”: 1.0, “OccurrenceOffsets”: [0], “SurfaceForms”: [“PGA Tour”]}]
Jednostki abstraktów [{“Label”: “PGA Tour”, “Type”: “O”, “WikidataId”: “Q910409”, “Confidence”: 1.0, “OccurrenceOffsets”: [35], “SurfaceForms”: [“PGA Tour”]}]

Lista opisów kluczy słowników w kolumnie „Entities” jest następująca:

Klucze Opis
Etykieta Nazwa jednostki grafu wiedzy Wikidata
Typ Typ jednostki w ramach danych Wikidata
WikidataId Identyfikator jednostki w ramach danych Wikidata
Ufność Ufność łączenia jednostek
OccurrenceOffsets Przesunięcie jednostek na poziomie charakteru w tekście tytułu lub abstraktu
SurfaceForms Nieprzetworzone nazwy jednostek w ramach tekstu oryginalnego

plik entity_embedding.vec i relation_embedding.vec

Pliki entity_embedding.vec i relation_embedding.vec zawierają 100-wymiarowe osadzenia jednostek i relacje uzyskane na podstawie grafu podrzędnego (z grafu wiedzy WikiData) metodą TransE. W obu tych plikach pierwsza kolumna to identyfikator jednostki/relacji, a pozostałe kolumny to wartości wektora osadzania. Mamy nadzieję, że te dane umożliwiają przeprowadzenie badania rekomendacji dotyczących wiadomości uwzględniających wiedzę. Przykład przedstawiono poniżej:

ID Wartości osadzania
Q42306013 0.014516 -0.106958 0.024590 … -0.080382

Z pewnych powodów w przypadku danych szkoleniowych osadzania na podstawie grafu podrzędnego kilka jednostek może nie zawierać osadzeń w pliku entity_embedding.vec.

Lokalizacja magazynu

Dane są przechowywane w obiektach blob w centrum danych Zachodnie/Wschodnie stany USA w następującym kontenerze obiektów blob: https://mind201910small.blob.core.windows.net/release/.

W ramach kontenera zestawy szkoleniowe i weryfikacji są skompresowane odpowiednio do plików MINDlarge_train.zip i MINDlarge_dev.zip.

Dodatkowe informacje

Zestaw danych MIND można pobrać bezpłatnie na potrzeby badań zgodnie z postanowieniami licencyjnymi działu Microsoft Research. W przypadku jakichkolwiek pytań dotyczących zestawu danych skontaktuj się z nami za pośrednictwem adresu mind@microsoft.com.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Select your preferred service:

Azure Notebooks

Azure Notebooks

Package: Language: Python

Demo notebook for accessing MIND data on Azure

This notebook provides an example of accessing MIND data from blob storage on Azure.

MIND data are stored in the West/East US data center, so this notebook will run more efficiently on the Azure compute located in West/East US.

Imports and environment

In [1]:
import os
import tempfile
import shutil
import urllib
import zipfile
import pandas as pd

# Temporary folder for data we need during execution of this notebook (we'll clean up
# at the end, we promise)
temp_dir = os.path.join(tempfile.gettempdir(), 'mind')
os.makedirs(temp_dir, exist_ok=True)

# The dataset is split into training and validation set, each with a large and small version.
# The format of the four files are the same.
# For demonstration purpose, we will use small version validation set only.
base_url = 'https://mind201910small.blob.core.windows.net/release'
training_small_url = f'{base_url}/MINDsmall_train.zip'
validation_small_url = f'{base_url}/MINDsmall_dev.zip'
training_large_url = f'{base_url}/MINDlarge_train.zip'
validation_large_url = f'{base_url}/MINDlarge_dev.zip'

Functions

In [2]:
def download_url(url,
                 destination_filename=None,
                 progress_updater=None,
                 force_download=False,
                 verbose=True):
    """
    Download a URL to a temporary file
    """
    if not verbose:
        progress_updater = None
    # This is not intended to guarantee uniqueness, we just know it happens to guarantee
    # uniqueness for this application.
    if destination_filename is None:
        url_as_filename = url.replace('://', '_').replace('/', '_')
        destination_filename = \
            os.path.join(temp_dir,url_as_filename)
    if (not force_download) and (os.path.isfile(destination_filename)):
        if verbose:
            print('Bypassing download of already-downloaded file {}'.format(
                os.path.basename(url)))
        return destination_filename
    if verbose:
        print('Downloading file {} to {}'.format(os.path.basename(url),
                                                 destination_filename),
              end='')
    urllib.request.urlretrieve(url, destination_filename, progress_updater)
    assert (os.path.isfile(destination_filename))
    nBytes = os.path.getsize(destination_filename)
    if verbose:
        print('...done, {} bytes.'.format(nBytes))
    return destination_filename

Download and extract the files

In [3]:
# For demonstration purpose, we will use small version validation set only.
# This file is about 30MB.
zip_path = download_url(validation_small_url, verbose=True)
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
    zip_ref.extractall(temp_dir)

os.listdir(temp_dir)
Downloading file MINDsmall_dev.zip to C:\Users\wutao\AppData\Local\Temp\mind\https_mind201910small.blob.core.windows.net_release_MINDsmall_dev.zip...done, 30945572 bytes.
Out[3]:
['behaviors.tsv',
 'entity_embedding.vec',
 'https_mind201910small.blob.core.windows.net_release_MINDsmall_dev.zip',
 'news.tsv',
 'relation_embedding.vec']

Read the files with pandas

In [4]:
# The behaviors.tsv file contains the impression logs and users' news click histories. 
# It has 5 columns divided by the tab symbol:
# - Impression ID. The ID of an impression.
# - User ID. The anonymous ID of a user.
# - Time. The impression time with format "MM/DD/YYYY HH:MM:SS AM/PM".
# - History. The news click history (ID list of clicked news) of this user before this impression.
# - Impressions. List of news displayed in this impression and user's click behaviors on them (1 for click and 0 for non-click).
behaviors_path = os.path.join(temp_dir, 'behaviors.tsv')
pd.read_table(
    behaviors_path,
    header=None,
    names=['impression_id', 'user_id', 'time', 'history', 'impressions'])
Out[4]:
impression_id user_id time history impressions
0 1 U80234 11/15/2019 12:37:50 PM N55189 N46039 N51741 N53234 N11276 N264 N40716... N28682-0 N48740-0 N31958-1 N34130-0 N6916-0 N5...
1 2 U60458 11/15/2019 7:11:50 AM N58715 N32109 N51180 N33438 N54827 N28488 N611... N20036-0 N23513-1 N32536-0 N46976-0 N35216-0 N...
2 3 U44190 11/15/2019 9:55:12 AM N56253 N1150 N55189 N16233 N61704 N51706 N5303... N36779-0 N62365-0 N58098-0 N5472-0 N13408-0 N5...
3 4 U87380 11/15/2019 3:12:46 PM N63554 N49153 N28678 N23232 N43369 N58518 N444... N6950-0 N60215-0 N6074-0 N11930-0 N6916-0 N248...
4 5 U9444 11/15/2019 8:25:46 AM N51692 N18285 N26015 N22679 N55556 N5940-1 N23513-0 N49285-0 N23355-0 N19990-0 N3...
5 6 U69606 11/15/2019 1:24:44 PM N879 N19591 N63054 N53033 N54088 N34140 N14952... N29862-0 N48740-0 N11390-0 N5472-0 N53572-0 N2...
6 7 U70421 11/15/2019 5:12:12 AM N38118 N55189 N16233 N37942 N23105 N27526 N965... N42767-1 N30290-0 N36779-0 N20036-0 N32536-0 N...
7 8 U38418 11/15/2019 10:12:44 AM N29464 N17952 N19028 N28338 N31631 N35831 N609... N53687-0 N31289-0 N37458-0 N8455-0 N56211-0 N5...
8 9 U9568 11/15/2019 11:59:42 AM N53393 N61857 N17744 N62644 N28274 N63634 N503... N45612-0 N60939-1 N33397-0 N19685-0
9 10 U77860 11/15/2019 3:52:43 PM N55829 N2203 N3909 N18459 N59704 N9146 N33096 ... N29091-0 N60762-0 N29862-0 N512-0 N48740-0 N60...
10 11 U24918 11/15/2019 12:55:33 PM N10792 N63241 N26424 N49745 N23487 N46978 N623... N54125-0 N60799-0 N29862-0 N3159-0 N46749-0 N4...
11 12 U50227 11/15/2019 12:46:28 PM N30974 N7171 N2186 N22561 N51630 N55951 N12098... N49285-0 N38951-0 N62365-0 N28682-0 N5472-0 N2...
12 13 U46210 11/15/2019 7:15:13 AM N28079 N64634 N30867 N40977 N11101 N51597 N586... N32536-0 N13408-0 N5940-0 N35216-0 N30290-0 N2...
13 14 U47134 11/15/2019 6:19:16 AM N35809 N13395 N23284 N64496 N29177 N1150 N2613... N49554-0 N12446-0 N7342-0 N6638-0 N12320-0 N10...
14 15 U27387 11/15/2019 11:41:22 AM N26500 N37363 N7889 N47479 N45146 N60611 N3264... N55237-0 N29862-0 N6400-1 N31958-0 N5472-0 N21...
15 16 U18529 11/15/2019 6:13:34 AM N28850 N17770 N33458 N13724 N12760 N51396 N530... N16118-0 N28072-0 N37204-0 N6645-0 N57515-0 N1...
16 17 U15905 11/15/2019 7:17:27 AM N20633 N59496 N50770 N62058 N50710 N31958-1 N5940-0 N20187-0 N13408-0 N30290-0 N2...
17 18 U67590 11/15/2019 2:48:12 PM N29177 N54540 N54827 N28345-0 N6400-0 N21681-0 N24460-0 N3168-0 N60...
18 19 U8181 11/15/2019 2:41:52 PM N56253 N35009 N54590 N52709 N9808 N53803 N5518... N24802-0 N51470-0 N48740-0 N34130-0 N29091-0 N...
19 20 U35337 11/15/2019 7:48:26 AM N47261 N52097 N38118 N55189 N36128 N41251 N507... N30290-0 N23513-1 N29393-0 N20036-0 N19990-0 N...
20 21 U1987 11/15/2019 10:27:09 AM N6922 N59704 N54179 N30290-0 N5940-0 N31958-0 N36779-0 N13408-0 N5...
21 22 U36076 11/15/2019 6:06:23 AM NaN N20036-0 N5940-0 N30290-0 N31958-0 N36779-0 N3...
22 23 U39839 11/15/2019 8:23:05 AM N16233 N40545 N55326 N20413 N15300 N4866 N2876... N46976-1 N31958-1 N20036-0 N13408-0 N53283-0 N...
23 24 U44035 11/15/2019 6:54:41 PM N47685 N55189 N26633 N33969 N19741 N19293 N341... N37204-1 N48487-0 N59933-0 N512-0 N51776-0 N64...
24 25 U48086 11/15/2019 9:40:20 AM N29177 N2203 N32483 N3863 N32089 N57737 N35920... N37352-0 N30290-0 N19990-0 N58251-0 N50775-0 N...
25 26 U44340 11/15/2019 1:55:43 AM N39374 N54575 N49874 N52022 N30290-0 N32237-0 N41934-0 N42950-0 N22978-0 N...
26 27 U18041 11/15/2019 8:39:11 AM N12907 N18242 N51706 N50455 N59496 N64986 N456... N65145-0 N38498-0 N37233-0 N17513-0 N55712-0 N...
27 28 U12288 11/15/2019 7:07:47 AM N871 N51706 N45794 N28058 N43142 N51705 N31958-0 N23513-1 N46976-0 N35216-0 N32536-0 N...
28 29 U83282 11/15/2019 1:06:18 AM N42299 N15254 N63448 N20028 N12907 N18777 N474... N10616-0 N20729-0 N9284-0 N20036-0 N47612-0 N6...
29 30 U23227 11/15/2019 7:29:10 AM N53234 N55148 N12349 N11784-0 N42478-0 N496-0 N14507-0 N32237-0 N57...
... ... ... ... ... ...
73122 73123 U47451 11/15/2019 6:31:49 AM N26136 N10629 N5663 N35403 N6638-0 N29722-0 N60747-0 N60939-1 N16237-0 N2...
73123 73124 U597 11/15/2019 4:18:58 PM N10340 N54827 N52962 N29091-0 N45057-0 N512-0 N29862-0 N52492-0 N25...
73124 73125 U12371 11/15/2019 1:25:33 PM N28016 N10078 N27911 N43674 N60933 N21336 N112... N53572-0 N29091-0 N6400-0 N24802-0 N29862-1
73125 73126 U67841 11/15/2019 5:51:21 AM N50155 N9304 N13233 N43353 N29438 N51840 N6004... N11930-0 N32536-0 N48622-0 N50055-0 N53242-0 N...
73126 73127 U10526 11/15/2019 12:02:31 PM N548 N34947 N62993 N32421 N4893 N58504 N22561 ... N51470-0 N49285-0 N11150-0 N29862-0 N48740-0 N...
73127 73128 U75216 11/15/2019 11:04:54 AM N30616 N62287 N56447 N51238 N39263 N52227 N338... N50775-0 N51470-0 N62365-0 N14223-0 N5472-0 N4...
73128 73129 U55334 11/15/2019 8:13:16 AM N35877 N59137 N47810 N37509 N30864 N47458 N140... N31958-0 N29393-0 N19990-0 N36779-0 N5940-0 N3...
73129 73130 U9483 11/15/2019 6:10:05 AM N60844 N871 N18030 N34934 N26900 N619 N26250 N... N20036-0 N32536-0 N31958-0 N36779-0 N23513-1
73130 73131 U25339 11/15/2019 9:11:43 AM N11243 N28555 N46955 N33415 N33415 N63624 N972... N53283-0 N6638-0 N58098-0 N55237-0 N46976-0 N3...
73131 73132 U12185 11/15/2019 9:49:32 AM N58078 N46622 N11264 N42581 N29177 N53199 N418... N53283-0 N50775-0 N13408-0 N55036-0 N30290-0 N...
73132 73133 U89284 11/15/2019 2:07:43 AM N60681 N5104 N64273 N20036-1 N36779-0
73133 73134 U90086 11/15/2019 7:57:06 AM N1150 N16215 N51136 N60033 N45794 N46909 N5970... N46162-0 N41946-0 N57327-0 N53283-0 N12320-0 N...
73134 73135 U35810 11/15/2019 1:29:06 AM N36811 N56753 N21178 N38390 N29227 N41197 N156... N42767-0 N23535-0 N2960-0 N32237-0 N46917-1 N4...
73135 73136 U71833 11/15/2019 8:19:02 AM N1150 N55846 N871 N49745 N46978 N32483 N39778 ... N29393-0 N30290-1 N23513-0 N23355-0 N5940-0 N2...
73136 73137 U18219 11/15/2019 6:06:27 AM N51991 N50048 N25113 N20286 N58173 N21977 N562... N53615-0 N23767-0 N22978-0 N17807-0 N9284-0 N6...
73137 73138 U76851 11/15/2019 8:02:07 AM N43357 N62553 N47077 N29597 N16821 N40447 N624... N7556-0 N61829-0 N36786-0 N27289-0 N61697-0 N5...
73138 73139 U58475 11/15/2019 3:18:06 PM N39556 N17933 N7432 N17587 N63875 N23571 N1150... N48740-0 N46749-0 N512-0 N60675-0 N24802-0 N69...
73139 73140 U86046 11/15/2019 9:43:02 AM N36133 N49456 N14629 N5056 N28296 N18777 N1809... N53572-1 N37352-0
73140 73141 U5662 11/15/2019 7:35:08 AM N6956 N37387 N20997 N17102 N1966 N7812 N62703 ... N46162-0 N45912-0 N20036-0 N45289-0 N60939-0 N...
73141 73142 U37385 11/15/2019 5:37:22 AM N61471 N3046 N59173 N19048 N42526 N9226 N9120 ... N38915-0 N23513-0 N62992-0 N13347-0 N10051-0 N...
73142 73143 U35542 11/15/2019 2:39:43 AM N41618 N50454 N20335 N63260 N53538 N20039 N162... N20036-1 N36779-0
73143 73144 U11 11/15/2019 4:43:28 AM N31820 N4647 N5905 N33271 N49023 N18870 N49505-0 N39683-0 N29490-0 N53754-0 N40094-0 N...
73144 73145 U92886 11/15/2019 1:49:17 PM N28741 N35450 N31801 N8201 N14617 N5742 N18656... N59933-0 N1952-1 N60724-0 N55036-0 N44621-0 N2...
73145 73146 U23543 11/15/2019 9:17:23 AM N22570 N19638 N47525 N17210 N22161 N28058 N469... N59602-0 N58251-0 N19990-0 N53572-0 N5472-0 N2...
73146 73147 U67440 11/15/2019 10:27:45 AM N55312 N9897 N55846 N1569 N6956 N28501 N11641 ... N11930-0 N50775-0 N33439-0 N58251-0 N7556-0 N3...
73147 73148 U77536 11/15/2019 8:40:16 PM N28691 N8845 N58434 N37120 N22185 N60033 N4702... N496-0 N35159-0 N59856-0 N13270-0 N47213-0 N26...
73148 73149 U56193 11/15/2019 1:11:26 PM N4705 N58782 N53531 N46492 N26026 N28088 N3109... N49285-0 N31958-0 N55237-0 N42844-0 N29862-0 N...
73149 73150 U16799 11/15/2019 3:37:06 PM N40826 N42078 N15670 N15295 N64536 N46845 N52294 N7043-0 N512-0 N60215-1 N45057-0 N496-0 N37055...
73150 73151 U8786 11/15/2019 8:29:26 AM N3046 N356 N20483 N46107 N44598 N18693 N8254 N... N23692-0 N19990-0 N20187-0 N5940-0 N13408-0 N3...
73151 73152 U68182 11/15/2019 11:54:34 AM N20297 N53568 N4690 N60608 N43709 N43123 N1885... N29862-0 N5472-0 N21679-1 N6400-0 N53572-0 N50...

73152 rows × 5 columns

In [5]:
# The news.tsv file contains the detailed information of news articles involved in the behaviors.tsv file.
# It has 7 columns, which are divided by the tab symbol:
# - News ID
# - Category
# - Subcategory
# - Title
# - Abstract
# - URL
# - Title Entities (entities contained in the title of this news)
# - Abstract Entities (entities contained in the abstract of this news)
news_path = os.path.join(temp_dir, 'news.tsv')
pd.read_table(news_path,
              header=None,
              names=[
                  'id', 'category', 'subcategory', 'title', 'abstract', 'url',
                  'title_entities', 'abstract_entities'
              ])
Out[5]:
id category subcategory title abstract url title_entities abstract_entities
0 N55528 lifestyle lifestyleroyals The Brands Queen Elizabeth, Prince Charles, an... Shop the notebooks, jackets, and more that the... https://assets.msn.com/labs/mind/AAGH0ET.html [{"Label": "Prince Philip, Duke of Edinburgh",... []
1 N18955 health medical Dispose of unwanted prescription drugs during ... NaN https://assets.msn.com/labs/mind/AAISxPN.html [{"Label": "Drug Enforcement Administration", ... []
2 N61837 news newsworld The Cost of Trump's Aid Freeze in the Trenches... Lt. Ivan Molchanets peeked over a parapet of s... https://assets.msn.com/labs/mind/AAJgNsz.html [] [{"Label": "Ukraine", "Type": "G", "WikidataId...
3 N53526 health voices I Was An NBA Wife. Here's How It Affected My M... I felt like I was a fraud, and being an NBA wi... https://assets.msn.com/labs/mind/AACk2N6.html [] [{"Label": "National Basketball Association", ...
4 N38324 health medical How to Get Rid of Skin Tags, According to a De... They seem harmless, but there's a very good re... https://assets.msn.com/labs/mind/AAAKEkt.html [{"Label": "Skin tag", "Type": "C", "WikidataI... [{"Label": "Skin tag", "Type": "C", "WikidataI...
5 N2073 sports football_nfl Should NFL be able to fine players for critici... Several fines came down against NFL players fo... https://assets.msn.com/labs/mind/AAJ4lap.html [{"Label": "National Football League", "Type":... [{"Label": "National Football League", "Type":...
6 N11429 news newsscienceandtechnology How to record your screen on Windows, macOS, i... The easiest way to record what's happening on ... https://assets.msn.com/labs/mind/AADlomf.html [{"Label": "Microsoft Windows", "Type": "J", "... []
7 N49186 weather weathertopstories It's been Orlando's hottest October ever so fa... There won't be a chill down to your bones this... https://assets.msn.com/labs/mind/AAJwoxD.html [{"Label": "Orlando, Florida", "Type": "G", "W... [{"Label": "Orlando, Florida", "Type": "G", "W...
8 N2131 health weightloss This Guy Altered His Diet and Training to Drop... Take Brandon Reid's advice: "Don't worry what ... https://assets.msn.com/labs/mind/AAGBR44.html [] []
9 N59295 news newsworld Chile: Three die in supermarket fire amid prot... Three people have died in a supermarket fire a... https://assets.msn.com/labs/mind/AAJ43pw.html [{"Label": "Chile", "Type": "G", "WikidataId":... [{"Label": "Santiago", "Type": "G", "WikidataI...
10 N24510 entertainment gaming Best PS5 games: top PlayStation 5 titles to lo... Every confirmed or expected PS5 game we can't ... https://assets.msn.com/labs/mind/AACHUn8.html [{"Label": "PlayStation", "Type": "J", "Wikida... []
11 N59883 foodanddrink recipes This Roasted Squash Panzanella Is the Perfect ... Introducing the perfect way to balance out you... https://assets.msn.com/labs/mind/AAAAPoj.html [{"Label": "Christmas", "Type": "H", "Wikidata... []
12 N9721 health nutrition 50 Foods You Should Never Eat, According to He... This is so depressing. https://assets.msn.com/labs/mind/AABDHTv.html [] []
13 N60905 autos autosenthusiasts Trying to Make a Ram 3500 as Quick as a Viper ... The 2019 Ram 3500's new Cummins diesel has 100... https://assets.msn.com/labs/mind/AADKhPQ.html [{"Label": "Ram Pickup", "Type": "V", "Wikidat... [{"Label": "Ram Pickup", "Type": "V", "Wikidat...
14 N16587 sports football_nfl Rye football wins 2019 rendition of The Game, ... After going into halftime tied, the Garnets re... https://assets.msn.com/labs/mind/AAIGv0N.html [] []
15 N28361 health wellness Instagram Filters with Plastic Surgery-Inspire... In an effort to combat some of the negative me... https://assets.msn.com/labs/mind/AAJaBOM.html [{"Label": "Instagram", "Type": "W", "Wikidata... []
16 N18680 health health-news Michigan apple recall: Nearly 2,300 crates cou... A Michigan produce company has recalled nearly... https://assets.msn.com/labs/mind/AAJwfO8.html [{"Label": "Michigan", "Type": "G", "WikidataI... [{"Label": "Michigan", "Type": "G", "WikidataI...
17 N55610 lifestyle lifestyleroyals Kate Middleton's Best Hairstyles Through the Y... The Duchess of Cambridge knows her way around ... https://assets.msn.com/labs/mind/AAEBVOU.html [{"Label": "Catherine, Duchess of Cambridge", ... [{"Label": "Catherine, Duchess of Cambridge", ...
18 N35621 entertainment celebrity Stars who got fired from major projects Take a look back at the celebs who got the boo... https://assets.msn.com/labs/mind/AABv6WU.html [] []
19 N22850 travel travelarticle Newark Liberty Airport's Terminal One a $2.7 b... The project, which is the bi-state agency's si... https://assets.msn.com/labs/mind/AAJfTqo.html [{"Label": "Newark Liberty International Airpo... []
20 N58173 autos autossuvs Is This The 2021 GMC Yukon Denali? A Motor1.com reader sent this to us, and it su... https://assets.msn.com/labs/mind/AAGZhlc.html [{"Label": "Chevrolet Tahoe", "Type": "V", "Wi... [{"Label": "Motorsport Network", "Type": "O", ...
21 N29120 sports football_nfl John Dorsey admits talks with Washington, but ... Team officials in Washington "emphatically" de... https://assets.msn.com/labs/mind/AAISxPW.html [{"Label": "John Dorsey (American football)", ... [{"Label": "John Dorsey (American football)", ...
22 N9786 news newspolitics Elijah Cummings to lie in state at US Capitol ... Cummings, a Democrat whose district included s... https://assets.msn.com/labs/mind/AAJgNxm.html [{"Label": "Elijah Cummings", "Type": "P", "Wi... [{"Label": "Elijah Cummings", "Type": "P", "Wi...
23 N46481 sports more_sports Michigan finally shows some fight, but can't s... UNIVERSITY PARK, Pa. -- Fans on their way out ... https://assets.msn.com/labs/mind/AAJ4lmA.html [] [{"Label": "Beaver Stadium", "Type": "S", "Wik...
24 N47705 travel traveltripideas 17 Abandoned Theme Parks to Explore for Thrill... Disney, Six Flags, and even the Flintstones ha... https://assets.msn.com/labs/mind/AADlunl.html [{"Label": "Amusement park", "Type": "C", "Wik... [{"Label": "Six Flags", "Type": "O", "Wikidata...
25 N1834 video animals Dog dies protecting Florida children from a de... The Richardson family shares home video and ph... https://assets.msn.com/labs/mind/AAI33em.html [{"Label": "Florida", "Type": "G", "WikidataId... []
26 N3574 autos autosnews Ford Bronco Test Mule Spotted Flexing Its Musc... It still won't be offered in the Land Down Und... https://assets.msn.com/labs/mind/AAGBWeL.html [{"Label": "Ford Bronco", "Type": "V", "Wikida... []
27 N42474 news newsbusiness Trump's Trustbusters Bring Microsoft Lessons t... DOJ's Makan Delrahim and the FTC's Joe Simons ... https://assets.msn.com/labs/mind/AACI1SK.html [{"Label": "Big Four tech companies", "Type": ... [{"Label": "Makan Delrahim", "Type": "P", "Wik...
28 N64498 sports golf PGA Tour winners A gallery of recent winners on the PGA Tour. https://assets.msn.com/labs/mind/AAjnQjj.html [{"Label": "PGA Tour", "Type": "O", "WikidataI... [{"Label": "PGA Tour", "Type": "O", "WikidataI...
29 N59538 foodanddrink newstrends Nashville restaurants: Ms. Cheap rounds up lun... Ms. Cheap rounds up Nashville restaurant lunch... https://assets.msn.com/labs/mind/AABDPUi.html [{"Label": "Nashville, Tennessee", "Type": "G"... [{"Label": "Nashville, Tennessee", "Type": "G"...
... ... ... ... ... ... ... ... ...
42386 N19666 news newsus Ohio BCI investigating officer-involved shooti... Investigators are at the scene of an officer-i... https://assets.msn.com/labs/mind/BBWsaAY.html [] [{"Label": "Ohio Attorney General", "Type": "K...
42387 N40542 news newsscienceandtechnology People puzzled by peculiar texts, and no one c... If you woke up Thursday to a weird text that s... https://assets.msn.com/labs/mind/BBWsaXd.html [] [{"Label": "United States", "Type": "G", "Wiki...
42388 N22213 news newsworld Mexican cartels 'worse than ISIS': massacre vi... Mexican cartels 'worse than ISIS': massacre vi... https://assets.msn.com/labs/mind/BBWsaYy.html [{"Label": "Islamic State of Iraq and the Leva... [{"Label": "Islamic State of Iraq and the Leva...
42389 N10997 news newsus 'If we don't condemn it, we condone it': Victi... ROSELLE, Ill. The head football coach at Lak... https://assets.msn.com/labs/mind/BBWsaer.html [] [{"Label": "Lake Park High School", "Type": "F...
42390 N58014 sports basketball_nba LaMarcus Aldridge's big night lifts Spurs past... In the moments before the Spurs' 121-112 victo... https://assets.msn.com/labs/mind/BBWsakC.html [{"Label": "LaMarcus Aldridge", "Type": "P", "... [{"Label": "LaMarcus Aldridge", "Type": "P", "...
42391 N53002 sports football_nfl Twitter reacts to Chargers' loss to Raiders NaN https://assets.msn.com/labs/mind/BBWsb16.html [{"Label": "Oakland Raiders", "Type": "O", "Wi... []
42392 N47373 weather weathertopstories Amid freezing temperatures, firefighters make ... INDIANAPOLIS, Ind. Capt. Michael Pruitt with... https://assets.msn.com/labs/mind/BBWsb26.html [] [{"Label": "Indianapolis", "Type": "G", "Wikid...
42393 N2292 news newspolitics House investigators release more impeachment t... House Democrats released new transcripts of Tr... https://assets.msn.com/labs/mind/BBWsb2c.html [] [{"Label": "Donald Trump", "Type": "P", "Wikid...
42394 N22044 news newsus Prosecutor's letter details fatal KC police sh... The Jackson County Prosecutor's Office has rel... https://assets.msn.com/labs/mind/BBWsbOB.html [{"Label": "Grand jury", "Type": "C", "Wikidat... [{"Label": "Grand jury", "Type": "C", "Wikidat...
42395 N27291 lifestyle lifestylebuzz Mural in Downtown S.F. Depicts Swedish Teen Cl... The newest addition to the San Francisco skyli... https://assets.msn.com/labs/mind/BBWsbQg.html [{"Label": "Greta Thunberg", "Type": "N", "Wik... [{"Label": "Greta Thunberg", "Type": "N", "Wik...
42396 N36948 news newsworld Australian tourist killed by elephant in Namibia A 59-year-old Australian tourist has been kill... https://assets.msn.com/labs/mind/BBWsbZq.html [{"Label": "Namibia", "Type": "G", "WikidataId... [{"Label": "Namibia", "Type": "G", "WikidataId...
42397 N48840 news newsus Famed Hollywood Boulevard Superman Christopher... He was the Walk of Fame street performer who f... https://assets.msn.com/labs/mind/BBWsbkb.html [{"Label": "Hollywood Boulevard", "Type": "S",... [{"Label": "Hollywood Walk of Fame", "Type": "...
42398 N52871 news newsworld Residents of Mexican town struggle with fear a... A mother and two sons were laid to rest in han... https://assets.msn.com/labs/mind/BBWscU8.html [{"Label": "Mexico, New York", "Type": "G", "W... [{"Label": "Mexico", "Type": "G", "WikidataId"...
42399 N36658 news newsus Apartments for rent in Minneapolis: What will ... Curious just how far your dollar goes in Minne... https://assets.msn.com/labs/mind/BBWscWw.html [{"Label": "Minneapolis", "Type": "G", "Wikida... [{"Label": "Minneapolis", "Type": "G", "Wikida...
42400 N32558 news elections-2020-us Trump campaign launching black outreach effort... WASHINGTON (AP) During the 2016 campaign, ca... https://assets.msn.com/labs/mind/BBWsd7A.html [{"Label": "Donald Trump", "Type": "P", "Wikid... [{"Label": "Donald Trump", "Type": "P", "Wikid...
42401 N12470 news newscrime 3 teens shot in Northeast DC Thursday Thursday was a violent night in the District, ... https://assets.msn.com/labs/mind/BBWse0x.html [{"Label": "Northeast (Washington, D.C.)", "Ty... [{"Label": "Metropolitan Police Department of ...
42402 N25642 news newspolitics Texas custody battle fuels debate over transge... A custody battle between two parents has led t... https://assets.msn.com/labs/mind/BBWseUG.html [{"Label": "Texas", "Type": "G", "WikidataId":... [{"Label": "Ted Cruz", "Type": "P", "WikidataI...
42403 N20845 news newsus Police say 23-year-old man allegedly fled stat... HALL COUNTY, Ga. (CBS46) -- Fourteen-year-old ... https://assets.msn.com/labs/mind/BBWsf0T.html [] [{"Label": "Hall County, Georgia", "Type": "G"...
42404 N16016 health healthnews More than 130,000 pounds of ground beef recall... About 130,464 pounds of raw ground beef produc... https://assets.msn.com/labs/mind/BBWsnTh.html [{"Label": "Ground beef", "Type": "U", "Wikida... [{"Label": "Ground beef", "Type": "U", "Wikida...
42405 N25854 finance finance-companies Billionaire-controlled companies outperform, s... Billionaires tend to be good at making money f... https://assets.msn.com/labs/mind/BBWso4u.html [{"Label": "UBS", "Type": "O", "WikidataId": "... []
42406 N7618 autos autosnews Ford v Ferrari: the real story The film about the epic Le Mans rivalry promis... https://assets.msn.com/labs/mind/BBWylQ7.html [{"Label": "Ford v Ferrari", "Type": "N", "Wik... []
42407 N16804 music music-celebrity Neil Young Says U.S. Citizenship Application D... Neil Young has revealed that his attempts to a... https://assets.msn.com/labs/mind/BBWylZD.html [{"Label": "Neil Young", "Type": "P", "Wikidat... [{"Label": "Neil Young", "Type": "P", "Wikidat...
42408 N19926 lifestyle lifestylebeauty These Haircuts Are Going to be Huge in 2020 Bring on the bobs, lobs, and bangs! It's time ... https://assets.msn.com/labs/mind/BBWymAg.html [{"Label": "Hairstyle", "Type": "C", "Wikidata... []
42409 N42491 movies movies-celebrity Roman Polanski Denies Rape Allegation by Valen... Responding to French actress Valentine Monnier... https://assets.msn.com/labs/mind/BBWysN3.html [{"Label": "Roman Polanski", "Type": "P", "Wik... [{"Label": "Roman Polanski", "Type": "P", "Wik...
42410 N13097 movies movienews Marvel's Kevin Feige Breaks Silence on Scorses... In his first public comments about the debate ... https://assets.msn.com/labs/mind/BBWywC9.html [{"Label": "Martin Scorsese", "Type": "P", "Wi... [{"Label": "Marvel Cinematic Universe", "Type"...
42411 N63550 lifestyle lifestyleroyals Why Kate & Meghan Were on Different Balconies ... There's no scandal here. It's all about the or... https://assets.msn.com/labs/mind/BBWyynu.html [{"Label": "Meghan, Duchess of Sussex", "Type"... []
42412 N30345 entertainment entertainment-celebrity See the stars at the 2019 Baby2Baby gala Stars like Chrissy Teigen and Kate Hudson supp... https://assets.msn.com/labs/mind/BBWyz7N.html [] [{"Label": "Kate Hudson", "Type": "P", "Wikida...
42413 N30135 news newsgoodnews Tennessee judge holds lawyer's baby as he swea... Tennessee Court of Appeals Judge Richard Dinki... https://assets.msn.com/labs/mind/BBWyzI8.html [{"Label": "Tennessee", "Type": "G", "Wikidata... [{"Label": "Tennessee Court of Appeals", "Type...
42414 N44276 autos autossports Best Sports Car Deals for October NaN https://assets.msn.com/labs/mind/BBy5rVe.html [{"Label": "Peugeot RCZ", "Type": "V", "Wikida... []
42415 N39563 sports more_sports Shall we dance: Sports stars shake their leg NaN https://assets.msn.com/labs/mind/BBzMpnG.html [] []

42416 rows × 8 columns

In [6]:
# The entity_embedding.vec file contains the 100-dimensional embeddings
# of the entities learned from the subgraph by TransE method.
# The first column is the ID of entity, and the other columns are the embedding vector values.
entity_embedding_path = os.path.join(temp_dir, 'entity_embedding.vec')
entity_embedding = pd.read_table(entity_embedding_path, header=None)
entity_embedding['vector'] = entity_embedding.iloc[:, 1:101].values.tolist()
entity_embedding = entity_embedding[[0,
                                     'vector']].rename(columns={0: "entity"})
entity_embedding
Out[6]:
entity vector
0 Q34433 [0.017808, -0.07325599999999999, 0.102521, -0....
1 Q41 [-0.063388, -0.181451, 0.057501, -0.091254, -0...
2 Q56037 [0.02155, -0.044888, -0.027872000000000004, -0...
3 Q1860 [0.060958000000000005, 0.06993400000000001, 0....
4 Q39631 [-0.093106, -0.052002, 0.020556, -0.020801, 0....
5 Q30 [-0.11573699999999999, -0.17911300000000002, 0...
6 Q60 [-0.051036, -0.16563699999999998, 0.132802, -0...
7 Q183 [0.052779999999999994, -0.139523, -0.027571, -...
8 Q2736 [-0.091826, -0.021255, -0.049415, -0.167199000...
9 Q21198 [-0.096089, -0.0068379999999999995, -0.0278399...
10 Q131524 [0.0043479999999999994, 0.028879000000000002, ...
11 Q12788174 [-0.080052, 0.11183199999999999, 0.076317, -0....
12 Q142 [0.044556, 0.005308, 0.135917, -0.203567, 0.02...
13 Q298 [0.013886, -0.08859199999999999, -0.004194, -0...
14 Q2887 [0.039070999999999995, -0.076391, 0.015528, 0....
15 Q155 [0.034837, -0.18423399999999998, 0.050972, -0....
16 Q15180 [-0.11381199999999998, 0.003113, -0.0070030000...
17 Q408 [0.030718000000000002, -0.167667, -0.032142000...
18 Q177220 [-0.12244100000000001, 0.115449, 0.04222300000...
19 Q38 [0.070083, -0.18845, 0.141631, -0.040243, 0.08...
20 Q752297 [-0.011161, -0.049322000000000005, 0.035052, -...
21 Q29 [-0.0057729999999999995, -0.166473, 0.05220900...
22 Q2807 [-0.036898, -0.150758, 0.03325, -0.0336, 0.017...
23 Q336286 [0.028158999999999997, 0.136965, -0.0331220000...
24 Q33506 [0.029741000000000004, -0.06346900000000001, 0...
25 Q37226 [-0.180374, 0.009042, -0.075288, -0.094787, 0....
26 Q213 [-0.052516, -0.089791, 0.07205, -0.063529, -0....
27 Q90 [0.028043000000000002, 0.016425, 0.12071099999...
28 Q11424 [-0.166153, -0.213444, -0.035954, -0.061191999...
29 Q25089 [-0.033586000000000005, -0.039048, 0.062833, 0...
... ... ...
22863 Q52424113 [-0.015946000000000002, -0.043258, -0.063225, ...
22864 Q51849371 [-0.015587, -0.029361, -0.005646, 0.016144, 0....
22865 Q52590967 [-0.01426, -0.030755, -0.060229, 0.010758, -0....
22866 Q65923113 [0.016715, 0.013468, -0.03318, -0.041919, 0.00...
22867 Q65935365 [-0.058651999999999996, -0.017408, -0.039342, ...
22868 Q52715530 [-0.039839, -0.037683, -0.069898, -0.052635, 0...
22869 Q52247498 [-0.017968, -0.022007, -0.0462, 0.020104, -0.0...
22870 Q52290003 [0.015207, 0.0025440000000000003, -0.052101999...
22871 Q52290425 [-0.084103, -0.062223, -0.017676, 0.022031, -0...
22872 Q52298665 [-0.019474, -0.056829, -0.078429, 0.0321269999...
22873 Q66363416 [0.047139999999999994, -0.030843000000000002, ...
22874 Q41571596 [0.023031, -0.029602999999999997, 0.01, -0.026...
22875 Q40234791 [0.016026, -0.042275, -0.011025, 0.033408, -0....
22876 Q53567614 [0.008826, -0.008437, -0.071004, 0.007258, -0....
22877 Q53583116 [0.00042300000000000004, -0.01745, 0.009059000...
22878 Q53720737 [0.069862, 0.059307000000000006, 0.02590400000...
22879 Q41788679 [-0.032408, 0.025944, -0.014616999999999998, 0...
22880 Q41790042 [-0.066249, 0.027369, -0.037001, 0.023492, -0....
22881 Q52985750 [-0.065515, -0.014433000000000001, -0.050944, ...
22882 Q53236501 [0.025514, 0.020045, 0.026060000000000003, 0.0...
22883 Q54621943 [-0.014662999999999999, -0.05322999999999999, ...
22884 Q54861465 [-0.002764, -0.021321, -0.020571000000000002, ...
22885 Q54952832 [0.040638, 0.050203, -0.011036, -0.008562, -0....
22886 Q53870565 [0.001374, -0.02647, -0.026286, 0.018, -0.0510...
22887 Q54085113 [0.016056, 0.019562, 0.011992000000000001, -0....
22888 Q278846 [0.042413, 0.021957, 0.07241399999999999, -0.0...
22889 Q54621949 [-0.018299, -0.048378, -0.021644999999999998, ...
22890 Q42225228 [-0.051346, -0.028947000000000004, -0.07587, 0...
22891 Q54862508 [-0.052323, -0.078029, -0.060925, -0.052536, 0...
22892 Q42301562 [-0.00519, -0.047871, 0.009753, -0.0215, -4.9e...

22893 rows × 2 columns

In [7]:
# The relation_embedding.vec file contains the 100-dimensional embeddings
# of the relations learned from the subgraph by TransE method.
# The first column is the ID of relation, and the other columns are the embedding vector values.
relation_embedding_path = os.path.join(temp_dir, 'relation_embedding.vec')
relation_embedding = pd.read_table(relation_embedding_path, header=None)
relation_embedding['vector'] = relation_embedding.iloc[:,
                                                       1:101].values.tolist()
relation_embedding = relation_embedding[[0, 'vector'
                                         ]].rename(columns={0: "relation"})
relation_embedding
Out[7]:
relation vector
0 P31 [-0.07346699999999999, -0.132227, 0.034173, -0...
1 P21 [-0.078436, 0.108589, -0.049429, -0.131355, 0....
2 P106 [-0.052137, 0.052444000000000005, -0.019886, -...
3 P735 [-0.051398, 0.056219000000000005, 0.0680289999...
4 P108 [0.09123099999999999, 0.022525999999999997, 0....
5 P101 [-0.03845, 0.053671, -0.063569, -0.150071, 0.0...
6 P69 [0.070871, 0.017891999999999998, 0.071605, -0....
7 P27 [-0.001034, -0.071413, 0.078409, -0.1355869999...
8 P19 [0.00088, -0.047513, 0.055876, -0.066817999999...
9 P1412 [0.030543999999999998, 0.149371, 0.01523600000...
10 P1343 [0.11008299999999999, 0.10230399999999999, -0....
11 P20 [0.013021000000000001, -0.046431, 0.0827, -0.0...
12 P509 [0.141286, 0.030367, 0.109865, -0.124899, 0.10...
13 P1196 [0.081161, -0.0418, 0.09615, -0.10577, -0.0562...
14 P734 [0.039076, -0.021949, 0.016378, -0.02883500000...
15 P17 [-0.066804, -0.157604, 0.013737000000000001, -...
16 P641 [-0.016671000000000002, -0.119618, -0.015854, ...
17 P463 [0.0344, 0.075337, -0.022096, -0.171531, 0.001...
18 P131 [-0.12283, -0.14671199999999998, 0.01054200000...
19 P159 [-0.095321, -0.141079, -0.011192, -0.077814, -...
20 P39 [-0.0519, 0.06034, -0.058713, -0.1645859999999...
21 P3373 [-0.008519, 0.001118, 0.00867, 0.000638, 0.006...
22 P551 [-0.039375, -0.051916, 0.053480999999999994, -...
23 P793 [0.13571, 0.180002, 0.029276, 0.12283699999999...
24 P2094 [0.042525, 0.11483399999999999, -0.009143, -0....
25 P1344 [0.122527, 0.14366600000000002, 0.018272999999...
26 P1303 [-0.034934, 0.072995, -0.010043999999999999, -...
27 P512 [0.112223, 0.078682, 0.06856699999999999, -0.1...
28 P84 [-0.028439999999999997, -0.07585599999999999, ...
29 P466 [0.000472, -0.057613, -0.0081, 0.074849, 0.058...
... ... ...
1061 P3005 [-0.022373, -0.031122000000000004, -0.00252100...
1062 P1777 [-0.025655, -0.021695, -0.022119, 0.021091, -0...
1063 P7406 [0.008973, -0.015265, 0.006358, -0.013365, 0.0...
1064 P2286 [0.04372, 0.010529, -0.004605, -0.024543000000...
1065 P922 [0.014927000000000001, -0.0032990000000000003,...
1066 P4843 [0.003708, 0.015212, -0.020975, -0.010442, -0....
1067 P4424 [-0.06400700000000001, -0.056288, -0.00518, -0...
1068 P2366 [0.016396, 0.046333, -0.022376, 0.00665, -0.02...
1069 P4988 [-0.006942, 0.000916, 0.022667, -0.025558, -0....
1070 P3190 [0.013193999999999999, -0.017598, -0.015018, 0...
1071 P1318 [-0.039105, -0.01598, 0.01339, 0.005304, -0.01...
1072 P1437 [0.028199000000000002, -0.0006900000000000001,...
1073 P5054 [0.024443, 0.018856, 0.037403, -0.025896, 0.02...
1074 P926 [0.011612, 0.010583, -0.0076370000000000006, 0...
1075 P1425 [-0.020383000000000002, 0.065183, -0.015123, 0...
1076 P1704 [-0.017636000000000002, -0.019274, -0.001118, ...
1077 P3357 [0.004563, 0.016324, 0.011803, -0.000982, 0.00...
1078 P3027 [-0.033355, -0.021512, 0.009940000000000001, 0...
1079 P3028 [-0.028638, -0.000949, -0.011845, 0.004555, 0....
1080 P925 [-0.021649, -0.001027, 0.008454999999999999, 0...
1081 P5961 [-0.036083, -0.027208, -0.010634999999999999, ...
1082 P5873 [0.010484, -0.038612, 0.041136, -0.02501, 0.01...
1083 P3969 [-0.016941, -0.010539, -0.027496, -0.014163999...
1084 P7469 [0.011467, -0.067762, 0.020363, 0.008853, -0.0...
1085 P3274 [0.006123, -0.006601999999999999, 0.0277100000...
1086 P1897 [-0.019021, 0.001183, -0.009602, -0.040833, -0...
1087 P3776 [-0.018365, 0.028526, -0.025934, 0.032296, -0....
1088 P1194 [-0.026819, 0.0032310000000000004, -0.011298, ...
1089 P2502 [0.003554, -0.041121, -0.010559, -0.037862, -0...
1090 P6977 [-0.023617, -0.021648, 0.009369, -0.021757, 0....

1091 rows × 2 columns

Clean up temporary files

In [8]:
shutil.rmtree(temp_dir)

Next Steps

Check out several baseline news recommendation models developed on MIND from Microsoft Recommenders Repository

MIND News Recommendation Challenge

MIND probe: A competition on news recommendations with the world’s biggest news dataset

Microsoft Recommenders Repository

Collection of open source recommenders algorithms, utilities and best practices, including deployment samples on Azure.

MIND: MIcrosoft News Dataset

A Large-Scale English Dataset for News Recommendation Research.