Retrieving News Analytics using the DSS REST API

Introduction

This second article on news concentrates on News Analytics retrieval with DSS (DataScope Select) and the DSS REST API. It highlights differences between the behavior of the DSS GUI and the REST API, and describes the query parameters in more detail than the documentation.

We cover both News Analytics (which is for Equities) and News Analytics Commodities (which is for Commodities and Energy).

Prerequisite reading is the previous companion article that covered News Items.

You need some basic DSS GUI and DSS REST API knowledge to understand this article. If you want to test the code a valid DSS user account is also required.

 

News Analytics

The purpose of News Analytics is to automate news content analysis, and use the results to drive further automated decisions.

The analysis mechanism converts news stories into quantitative scores that can then be incorporated into trading and investment models, to exploit market opportunities and manage risk.

DSS performs this conversion using sophisticated Natural Language Processing (NLP) techniques. News items are analyzed at high speed for thousands of companies, scoring text across measures related to sentiment, relevance, novelty and volume. News Analytics can also read and process Thomson Reuters News Archive files to obtain and publish a score history.

Instrument scores being numeric values, they can be directly consumed by your own internal systems to guide in investment, surveillance, compliance and trading decisions.

News analytics are mature technologies, that are used by both buy-side and sell-side in alpha generation, trading execution, risk management, and market surveillance and compliance.

Note: the only supported asset class for news analytics is equity.

 

Scoring

News items are scored separately for each instrument in your instrument list that is mentioned in the news item. For example, an item that mentions 3 companies will generate 3 scores in total. These scores will probably be different, because separate linguistic processing takes place for each instrument.

 

News Analytics Commodities

News Analytics Commodities is like News Analytics, it supports the retrieval of sentiment analysis and metadata from global news sources.

There are some differences though. News Analytics Commodities:

  • Covers commodities and energy assets.
  • Uses an additional filtering criteria, on commodity codes.
  • Can only be used without instrument list.

The News Analytics Commodities extraction request builds off of the existing News Analytics report template but without using an input list. Instead, the selected commodities/energy assets, along with other criteria defined in your request or report template, are used as input for retrieving data upon extraction.

Note: the only supported asset class for news analytics commodities is commodities.

 

Permissioning

No additional permissioning is required for retrieving third-party meta data via News Analytics reports. Please contact your local account manager or sales specialist for information.

 

Code extracts used in this article

The code extracts are based on pure HTTP requests. To try them out you can cut and paste them into a REST client, like Postman. This is explained in the REST API Tutorials Introduction. You can also download the associated Postman environment and collection to avoid copy/pasting. All request headers must contain an authentication token; refer to the REST API Tutorial 1 for details.

The full workflow for some requests is omitted for clarity, but when required you will find pointers or links to the tutorials or documentation where the details are explained. For more information, refer to the DSS Tutorials set.

 

Use cases

News Analytics, just like News Items and contrary to most other data types available in DSS, can be requested with or without using an instrument list. These are two very different use cases.

When using an instrument list, we focus on retrieving news for a predefined set of instruments, applying a set of criteria (date range, topics, news sources, relevance, sentiment, novelty) to narrow down the result set. A typical use case could be finding the news sentiment for a specific instrument, to use in a trading decision.

With an instrument-less request, we only apply a set of criteria (date range, topics, languages, news sources, relevance, sentiment, novelty) to find a set of news items, without limiting it to a list of instruments. This is a kind of search, where we want to find news that match specific criteria. A typical use case could be to find instruments in a particular business sector where the sentiment is very positive (or negative), to find new trading opportunities. Another use case could be research, for instance to find news on Gold with a positive sentiment.

Apart from the presence or absence of an instrument list, the extraction parameters are the same for both use cases. These are covered in the next section of this article. The specifics of instrument-less requests are covered later, under the heading: Instrument-Less extractions.

News Analytics Commodities can only be requested without using an instrument list. The specifics of this request are covered later, under the heading: News Analytics Commodities.

 

Extractions

This section is common to extractions with or without instrument list.

Except when mentioned otherwise, what follows applies both to News Analytics and News Analytics Commodities. The specifics of News Analytics Commodities are covered later, under the heading: News Analytics Commodities.

Data extractions can be scheduled or On Demand. If you are not familiar with these two workflows, read the DSS Tutorials Introduction to learn more.

A data extraction can be scheduled in the GUI or with the REST API; this requires a News Items template and a schedule, and an (optional) instrument list.

The template is the element that is specific to the News Items extraction, whereas the instrument list and schedule are of a generic nature, we will therefore ignore them in what follows.

Here is an example request to create a News Analytics template, in Postman:

Method: POST

Endpoint: https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ NewsAnalyticsReportTemplates

Body:

{
  "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ReportTemplates.NewsAnalyticsReportTemplate",
  "ShowColumnHeaders": false,
  "Name": "myNewsAnalyticsTemplateName",
  "CompressionType": "Zip",
  "Headers": [],
  "Trailers": [],
  "ContentFields": [
    { "FieldName": "Headline" },
    { "FieldName": "Story Body" },
    { "FieldName": "Story Date Time" },
    { "FieldName": "Take Date Time" },
    { "FieldName": "Created Date" },
    { "FieldName": "Novelty Timestamp" },
    { "FieldName": "Attribution" },
    { "FieldName": "Products" },
    { "FieldName": "Topics" },
    { "FieldName": "Language" },
    { "FieldName": "Relevance" },
    { "FieldName": "Sentiment" },
    { "FieldName": "Sentiment - Negative" },
    { "FieldName": "Sentiment - Neutral" },
    { "FieldName": "Sentiment - Positive" }
  ],
  "Condition": {
    "ReportDateRange": "Range",
    "QueryStartDate": "2018-01-01",
    "QueryEndDate": "2018-03-31",
    "NewsAnalyticsRelevanceOperator": "GreaterThanOrEqualTo",
    "NewsRelevanceValue": 0.5,
    "NewsAnalyticsPrevailingSentiment": "Negative",
    "NewsAnalyticsNovelty": "Novelty7Day",
    "NewsFilterNoveltyOperator": "GreaterThanOrEqualTo",
    "NewsNoveltyValue": 10,
    "NewsTopicsCodes":[ "CMPNY","FR","EUROP","TECH" ],
    "IncludeImbalace": true,
    "NewsAnalyticsSource": "ArticlesAndAlerts",
    "NewsItemsSource":"Selected",
    "NewsAttributionsCodes": [ "RTRS", "BSW" ]
  }
}

 

The REST API also allows making an On Demand request for news items (the GUI does not deliver this functionality).

Here is an example in Postman, using the same parameters as the example above to make an On Demand request for News Analytics:

Method: POST

Endpoint: https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractWithNotes

Body:

{
  "ExtractionRequest": {
    "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.NewsAnalyticsExtractionRequest",
    "ContentFieldNames":
      ["Headline", "Story Body", "Story Date Time", "Take Date Time", "Created Date", "Novelty Timestamp",
       "Attribution", "Products", "Topics", "Language",
       "Relevance", "Sentiment", "Sentiment - Negative", "Sentiment - Neutral", "Sentiment - Positive"],
    "IdentifierList": {
      "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.InstrumentIdentifierList",  
      "InstrumentIdentifiers": [
      	{ "Identifier": "CARR.PA", "IdentifierType": "Ric" }
      ],
      "ValidationOptions": {"AllowHistoricalInstruments": true},
      "UseUserPreferencesForValidationOptions": false
    },
    "Condition": {
      "ReportDateRange": "Range",
      "QueryStartDate": "2018-01-01",
      "QueryEndDate": "2018-03-31",
      "NewsAnalyticsRelevanceOperator": "GreaterThanOrEqualTo",
      "NewsRelevanceValue": 0.5,
      "NewsAnalyticsPrevailingSentiment": "Negative",
      "NewsAnalyticsNovelty": "Novelty7Day",
      "NewsFilterNoveltyOperator": "GreaterThanOrEqualTo",
      "NewsNoveltyValue": 10,
      "NewsTopicsCodes":[ "CMPNY","FR","EUROP","TECH" ],
      "IncludeImbalace": true,
      "NewsAnalyticsSource": "ArticlesAndAlerts",
      "NewsItemsSource":"Selected",
      "NewsAttributionsCodes": [ "RTRS", "BSW" ]
    }
  }
}

 

Extraction parameters

Let us look at the details of the extraction request parameters.

Note: in what follows, illustrations from the GUI are from the creation of a News Analytics template, and those from Postman are from the On Demand request.

Report date range type

This defines the time frame, which can be a Range query or a Delta query. This is exactly the same as for a News Items request, for details please refer to the previous article.

 

Criteria

Relevance

Relevance defines how relevant a news item is. The value is between 0 and 1, 0 meaning the item is totally irrelevant, and 1 being fully relevant.

Choose an operator (>, >=, <, <= or =) and a value between 0.0 - 1.0 to indicate the relevance of the news item to the instruments in your input list.

Relevance is calculated by comparing the relative number of occurrences of the instruments with the number of occurrences of other organizations and commodities within the text of the item. In addition, if the instrument is mentioned in the headline, the relevance is set to 1.0. For stories with multiple instruments, the instrument with the most mentions will have the highest relevance. An instrument with a lower amount of mentions will have a lower relevance score.

In the GUI:

In the API:

"Condition": {
    …
    "NewsAnalyticsRelevanceOperator": "GreaterThanOrEqualTo",
    "NewsRelevanceValue": 0.5,
    …
}

If the relevance operator and/or value are not set, no relevance filtering will be applied; all results will be returned, whatever their relevance value.

 

Prevailing Sentiment

The prevailing sentiment can be positive, neutral or negative. This is the type of emotion that the news item conveys.

Choose Positive, Neutral or Negative to denote the manner in which the instrument is addressed in the news item.

In the GUI:

In the API:

"Condition": {
    …
    "NewsAnalyticsPrevailingSentiment": "Negative",
    …
}

If the prevailing sentiment is not set, no prevailing sentiment filtering will be applied; all results will be returned, whatever their prevailing sentiment.

 

Novelty

Novelty is a measure of how recent a news item is, and how much buzz it generated.

Choose one of five period linked counts that are used in the comparison to the news item’s time stamp: (12 or 24 hours or 3, 5, or 7 days), an operator, and a value.

For example, if you select 7 Day Linked Count, and then select = and a value of 10, DataScope Select will extract stories with 10 related stories in the past 7 days. If you enter 0, only new stories are extracted. If you specify that the linked counts should be > a certain value, your extraction will return stories that have been repeated x number of times.

In the GUI:

In the API:

"Condition": {
    …
    "NewsAnalyticsNovelty": "Novelty7Day",
    "NewsFilterNoveltyOperator": "GreaterThanOrEqualTo",
    "NewsNoveltyValue": 10,
    …
}

If the novelty parameters are not set, no novelty filtering will be applied; all results will be returned, whatever their novelty.

 

Story topics

This defines what the news item talks about, i.e. subject topics.

There are more than 1500, covering a wide range of topics like asset classes, events, business sectors, geography, etc.

This is exactly the same as for a News Items request, for details please refer to the previous article.

 

Include imbalance

Select this option to include imbalance messages that result from an automatically generated news item.

In the GUI:

In the API:

"Condition": {
    …
    "IncludeImbalace": true,
    …
}

If the imbalance parameter is not set, it defaults to false.

 

News Analytics Source

Select the news items to use in your extractions: Articles only, Alerts Only or Both.

In the GUI:

In the API:

"Condition": {
    …
    "NewsAnalyticsSource": "ArticlesAndAlerts",
    …
}

The value for this parameter can be “ArticlesAndAlerts”, “Articles” or “Alerts”. If the news analytics source is not set, it defaults to “Articles”.

 

News Items Source

This is the source of the news, who published it. There are 45 news sources.

This is exactly the same as for a News Items request, for details please refer to the previous article.

 

Output data fields

Here we select what data we want to retrieve. The most obvious for analytics are the Relevance and Sentiment (with several variants),  but more than 100 other fields are available, relating to the headline and story, timestamp, language, topic, economic sector, industry code, exchange, instrument codes, etc. Let us highlight a few:

4 timestamps [GMT] are available:

  • Story Date Time - Date and time when the first alert or take in the story was filed
  • Take Date Time - Date and time of the news item
  • Created Date - Date and time the record of the event was created
  • Novelty Timestamp - Date and time of the item used in the calculation

The Take Date Time is always greater or equal to the Story Date Time.

There are several Item Count fields that deliver the number of articles in the same feed, for various time frames. Similarly, there are several Linked Count fields that deliver the number of related articles in the same feed.

Relevance: relevance of the news item to the company. It is calculated by comparing how many times each of the companies is mentioned. For stories with multiple companies mentioned, the company with the most mentions will have the highest relevance.

4 numeric sentiment fields are available:

  • Sentiment

    Indicates the predominant sentiment class for this news item with respect to this company. The indicated class is the one with the highest probability.

    1=Positive, 0=Neutral, -1=Negative
  • Sentiment – Negative

    Probability that the sentiment of the news item was negative for the company.
  • Sentiment – Neutral
  • Sentiment – Positive

The three probabilities (negative, neutral and positive) sum to 1.0

The list of all available fields for a News Analytics extraction can be studied in the GUI by creating a report template, or queried using the API, with a GET to this endpoint:

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/GetValidContentFieldTypes(ReportTemplateType=ThomsonReuters.Dss.Api.Extractions.ReportTemplates.ReportTemplateTypes'NewsAnalytics') 

 

Instrument-Less extractions

Instrument-less extractions are only available for News Analytics, News Analytics Commodities and News Items (and Corporate Actions – IPO Events).

When running an instrument-less News Items extraction, instead of using an instrument list, the criteria defined in your request are used to select the relevant news. This is an entirely different use case, where we concentrate on specific topics, languages and news sources.

Important note: instrument-less extractions can return much more data than those that use an instrument list, it is therefore important to set a fairly restrictive number of topics to keep the result set size manageable. Remember that all news items matching any of the topics will be returned, so the more topics you set, the more results you will get (within the limits outlined in the Extraction Limits section below).

In the GUI:

When creating the schedule, instead of an instrument list, you simply select the option “None”:

In the API:

The following instrument-less On Demand request retrieves News Analytics for Gold with a prevailing positive sentiment for the first 11 days of April 2018, from 2 news sources. If you compare this request to the one that used an instrument list, you will see that it has the same structure, except that the IdentifierList object has simply been removed:

{
  "ExtractionRequest": {
    "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.NewsAnalyticsExtractionRequest",
    "ContentFieldNames":
      ["Headline", "Story Body", "Story Date Time", "Take Date Time", "Created Date", "Novelty Timestamp",
       "Attribution", "Products", "Topics", "Language",
       "Relevance", "Sentiment", "Sentiment - Negative", "Sentiment - Neutral", "Sentiment - Positive"],
    "Condition": {
      "ReportDateRange": "Range",
      "QueryStartDate": "2018-04-01",
      "QueryEndDate": "2018-04-11",
      "NewsAnalyticsRelevanceOperator": "GreaterThanOrEqualTo",
      "NewsRelevanceValue": 0.5,
      "NewsAnalyticsPrevailingSentiment": "Positive",
      "NewsAnalyticsNovelty": "Novelty7Day",
      "NewsFilterNoveltyOperator": "GreaterThanOrEqualTo",
      "NewsNoveltyValue": 10,
      "NewsTopicsCodes":[ "GOL" ],
      "IncludeImbalace": true,
      "NewsAnalyticsSource": "ArticlesAndAlerts",
      "NewsItemsSource":"Selected",
      "NewsAttributionsCodes": [ "RTRS", "BSW" ]
    }
  }
}

 

News Analytics Commodities

This section details the specifics for News Analytics Commodities that were not covered in the preceding section.

News Analytics Commodities can only be requested without using an instrument list.

 

Criteria

All parameters described above apply both to News Analytics and News Analytics Commodities requests.

Commodity Selection

This additional parameter applies only to News Analytics Commodities requests.

There are 42 commodity categories.

Examples: COF for Coffee, GOL for Gold, RFO for fuel oil, etc.

In the GUI:

In the API:

"Condition": {
    …
      "NewsCommoditiesCodes":[ "CRU","OPEC" ],
    …
}

At least one must be set. If several commodities are set, all items matching any one of the commodities will be returned. Said differently, the commodities apply with an OR rule, so the more commodities you set, the more results you will get.

For extractions with a specified date range of 32 or more days, only the first 15 commodity codes in your input list will return data.

The list of commodities can be queried using the API, with a GET to this endpoint:

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/NewsAnalyticsCommoditiesReportTemplateGetNewsCommodityTypes

 

Output data fields

The list of all available fields for a News Analytics Commodities extraction can be studied in the GUI by creating a report template, or queried using the API, with a GET to this endpoint:

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/GetValidContentFieldTypes(ReportTemplateType=ThomsonReuters.Dss.Api.Extractions.ReportTemplates.ReportTemplateTypes'NewsAnalyticsCommoditiesAndEnergy')

 

Complete request examples

To conclude this section specific to News Analytics Commodities, let us see a few example requests.

Here is an example request to create a News Analytics Commodities template, in Postman:

Method: POST

Endpoint: https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ NewsAnalyticsCommoditiesReportTemplates

Body:

{
  "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ReportTemplates.NewsAnalyticsCommoditiesReportTemplate",
  "ShowColumnHeaders": false,
  "Name": "myNewsAnalyticsCommoditiesTemplateName",
  "CompressionType": "Zip",
  "Headers": [],
  "Trailers": [],
  "ContentFields": [
    { "FieldName": "Headline" },
    { "FieldName": "Story Body" },
    { "FieldName": "Story Date Time" },
    { "FieldName": "Take Date Time" },
    { "FieldName": "Created Date" },
    { "FieldName": "Novelty Timestamp" },
    { "FieldName": "Attribution" },
    { "FieldName": "Products" },
    { "FieldName": "Topics" },
    { "FieldName": "Language" },
    { "FieldName": "Relevance" },
    { "FieldName": "Sentiment" },
    { "FieldName": "Sentiment - Negative" },
    { "FieldName": "Sentiment - Neutral" },
    { "FieldName": "Sentiment - Positive" }
  ],
  "Condition": {
    "ReportDateRange": "Range",
    "QueryStartDate": "2018-01-01",
    "QueryEndDate": "2018-01-31",
    "NewsAnalyticsRelevanceOperator": "GreaterThanOrEqualTo",
    "NewsRelevanceValue": 0.5,
    "NewsAnalyticsPrevailingSentiment": "Positive",
    "NewsAnalyticsNovelty": "Novelty7Day",
    "NewsFilterNoveltyOperator": "GreaterThanOrEqualTo",
    "NewsNoveltyValue": 20,
    "NewsTopicsCodes":[ "NSEA" ],
    "IncludeImbalace": true,
    "NewsAnalyticsSource": "ArticlesAndAlerts",
    "NewsItemsSource":"Selected",
    "NewsAttributionsCodes": [ "RTRS", "BSW" ],
    "NewsCommoditiesCodes":[ "CRU" ]
  }
}

 

The REST API also allows making an On Demand request for news items (the GUI does not deliver this functionality).

Here is an example in Postman, using the same parameters as the example above to make an On Demand request for News Analytics:

Method: POST

Endpoint: https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractWithNotes

Body:

{
  "ExtractionRequest": {
    "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.NewsAnalyticsCommoditiesExtractionRequest",
    "ContentFieldNames":
      ["Headline", "Story Body", "Story Date Time", "Take Date Time", "Created Date", "Novelty Timestamp",
       "Attribution", "Products", "Topics", "Language",
       "Relevance", "Sentiment", "Sentiment - Negative", "Sentiment - Neutral", "Sentiment - Positive"],
    "Condition": {
      "ReportDateRange": "Range",
      "QueryStartDate": "2018-01-01",
      "QueryEndDate": "2018-01-31",
      "NewsAnalyticsRelevanceOperator": "GreaterThanOrEqualTo",
      "NewsRelevanceValue": 0.5,
      "NewsAnalyticsPrevailingSentiment": "Positive",
      "NewsAnalyticsNovelty": "Novelty7Day",
      "NewsFilterNoveltyOperator": "GreaterThanOrEqualTo",
      "NewsNoveltyValue": 20,
      "NewsTopicsCodes":[ "NSEA" ],
      "IncludeImbalace": true,
      "NewsAnalyticsSource": "ArticlesAndAlerts",
      "NewsItemsSource":"Selected",
      "NewsAttributionsCodes": [ "RTRS", "BSW" ],
      "NewsCommoditiesCodes": [ "CRU" ]
    }
  }
}

 

Extraction limits

Number of concurrent requests

To support consistent performance and optimize response times for the most users, DataScope Select applies request execution limits and queuing on a per-template basis, which allows for more granular resource balancing. This is applied across all product interfaces: GUI, REST API, SOAP API, and FTP.

The limit is 50 concurrent requests per user per template, for each of the news analytics templates:

  • News Analytics
  • News Analytics Commodities

If you reach the per-user limit on the number of extraction requests for that template, any additional extraction requests that you submit against that template will fail, with the extraction notes explaining why. If submitted via the REST or SOAP API, it will also return HTTP status 429.

Number of instruments

Extraction processing limits for the GUI, FTP, and REST API platforms are the same and are identified by report type. Limits for File Codes and Chains both before and after expansion are shown. File Codes and Chains are expanded upon extraction only. A row of data will be returned for each expanded instrument in the File Code or Chain, up to the expansion limit identified below.

News Analytics

Instrument limits depend on the number of days for which you retrieve news:

  • 0-7 Days – 50,000 Instruments
  • 8-31 Days – 25,000 Instruments
  • 32-366 Days – 10,000 Instruments
  • 367+ Days – 5,000 Instruments

The maximum number of records that can be returned for RIC-less extractions for News Analytics is 400,000.

News Analytics Commodities

For extractions with a specified date range of 32 or more days, only the first 15 commodity codes in your input list will return data.

 

Conclusions

In this article we looked at the details of how to retrieve News Analytics, both for equities and commodities, and we analyzed the request parameters in detail.