Retrieving News Items using the DSS REST API

Introduction

This first article on news delves into News Items retrieval with DSS (DataScope Select) and the DSS REST API. It highlights differences between the behavior of the DSS GUI and the REST API, and describes the query parameters in more detail than the documentation.

A second companion article covers News Analytics.

You need some basic DSS GUI and DSS REST API knowledge to understand this article. If you want to test the code a valid DSS user account is also required.

 

News Items

News items are current and historical story bodies and headers from Reuters and global third-party sources, published in a variety of languages.

A news item can be a single-line alert, full news article, or an update to an existing news article, as determined by the news feed handler.

Note: the only supported asset class for news items is equity.

 

Permissioning

Access to third-party content news via News Items reports requires additional permissions from Thomson Reuters. Please contact your local account manager or sales specialist for information.

 

Code extracts used in this article

The code extracts are based on pure HTTP requests. To try them out you can cut and paste them into a REST client, like Postman. This is explained in the REST API Tutorials Introduction. You can also download the associated Postman environment and collection to avoid copy/pasting. All request headers must contain an authentication token; refer to the REST API Tutorial 1 for details.

The full workflow for some requests is omitted for clarity, but when required you will find pointers or links to the tutorials or documentation where the details are explained. For more information, refer to the DSS Tutorials set.

 

Use cases

News Items, contrary to other data types available in DSS, can be requested with or without using an instrument list. This is to cater for two very different use cases.

When using an instrument list, we focus on retrieving news for a predefined set of instruments, applying a set of criteria (date range, topics, languages and news sources) to narrow down the result set. A typical use case could be finding news for a specific instrument, for display inside a portfolio management system.

With an instrument-less request, we only apply a set of criteria (date range, topics, languages and news sources) to find a set of news items, without limiting it to a list of instruments. This is a kind of search, where we want to find news that match specific criteria. A typical use case could be research, for instance finding all news on South Africa for a specific day.

News Analytics, which is the subject of a separate article, goes one step further, using relevance and sentiment analysis obtained through sophisticated Natural Language Processing (NLP) techniques.

Apart from the presence or absence of an instrument list, the extraction parameters list for a News Items request is the same for both use cases. These parameters are covered in the next section of this article. The specifics of instrument-less requests are covered later, under the heading: Instrument-Less extractions.

 

Extractions

This section is common to extractions with or without instrument list.

Data extractions can be scheduled or On Demand. If you are not familiar with these two workflows, read the DSS Tutorials Introduction to learn more.

A data extraction can be scheduled in the GUI or with the REST API; this requires a News Items template and a schedule, and an (optional) instrument list.

The template is the element that is specific to the News Items extraction, whereas the instrument list and schedule are of a generic nature, we will therefore ignore them in what follows.

Here is an example request to create a News Items template, in Postman:

Method: POST

Endpoint: https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ NewsItemsReportTemplates

Body:

{
  "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ReportTemplates.NewsItemsReportTemplate",
  "ShowColumnHeaders": false,
  "Name": "myNewsItemsTemplateName",
  "CompressionType": "Zip",
  "Headers": [],
  "Trailers": [],
  "ContentFields": [
    { "FieldName": "Headline" },
    { "FieldName": "Story Body" },
    { "FieldName": "Story Date Time" },
    { "FieldName": "Take Date Time" },
    { "FieldName": "Attribution" },
    { "FieldName": "Products" },
    { "FieldName": "Topics" },
    { "FieldName": "Language" }
  ],
  "Condition": {
    "ReportDateRangeType": "Range",
    "QueryStartDate": "2018-04-09",
    "QueryEndDate": "2018-04-09",
    "NewsTopicsCodes":[ "CMPNY","FR","EUROP","TECH" ],
    "NewsItemsLanguage":"Selected",
    "NewsLanguagesCodes": [ "EN","FR" ],
    "NewsItemsSource":"Selected",
    "NewsAttributionsCodes": [ "RTRS", "BSW" ]
  }
}

 

The REST API also allows making an On Demand request for news items (the GUI does not deliver this functionality).

Here is an example in Postman, using the same parameters as the example above to make an On Demand request:

Method: POST

Endpoint: https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/ExtractWithNotes

Body:

{
  "ExtractionRequest": {
    "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.NewsItemsExtractionRequest",
    "ContentFieldNames":
      ["Headline", "Story Body", "Story Date Time", "Take Date Time", "Attribution", "Products", "Topics", "Language"],
    "IdentifierList": {
      "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.InstrumentIdentifierList",  
      "InstrumentIdentifiers": [
      	{ "Identifier": "CARR.PA", "IdentifierType": "Ric" }
      ],
      "ValidationOptions": {"AllowHistoricalInstruments": true},
      "UseUserPreferencesForValidationOptions": false
    },
    "Condition": {
      "ReportDateRangeType": "Range",
      "QueryStartDate": "2018-04-09",
      "QueryEndDate": "2018-04-09",
      "NewsTopicsCodes":[ "CMPNY","FR","EUROP","TECH" ],
      "NewsItemsLanguage":"Selected",
      "NewsLanguagesCodes": [ "EN","FR" ],
      "NewsItemsSource": "Selected"
      "NewsAttributionsCodes": [ "RTRS", "BSW" ]
    }
  }
}

 

Extraction parameters

Let us look at the details of the extraction request parameters.

Note: in what follows, illustrations from the GUI are from the creation of a News Items template, and those from Postman are from the On Demand request.

Report date range type

This defines the time frame, which can be a Range query or a Delta query.

Range query

This is defined by a start and an end date.

In the GUI:

In the API:

"Condition": {
      "ReportDateRangeType": "Range",
      "QueryStartDate": "2018-02-28",
      "QueryEndDate": "2018-02-28",
    …
}

Important notes:

The start and end dates are dates only, if you add a time component it will be ignored.

News will be delivered for the entire day(s) set in the request, starting at 00:00:00 and ending at 23:59:59, in the time zone set in the user preferences in the GUI.

If the end date is today, the effective end time is 20 minutes before the extraction run. In other words, results will only contain data up to 20 minutes before the extraction started.

Delta query

This is defined as a delta from a starting point some time ago, till now.

In the GUI:

You must choose between a number of days, or hours and minutes (multiples of 15):

In the API:

The API is more flexible than the GUI, you can combine any number of days, hours and minutes:

"Condition": {
    "ReportDateRangeType": "Delta",
    "DaysAgo": 1,
    "HoursAgo": 20,
    "MinutesAgo": 40,
    …
}

 

Criteria

Story topics

This defines what the news item talks about, i.e. subject topics.

There are more than 1500, covering a wide range of topics like asset classes (ABS for Asset Backed Securities), event types (SL1 for Bond Sales, PRIV for Privatisations), business sectors (CHEM for Chemicals), commodities (PLAT for PLATINUM), crime (BRIB for Bribery), geography (CH for Switzerland, USCNYC for New York), organisations (G7), sports (TRIA for Triathlon), entertainment (FLM for Film), etc.

A detailed list with description is available in the DSS Data Content Guide, appendix G.

In the GUI:

In the API:

"Condition": {
    …
      "NewsTopicsCodes":[ "CA","US","WOM" ],
    …
}

If NewsTopicsCodes is not set, no topic filtering will be applied, all topics will be returned.

If topics are set, all news items matching any one of the topics will be returned. Said differently, the topics apply with an OR rule, so the more topics you set, the more results you will get.

The list of topics can be queried using the API, with a GET to this endpoint:

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/NewsItemsReportTemplateGetNewsTopicTypes

 

Languages

This is the language in which the news item is written. There are 28 available languages.

Examples: FR for French, ZH for Chinese, etc.

In the API:

"Condition": {
    …
      "NewsItemsLanguage":"Selected",
      "NewsLanguagesCodes": [ "AR","EN","JA" ],
    …
}

NewsItemsLanguage can take values “AllLanguages”, “English” or “Selected”.

If NewsItemsLanguage is not set, it defaults to “English”.

The list of languages can be queried using the API, with a GET to this endpoint:

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/NewsItemsReportTemplateGetNewsItemsLanguageTypes

Encoding: news stories are encoded in the UTF-8 character set, which allows DataScope Select to output the related text in a variety of languages. Your operating system must support the selected language in order to display the stories correctly. Unsupported characters will appear in your extraction results if your operating does not support the selected language.

 

News Items Source

This is the source of the news, who published it. There are 45 news sources.

Examples: BSW for Business Wire, RNS for Regulatory News Service, etc.

A detailed list with description is available in the DSS Data Content Guide, appendix M.

In the GUI:

In the API:

"Condition": {
    …
      "NewsItemsSource": "Selected"
      "NewsAttributionsCodes": ["DGP", "GNW", "RTRS"]
    …
}

NewsItemsSource can take values “AllNews”, “ReutersNews” or “Selected”.

If NewsItemsSource is not set, it defaults to AllNews.

The list of sources can be queried using the API, with a GET to this endpoint:

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/NewsItemsReportTemplateGetNewsAttributionTypes

 

Output data fields

Here we select what data we want to retrieve. The most obvious for a news item are the Headline and the Story Body, but more than 50 other fields are available, relating to the timestamp, language, topic, economic sector, industry code, exchange, instrument codes, etc. Let us highlight a few:

2 timestamps [GMT] are available:

  • Story Date Time - Date and time when the first alert or take in the story was filed
  • Take Date Time - Date and time of the news item

The Take Date Time is always greater or equal to the Story Date Time.

Each news item is assigned with identifiers that you can include in your extractions:

  • PNAC – Primary News Access Code.

    A semi-unique story identifier that is often reused. However, prior to the PNAC being recycled, it will be preceded by a \"delete\" message.
  • Unique Story Index – Unique index for all news items on the same story, combining the Story Date Time and PNAC fields.

A space separated list of product codes is delivered by field Products. A list of these codes, with their description, is available in appendix H of the DSS Data Content Guide.

There is also a set of fields for TRBC codes (Thomson Reuters Business Classification) that cover activity, business and economic sectors, and industry.

The list of all available fields for a News Items extraction can be studied in the GUI by creating a report template, or queried using the API, with a GET to this endpoint:

https://hosted.datascopeapi.reuters.com/RestApi/v1/Extractions/GetValidContentFieldTypes(ReportTemplateType=ThomsonReuters.Dss.Api.Extractions.ReportTemplates.ReportTemplateTypes'NewsItems')

 

Instrument-Less extractions

Instrument-less extractions are only available for News Items, News Analytics and News Analytics Commodities (and Corporate Actions – IPO Events).

When running an instrument-less News Items extraction, instead of using an instrument list, the criteria defined in your request are used to select the relevant news. This is an entirely different use case, where we concentrate on specific topics, languages and news sources.

Important note: instrument-less extractions can return much more data than those that use an instrument list, it is therefore important to set a fairly restrictive number of topics to keep the result set size manageable. Remember that all news items matching any of the topics will be returned, so the more topics you set, the more results you will get (within the limits outlined in the Extraction Limits section below).

In the GUI:

When creating the schedule, instead of an instrument list, you simply select the option “None”:

In the API:

The following instrument-less On Demand request retrieves all English language News Items for South Africa for the 9th of April 2018, from all news sources. If you compare this request to the one that used an instrument list, you will see that it has the same structure, except that the IdentifierList object has simply been removed:

{
  "ExtractionRequest": {
    "@odata.type": "#ThomsonReuters.Dss.Api.Extractions.ExtractionRequests.NewsItemsExtractionRequest",
    "ContentFieldNames":
      ["Headline", "Story Body", "Story Date Time", "Take Date Time", "Attribution", "Products", "Topics", "Language"],
    "Condition": {
      "ReportDateRangeType": "Range",
      "QueryStartDate": "2018-04-09",
      "QueryEndDate": "2018-04-09",
      "NewsTopicsCodes":[ "ZA" ],
      "NewsItemsLanguage":"English",
      "NewsItemsSource":"AllNews"
    }
  }
}

 

Extraction Limits

Number of concurrent requests

To support consistent performance and optimize response times for the most users, DataScope Select applies request execution limits and queuing on a per-template basis, which allows for more granular resource balancing. This is applied across all product interfaces: GUI, REST API, SOAP API, and FTP.

The limit is 50 concurrent requests per user for the News Items template.

If you reach the per-user limit on the number of extraction requests for that template, any additional extraction requests that you submit against that template will fail, with the extraction notes explaining why. If submitted via the REST or SOAP API, it will also return HTTP status 429.

Number of instruments

Extraction processing limits for the GUI, FTP, and REST API platforms are the same and are identified by report type. Limits for File Codes and Chains both before and after expansion are shown. File Codes and Chains are expanded upon extraction only. A row of data will be returned for each expanded instrument in the File Code or Chain, up to the expansion limit identified below.

News Items

Instrument limits depend on the number of days for which you retrieve news:

  • 0-7 Days – 50,000 Instruments
  • 8-31 Days – 25,000 Instruments
  • 32-366 Days – 10,000 Instruments
  • 367+ Days – 5,000 Instruments

The maximum number of records that can be returned for RIC-less extractions for News Items is 400,000.

 

Conclusions

In this article we saw several ways of retrieving News Items, corresponding to different use cases, and we analyzed the request parameters in detail.

If you are also interested in News Analytics, proceed to the Retrieving News Analytics using DSS article.