MRN & N2_UBMS Comparison and Migration Guide

This article is intended for use by Elektron and TREP APIs developers who wish to migrate from N2_UBMS news to Machine Readable News. It highlights the key differences between N2_UBMS and MRN, and gives examples to help Elektron and TREP developers understand how to replace N2_UBMS news with MRN.

N2_UBMS REAL-TIME NEWS

This section will describe N2_UBMS construction and implementation. It aimed to help new developers who might not have the background knowledge in N2_UBMS real-time news. Experienced developers can skip to the next section.

N2_UBMS Overview

N2_UBMS is a legacy RIC for News feed broadcast message. Any incoming news story result in broadcast messages being transmitted as updates to N2_UBMS RIC. However, a news story may be made available in many parts. For example, the first part of a story may include one or more Alerts, which is a brief news headline containing the most essential information relating to an emerging story. An Alert may be followed by another headline and a piece of text for the story. This text and its associated headline are called a Take. More than one Takes are possible allowing more story text to be provided as and when available.

In order to identify each part of a news story, all transmitted news will has a common identifier that enables all parts of the same story to be recognized, and this identifier was called the "Primary News Access Code" (PNAC).

Each broadcast message on N2_UBMS contains a set of fields in Market Price domain which are listed in the next two sections. These fields contain "Primary News Access Code", headline, category codes and timestamps, but not the story body.

The body of a story is split into a number of text segments, each of which has a unique identifier that is used as a RIC for retrieval purposes. The RIC for the first story text segment is always the PNAC, and each story text segment has pointers that allow segments to be linked together. Consumer application must use PNAC from N2_UBMS and subsequence story text segment RICs to retrieve the full body of a story.

A news story also has a set of category codes. These codes are transmitted with the broadcast messages. Codes represent different aspects of the news item, and include product codes, topic codes, company codes and attribution. Product codes identify whether a user is allowed to receive a news item and also define a broad range for the subject matter, e.g. "M" for Money International News Service. Topic codes describe the story's subject matter, e.g. "INT" for interest rates. Company codes identify a particular company affected by the news item, e.g. "RTR.L" for Reuters. Attribution defines the source of a specific news item.

Fields usage in broadcast messages

FID FID Name Description
3 DISPLY_NAME Identify message subtype
235 PNAC Primary News Access Code
255 PROC_DATE Take date
259 RECORDTYPE Set to 232 for news
264 BCAST_TEXT Headline text
715 STORY_ID Sequential value for message loss detection
720 TAKE_SEQNO Message sequence number
725 ATTRIBTN News Source
749 PROD_CODE List of product codes
750 TOPIC_CODE List of topic codes
751 CO_IDS List of company codes
752 LANG_IND Language indicator
1015 TAKE_TIME Take time
1024 STORY_TIME Story time
1027 STORY_DATE Story date

Broadcast messages example

UPDATE Item Name: N2_UBMS
Fid: 3 Name = DSPLY_NAME DataType: Rmtes Value: 2
Fid: 1 Name = PROD_PERM DataType: UInt Value: 7710
Fid: 235 Name = PNAC DataType: Ascii Value: nHKS1w6r3n
Fid: 255 Name = PROC_DATE DataType: Date Value: 7 / 7 / 2017
Fid: 259 Name = RECORDTYPE DataType: UInt Value: 232
Fid: 264 Name = BCAST_TEXT DataType: Rmtes Value: The quick brown fox jumps over the lazy dog
Fid: 715 Name = STORY_ID DataType: Ascii Value: n&1EVEmt01
Fid: 720 Name = TAKE_SEQNO DataType: UInt Value: 1
Fid: 725 Name = ATTRIBTN DataType: Rmtes Value: HIIS
Fid: 749 Name = PROD_CODE DataType: Rmtes Value: SHKI
Fid: 750 Name = TOPIC_CODE DataType: Rmtes Value: ASIA CMPNY CN EMRG ENER EQTY HK LEN RENE RENQ STX HIIS
Fid: 751 Name = CO_IDS DataType: Rmtes Value: 9999.HK
Fid: 752 Name = LANG_IND DataType: Rmtes Value: EN
Fid: 1015 Name = TAKE_TIME DataType: Time Value: 10:11:12:0
Fid: 1024 Name = STORY_TIME DataType: Time Value: 10:11:12:0
Fid: 1027 Name = STORY_DATE DataType: Date Value: 7 / 7 / 2017

Fields usage in PNAC and subsequence story text segment messages

FID FID Name Description
2 RDNDISPLAY Set to 136
237 PREV_LR RIC for previous text segment
238 NEXT_LR RIC for next text segment
254 UNIQUE_SN PNAC
255 PROC_DATE Take date
256 PROC_TIME Take time
258 SEG_TEXT Story text segment
259 RECORDTYPE Set to 232 for news
723 TAB_TEXT Tabular text indicator
727 MORE_NEWS Used to indicate if more news is expected
752 LANG_IND Language indicator

Subsequence story text example

REFRESH Item Name: nL3N1J5319
Fid: 1 Name = PROD_PERM DataType: UInt Value: 511
Fid: 2 Name = RDNDISPLAY DataType: UInt Value: 136
Fid: 237 Name = PREV_LR DataType: Ascii Value: <BLANK>
Fid: 238 Name = NEXT_LR DataType: Ascii Value: n#1EaGva03
Fid: 254 Name = UNIQUE_SN DataType: Ascii Value: nHKS1w6r3n
Fid: 255 Name = PROC_DATE DataType: Date Value: 7 / 7 / 2017
Fid: 256 Name = PROC_TIME DataType: Time Value: 8:9:0:0
Fid: 258 Name = SEG_TEXT DataType: Rmtes Value: The quick brown fox jumps over the lazy dog
Fid: 259 Name = RECORDTYPE DataType: UInt Value: 232
Fid: 723 Name = TABTEXT DataType: Rmtes Value: X
Fid: 727 Name = MORE_NEWS DataType: Rmtes Value: R

The message type is stored in FID 3 of broadcast messages and contains a single digit. A brief description of each message type is as followed.

Message type FID value Notes
ALERT 1 Provides a brief summary of an important news item as quickly as possible. An alert can only be used for stories where there is, as yet, no associated story.
FIRST_TAKE 2 Associated with the first part of the story text
SUBSEQUENT_TAKE 3 Associated with additional story text filed after the 1st take
CORRECTION 4 Modifies a headline or story text by the addition of further text.
CORRECTED 5 Indicates that all of the associated story text may have been completely re-written.
UPDATE 6 Changes the headline and indicates that all of the associated story text may have been completely re-written.
DELETED 7 A story has been removed from the feed and should be deleted immediately, either because it was sent in error or because some other reason requires it to be deleted.

The Primary News Access Code (PNAC, FID 235 for broadcast messages and FID 254 for request/response messages) is an 8 byte string identifier which is used to identify a story. Story text segment RICs (FIDs 237 PREV_LR and 238 NEXT_LR) are also 8 byte strings and represent the specific RIC associated with part of the story text. Both PNACs and Story Text Segment RICs are unique for a period of 24 hours.

Category codes provide a means of associating news items with additional information about each news item. The codes fall into three of distinct sets; Product code on FID 749, Topic code on FID 750, and Company code on FID 751. Attribution (FID 725 ATTRIBTN) indicate the source of the news article. The source of the news is entered as a product code on FID 749 as well.

Two timestamps are include, the Story and Take date/times. The Story Date/Time (FID 1024 STORY_TIME and FID 1027 STORY_DATE) indicates when the news item first appeared, and the Take Date/Time (FID 255 PROC_DATE and FID 1015 TAKE_TIME) indicates the time when this portion of the news item was received.

The headlines and story text (FIDs 264 and 258) are provided using the Reuter implementation of ISO2022 encoding called "Reuter Multilingual Text Encoding Standard" (RMTES). The text for headlines and story are contained within 255 bytes, the maximum length of the FIDs.

MACHINE READABLE NEWS

Currently there are four MRN content set available over Elektron, but the one that shall be used by previous N2_UBMS user is Real-time News. This content set is sourced from news alerts and stories from Reuters and dozens of third-party news sources. It contains the headline, story body text, and associated metadata available at news publication time.

MRN Data model

MRN is published over Elektron using an Open Message Model (OMM) envelope in News Text Analytics domain RSSL messages. The Real-time News content set is made available over MRN_STORY RIC. The content data is contained in a FRAGMENT (BUFFER type) field that has been compressed, and potentially fragmented across multiple messages, in order to reduce bandwidth and RSSL message size.

The data goes through the following series of transformations:

  1. The core content data is a UTF-8 JSON string
  2. This JSON string is compressed using gzip
  3. The compressed JSON is split into a number of fragments which each fit into a single RSSL update
  4. The data fragments are added to an update message as the FRAGMENT field value in a FieldList envelope

Therefore, in order to parse the core content data, the application will need to reverse this process.

Five fields, as well as the RIC itself, are necessary to determine whether the entire item has been received in its various fragments and how to concatenate the fragments to reconstruct the item:

  • MRN_SRC: identifier of the scoring/processing system that published the FRAGMENT
  • GUID: globally unique identifier for the data item. All messages for this data item will have the same GUID values.
  • FRAGMENT: compressed data item fragment, itself
  • TOT_SIZE: total size in bytes of the fragmented data
  • FRAG_NUM: sequence number of fragments within a data item. This is set to 1 for the first fragment of each item published and is incremented for each subsequent fragment for the same item.

A single MRN data item publication is uniquely identified by the combination of RIC, MRN_SRC and GUID.

Fragmentation

For a given RIC-MRN_SRC-GUID combination, when a data item requires only a single message, then TOT_SIZE will equal the number of bytes in the FRAGMENT and FRAG_NUM will be 1.

When multiple messages are required, then the data item can be deemed as fully received once the sum of the number of bytes of each FRAGMENT equals TOT_SUM. The consumer will also observe that all FRAG_NUM range from 1 to the number of fragment, with no intermediate integers skipped. In other words, a data item transmitted over three messages will contain FRAG_NUM values of 1, 2 and 3.

Compression

The FRAGMENT field is compressed with gzip compression, thus requiring the consumer to decompress to reveal the JSON plain-text data in that FID.

When an MRN data item is sent in multiple messages, all the messages must be received and their FRAGMENTs concatenated before being decompressed. In other words, the FRAGMENTs should not be decompressed independently of each other.

The decompressed output is encoded in UTF-8 and formatted as JSON.

N2_UBMS and MRN Real-time News Comparison

As mentioned in overview, the content of N2_UBMS appears as Market-Price domain field value pair, while the content of MRN Real-time news is appears as UTF-8 JSON in FRAGMENT field of News Text Analytics domain.

The following table lists the different between process the consumer application have to take in order to extract the data.

N2_UBMS MRN Real-Time News
Use Market Price domain Use News Text Analytics domain
Request RIC Request RIC
Received update messages contain headline and PNAC Received update messages contain fragments of entire data
Use the PNAC from the update message to request the first story body segment  
Use the pointer to request subsequence segments  
Combine the story body from each segments Combine the fragments
  Decompress the gzipped data.
  The entire content data is in JSON format.

Since News Text Analytic domain is a new domain, TREP API users who use legacy MD interface will not be able use this domain. It is recommend that legacy TREP API users should upgrade to OMM interface.

In N2_UBMS, consumer application must make mutiple requests in order to retrieve the full body of a story. First, it has to request RIC for headline and PNAC, then use PNAC to request the first body segment. After that it has to use the pointer to next segment for subsequence body parts. But in MRN, consumer makes a single RIC request. The consumer receives update messages from contain fragments of data which it has to combine and decompress. MRN consumer requires compression library that can decompress gzip. And since news story content data is in JSON format, a JSON parser is necessary as well.

Fields Mapping

Both MRN Real-time News and N2_UBMS contain the same headline, story body text, and associated metadata about the story. The following table maps MRN Real-time News’ fields to N2_UBMS fields.

Field Categories General Description MRN_STORY N2_UBMS Notes
Identifying Information Item ID Id   The two fields have the same value.

GUID is in the field list envelope, but not in the core JSON data item
GUID
Primary News Access Code altId PNAC  
Take Sequence Number takeSequence TAKE_SEQNO  
News Source provider ATTRIBTN MRN_STORY is prefixed by “NS:”
News Text Headline headline BCAST_TEXT  
Story Body body SEG_TEXT MRN_STORY value is escaped to ensure valid JSON
Language language LANG_IND N2_UBMS use upper case. MRN_STORY use lower case
Timestamps Story Date firstCreated STORY_DATE  
Story Time STORY_TIME  
Take Date versionCreated PROC_DATE  
Take Time TAKE_TIME  
Tagging Company Codes subjects CO_IDS MRN_STORY is prefixed by “R:”

MRN_STORY may additionally contain company PermIDs, prefixed with “P:”
Named Item Code instancesOf NAMED_ITEM MRN_STORY is prefixed by “NI:”
Product Codes audiences PROD_CODE MRN_STORY is prefixed by “NP:”
Topic Codes subjects TOPICS_CODE MRN_STORY is prefixed by “N2:”
Item Classification messageType DSPLY_NAME MRN messageType use the same enum value as N2_UBMS.

MRN urgency is a less specific version.
urgency

Handling News

When a newsworthy event occurs, the first part of a newsbreak can be an alert, a short sentence in upper-case that contains the facts and essential detail. Then a story is created a few minutes after any alerts. Stories comprise a headline (often different from the alert) and serveral paragraphs of body text. They are usually filed in a single take. However, in some cases, further takes are necessary to add text or codes.

In N2_UBMS, the item classification is defined in FID 3.

In MRN, the classification is defined in following JSON fields.

  1. messageType field, an integer field that use the same enum value as FID 3 of N2_UBMS.
  2. urgency field, where 1 is an alert and 3 is the newsbreak story article. The number of take is then defiend in takeSequence field.

Finally, all parts of a story should contain the same Primary News Access Code.

Correction

If there is a substantial error in an alert, the newsbreak is filed it with a new PNAC and CORRECTED- is inserted at the beginning of the alert headline. But if there is a substantial error in a story, the amended story is usually filed with the same PNAC.

Withdrawals

If a story is fundamentally flawed, a delete message is sent.

In N2_UBMS, a delete message classification is 7 in FID 3.

In MRN, a delete message will have 7 in messageType field and "stat:canceled" in pubStatus field.

The application should remove the story from viewing and any active story caches.

Reference

Real-time News Field Mapping

MRN Data Models & Elektron Guide - Data Model 2.12

Triarch Real-Time News Standard