MRN & N2_UBMS Comparison and Migration Guide
This article is intended for use by Elektron and TREP APIs developers who wish to migrate from N2_UBMS news to Machine Readable News. It highlights the key differences between N2_UBMS and MRN, and gives examples to help Elektron and TREP developers understand how to replace N2_UBMS news with MRN.
N2_UBMS REAL-TIME NEWS
This section will describe N2_UBMS construction and implementation. It aimed to help new developers who might not have the background knowledge in N2_UBMS real-time news. Experienced developers can skip to the next section.
N2_UBMS is a legacy RIC for News feed broadcast message. Any incoming news story result in broadcast messages being transmitted as updates to N2_UBMS RIC. However, a news story may be made available in many parts. For example, the first part of a story may include one or more Alerts, which is a brief news headline containing the most essential information relating to an emerging story. An Alert may be followed by another headline and a piece of text for the story. This text and its associated headline are called a Take. More than one Takes are possible allowing more story text to be provided as and when available.
In order to identify each part of a news story, all transmitted news will has a common identifier that enables all parts of the same story to be recognized, and this identifier was called the "Primary News Access Code" (PNAC).
Each broadcast message on N2_UBMS contains a set of fields in Market Price domain which are listed in the next two sections. These fields contain "Primary News Access Code", headline, category codes and timestamps, but not the story body.
The body of a story is split into a number of text segments, each of which has a unique identifier that is used as a RIC for retrieval purposes. The RIC for the first story text segment is always the PNAC, and each story text segment has pointers that allow segments to be linked together. Consumer application must use PNAC from N2_UBMS and subsequence story text segment RICs to retrieve the full body of a story.
A news story also has a set of category codes. These codes are transmitted with the broadcast messages. Codes represent different aspects of the news item, and include product codes, topic codes, company codes and attribution. Product codes identify whether a user is allowed to receive a news item and also define a broad range for the subject matter, e.g. "M" for Money International News Service. Topic codes describe the story's subject matter, e.g. "INT" for interest rates. Company codes identify a particular company affected by the news item, e.g. "RTR.L" for Reuters. Attribution defines the source of a specific news item.
Fields usage in broadcast messages
|3||DISPLY_NAME||Identify message subtype|
|235||PNAC||Primary News Access Code|
|259||RECORDTYPE||Set to 232 for news|
|715||STORY_ID||Sequential value for message loss detection|
|720||TAKE_SEQNO||Message sequence number|
|749||PROD_CODE||List of product codes|
|750||TOPIC_CODE||List of topic codes|
|751||CO_IDS||List of company codes|
Broadcast messages example
UPDATE Item Name: N2_UBMS Fid: 3 Name = DSPLY_NAME DataType: Rmtes Value: 2 Fid: 1 Name = PROD_PERM DataType: UInt Value: 7710 Fid: 235 Name = PNAC DataType: Ascii Value: nHKS1w6r3n Fid: 255 Name = PROC_DATE DataType: Date Value: 7 / 7 / 2017 Fid: 259 Name = RECORDTYPE DataType: UInt Value: 232 Fid: 264 Name = BCAST_TEXT DataType: Rmtes Value: The quick brown fox jumps over the lazy dog Fid: 715 Name = STORY_ID DataType: Ascii Value: n&1EVEmt01 Fid: 720 Name = TAKE_SEQNO DataType: UInt Value: 1 Fid: 725 Name = ATTRIBTN DataType: Rmtes Value: HIIS Fid: 749 Name = PROD_CODE DataType: Rmtes Value: SHKI Fid: 750 Name = TOPIC_CODE DataType: Rmtes Value: ASIA CMPNY CN EMRG ENER EQTY HK LEN RENE RENQ STX HIIS Fid: 751 Name = CO_IDS DataType: Rmtes Value: 9999.HK Fid: 752 Name = LANG_IND DataType: Rmtes Value: EN Fid: 1015 Name = TAKE_TIME DataType: Time Value: 10:11:12:0 Fid: 1024 Name = STORY_TIME DataType: Time Value: 10:11:12:0 Fid: 1027 Name = STORY_DATE DataType: Date Value: 7 / 7 / 2017
Fields usage in PNAC and subsequence story text segment messages
|2||RDNDISPLAY||Set to 136|
|237||PREV_LR||RIC for previous text segment|
|238||NEXT_LR||RIC for next text segment|
|258||SEG_TEXT||Story text segment|
|259||RECORDTYPE||Set to 232 for news|
|723||TAB_TEXT||Tabular text indicator|
|727||MORE_NEWS||Used to indicate if more news is expected|
Subsequence story text example
REFRESH Item Name: nL3N1J5319 Fid: 1 Name = PROD_PERM DataType: UInt Value: 511 Fid: 2 Name = RDNDISPLAY DataType: UInt Value: 136 Fid: 237 Name = PREV_LR DataType: Ascii Value: <BLANK> Fid: 238 Name = NEXT_LR DataType: Ascii Value: n#1EaGva03 Fid: 254 Name = UNIQUE_SN DataType: Ascii Value: nHKS1w6r3n Fid: 255 Name = PROC_DATE DataType: Date Value: 7 / 7 / 2017 Fid: 256 Name = PROC_TIME DataType: Time Value: 8:9:0:0 Fid: 258 Name = SEG_TEXT DataType: Rmtes Value: The quick brown fox jumps over the lazy dog Fid: 259 Name = RECORDTYPE DataType: UInt Value: 232 Fid: 723 Name = TABTEXT DataType: Rmtes Value: X Fid: 727 Name = MORE_NEWS DataType: Rmtes Value: R
The message type is stored in FID 3 of broadcast messages and contains a single digit. A brief description of each message type is as followed.
|Message type||FID value||Notes|
|ALERT||1||Provides a brief summary of an important news item as quickly as possible. An alert can only be used for stories where there is, as yet, no associated story.|
|FIRST_TAKE||2||Associated with the first part of the story text|
|SUBSEQUENT_TAKE||3||Associated with additional story text filed after the 1st take|
|CORRECTION||4||Modifies a headline or story text by the addition of further text.|
|CORRECTED||5||Indicates that all of the associated story text may have been completely re-written.|
|UPDATE||6||Changes the headline and indicates that all of the associated story text may have been completely re-written.|
|DELETED||7||A story has been removed from the feed and should be deleted immediately, either because it was sent in error or because some other reason requires it to be deleted.|
The Primary News Access Code (PNAC, FID 235 for broadcast messages and FID 254 for request/response messages) is an 8 byte string identifier which is used to identify a story. Story text segment RICs (FIDs 237 PREV_LR and 238 NEXT_LR) are also 8 byte strings and represent the specific RIC associated with part of the story text. Both PNACs and Story Text Segment RICs are unique for a period of 24 hours.
Category codes provide a means of associating news items with additional information about each news item. The codes fall into three of distinct sets; Product code on FID 749, Topic code on FID 750, and Company code on FID 751. Attribution (FID 725 ATTRIBTN) indicate the source of the news article. The source of the news is entered as a product code on FID 749 as well.
Two timestamps are include, the Story and Take date/times. The Story Date/Time (FID 1024 STORY_TIME and FID 1027 STORY_DATE) indicates when the news item first appeared, and the Take Date/Time (FID 255 PROC_DATE and FID 1015 TAKE_TIME) indicates the time when this portion of the news item was received.
The headlines and story text (FIDs 264 and 258) are provided using the Reuter implementation of ISO2022 encoding called "Reuter Multilingual Text Encoding Standard" (RMTES). The text for headlines and story are contained within 255 bytes, the maximum length of the FIDs.
MACHINE READABLE NEWS
Currently there are four MRN content set available over Elektron, but the one that shall be used by previous N2_UBMS user is Real-time News. This content set is sourced from news alerts and stories from Reuters and dozens of third-party news sources. It contains the headline, story body text, and associated metadata available at news publication time.
MRN Data model
MRN is published over Elektron using an Open Message Model (OMM) envelope in News Text Analytics domain RSSL messages. The Real-time News content set is made available over MRN_STORY RIC. The content data is contained in a FRAGMENT (BUFFER type) field that has been compressed, and potentially fragmented across multiple messages, in order to reduce bandwidth and RSSL message size.
The data goes through the following series of transformations:
- The core content data is a UTF-8 JSON string
- This JSON string is compressed using gzip
- The compressed JSON is split into a number of fragments which each fit into a single RSSL update
- The data fragments are added to an update message as the FRAGMENT field value in a FieldList envelope
Therefore, in order to parse the core content data, the application will need to reverse this process.
Five fields, as well as the RIC itself, are necessary to determine whether the entire item has been received in its various fragments and how to concatenate the fragments to reconstruct the item:
- MRN_SRC: identifier of the scoring/processing system that published the FRAGMENT
- GUID: globally unique identifier for the data item. All messages for this data item will have the same GUID values.
- FRAGMENT: compressed data item fragment, itself
- TOT_SIZE: total size in bytes of the fragmented data
- FRAG_NUM: sequence number of fragments within a data item. This is set to 1 for the first fragment of each item published and is incremented for each subsequent fragment for the same item.
A single MRN data item publication is uniquely identified by the combination of RIC, MRN_SRC and GUID.
For a given RIC-MRN_SRC-GUID combination, when a data item requires only a single message, then TOT_SIZE will equal the number of bytes in the FRAGMENT and FRAG_NUM will be 1.
When multiple messages are required, then the data item can be deemed as fully received once the sum of the number of bytes of each FRAGMENT equals TOT_SUM. The consumer will also observe that all FRAG_NUM range from 1 to the number of fragment, with no intermediate integers skipped. In other words, a data item transmitted over three messages will contain FRAG_NUM values of 1, 2 and 3.
The FRAGMENT field is compressed with gzip compression, thus requiring the consumer to decompress to reveal the JSON plain-text data in that FID.
When an MRN data item is sent in multiple messages, all the messages must be received and their FRAGMENTs concatenated before being decompressed. In other words, the FRAGMENTs should not be decompressed independently of each other.
The decompressed output is encoded in UTF-8 and formatted as JSON.
N2_UBMS and MRN Real-time News Comparison
As mentioned in overview, the content of N2_UBMS appears as Market-Price domain field value pair, while the content of MRN Real-time news is appears as UTF-8 JSON in FRAGMENT field of News Text Analytics domain.
The following table lists the different between process the consumer application have to take in order to extract the data.
|N2_UBMS||MRN Real-Time News|
|Use Market Price domain||Use News Text Analytics domain|
|Received update messages contain headline and PNAC||Received update messages contain fragments of entire data|
|Use the PNAC from the update message to request the first story body segment|
|Use the pointer to request subsequence segments|
|Combine the story body from each segments||Combine the fragments|
|Decompress the gzipped data.|
|The entire content data is in JSON format.|
Since News Text Analytic domain is a new domain, TREP API users who use legacy MD interface will not be able use this domain. It is recommend that legacy TREP API users should upgrade to OMM interface.
In N2_UBMS, consumer application must make mutiple requests in order to retrieve the full body of a story. First, it has to request
Both MRN Real-time News and N2_UBMS contain the same headline, story body text, and associated metadata about the story. The following table maps MRN Real-time News’ fields to N2_UBMS fields.
|Field Categories||General Description||MRN_STORY||N2_UBMS||Notes|
|Identifying Information||Item ID||Id||The two fields have the same value.
GUID is in the field list envelope, but not in the core JSON data item
|Primary News Access Code||altId||PNAC|
|Take Sequence Number||takeSequence||TAKE_SEQNO|
|News Source||provider||ATTRIBTN||MRN_STORY is prefixed by “NS:”|
|Story Body||body||SEG_TEXT||MRN_STORY value is escaped to ensure valid JSON|
|Language||language||LANG_IND||N2_UBMS use upper case. MRN_STORY use lower case|
|Tagging||Company Codes||subjects||CO_IDS||MRN_STORY is prefixed by “R:”
MRN_STORY may additionally contain company PermIDs, prefixed with “P:”
|Named Item Code||instancesOf||NAMED_ITEM||MRN_STORY is prefixed by “NI:”|
|Product Codes||audiences||PROD_CODE||MRN_STORY is prefixed by “NP:”|
|Topic Codes||subjects||TOPICS_CODE||MRN_STORY is prefixed by “N2:”|
|Item Classification||messageType||DSPLY_NAME||MRN messageType use the same enum value as N2_UBMS.
MRN urgency is a less specific version.
When a newsworthy event occurs, the first part of a newsbreak can be an alert, a short sentence in upper-case that contains the facts and essential detail. Then a story is created a few minutes after any alerts. Stories comprise a headline (often different from the alert) and serveral paragraphs of body text. They are usually filed in a single take. However, in some cases, further takes are necessary to add text or codes.
In N2_UBMS, the item classification is defined in FID 3.
In MRN, the classification is defined in following JSON fields.
- messageType field, an integer field that use the same enum value as FID 3 of N2_UBMS.
- urgency field, where 1 is an alert and 3 is the newsbreak story article. The number of take is then defiend in takeSequence field.
Finally, all parts of a story should contain the same Primary News Access Code.
If there is a substantial error in an alert, the newsbreak is filed it with a new PNAC and CORRECTED- is inserted at the beginning of the alert headline. But if there is a substantial error in a story, the amended story is usually filed with the same PNAC.
If a story is fundamentally flawed, a delete message is sent.
In N2_UBMS, a delete message classification is 7 in FID 3.
In MRN, a delete message will have 7 in messageType field and "stat:canceled" in pubStatus field.
The application should remove the story from viewing and any active story caches.
Triarch Real-Time News Standard