Introduction

The purpose of this article is to describe the Item and Connection recovery mechanism in EMA C++. This article can help users better understand how EMA manages item and connection recovery and some related configuration parameters as well. It also provides guidance for application reponsibility in scenarios that EMA does not perform recovery. The behaviors in this article are based on EMA C++ 1.1.0.L1 version. However, almost all concept and configuration parameters should also be applicable to other versions of EMA C++ and EMA Java.

Please note that connection recovery should be applicable to both Consumer and Non-Interactive Provider applications, while item recovery is applicable to Consumer only.

 

Background

EMA incorporates the ValueAdded Reactor component (called the Elektron Transport API VA Reactor) from the Transport API, which provides the watchlist and transport-level functionality. The EMA recovery features generally are based on the ValueAdded Reactor’s feature. Detailed information could be found in the ETA Value Added Components Developers Guide documentation.

 

Connection Recovery/Fail-over

EMA normally offers transport-level functionality such as heartbeat management, connection and item recovery. Once a connection has failed, EMA automatically tries to re-connect to servers to establish connection, and then sends Login details on the application’s behalf. Users can configure servers in the following ways:

  • Reconnect to the same server

User can configure a specific channel name in the “Channel” parameter. EMA will try to reconnect to the server, once the connection is failed.

<ConsumerList>

                <Consumer>

                                <Name value="Consumer_1"/>

                                <Channel value="Channel_1"/>

                </Consumer>

<ConsumerList>

  • Failover to other servers

Users can configure a comma-separated set of channels in the “ChannelSet” parameter. Channels in the set will be tried with each reconnection attempt from left to right until a successful connection is made.

For example, with the following configuration, EMA will attempt to establish a connection to Channel_1. If the connection to the Channel_1 is failed, EMA will attempt to connect to Channel_2 and so on. If the Channel_3 is failed, the connection attempt will be on the Channel_1.

<ConsumerList>

                <Consumer>

                                <Name value="Consumer_1"/>

                                <ChannelSet value="Channel_1, Channel_2, Channel_3"/>

                </Consumer>

<ConsumerList>

Note: If both Channel and ChannelSet are configured, then EMA uses the parameter that is configured last in the file. For example, if <Channel> is configured after <ChannelSet> then EMA uses <Channel>, but if <ChannelSet> is configured after <Channel> then EMA uses <ChannelSet>.

 

Reconnection wait time

The EMA reconnection wait time for each reconnection is not static. This wait time increases with each connection attempt, from reconnectMinDelay to reconnectMaxDelay. These parameters are the maximum/minimum amount of time the consumer and non-interactive provider wait (in milliseconds) before attempting to reconnect a failed channel.

The first reconnection wait time will be the minimum delay value configured in the ReconnectMinDelay. The reconnection wait time will double for every reconnection, but not exceed the ReconnectMaxDelay. The reconnection time will be reset once the connection is successfully established.

 

Example scenarios:

The ReconnectMaxDelay is set to 3000 milliseconds and ReconnectMinDelay is set to 20000 milliseconds. The first reconnection time will be 3000 milliseconds after the first disconnection. After that, EMA will wait for 6000 and 12000 milliseconds respectively for each reconnection attempt. After that, the delay time will not increase, because it exceeds the ReconnectMaxDelay.

 

Reconnect Attempt Limit

Users can limit number of times which EMA consumer or non-interactive provider attempts to reconnect to a channel,  using the ReconnectAttempLimit configuration parameter, when connect fails. If the parameter is set to -1, the consumer and non-interactive provider continually attempts to reconnect.

If this configuration is used with the ChannelSet, each reconnection attempt on each Channel in the ChannelSet will increase the reconnect attempt count.

Once the Reconnect Attempt limit is reached, EMA will stop reconnecting and log the Received ChannelDown event on channel <channel name> error message, instead of Received ChannelDownReconnecting event on channel <channel name> warning message.

 

Item Recovery

EMA utilizes the ETA Reactor watchlist component to manage item streams. The watchlist automatically recovers data streams in response to failure conditions, such as disconnects and unavailable services, so that applications do not need special handling for these conditions. As conditions are resolved, the watchlist will re-request items on the application’s behalf. Applications can also use this function to request data before a connection is fully established.  The watchlist will request items once the connection is fully established.

 

In case that EMA send an item request but doesn’t receive any response back. EMA will wait for a specified time and then close the request, and then re-request the item on the application’s behalf. The timeout can be configured via the RequestTimeout parameter. Default value is 15 seconds. Each re-request attempt, EMA will generate a Status message with state="Open / Suspect / Timeout / 'Request timed out.'"

 

There are some situations that applications may want to manually manage the item recovery. In this situation, applications can change the SingleOpen element in Login request attribute to false. The SingleOpen element can be used to indicate whether the consumer application wants provider to drive stream recovery or the consumer application will drive stream recovery. With this configuration, statuses with the OMMState::ClosedRecover will be received for each item and there is no item recovery. Application will need to manually re-request items to recover data.

 

Non recovery scenarios

In some failures, EMA does not automatically recover data stream as the data stream is closed. EMA application can verify the status of data stream status via the OmmState::StreamState of Refresh or Status response. The Closed or ClosedRecover StreamState can be considered as data stream closed.

For Closed StreamState, the status implies that data is not available on this service and connection and is not likely to become available (for example, permission denied and RIC not found). Applications may just log the status information (i.e. OmmState::StreamState, OmmState::DataState and OmmState::StatusText)

For ClosedRecover StreamState, the status implies that the current stream is closed; however, data can be recovered on this service and connection at a later point in time. Application may be implemented to re-request the item at a specific point of time.

 

There are various factors which can affect the StreamState and item recovery. Below are examples:

  • SingleOpen and AllowSuspectData behavior

As mentioned earlier, the SingleOpen element in Login request can change the item recovery handled by EMA. It also affects the StreamState of item. There is another element named AllowSuspectData which can change the StreamState as well. The AllowSuspectData can be used to indicate whether the consumer application allows a OmmState.Suspect or the consumer application prefers any suspect data result in the stream being closed with an OmmState.ClosedRecover state.

If any SingleOpen and AllowSuspectData configuration causes a contradiction in behavior (e.g., SingleOpen indicates that the provider should handle recovery where a suspect status is generated, but AllowSuspectData indicates that the consumer does not want to receive suspect status), SingleOpen behavior takes precedence.

The following table is from EMA C++ RDM Usage Guide document. It shows how a provider can convert messages to honor the consumer’s SingleOpen and AllowSuspectData settings. The first column in the table show the provider’s actual StreamState and DataState. The subsequent columns show the StreamState and DataState affected by each combination of SingleOpen and AllowSuspect configuration.

The following table uses the abbreviations:

• SS for StreamState

• DS for DataState

For example, in case that data is suspect for a RIC, the feed will send a Status message with DataState::Suspect and StreamState::Open. If the consumer application makes Login request with SingleOpen=0 and AllowSuspect=0, the Status message received by application will be changed to DataState::Suspect and StreamState::ClosedRecover. The data stream will be closed and there is no data received on the stream.

 

EMA applications can modify the SingleOpen and AllowSuspect elements via the addAdminMsg interface which allows application to override the default

Administrative domains request (Login, Directory, and Dictionary).  You can try the 420__MarketPrice__AdminDomainConfig example provided in the EMA package.

    // Modify Administrative domains with ReqMsg to override default configurations

    OmmConsumer consumer( OmmConsumerConfig().operationModel( OmmConsumerConfig::UserDispatchEnum )

                .addAdminMsg( ReqMsg().domainType( MMT_LOGIN ).name( "user" ).nameType( USER_NAME )

                .attrib( ElementList().addAscii( ENAME_APP_ID, "127" )

                .addAscii( ENAME_POSITION, "127.0.0.1/net" )

                .addUInt(ENAME_SINGLE_OPEN, 0)

                .addUInt( ENAME_ALLOW_SUSPECT_DATA, 1 )

                .complete() ) )

 

  • PrivateStreams

The Private Streams feature provides applications with the ability to establish streams exclusively between two points or users. Data flowing on private streams is not shared with other users. This allows applications to provide, for example, a transactional capability to their users.

EMA will not attempt to recover a private stream that goes down, regardless of the cause of recovery and of the SingleOpen value used in the login request. Applications should receive status message with the stream state of ClosedRecover. Applications are responsible for re-requesting item streams.

Service gone down

Item State: Closed, Recoverable / Suspect / None / 'A31: the private stream item is closed.'

 

Connection gone down

Item State: Closed, Recoverable / Suspect / None / 'Service for this item was lost.'

Moreover, when the connection is established, but the service is down or not accepting requests, the EMA Consumer application should not request private streams. If the application requests private streams prior to a service’s availability, EMA sends an internally-generated status message with the stream state of Closed/Closed. In this case, application should request Directory stream to monitor the status of service, before requesting private stream.

 

Item is requested on service which currently is down

Item State: Closed, Recoverable / Suspect / None / 'Service is down.'

Item is requested on service which currently is not available

Item State: Closed / Suspect / Source unknown / 'Service name of 'DIRECT_FEED' is not found.'

 

Administrative domain recovery

EMA normally manages the administrative domains (Login, Directory, and Dictionary) streams on the application’s behalf. The EMA consumer uses default login, directory, and dictionary requests when connecting to a provider or ADS. Responses are handled by EMA as well. On connection failure, EMA will automatically request Login and Directory once connection is re-established. There is no re-request for Dictionary domain. The Dictionary domain will only be requested once on the first connection.

Similar to the item stream recovery, the Login stream will not be recovered if the Login stream is closed (i.e. no permission for user login on DACS). However, there is an exceptional for the OmmState.ClosedRecover of Login. EMA will re-request Login once Login response with OmmState.ClosedRecover is received. The OmmState.ClosedRecover implies that the user is allowed to attempt another login.  For example, DACS is enabled on ADS but the connection between ADS and DACS is not successfully established. ADS will reject login request with ClosedRecover status.

 

References

  • EMA C++ Developers Guide
  • EMA C++ RDM Usage Guide
  • EMA C++ Configuration Guide

All EMA C++ guides can be access via the Message API - C++ Development Guides.