Ensemble Identifiers (Focal Theory)
In the Focal theory there are two different identifiers, Identifier(s) (IDFR) and Secondary Identifiers (SecId).
The IDFR is defined as follows
”IDFR identifies an unknown Instance of a defined Ensemble and represents the Instance life cycle from cradle to grave”
The SecId is defined as follows
“Secondary Identifiers are identifiers that the business might use to search for an instance in an Ensemble. The identifier does not need to be unique for one instance. All ID’s that is used as IDFR’s will also be in the SecId concept”
In the Focal theory the IDFR has one purpose only and that is to ensure that the integration of instances is correctly executed, without resulting in duplicates or key collisions, therefore the IDFR follows a specific pattern.
The Secid representation is as source sends it with no consideration of possible duplicates or key collisions. The SecId is used to support business search of specific instances of data and only has to be true in a specific point in time.
This is done so we can have a separation of concern in the implementation. IDFR focus on achieving the correct integration of instances, the SecId support business search. This way they will not interfere with each other’s function and be suboptimitzed to support both requirement as the same string
In Focal theory the IDFR has a specific pattern to ensure that it is unique and that it integrates different sources of data for a specific instance of the defined entity (Ensemble) if the Instance gets the same IDFR.
Here is the description of the different parts of the IDFR string.
The idfrtoken is an alphanumeric string that is 8 characters long, its purpose is to ensure no key collisions and that integration of instances from different sources of data is correctly applied. The idfrtoken is built out of two things, Owner and IdfrType.
OWNER represents an entity that owns, governs and sets the ID that is used to identity the unknown instance of the defined Ensemble (Focal), the entity can as an example be a data system or an organization.
IDFRTYPE is used to ensure that if one OWNER can identify two different Instances with same ID series, the IDFRTYPE is used to ensure uniqueness between the two different Instances.
ID is the identifier string that the OWNER sets for a specific Instance.
The ID can be a concatenation of multiple attributes.
Here there are some examples on how these parts in the IDFR string is used to ensure no Key Collision and Integration.
Life Cycle of Ensemble Instance example
To start we will look at the life cycle aspect of the IDFR definition. The idea is that an IDFR should represent the life cycle of the unknown Instance of the defined entity (Ensemble).
In this example we look at the identification of a Mobile Telecom Subscription by using the mobile phone number (MSISDN) and why it does not work very well in an analytical system even when the Business uses it to identify a Subscription/Subscriber.
From an operational business perspective, the MSISDN is a viable identification of a Subscription and should be used as a SecId, since at a specific point in time a MSISDN can only belong to one Subscription. From an analytical perspective it is not good enough for the simple reasons that it is possible to change MSISDN without starting a new Subscription or keep the same MSISDN and start a new Subscription. If the MSISDN was used as an IDFR for the Subscription Ensemble and then owner of the Subscription changes its MSISDN the analytical system would then count that as a New subscription had entered the system, which is not true in this case.
What more that happens is that “old” MSISDN, that subscribers have changed from, will be reused for new Subscribers that might enter the Telecom company for the first time and therefore signs a New subscription. In this case the analytical system would recognize the MSISDN and use that existing data instance as a representation of the New Subscription and we would lose the correct history of the Subscription.
Another scenario is that a subscriber changes its Subscription, might move from a Company Subscription to a private subscription when the Subscriber stops working at the Company but is allowed to keep the MSISDN and therefor the MSISDN is moved to the New subscription. The analytical system would keep the same Subscription and update its information, which is wrong, since we would not then see that it is really a New Subscription that has entered our analytical system
As you can see the MSISDN does not work as an IDFR for a Subscription, since it does not represent the Subscription from cradle to grave. It is important to understand, that some Identifier that work for the operational business view does not work for the analytical system as an IDFR from a life cycle and Time variant perspective of an analytical system.
The use of OWNER is important to get a correct IDFR. In this example we will look at a key collision example by having an Ensemble representing Deposit Account. The ID used to identify a Deposit Account is the Account Number. In this case the IT landscape has two different Deposit Account systems that will load the Deposit Account Ensemble. The issue is that they both use a number series and the same Account Number can come from both systems, but it is not the same Deposit Account instance. By concatenating the System before the ID as a representation of the OWNER the IDFR string will differ and two different Deposit Account instances will be created as the data enter the analytical system.
It is important to understand how/where/when an ID is created so the analytical system can, because it is not only about avoid Key Collision, also to achieve integration when correct to do so. Adding system in front of every ID will ensure most of the Key Collisions issues does not happen but will hamper the ability to integrate data between different sources. In this example we look how to use OWNER to achieve integration.
Bonds and Stock paper uses a global identification called International Securities Identification Number (ISIN). Those ID’s are issued by the ISIN organization. If we would have an Ensemble for Security Paper, which then could contain Bonds, Stock, Obligation etc. now the information of a specific Security Paper could come from multiple different sources and also even external sources. In this case it is not good to add Source as the OWNER since then no Integration would be achieved and the analytical system would have multiple duplicates of the same Security Paper. By adding the ISIN organization as the OWNER for all Sources of data that sends the ISIN as the identification, the IDFR would look the same independent of the source of the instance and integration will be achieved.
There could be an argument to not use the OWNER when using a Global Identifier like ISIN, but there is always the danger when creating an Ensemble that the same ID series might enter and not identifying the same instance, in this example if some source sends another identifier that for some strange reason has the same ID string as another ISIN Id. It is good practice to use OWNER to ensure that the control of Integration and Key Collision is in the analytical system design and not by luck.
IDFRTYPE is the most seldom needed part of the IDFR string. But there are some cases where it is needed to ensure that Key Collision does not happen. This is one real life case where it was needed. A large financial institution had an Ensemble representing Loan. In the IT landscape there was a large IT system handling all kinds of Loans. The Loan system design was so that each type of loan had its own table with its own number series which were the Loan number. A Mortgage Loan and a Leasing Loan or Blanco Loan could all have the same Loan Number. By adding OWNER (System in this case) to the IDFR string did not differentiate the IDFR string, it was still the same. But by setting unique IDFRTYPE on each Loan type the IDFR string got unique.
Core Business Concept Model
A given argument when it comes to these issues is to put the data in their own Ensembles, Leasing Ensemble, Mortgage Ensemble and so on. From a Focal Theory that is not the way to solve it, since it makes the analytical system to be IT system driven and not Business driven. If the Core Business Concept is Loan for a specific Bank, then that is what should be implemented. Not solve Key Collision and Integration issues by building the data model to ensure no Key Collisions, since the Key Integration is just one part of the Integration logic that an Analytical System must handle.
The Identifier strings that work from an operational perspective, might not always work from an analytical perspective. In the Focal Framework this is handled by the use of IDFR strings that follows a predefined pattern which sole purpose is to ensure that integration is correctly applied and that key collisions are avoided.