Solving inequality Joiner transformation limitations employing the Lookup transformation using Informatica PowerCenter
1. Introduction
The Informatica PowerCenter Joiner transformation can perform heterogeneous (same DBMS) and non-heterogeneous (multiple DBMS and Flat files) join operations between two or more data conjuncts using a key column, usually the primary key column, or multiple columns. When applying the PushDown optimization, the Joiner transformation is translated to a regular SQL Join Clause, but, with the restriction of performing only standard SQL Equi Joins (joins with an equal operator). If a Non-Equi Join (joins with a less than, greather than, different from operator) is necessary to combine multiple data conjuncts, the Joiner transformation becomes useless. Sometimes it's possible to avoid the Non-Equi Join Joiner using a case statement inside an Expression transformation and combine it with a Filter, but usually, the task becomes absurdly tricky and moreover, the time required to do it is more than the given. For such cases, the Lookup transformation comes handy. The Lookup transformation can avoid the equi joins Joiner restriction, performing Non-Equi joins, which also are translated to a SQL Join Clause when using the PushDown optimization. Unfortunately, the Lookup transformation can perform only Left (Master/Detail in Informatica PowerCenter) joins, therefore, in the case of requiring an Inner Join, a Filter Transformation is needed right after the Lookup. At the time of writing this article, Non-equi joins are still not supported by Informatica PowerCenter.
2. Informatica PowerCenter Joiner transformation
The Joiner transformation can be used mainly in two ways, first, as an expression to join two data conjuncts (tables) inside the same DBMS and using the same ODBC connection, a process known as a heterogeneous join. Second, as an expression to join two data conjuncts which they can reside in several sources, for example, between many DBMS and Flat Files (CSV, TSV, Cobol). Moreover, these sources can use a different ODBC connection; this process is known as a non-heterogeneous join.
The Joiner transformation can perform standard SQL inner/left/right joins. The left/right joins are translated into Master/Detail IPC joins. By default, when you add the first source automatically becomes the Detail and the second source becomes the Master. When the Integration Services process the joins inside the Mapping, always takes the data from the Detail source and starts to compare it with the Master. Therefore, it's cataloged as good practice to set the Detail as the source with fewer records, to improve processing speed.
As every Workflow, each Mapping inside of it can be set to execute or not a PushDown optimization. A PushDown optimization is an Informatica feature (that can be purchased separately, and doesn't come in the free 9.6.1 Informatica PowerCenter version) which allow us to translate all the Mapping logic into a standard ANSI SQL code, and push it (hence the name) directly into the DBMS. In other words, when enabling the PushDown Optimization, IPC acts as a bridge to execute the generated query in the corresponding DBMS and doesn't process any record by himself. This PushDown optimization explanation that came out of nowhere has a reason, when using the Joiner transformation if the PushDown optimization is enabled the Joiner transformation is translated to a standard SQL JOIN clause, if not, the Joiner Transformation can/can't store some information as cache data files to improve processing speed when performing future JOINS.
The following two images show the Master/Detail option inside Joiner transformation and highlight the equi-join limitation in the Condition tab.
This short Joiner transformation explanation tries to give only the big picture of its main possibilities and limitations, furthermore, doesn't try to dig into its full complexity. Finally, to have in mind, the PushDown optimization can't be executed when performing non-heterogeneous joins.
3. Informatica PowerCenter Lookup transformation
The Lookup transformation has some capabilities that are analogous to the Joiner transformation. It can join two data conjuncts with one or more key columns, performing only Left Joins (Master/Detail IPC Joins) but with the capacity to execute both non-equi joins, and equi joins.
The Lookup transformation performs a SQL SELECT clause to a predefined table which is chosen right after adding the transformation to the Mapping Designer. The selected columns inside the SELECT clause correspond to the ones inside the Lookup transformation, so, if it's necessary to obtain the first three columns of a 250 column table, you can delete the rest from the Lookup, and the auto-generated SELECT clause brings only the first three columns. Sometimes we need to pull records from a Volatile view; for such cases, we can add the columns manually inside the Lookup transformation and write every column name that the view has. Therefore, instead of the SELECT clause being generated automatically, we need to override it by the Lookup SQL Override attribute. It's imperative to match the names of the ports from the Lookup transformation to those in the Lookup SQL Override.
Once being inserted the Lookup Transformation into the Mapping Designer, we can drag ports from a second data conjunct and join it together, declaring the restrictions using the Condition tab from the Lookup transformation properties. As you can see in the image shown below, the Lookup is capable of performing both Equi-Joins and Non-Equi Joins.
By definition, the Lookup Transformation search for matches with the second data conjunct, hence by default performs a Left Join. When executing some Workflow with the Pushdown Configuration enabled, the Lookup Transformation is translated to a standard LEFT SQL JOIN clause.
This short Lookup Transformation explanation tries to give a partial knowledge of how it works and doesn't try to dig deeper. However, to have in mind, like the Joiner Transformation, you need to set the correct Lookup Cache configuration according to your necessities unless you enable the PushDown optimization. The Lookup Transformation can be configured to work as connected or unconnected, but the unconnected Lookup can retrieve only one column.
4.- Real-life practical example
The objective of the following example is to solve the Joiner Non-Equi Join limitations using the Lookup transformation instead. To give a sense of real-life application, we are going to suppose that we are a Sales company, and each transaction can occur due to a sell in the Online or Retail Store. Each transaction has an ID_TRANSACTION which works like a key to relating the Customers table with the Online and Retail store table. The following image shown below is an entity-relationship diagram (ERD) of our 4 tables, Customers, Transactions, Online Store, and Retail Store.
Now, let's suppose we want two reports. The first one, a report of the behavior of all customers for all the time there were registered, and the second one, a report of all customers for 3 months after their registration date. The rest of the section solves the logic for the first and second report, but first, let's start with the prerequisites.
4.1 Prerequisites
First, we need to create the four tables, Customers, Online_Store, Retail_Store, and Transactions, plus one table which works as a Target for the Mapping, Report_Table. Second, we are going to populate all the tables with the snippets below, and finally, we are going to import the 4 tables as a source in the Source Analyzer. For this example, all the tables are created in an Oracle 12c server, and I strongly recommend you to change the NLS_DATE_FORMAT session parameter, to the ANSI standard:
ALTER SESSION SET NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS';
4.1.1 Creating and populating Tables
The Customers table has the ID primary Key column, which is used to Join the Customers with the Transactions table. Moreover, we are going to populate the Customers table with three customers.
CREATE TABLE "CUSTOMERS"(
"ID" INTEGER GENERATED by default on null as IDENTITY,
"NAME" CHAR(15),
"COUNTRY" CHAR(10),
"REGISTRATION_DATE" DATE,
"DEREGISTRATION_DATE" DATE,
CONSTRAINT customers_pk PRIMARY KEY (ID));
INSERT INTO CUSTOMERS VALUES (1,'AZEEM','MX','2019-01-01',NULL);
INSERT INTO CUSTOMERS VALUES (2,'OSCAR','USA','2019-01-10','2019-04-12');
INSERT INTO CUSTOMERS VALUES (3,'BECERRIL','MX','2019-01-10',NULL);
The Transactions table has the ID_CUSTOMER column to relate to the Customers table, and the ID_TRANSACTION column to relate to the Online_Store and Retail_Store tables.
CREATE TABLE "TRANSACTIONS"(
"ID_CUSTOMER" INTEGER,
"ID_TRANSACTION" INTEGER GENERATED by default on null as IDENTITY,
"STORE" CHAR(20),
"TRANSACTION_DATE" DATE,
CONSTRAINT transactions_pk PRIMARY KEY (ID_TRANSACTION));
For the three customers, we have a total of five transactions, which could be by the Retail or Online store.
INSERT INTO TRANSACTIONS VALUES (1,1,'ONLINE','2019-01-05');
INSERT INTO TRANSACTIONS VALUES (1,2,'ONLINE','2019-02-13');
INSERT INTO TRANSACTIONS VALUES (1,3,'RETAIL','2019-02-14');
INSERT INTO TRANSACTIONS VALUES (2,4,'ONLINE','2019-02-01');
INSERT INTO TRANSACTIONS VALUES(2,5,'ONLINE','2019-05-20');
The Online_Store table has 4 records which are related to the 4 Online transactions on the Transactions table.
CREATE TABLE ONLINE_STORE(
"ID_TRANSACTION" INTEGER GENERATED by default on null as IDENTITY,
"COUNTRY" CHAR(20),
"ITEMS" INTEGER,
"AMOUNT" NUMBER (15,3),
CONSTRAINT online_store_pk PRIMARY KEY (ID_TRANSACTION));
INSERT INTO ONLINE_STORE VALUES (1,'USA',2,2421.87);
INSERT INTO ONLINE_STORE VALUES (2,'MEX',1,118.23);
INSERT INTO ONLINE_STORE VALUES (4,'MEX',1,9.99);
INSERT INTO ONLINE_STORE VALUES(5,'USA',5,168798.43);
The Retail_Store has just one record which is related to the one Retail transaction on the Transactions table.
CREATE TABLE RETAIL_STORE(
"ID_TRANSACTION" INTEGER GENERATED by default on null as IDENTITY,
"COUNTRY" CHAR(20),
"ITEMS" INTEGER,
"AMOUNT" NUMBER (15,3),
CONSTRAINT retail_store_pk PRIMARY KEY (ID_TRANSACTION));
INSERT INTO RETAIL_STORE VALUES (3,'MEX',3,6739.99);
Finally, the Report_Table receives all the records from both reports.
CREATE TABLE REPORT_TABLE(
"ID" INTEGER,
"NAME" INTEGER,
"COUNTRY" CHAR(20),
"STORE" CHAR(15),
"TOS_ITEMS" INTEGER,
"TOS_AMOUNT" NUMBER(15,3),
"TRS_ITEMS" INTEGER,
"TRS_AMOUNT" NUMBER(15,3));
4.1.2 Importing tables definition into the Mapping Designer
After importing all the tables into the Mapping Designer by the Source Analyzer, we can have an environment like the image below.
4.2. All-time transactions for each user, report
We have our four source tables, a total of 3 customers, 5 transactions of which 4 were due to the online store and 1 by the retail store, also in each transaction were bought a different quantity of items creating a certain spent amount. We want all the data universe from the Customers table. Therefore we need to join it with the Transactions table by the ID column, and later join this new data universe with both the Online and Retail store table by the ID_TRANSACTION column. Since we need all Customers data, such join process is realized with a LEFT join, and an equijoin does all the joins. The following snippet shows the all-time transactions for each user report.
SELECT ID, NAME, COUNTRY, STORE,SUM(O_S_ITEMS) TOS_ITEMS, SUM(O_S_AMOUNT) TOS_AMOUNT, SUM(R_S_ITEMS) TRS_ITEMS, SUM(R_S_AMOUNT) TRS_AMOUNT
FROM (
SELECT CU.ID, CU.NAME, CU.COUNTRY, TR.STORE STORE, TR.TRANSACTION_DATE, OS.COUNTRY O_S_COUNTRY, OS.ITEMS O_S_ITEMS, OS.AMOUNT O_S_AMOUNT, RS.COUNTRY R_S_COUNTRY, RS.ITEMS R_S_ITEMS, RS.AMOUNT R_S_AMOUNT
FROM CUSTOMER CU LEFT JOIN TRANSACTIONS TR
ON CU.ID=TR.ID_CUSTOMER
LEFT JOIN ONLINE_STORE OS
ON TR.ID_TRANSACTION=OS.ID_TRANSACTION
LEFT JOIN RETAIL_STORE RS
ON TR.ID_TRANSACTION=RS.ID_TRANSACTION
) AG
GROUP BY ID, NAME, COUNTRY, STORE
ORDER BY ID ASC;
Moreover, the final report is shown in the image below, notice that the report is grouped by all the transactions by ID, NAME, COUNTRY and STORE.
Translating this query to a Mapping doesn't involve any trouble, and even it becomes a routine task. We must use a Joiner transformation, configure it as Master/Detail depending on which source we add first and join the four tables, setting the condition inside the Joiner properties as an equijoin (=).
4.3. All transactions for each user within three months, report
Like the all-time transactions for each user report, we have the same 3 customers, but not all the transactions were due within the first three months after their registration date, and moreover the user Becerril doesn't have any transaction yet. The transaction number 5, realized by the user Oscar, was done in the fourth month, so we don't want this transaction to appear in the final report, but we want to maintain the original Customers table universe. To achieve this, we need to modify our last code snippet and add a second condition to our first LEFT join, specifying that the transaction date should be three months less than the registration date. The less than is the keyword here, because it involves a Non-Equi Join. The SQL code for the second report is described in the snippet below.
SELECT ID, NAME, COUNTRY, STORE,SUM(O_S_ITEMS) TOS_ITEMS, SUM(O_S_AMOUNT) TOS_AMOUNT, SUM(R_S_ITEMS) TRS_ITEMS, SUM(R_S_AMOUNT) TRS_AMOUNT
FROM (
SELECT CU.ID, CU.NAME, CU.COUNTRY, TR.STORE STORE, TR.TRANSACTION_DATE, OS.COUNTRY O_S_COUNTRY, OS.ITEMS O_S_ITEMS, OS.AMOUNT O_S_AMOUNT, RS.COUNTRY R_S_COUNTRY, RS.ITEMS R_S_ITEMS, RS.AMOUNT R_S_AMOUNT FROM
CUSTOMERS CU LEFT JOIN TRANSACTIONS TR
ON
CU.ID=TR.ID_CUSTOMER
--INEQUI JOIN
AND
ADD_MONTHS(CU.REGISTRATION_DATE,3)>=TR.TRANSACTION_DATE
LEFT JOIN ONLINE_STORE OS
ON
TR.ID_TRANSACTION=OS.ID_TRANSACTION
LEFT JOIN RETAIL_STORE RS
ON
TR.ID_TRANSACTION=RS.ID_TRANSACTION
) AG
GROUP BY ID, NAME, COUNTRY, STORE
ORDER BY ID ASC;
The following image is their snippet output. It's possible to see that the Oscar user has spent only 9.99 for one item online sell within the first three months, and the other five items that were sold in the '2019-05-20' date, doesn't appear in the final result. Notice that all users stored in the Customers table appear.
However, what happens if we substitute the non-equi join LEFT JOIN with a WHERE clause? Indeed, it removes the Oscar customer sell from the '2019-05-20' date, but also the Becerril user who hasn't bought anything yet, and our main task is to preserve the original Customers table universe. The following script replaces the non-equi join with the WHERE clause.
SELECT ID, NAME, COUNTRY, STORE,SUM(O_S_ITEMS) TOS_ITEMS, SUM(O_S_AMOUNT) TOS_AMOUNT, SUM(R_S_ITEMS) TRS_ITEMS, SUM(R_S_AMOUNT) TRS_AMOUNT
FROM (
SELECT CU.ID, CU.NAME, CU.COUNTRY, TR.STORE STORE, TR.TRANSACTION_DATE, OS.COUNTRY O_S_COUNTRY, OS.ITEMS O_S_ITEMS, OS.AMOUNT O_S_AMOUNT, RS.COUNTRY R_S_COUNTRY, RS.ITEMS R_S_ITEMS, RS.AMOUNT R_S_AMOUNT FROM
CUSTOMERS CU LEFT JOIN TRANSACTIONS TR
ON
CU.ID=TR.ID_CUSTOMER
/*
--INEQUI JOIN
AND
ADD_MONTHS(CU.REGISTRATION_DATE,3)>=TR.TRANSACTION_DATE
*/
LEFT JOIN ONLINE_STORE OS
ON
TR.ID_TRANSACTION=OS.ID_TRANSACTION
LEFT JOIN HR.RETAIL_STORE RS
ON
TR.ID_TRANSACTION=RS.ID_TRANSACTION
--WHERE CLAUSE
WHERE ADD_MONTHS(CU.REGISTRATION_DATE,3)>=TR.TRANSACTION_DATE
) AG
GROUP BY ID, NAME, COUNTRY, STORE
ORDER BY ID ASC;
The image below shows the snippet output. Notice that the user Becerril doesn't appear.
Translating this Nonequi join SQL query logic into an IPC Workflow-Mapping becomes a tricky task, due to the Nonequi join Joiner limitations, we can't use such transformation. To solve this obstacle, we should use the Lookup transformation instead, because of the possibility to perform nonequi joins.
4.4 Informatica Powercenter Lookup Mapping
To translate the All transactions for each user within three months report SQL query logic into an IPC Workflow mapping; first, we need to create the: m_nonequi_join_lookup Mapping, in which we will add the three Source Tables: Customers, Online_Store and Retail_Store plus the Target table, Report_Table.
Then, we need to create a Transformation expression dragging al the ports from the Customers table and create a new port: THREE_MONTHS, in which we add three months after the registration date; this task is done with the ADD_TO_DATE IPC built-in function.
To be able to join the Transactions table with the Customers and both Sells tables, we need to import it using the Lookup transformation.
Then, we need to drag all the Customer table ports into the Lookup transformation and set the condition to TRANSACTION_DATE<=THREE_MONTHS
Later, we add two Joiner Transformations. The first one JNR_OS, to join the Lookup transformation with the Online_Sells table. The second one JNR_RS to join the JNR_OS with the Retail_Sells table.
Then, we add an Aggregator transformation: AGG_TS, in which we drag all the ports from the JNR_OS, and create inside of it four new ports from which we do the SUM() of the Online_Sells and Retail_Sells tables.
Finally, we drag the ports from the AGG_TS directly to the Report_Table target table and save the m_nonequi_join_lookup mapping. The final mapping: m_nonequi_join_lookup, looks like the image below.
To execute the m_nonequi_join_lookup mapping, we need to create a Workflow: wf_nonequi_join_lookup, add a session, and configure the corresponding ODBC connections.
Running the wf_nonequi_join_lookup workflow, we can see the statistics through the PowerCenter Workflow Monitor, and querying all the rows from the Report_Table we can see that the m_nonequi_join_lookup mapping using the Lookup transformation inserts the same four rows that we could get using the original SQL for the second report.
5. Conclusions
Always it comes to company believes when developing some software or service. It could look like doesn't have any sense for Informatica to create a Joiner transformation which by default can't perform all the standard SQL JOIN clause possibilities. However, the Lookup transformation can offer multiple actions that the Joiner can't, for example, can identify if an incoming record is a new record if so insert it or update it otherwise, thanks to the Lookup cache. Also, it can retrieve rows from a Volatile view with a customized SQL query, without the necessity to create a Source Qualifier transformation. Moreover, as I demonstrated in this article, it can perform all the SQL Join clause instances. Nevertheless, it's crucial to have in mind if the source from the Lookup transformation con lead to duplicated rows, because the Lookup transformation retains only one value, and doesn't maintain all the matches. To perform the exercise, the workflow: wf_nonequi_join_lookup, must be set to operate with the PushDown optimization enabled. Otherwise, the Lookup transformation retains just one encountered value.
Thanks for Reading!
Azeem.