Parameterized SQL and Static Columns in Power Query in Excel =]
I recently had some interesting problems and learned some cool solutions while working in Power Query in Excel =] !
The first problem I had was that I was trying to create a dynamic SQL query based on data in another query. While I could manually change the IN Clause, I'd rather find a way to dynamically do it so I could simply hit refresh and go. I had done this before between Excel worksheet objects and VBA, but Power Query was another beast entirely. After the fact, though, I found the same principle applies in Power Query as in VBA, even if the route to arriving there is a little different, haha. What I had to do was essentially re-create the string that would normally go into the SQL query, but within Power Query, then feed that as a parameter to be added to the existing query. While this sounds simple, it was a bit difficult to learn how to do it within the Power Query Ecosystem, and I found there weren't as many resources discussing this as I had found for a similar VBA solution previously. That's probably not as surprising because Power Query is much newer and likely has a lower regular usage rate compared to VBA, so less people are talking about it. Or perhaps I'm bad at googling, haha!
So, I started by referencing the desired query in another query, then isolating the desired column of data I wanted to use within the IN Clause. From there I was able to drill down within the column to create a list of values. Then I was able to apply
= List.Split(List.Distinct(drilled down column),<split number>)
to remove any duplicates and split the column into usable chunks. This List.Split is important for two reasons. Primarily because Oracle databases only allow 1000 arguments within a single IN Clause (I think this is a general -list- limitation as well, but I'm not well versed in the technical details as to why), so if I wanted to use these for an Oracle Database Query, I'd need to have chunks less than 1000. The lesser secondary reason is that if access to a DSN wasn't available, I had to load this final query prep table to an Excel worksheet and copy the [Column[#Data]] and paste it into a query tool. This is important because Excel has a limit to the number of characters it can display in a cell, truncating characters past that limit (and breaking the query as a result)!
After successfully de-duping and splitting the list, I could then transform those lists back into a table and add a Custom Column to use Text.Combine() to transform the list chunks into delimited strings, such as 'value1','value2',... etc. Then I could add further custom columns to continue constructing the string, such as 'Search_Column IN (' & ['Custom Column'] & "') ". Now the fun part about the this step is when I learned how to add "OR" keywords to bypass the 1000 argument IN Clause Oracle limit by phrasing it: "Search_Column IN ('this list') OR Search_Column IN ('that list')". I realize I’m playing this a little fast and loose, but hopefully I'm getting the point across without getting too stuck in the details 😅. In Python, it would be quite simple to zip two lists together to create ['this list','OR', 'that list', 'OR',...etc.], but how to do this in power query wasn't so readily clear to me. Then I learned about Table.RowCount() and that, combined with Split by Delimiter by Custom Character into Rows allowed me to create the desired interleaved string arguments and "OR"s. In some cases, the arguments to be in the IN Clause may be less than the limit defined by the List.Split(list_to_be_split,argument), resulting in only one row being created. So, my new custom column went something like
= if Table.RowCount(#'Previous Step") = 1, then "'Search_Column IN ('" & ['Custom Column'] & "') " else "'Search_Column IN ('" & ['Custom Column'] & "') " & "<|>" & "OR"
What this allowed me to do was add the "OR" keyword when there were multiple rows returned, which I could then split into rows by the custom character "<|>" =] ! With that done, I had one last item of business to attend to, which was the dangling "OR" statement that would inevitably result from this if more than one row returned. Using similar logic, I would choose to remove the last row, then modify it to be conditional, such that if the last step only had one row, then return that last step's table, else remove the bottom row. Success ^=^ ! Now, at this point, I could remove the extra columns and load it to a table, as previously mentioned, to be copied and pasted into a query tool. If I has access to a DSN and could query within Power Query directly, then my next step would be to add a custom column with a single value, such as 1, and do a group by on my query column against that new custom column, to create a single aggregated text string, a la a modified group by using Text.Combine() as Rick De Groot details in his blog:
Recommended by LinkedIn
Then in another query, I could submit a sample query against the DSN. Then, in the advanced editor, I could insert a step, define a parameter pointed to the previous "prep query" and specify the table column to use as that parameter's value. Then I could replace the sample query's IN Clause with the parameter, such as
Let Query_Parameter = #'SQL Prep Query'{0}['Table Column With SQL String'], Source = ODBC... WHERE (" & Query_Parameter & ")...continued SQL...etc.,
as similarly detailed, but for worksheet values:
Voilà, parameterized, custom SQL queries based on your data =] 🙌 ! Of course, this created another problem, haha, but more specifically for pivoting columns in Power Query. However, this is a shorter explanation, so hang in there for a little longer =D . As I was using this new feature, I was doing this over about a year's worth of data, so it is fair to say this was a diverse and possibly representative sample. My corresponding pivoted output had all the columns from the queried data it needed, yes!
However, the following week, when I attempted to refresh this, but only over a week's worth of recent data this time, Power Query threw an error because columns were missing. Turns out, unsurprisingly, that this smaller dataset didn't return all the data I had seen based on the previous year. Shocker, I know 😱! So, my question was: how do I keep all the columns needed, even if the data isn't there? Simple answer discovered: create an empty placeholder table with the column names and append the data to that. Two catches to this: unless one is working directly in the advanced editor, the placeholder table will be appended to the pivot table step via the GUI, which reveals that only matching columns in the first table will be appended. More on that in a minute 👆. This is easily fixed by reversing the arguments in the step such that data is appended to (more like filling out!) the placeholder table with the all the required columns. So, what if new data and new columns are returned? Well, under the current method, they would not be added since they aren't in the placeholder table 😰. That's no good (╯°□°)╯︵ ┻━┻ ! So, to avoid that, add a check after the pivot step, but before the append step which uses
= if List.Count(List.Difference(#'Previous Pivot Step with possibly new data',#'Placeholder Table query')) > 0 then null else #''Previous Pivot Step with possibly new data'
which will return null and throw an error if new data/columns are present, otherwise, carry on as usual.
I'm sure there's better ways to address these issues, but I hope you found this train of thoughts interesting. I'd love to hear your responses and experiences in the comments below =] . Thanks for your patience and interest in reading this ^=^ !