Inspiring Ingenuity

Alteryx, Bicycles and Teaching Kids Programming.

Alteryx: Processing Groups of Records in a Macro

4 Comments

I have been working on a blog post that is going to integrate a variety of different aspects of Alteryx together to paint a larger story.  As I have been building towards it, I have run into a variety of challenges that have grown into blog posts in their own right.  It started with Weighted Medians and continued on with Downloading from TIGER.  Today I ran into a very common problem – I needed to process records in groups.  The solution I outline is extensible to any macro; it should be a useful technique for any data artisan.

I wanted to change the Weighted Median macro in order to make it work with groups of records.  Instead of a single weighted median value, I want to return a separate value per group.  Sometimes it is possible to add a List control to a macro and have custom actions and make it all work, but it can be challenging.  If you look at the weighted average macro that is built in to Alteryx, you will see it working this way, but if you look a little deeper, you will see how challenging it is.  In that case it required an Update Raw XML action to do it.

In the case of the weighted median, the algorithm is complicated enough that it would be very difficult.  Fortunately Alteryx has a mechanism  to break any input into groups while it is processing – it is called a batch macro.  A batch macro does 2 different things that are fairly different from each other.  The first is the split a stream of records into groups and process them one group at a time.  The second is to reconfigure the macro at runtime based on data coming in instead of configuration time information.  In this case, we only need the former.

imageWe can accomplish our task very quickly just by converting the macro to a batch macro.  The only issue with this is that consuming it now becomes a little more difficult.  Turning it into a batch macro automatically creates a new second input labeled with a ¿ for the control parameters.  In this case we have no control parameters, but that input is required.  That input is what controls how many groups are going to be run.  We can quickly modify our module that consumes this macro to group by (Summarize tool) the records on the grouping field and pass that on to the control input.  That gives one record per group.  It is also important to go to the Group By tab on the properties of the macro and pick the grouping field.

As long as you are not consuming this macro in many places, this is a fine solution, but if you are giving this macro to other users it makes it difficult to consume.  One of the luminaries in the world of computer science, David Wheeler, once said “All problems in computer science can be solved by another level of indirection.”  This is exactly true in this case, we just need another level of macro. The answer is to create a new macro that presents the simple user interface and deals with the complexity of consuming a batch macro.

I started by saving 2 copies of the original macro, called WeightedMedian_Batch and WeightedMedianInner.  The inner one I converted to a batch macro, and I also removed the questions and actions related to the field names.  The only question left is the percentile.  On the outer one, I removed all the tools that did all the work, but I kept the initial select tool.  I also needed to keep the numeric up/down question for the percentile as well as creating an action to pass the value on to the inner macro.   I added a new dropdown for Grouping field.  In this case I check the box for “Include [None]”.  I created a new action using the same technique as the older macro renaming the input field in a formula tool.

As an exercise for the reader: I allow the user to not select a grouping field (via “Include [None]”) but I am still using the same batch macro.  How does it work?

The modules can be downloaded here.

Thanks for reading,

ned.

Advertisements

4 thoughts on “Alteryx: Processing Groups of Records in a Macro

  1. Response to exercise:

    By selecting “None” for the grouping field you’re telling the macro that each value within each field (i.e., NumberOfKids, MilesToNearestSchool, etc.) represents its’ own group. Without giving it all away, the first half of the Update/Change Value expression for the Grouping field -more specifically, this part: “”” -is what lead me to that conclusion.

    Am I close?

    • Correction:
      …each value within each field (i.e., NumberOfKids, MilesToNearestSchool, etc.) are all part of the same group.

  2. Pingback: Month in Review: September | Ned Harding

  3. This post is getting lots of spam, so I closed comments.