Inspiring Ingenuity

Alteryx, Bicycles and Teaching Kids Programming.

Alteryx: Open Source YXDB

A few years back, we mentioned a open source YXDB reader/writer on LinkedIn.  After that, a whole lot of nothing.  It turns out that Alteryx did release the open source YXDB code, but it was so stealth that no one noticed.  This code is used inside of an R plugin, which had to be GPL’s because of R’s licence.  But since it was never published as a way to read/write YXDBs, no one noticed.

The thread on LinkedIn was recently revived, so I decided it was time to expose it to a bigger audience.

Continue reading

Alteryx: XSLX Wildcard inputs

 A few people have been using the macro I wrote about in Alteryx: Wildcard Inputs, but have an issue with XLSX files.  The first thing to remember is that these macros I post (on my personal blog) are examples only and are not a supported part of the product.  I am happy to give people advice on how they might take what I did and extend it.  However, in this case, I thought it might make a good post about Alteryx macros with optional parameters, so I went ahead and did it anyway. Continue reading

Alteryx: Aggregate Formulas

Aggregate Formula SampleOne of the really great strengths of Alteryx is that is can handle any amount of data that you throw at it.  If your data is small enough, it might all be in memory, but when Alteryx gets more data than fits, it silently swaps out to disk.  This way people are routinely processing data sets that are 2, 10 or even 100 times bigger than they have enough memory for!

Mostly the user never notices this aspect of the Alteryx engine and it just works.  There are times though when we get feature requests that would be much easier to implement if all the data was in memory.  One example of that is aggregate functions in the formula tool.  Since other desktop products that are similarly easy to use, like Tableau and Excel, have simple SUM and AVG type functions in their formulas, it is assumed that Alteryx would too. Continue reading

1 Comment

Alteryx: Optimizing Modules for Speed

One of the most common questions I get about Alteryx is: “How can I make my module run faster?”  Although Alteryx can be very fast, since it is such a general tool, it is only as good as the module that you have authored.  There is a very simple guideline that you can follow to make a module faster:  do less work.  The most common example of doing less work is to use the select tool as early as possible to remove fields that you are no longer using.  In order to walk you through the process I use to make an Alteryx module run faster, I am going to walk through the process of optimizing my Percentile Macro to run as fast as possible.

Continue reading

1 Comment

Alteryx: Percentile Macro

Update – there is an updated version of this macro in the post:  Alteryx: Optimizing Modules for Speed.

There was a recent question on the Alteryx forum: How to use the percentile in summarize.  The question misunderstands the percentile function in the summarize and is looking for something slightly different, although with similar math.  So what does the percentile in the Summarize tool do?  From the help:

Percentile: Calculates the specified percentile value for the group. The percentile is calculated by sorting the data and returning the row value relative to the specified percentile and its position in the sorted array – the largest value is the 100th percentile, lowest value is the 0 percentile, median is the 50th percentile, the 25th percentile is the value in the middle of the median and minimum, etc.

Continue reading


Alteryx: In Database Processing (Project LockIn)

The Alteryx engine is known for being fast.  I would like to think that the processing engine is as fast or faster than any other data engine out there.  I learned how to program on a computer with 16 KB of memory and a 2 MHz 8 bit CPU.  Learning how to program in an environment like that forces you to learn how to count bits and clock cycles.  Taking that mentality and applying it to a modern computer leads to very quick processing.

The difficulty though is that Alteryx is limited to the speed it can read & write data from where it is stored.  Having  a super fast data processing engine doesn’t help if you have to pull a terabyte of data from a data warehouse only to find a subset of it and produce a report.  It doesn’t help to be fast if it is slow to get the needed data.

Continue reading