I was working on a post with some interesting demographics for this blog and came across the need for a weighted median, which is not a statistic the Alteryx Summarize tool provides. A weighted median is a subtle concept; in order to explain it, let me start with a real world, although simplistic, example.
Say you were a local school and wanting to find the distance that students walk to school. You might have a data set of households that looks something like this:
You could take an average of the distance, but that would be misleading. Some families don’t have kids at all. Some have multiple kids. What we really want is a median of how far each kid walks. The family with 6 kids pulls the median more in that direction than it would have if they only have 1 kid. The households with 0 kids have no impact and the result we want.
One way to do this in Alteryx is to use a Generate Rows tool in Alteryx to create a record for each kid (filtering out the 0 records) and then take a median. In this specific case, that is fine because there are only 16 kids. But imagine you were doing this for the whole country, you don’t want to magnify your data set that much! Also, instead of kids and school, you might want a weighted median on stock market returns weighted my market capital. Do you create a record for each dollar of market cap?
So, once again, how do we add a feature to Alteryx that doesn’t exist? We write a macro. Since we are looking for the value which 50% of the weighted records are above and 50% below, we start with a summarize tool to get the total weight. We can then use a Multi-Row Formula tool to create a running total and a filter to find the appropriate record – or pair of records if the median was in between records. After the filter, it’s a simple matter to use a Sample tool to get the last record which has the value to return to the user. Now I have the feature I need and I can continue working on next weeks post.
I hope that I am showing that when you find features you want in Alteryx, you don’t always have to ask Alteryx to add them in the next version. Many times it is a simple thing to write a macro to add that feature.
The macro and a simple module demonstrating it can be downloaded here. Like anything in this blog, it is unsupported by Alteryx, but feel free to add a comment or question here for me if you have any issues or questions.
Thanks for reading,
If you don’t yet have a copy of Alteryx yet, by all means go get one. Its free (for limited use)… http://www.alteryx.com/download
Pingback: Alteryx: Downloading from TIGER | Ned Harding
Pingback: Alteryx: Processing Groups of Records in a Macro | Ned Harding
Pingback: Month in Review: September | Ned Harding
Pingback: Alteryx: Big Data and Current Events | Ned Harding