Radio Shack is on its way out. I was probably about 10 years old when my mom bought me a TRS-80 Model 1 (but level 2!) computer. Although I had previously learned a little programming on my school’s time shared access to a PDP-11, this was my 1st computer that was all mine. Continue reading
One of the really great strengths of Alteryx is that is can handle any amount of data that you throw at it. If your data is small enough, it might all be in memory, but when Alteryx gets more data than fits, it silently swaps out to disk. This way people are routinely processing data sets that are 2, 10 or even 100 times bigger than they have enough memory for!
Mostly the user never notices this aspect of the Alteryx engine and it just works. There are times though when we get feature requests that would be much easier to implement if all the data was in memory. One example of that is aggregate functions in the formula tool. Since other desktop products that are similarly easy to use, like Tableau and Excel, have simple SUM and AVG type functions in their formulas, it is assumed that Alteryx would too. Continue reading
One of the most common questions I get about Alteryx is: “How can I make my module run faster?” Although Alteryx can be very fast, since it is such a general tool, it is only as good as the module that you have authored. There is a very simple guideline that you can follow to make a module faster: do less work. The most common example of doing less work is to use the select tool as early as possible to remove fields that you are no longer using. In order to walk you through the process I use to make an Alteryx module run faster, I am going to walk through the process of optimizing my Percentile Macro to run as fast as possible.
Update – there is an updated version of this macro in the post: Alteryx: Optimizing Modules for Speed.
There was a recent question on the Alteryx forum: How to use the percentile in summarize. The question misunderstands the percentile function in the summarize and is looking for something slightly different, although with similar math. So what does the percentile in the Summarize tool do? From the help:
Percentile: Calculates the specified percentile value for the group. The percentile is calculated by sorting the data and returning the row value relative to the specified percentile and its position in the sorted array – the largest value is the 100th percentile, lowest value is the 0 percentile, median is the 50th percentile, the 25th percentile is the value in the middle of the median and minimum, etc.
The Alteryx engine is known for being fast. I would like to think that the processing engine is as fast or faster than any other data engine out there. I learned how to program on a computer with 16 KB of memory and a 2 MHz 8 bit CPU. Learning how to program in an environment like that forces you to learn how to count bits and clock cycles. Taking that mentality and applying it to a modern computer leads to very quick processing.
The difficulty though is that Alteryx is limited to the speed it can read & write data from where it is stored. Having a super fast data processing engine doesn’t help if you have to pull a terabyte of data from a data warehouse only to find a subset of it and produce a report. It doesn’t help to be fast if it is slow to get the needed data.
The Bob Cook Memorial/Mt Evans Hill Climb has long been a favorite race of mine. I am aspirationally a climber on a bike, which is to say that I am not able to compete with the people who are really good, but I love doing it anyway. I first did the race when I was 29 and did a pretty good time. When I turned 40, I decided that I would beat my time from 11 years before. It turned out that that was a slightly more difficult goal than I gave it credit for, but after 4 years of trying I finally did it and bettered that time by 2 minutes.
That was 2 years ago. This year in February I registered for the race with good intentions of getting fit and trying to do well again. Fast forward to a crazy busy summer without much time for training as well as not being very diligent in keeping the weight down and I started to think about not doing it. Instead I made the decision to go ride it just for the fun of it, and not worry about my time or anything. I am so glad I did. It is such a different experience riding that mountain without trying to go hard. My time was 36 minutes slower than last time, but at the top I felt relaxed and happy instead of ready to throw up. I had thought I might be done riding in the Mt Evans race, now I just think that I am done racing in it.
If you are a cyclist and have never ridden Mt Evans, I highly recommend it. Just don’t worry about trying to go fast, just slow down and enjoy it.
Sorry for the lack of posts the last few weeks – I have been busy few weeks heads down working on a futures project (code named LockIn.) I went as far as turning off email and IM to get some real focus. It was very productive – I haven’t produced that much code in a while.
Anyway, this week I have a very quick post answering a question from the Alteryx forums. The questions asks: How do I skip the last N records from a data stream. Skipping the 1st N is very easy, just use the sample tool, but it doesn’t have a mode to skip the last N. Continue reading
There is now an update to this post at: Alteryx: XSLX Wildcard inputs – read both to see how it comes together.
Wow, what a conference that was last week! I loved meeting all kinds of amazing customers in the Solutions Center and getting all kinds of product feedback, as well as being able to help people solve their problems. In particular, if you haven’t seen it, check out Adam’s Blog Macro Pack. He took a bunch of macros from this blog as well his and Chris Love’s and packaged them up with a cool installer so they show up in your tool palate.
Much of the feedback is already under consideration by product management and some of it has already been put on the development teams backlogs for Alteryx 9.1. There was one request in specific though that is actually much easier to implement as a macro then it would be as a native tool. The customer asks:
While I know that the input tool will accept a wildcard, it fails if the schemas are different. How do a read a set of files using a wildcard when the schemas don’t exactly match?
Before I start, let me say that I am looking forward to seeing lots of my readers next week at Inspire. You will most often find me in the Solutions Center. Please don’t hesitate to come ask questions, give suggestions or just chat. I love the opportunity to teach & learn.
Recently I got a question on our internal support board: How do I parse a file that is Ctrl-A delimited? Normally for reading delimited files you just read the file as a CSV and set the delimiter in the input settings and you are done. The problem with Ctrl-A is that it is a special (unprintable) character and it is not possible to set in the GUI. The normal backup for parsing issues like this – the Text To Columns tool – has the same limitation. Continue reading
When the Alteryx 9.0 release got near crunch time, I got busy and obviously stopped posting here. Once you get out of the habit, you forget to start back up. Anyway, the time has come to start up again.
I hope that I am going to see many of you in just over a week at Inspire. Inspire is many things to different people, but to me it is a chance to connect to our customers. I will be found in the Solutions Center for as much time as I can possible manage. I love talking to customers, helping your with issues, listening to product ideas, and just generally understanding how you work. Please do not hesitate to ask me anything – in the solution center, at meals, or any other time you see me.
Inspire really is an amazing opportunity for clients, prospects and us in development to connect. Many times what we learn there talking to people can shape the road map for months or even years to come. So if you are on the fence, it really is worth it, I promise. Again, I hope to see you there.