Tuesday, July 26, 2011

Introducing Transpose 0.1

Back when I was a much bigger Linux geek than I am now, I worked with large data sets for training classifiers. Back then there were two tools that I really wanted, either of which would have made my job a lot simpler. The first was a simple command that would take a file and shuffle the lines. The second is a simple command that would take a delimited text file and transpose the data.

Now that I'm getting back into Linux and Open Source geekdom, I have discovered that the first tool indeed exists in the form of the shuf command. The second tool did not, so I took the opportunity to open my IDLE editor and write one. The result is a Python script, transpose.py. You can find the source code at Github.

If you look through the source code for this script, you might see that I use arrays quite liberally instead of string logic. This is because it's much faster and much friendlier on memory to store strings in arrays and concatenate them together with .join than to use the plus operator (or so I'm told); more here. This was also impressed upon us in Raymond Hettinger's class on Advanced Python at OSCON 2011.

I encourage you to use this script and to tell me what you think, whether it sucks or rocks, and what you need me to fix or would like for me to change about it. I'm not kidding. I can't make this code work better, and grow as a programmer, without your feedback.