Unix sort Performance

November 5, 2008

Python’s list.sort() method was much faster than Unix sort on a 100 MB file containing ascii text. This didn’t make any sense, so I asked my office mate (who happens to be a performance guru) what was going on. He pointed out that sort was probably tied up in iconv, and suggested exporting LC_ALL=C into the environment so sort wouldn’t need to worry about non-ascii text. This increased sort’s performance by 20x, and should let me release a pretty cool feature on Instant Domain Search soon!

Archives