Java IO seems pretty fast, assuming you properly buffer your input and output, but I keep wondering if C++ (or C for that matter) is much faster. I've seen conflicting benchmarks on both sides of that argument. What's clear is that at least all agree that Java is fast enough for general purposes. But to extract maximum performance, it may be necessary to go the native route... get down and dirty with gcc, as it were.
It's clear from experience (not mine, but other people's) that using a fast language does not make a fast program. More often than not, the theoretical performance of a low level language (like C) is lost in the complexity of the application. Because low level languages make programs harder to reason about, things don't get optimized to the point they could be. That's how you get the Ruby on Rails outperforming Java webapps phenomenon. Ruby the language is 50 or 100 times slower, but Rails the framework makes possible all the performance tuning you simply wouldn't have time for a Java app. The easier a program is to think about, the easier it is to optimize.
But what is simpler than reading a file and writing a file? The IO that my program is doing doesn't tax the brain. It must read in millions of points, compute some conversions, and write them out. In Out, IO!
Because I don't have to deal with a great deal of complexity, I feel the urge to use a low level language, just to see if I can extract more performance. It takes my Java version (after quite a bit of optimization using the low level NIO library) about 6 seconds to read a 15 megabyte file, compute the conversions, and write it all back out. That's pretty good, and possibly good enough, but I want to see what the limit is. I'm spending about 1.5 seconds reading, 1.5 calculating, and 3 seconds writing. I didn't use NIO for the output part of the program, and I imagine using it will bring me down to 4.5 seconds total. But I know the harddrive can read faster than 10/megabytes a second, so obviously there's some overhead that is slowing me down.
Part of the overhead is clear: each time I read a certain number of points, I stop to convert and write them to a file. So I keep repeating: Read, Convert, Write. What I should be doing is
Read Read Read
Convert Convert Convert
Write Write Write .... all in parallel. So I'm going to try executing the three pieces concurrently and seeing if that boosts performance. I think it will. Even if it does, it should be interesting to reimplement in C or C++, to see the absolute limit of performance.
Subscribe to:
Post Comments (Atom)

0 comments:
Post a Comment