Wednesday, July 30, 2008

IO (revisited)

Alas, my attempt at multithreading my corpscon clone (to speed up its conversion of files from one coordinate system to another) ended in no appreciable speed increase or decrease, but rather a drop in efficiency: it now uses twice as many cpu cycles, hurray!

Saturday, July 26, 2008

IO

Java IO seems pretty fast, assuming you properly buffer your input and output, but I keep wondering if C++ (or C for that matter) is much faster. I've seen conflicting benchmarks on both sides of that argument. What's clear is that at least all agree that Java is fast enough for general purposes. But to extract maximum performance, it may be necessary to go the native route... get down and dirty with gcc, as it were.
It's clear from experience (not mine, but other people's) that using a fast language does not make a fast program. More often than not, the theoretical performance of a low level language (like C) is lost in the complexity of the application. Because low level languages make programs harder to reason about, things don't get optimized to the point they could be. That's how you get the Ruby on Rails outperforming Java webapps phenomenon. Ruby the language is 50 or 100 times slower, but Rails the framework makes possible all the performance tuning you simply wouldn't have time for a Java app. The easier a program is to think about, the easier it is to optimize.
But what is simpler than reading a file and writing a file? The IO that my program is doing doesn't tax the brain. It must read in millions of points, compute some conversions, and write them out. In Out, IO!
Because I don't have to deal with a great deal of complexity, I feel the urge to use a low level language, just to see if I can extract more performance. It takes my Java version (after quite a bit of optimization using the low level NIO library) about 6 seconds to read a 15 megabyte file, compute the conversions, and write it all back out. That's pretty good, and possibly good enough, but I want to see what the limit is. I'm spending about 1.5 seconds reading, 1.5 calculating, and 3 seconds writing. I didn't use NIO for the output part of the program, and I imagine using it will bring me down to 4.5 seconds total. But I know the harddrive can read faster than 10/megabytes a second, so obviously there's some overhead that is slowing me down.
Part of the overhead is clear: each time I read a certain number of points, I stop to convert and write them to a file. So I keep repeating: Read, Convert, Write. What I should be doing is
Read Read Read
Convert Convert Convert
Write Write Write .... all in parallel. So I'm going to try executing the three pieces concurrently and seeing if that boosts performance. I think it will. Even if it does, it should be interesting to reimplement in C or C++, to see the absolute limit of performance.

Tuesday, July 15, 2008

Spicy

Reading through my previous posts, as I can't help but do, it's apparent that I totally lack a fluid writing style. Other people blog so much better. But, maybe I'll get better... and because no one is reading this, it's just my problem anyway.
Now for an update: I've decided to join my local programming user groups and maybe start attending meetings, since I have nothing else to do. I'm not sure if hanging out at user groups is a proper substitute for work experience, but at least is should keep me up to do date with what's going on in the community. And certainly, we all know it's of utmost importance to keep up to date in the computer world (since things change so much).
In terms of personal projects, I've been working on a port of the Corpscon6 coordinate converter thingamabob from C to Java. Right now I just have a wrapper around their dll (which contains the algorithms), providing more features and a nicer GUI, plus all the Java goodness. On this project I learned about JNI, the MigLayout library for swing (very nice), and because I'm doing the GUI in Clojure, I've learned more about how to write effective macros.
Honestly, the ability to abstract away boilerplate repetitive code is extraordinarily helpful in writing a program; and that's especially obvious when you're trying to hack some GUI code, which is by nature (at least in Java/Swing) horrendously repetitive and gnarly. In lisp I can write a set of macros that lets me write my UI in a declaritive manner, and does all the repetitive gnarliness behind the scenes for me.

Sunday, July 6, 2008

Clojure

Everybody knows concurrent programming is the next big thing. Even the GTA guys running on the PS3 are realizing it, if belatedly (damn that cell processor!). Too bad really, that no body actually knows how to do concurrent programming. That is... without causing the machine to blow up.
What's wrong with the locking model? you might ask. Well... as I understand it, the locking model suffers from a lack of composeability; whatever that means. What's ultimately the problem is that the locking model is hard to think about; it's hard to say for certain when a lock is going to be held, and to determine if your operations are ever going to interleave and cause inconsistent state.
The alternatives to the locking model, the actor model and the transactional memory model, are definitely worth looking at. While I was screwing around with the Scala programming language, I made use of their nice Actor library to do a bit of multithreading, and I kindof like it. The only problem with it is the asynchronousness of actors (which is an inherent aspect of actors), which can lead (especially in a language without first class functions) to some really ugly callback style spaghetti, imho. I actually tried to implement a program with asynchronous io purely in java at one point, and it was soooo ugly I just had to stop. Don't be fooled, anonymous inner classes are no substitute for first class functions.
Obviously the inner class ugliness is not an issue with scala, which has great support for first class functions, but nevertheless having to think about a program, and design a program according to an asynchronous model is (I think) more of a challenge than the more traditional sequential imperative crap: this happens, then that happens, then that happens, etcetera.
Software Transactional Memory has been called the silver bullet (I heard it on java posse!), because at face value it appears to solve all the concurrency problems by making it (almost) transparant to the programmer. STM is nestable, and it appears chipmakers are starting or planning to build support for it directly in the hardware. But... for whatever reason, there are some issues that it may bring up, which I do not understand enough to talk cogently about.
Clojure, a strongly functional dialect of lisp, has a built in STM facility, and I really look forward to trying it out on a multithreaded program. Truthfully I have tried it out a little, but I without using it a larger concurrent program I can't say how great it is. Rich Hickey, the creator of Clojure, has this ant simulation that's pretty cool: each ant has a thread, and they all coexist on in this field of food and pheromones--the STM, concisely applied, keeps them all consistent.
I really like Clojure, mostly because it's a lisp dialect on the JVM, and I'm really warming up to the power of lisp. I just got finished doing some GUI programming and it is totally astonishing how much having lisp macros improves the process of GUI coding: which for me, is normally the worst possible kind of coding I would never want to voluntarily engage in. Swing especially, in Java, tends to be soooooo repetative and boilerplate, lacking all the declaritive goodness and conciseness found in html and css or even something like JSF, I really have trouble keeping from banging my head against the wall. But, with lisp, I can write macros that let me write my widgets in a declarative style, and do all the nasty gnarly conversions on the backend for me. This is really the extraordinary power of DSLs and metaprogramming made easy. This is why people love Lisp, because it can take a tedious boilerplate coding task, and strip away all the useless crap, leaving just the ideas.