Friday, December 16, 2005

Inflammatory Ruby Questions

I have some honest questions for Ruby experts to help me understand where I am on this Ruby debate. Not that any Ruby experts, or anyone reads this blog, but it makes me feel good, ok?
I honestly want to understand why and how Ruby is going to be a revolution. I absolutely love Java, but I am deathly afraid I will my love for it will blind me from seeing the next big thing.

I'd like to seperate Ruby and Ruby on Rails for clarification.

Ruby:
What is it about this language that seperates it from the dynamically-typed crowd? Does it hit the sweet spot between loose typing and "true" OO features? Is it fair to Ruby a great dynamically typed language, and short circuit this half of the discussion as the ancient "staticly versus dynamically typed languages" debate? My point is I dont see much of a difference between "Ruby v. Java" and "Static v. Dynamic typing" debates. Is there something novel Ruby adds to that debate? Perhaps Ruby does for dynamically typed languages what Java did for statically typed languages?

Ruby On Rails:
What does Ruby on Rails provide that other language web frameworks cannot? I'm sure "convention over configuration" is a fantastic innovation for small, simple webapps, but what precludes another framework from mimicking this, such as Trails? I understand Ruby ensusiasts' frustration w/ "XML hell", but that's not a function of Java in the least, but in the frameworks they choose.

If these were both true statements:
"Ruby is the best dynamically typed language available"
"Ruby on Rails is an innovative web framework for writing simple webapps with very little work"
I see plenty of room left for Java.

Tuesday, December 13, 2005

Simple logging improvement

No matter how clever you are about logging, you will run in to problems in production where you need more information. Often times, I'll resort to access logs to try to understand more. I recently made two changes to the default access log format in Tomcat which have drastically increased the utility of these logs. I added session ID (%S) and "Time taken to process the request" (%D). If you take care to log session ID in your normal logs, you'll have a nice "foreign key" into the access logs.

Monday, December 12, 2005

Tiger Beat!

1. Enhanced for loop is pure coding joy, especially in nested loops, but you will be suprised at how often you can't use it: you need the index, special exit condition (I'm not a "break" man), or you have an iterator.

2. Generics will improve your API dramatically. The rap on them is that creating your own generic classes can get messy, and that's true. But it's not _that_ bad, and 99% of the time you are simply using generic classes, which is completely simple.

Before generics, we often deliberated about what type to return when a method produced a group of objects (Not a group of "expensive" objects, like objects read from a database - they pretty much necessitate an Iterator). Arrays were succinct and type-safe. Collections were what the method often used internally and were more flexible, but lacked type safety. For us, in a big system, readability won out, and we chose arrays. Overall, I was happy with the choice, but it didn't sit well how often we'd convert a List to an array at the end of the method. Generics resolved this completely. Now all methods return "generified" collections objects, and we get the best of both worlds.

Map.get/containsKey/remove and Set.contains are NOT generic. Good discussion:
http://forum.java.sun.com/thread.jspa?threadID=465357&messageID=2139377

It takes a while to really get used to the fact that generics info is not available at runtime. Once you understand that, you'll find yourself more a user of generics classes than an author of generics classes.

3. java.util.concurrent - Great libraries change how you think about problems. java.util.concurrent is no exception. Simple concurrency problems are now actually simple to solve. I'm able to express amazingly powerful concurrent problems in simple and safe code. I don't know how I lived before ConcurrentHashMap, Executor, Callable, and Future. I now look forward, rather than dread, the next concurrency programming problem.

4. @Override - Love it. Even though I haven't programmed Perl in 8 years, I still get a thrill when the compiler points out a problem I never would have noticed. Errors that @Override solve are pretty rare, but they've saved me once, and that's enough for the "Love it" designation. While I understand it would be trickier, an @Implements would be cool to. @SuppressWarning allows to turn certain compiler warnings on which I avoided in the past, because I didn't want warnings for the handful of "special cases" of a warning.

5. String.contains - Can you believe it? I feel like a Programmer of the Future every time I use this baby.

6. OutOfMemory actually TELLS you what kind of memory we ran out of! Before 1.5, you had to figure out what kind: heap? perm heap? thread stack? They were clearly emboldened by the rejoicing of this fix, and tackled an even MORE amazing problem in Mustang: ClassCastException will tell you what was casted!

7. Thread.getStackTrace/getAllStackTraces. If you run into race condition/locking problems, and don't have the luxury of reproducing in your IDE, a JSP that displays Thread.getAllStackTraces is pretty useful.

8. enums - Typesafe enums are essential for clarity in big APIs, and I hope they get more attention now that they're first class in Java. If you are leary of Generics introducing clutter to API, you won't like this:
public abstract class Enum>
I tried writing a utility method for enums, and it actually set my brain on fire.

Eclipse has a neat compiler option - to warn/error if a switch statement doesn't cover every member of the enum. This is a wonderful tool, as you often want to take an action based on the value of an enum, and it's imperative to cover every possible value of the enum. However, I haven't used switches or breaks since I was a C programmer. I was torn between the joy of a new compiler warning, and the mess that is "switch". I tried and tried, and eventually just had to drop the switch. Its freaky scoping and flow was just too much to swallow.

9. Monitoring - Anybody who's had problems in production will love JConsole and java.lang.management.ManagementFactory. You'll also love System.nanoTime if you've ever tried timing operations with System.currentTimeMills. currentTimeMillis could not measure down to the millisecond, so any "fast" operation would result in times of 0, 16 or 32 milliseconds. Not a problem with nanos!

I'm sure I'm forgetting quite a bit, but in short, I'm in love. Tiger represents a good balance between Sun's traditional conservatism and the pressure they face from their emerging competition. I hope they maintain their discipline as well as their fear.

Sunday, October 23, 2005

i18n zip file woes

Update: This is being addressed in JDK7. It will still be up to your to figure out what encoding your zip file is in - no small feat if you've got zips coming in from various sources/locales.

Our app allows customers to upload zip files of content. Recently, customers in Brazil were having problems with a file they were uploading. We'd get this exception when trying to unzip it:

java.lang.IllegalArgumentException
at java.util.zip.ZipInputStream.getUTF8String (ZipInputStream.java:291)
at java.util.zip.ZipInputStream.readLOC (ZipInputStream.java:230)
at java.util.zip.ZipInputStream.getNextEntry (ZipInputStream.java:75)

Inspecting the file, I found they were including Portuguese characters in the file name. I was pretty suprised to find this issue was related to not one, but two of Java's Top 25 Bugs. The Sun engineers point out the unfortunate fact that the Zip spec doesn't say anything about encoding of file names. The only thing Java could do better is allow people to pass in their own encoding when instantiating a ZipFile. A commentor even provides a patch for ZipInputStream to allow this. I found this solution didn't even cut it for me. The customers were using some German zip program (a truly international problem!), and I couldn't get a clean unzip with any encoding. When I discoved both WinZip and Windows XP "compressed folders" feature were baffled as well, I threw in the towel. People waiting for Sun to fix this are probably in for a nasty suprise.


For the Brazillian customers, I had them re-create the zip using a nice little graphical jar utility - ZipOutputStream uses UTF-8, which will have no trouble with their i18n file names. Another good alternative would be 7-zip, which considered i18n issues from the start. But, suprisingly, their Java support seems pretty lacking.

For the grim details of this problem, I can't say it better than the WinZip tech support person:


>There is unfortunately some ambiguity within Zip files as to the
>character set that the filename is stored in. Whenever a file can be
>stored in the OEM character set (which is the character set originally
>used by MS-DOS), WinZip does so for compatibility with other Zip
>utilities. In this case, it marks the Zip file as made by MS-DOS, since
>the original Zip utilities were DOS utilities that used the OEM
>character set for filenames.
>
>For files whose names can't be stored in OEM, because they contain
>characters not present in the OEM character set, WinZip stores the
>filename in ANSI and marks the file as having been made on an
>NTFS-based system. WinZip is here following the lead of InfoZip, a Zip
>utility that since 1993 used this method of marking filenames as being
>stored in the ANSI character set. (ANSI is the character set used by
>Windows 95, 98, and Me, and by many Windows applications. Windows 2000
>and Windows XP use yet another character set, Unicode, but they also
>provide support for the ANSI character set.)
>
>However, some other Zip utilities handle this differently. In
>particular, Microsoft's Compressed Folders program always marks the
>files it creates as having been made on an NTFS-based system, but
>stores the filename in the OEM character set.
>
>So when WinZip sees a file marked as NTFS-based, it has no way to be
>absolutely sure whether the filename is in the OEM character set (and
>needs to be translated to ANSI), or if it is already in the ANSI
>character set. WinZip uses some complicated heuristics to try to decide
>which character set is involved, and it almost always comes up with the
>"right" answer, but there is no way to be absolutely right in every
>case.
>
>We are looking into how to handle this better in a future version of
>WinZip.



Saturday, October 22, 2005

Crystal ate my log4j!

When solving a problem on a live site, logs are indespesnable. So imagine my frustration last spring when I went to solve a problem, and no logs were there. None. Everything had just stopped logging. Solving a problem with access logs alone is not an enviable task. I searched around a bit, and found this

I was indeed using the log4j WatchDog to periodically watch log4j.xml for changes. And we do hot restarts of the context from time to time. I wasn't convinced this was the problem, as I could not reproduce the problem no matter what I did with the WatchDog and hot restarts. So I set the system property "log4j.debug" to "true", in the hopes that next time it happened, I'd get some kind of info. I also wrote code that ensured any time a Logger object is acquired, a System.out Console appender was attached to it, at the very least.

The logs continued to disappear, but thankfully output went to stdout, and was captured. Then, log4j.debug paid off. In the middle of a log, I saw this:
log4j: Reading configuration from URL jar:file:/E:/webapps/app/WEB-INF/lib/celib.jar!/META-INF/CrystalEnterprise.Trace/basic.properties
log4j: Parsing for [root] with value=[ERROR, A1].
log4j: Level token is [ERROR].
log4j: Category root set to ERROR
Whoah! Sure enough, all of my categories and appenders were blown away at this point. It didn't happen at startup, but whenever the first Crystal report was launched. After much consternation and trying to decipher the log4j comment in the release notes, my coworker came up with setting the system property "crystal.enterprise.trace" (yes, not even mentioned in the release notes) to "false". Now we can launch Crystal reports AND use log4j - oh, the decadence!

I'm no log4j expert, so I wonder how Crystal should have played more nicely here. Surely, there's a away to use PropertyConfigurator to amend logging configuration, rather then obliterate it.

Lessons learned:
1. Abstract your code such that you can intercept acquisition of Logger objects. Use this code to ensure that a Logger is at least guaranteed to have a Console appender, and escalate the problem if it indeed has zero appenders.
2. Set log4j.debug to true at all times
3. When using crystal, be sure to set crystal.enterprise.trace to true (unless, of course, you care about their logs more than your own).





Microsoft to buy Redhat?

I've finally gotten over my fear that I'd create a totally useless, misguided, and confused blog. I've gotten over it - I now know this blog will be be useless, misguided, and confused. My only three goals are:
  • Document weird problems I've encountered in the hopes that I can help someone avoid my frustration
  • Have inflamatory titles to attract people to read my blog, and respond to it.
  • Learn from responses to my posts
  • Vent.
I'm a Java programmer who has been working on roughly the same product since about 1998. That being said, I won't have many smart things to say about the hot topics of the day: AJAX, RoR, or Web 2.0. I will hopefully be able to share interesting experience on more mundane topics, like JDBC, i18n, webapp performance, maintainable code, and other such things.