Tuesday, October 30, 2007

A Google Translate API?

I got all excited today when my favorite podcasters, the Java Posse announced in episode #148 that Google had released an API for their translate service.

I've had my own implementation going on for several months now, and I was all ready to post that I was going to discontinue my Google Translate Scraper. Who wants to use a scraper when you've got a real robust interface, right?

Then I looked at the code, and there is no way this is an official Google API, because it's nothing more than a two class scraper that's far more immature than my own implementation.

A couple of key differences between this one and mine. MIne has some basic test coverage, this one doesn't. My implementation will throw a TranslationException if you try to send more data than Google will process (you can't just send them an arbitrary amount of text). My implementation includes a real XML parser so if Google radically changes their HTML structure I can can still parse it cleanly.

So for now I'll keep my project going, hopefully one day Google really will release an API for their translate service.

Thursday, October 11, 2007

Return of the Coder

It feels good to get some coding done, even if it isn't a whole lot. I've had a long hiatus for the last two months and now I'm trying to pick things up where I left off. The reason for my absence?

Maxim Alexander Hudgins

Maxim Alexander Hudgins (7 lbs 10 oz) was born in Aug 20th, he's my first and I'm quite proud. In fact I hear him cooing in the background as I write this. I know what your thinking, but no, we did not name him after a men's magazine. His name is a very old Russian name and it's pronounced "mak seam".

Plenty of photos here : www.flickr.com/photos/jasonhudgins/tags/maxim/

Changing the subject, almost every java developer that I know uses Eclipse as their primary IDE. I listen to the javaposse podcast pretty religiously, and they have always said good things about NetBeans. So I decided to take a serious look at it.

The big thing for me is support for maven, I love it and I can't imagine not using it. With eclipse you have two options, you can either use the internal maven plugin (mvn eclipse:* commands), to generate the project files for eclipse or you can use the maven eclipse plugin. Bad things tend to happen if you try to use both at the same time. Of the two I prefer the maven internal plugin, I've only experienced grief with the eclipse plugin.

Getting maven to integrate cleanly with WTP in eclipse for developing web apps was a painful experience for me. For one thing, you can't use the newest version of eclipse, europa, because the maven internal plugin doesn't yet support WTP 2.0 yet Even using 3.2.2 it seems like I was always having to much around .classpath and try to get things to work.

There is also a maven plugin for NetBeans, it's called mavenide. It also comes from CodeHaus, the makers of the eclipse plugin. After installing the plugin and opening one of my maven projects, I was immediately aware that mavenide has a very tight integration with the NetBeans platform. It's features are very well organized, and I have yet to do any mucking with config files, everything is just working, and I'm very happy. The webapp support is really great. NetBeans 6 is looking good too, but the maven support wasn't quite up to par, so I'm still using 5.5 at the moment. I'm ready to say goodbye to eclipse for the time being, but I expect both apps to continue to improve so swearing fealty to either would be short sighted.

And the final bit of news, I've released version 0.9.8 of the GoogleTranslateScraper, it's available on my software page. I've cleaned up the unit tests and put a cap on the amount of text you can submit for a translation job (30000 characters max).

Somewhere around this point google stops translating. My first idea was to just split up the input into smaller chunks and submit multiple jobs concurrently. Sounds easy, but you can't just arbitrarily split the data, you have to try and do it on a sensible boundary, like the end of a sentence, otherwise your translation won't work very smoothly across the transitions. A character followed by some punctuation would do it for English, but I don't know how to do it with non-western languages, Chinese, etc. So I took the easy way out and just punted the problem up to the next layer in the application. So if your using my library and want to translate large globs of text, then it's your job to split the data into chunks in whatever way you feel is appropriate.