Friday, March 23, 2007

Google Translate Engine 0.8.0 released

For some of my upcoming projects, I was hoping to find some publicly available translation web service. I managed to find one, but it seemed to be under fairly heavy load and unavailable at times.

Google has a very nice, simple web interface for performing text translations. Unfortunately they don't offer an API for it. Since I couldn't find a suitable web service, it made sense to do my own implementation by scraping google's site. Besides, I could always use more experience with web services.

Before I could do any of that, I had to write the core library. It's not a web service, but it's completely usable in a java application. I haven't had an opportunity to test it out very much, but I decided to share it anyway. You can find the Google Translate Engine v 0.8, here on my software page. It appears to work fine, but I've had a real hard time getting my workstation to properly output unicode on the console, so it definitly requires more testing.

I played with lots of new stuff while making this: maven, javadoc, SAX, unit testing to name a few. I'm using John Cowan's nice tagSoup library to web scrape. It allows me to use a SAX handler even on badly formed html.

Now I'm in the process of trying to figure out how to best implement an old school JSR-109 servlet driven JAX-RPC web service. My target platform at the moment is Jboss. As a pre-requisite exercise I'll probably make a tiny RMI service that hooks into the translate engine. Hopefully I can get that done in the next couple of days.

Monday, March 5, 2007

i18n'ing it up!

I plan on working on some multi-lingual websites, so what better encoding to use, than UTF8, right? You get dozens of languages and character sets all supported by a single encoding, like on this site.

So what's it take to get all this set up? Let's start with the most obvious thing you can do, like sticking this into your html header.

<meta equiv="Content-Type" content="text/html; charset=utf-8">
That's not nearly enough however, because most browsers will also check your http header, and if it doesn't agree with your meta tag, then the http header value takes precedence. This is fairly easy to correct. In JSP I can do it with a page directive :

<%@ page contentType="text/html; charset=utf-8" %>
What about sending/recieving form data?

Most browsers will send in the same encoding that the page is in, but for an added guarantee you can specify an encoding type in your form tag :

<form charset="utf-8" method="post" action="postArticle.do">
...
</form>
Things can get a little more tricky on the receiving end however. A servlet will check the character encoding, with request.getCharacterEncoding(), and if it's null, you won't get what your expecting. I'm not enough of a container expert to understand what's going on behind the scenes, but in my case it was necessary to tweak things to tell java to use utf8 encoding. I did this with a simple modification to my ActionForm bean.

public void setContent(String content) throws Exception {
this.content = new String(content.getBytes("8859_1"),"UTF8");
}
Granted, there are probably much better ways to do this (request filters, etc) and I'd be more than willing to listen to anyone else's expertise on the subject.

Friday, March 2, 2007

Good books, bad books

I thought I'd share my thoughts on 4 books I've read lately.

Head First HTML with CSS & XHTML - I wanted to teach my wife XHTML/CSS and this book was fantastic. Thorough and well organized, and funny. It's as about as interesting as you could make the subject. I highly recommend it.

PHP/MySQL Programming for the Absolute Beginner - Now I'm trying to teach my wife PHP development. From the title of this book, and several decent reviews, it seemed like a good choice. Boy was I wrong! I think most of the reviews written at amazon.com were by amateur coders who don't know any better. The author makes niave assumptions about your development environment, writes bad html, and doesn't use best practices. Some of the examples are so nasty that they made my head hurt trying to explain them to my wife. Avoid this book!

PHP and MySQL for Dynamic Web Sites: Visual QuickPro Guide - So back to amazon I went and this time I did a lot more research. This book is very highly recommended and quite a few of the book reviewers know what they are talking about. We've just started into it, but the examples use proper XHTML, are fairly concise and well organized. He doesn't just assume that your environment has auto register globals enabled (yuck) and covers important topics like magic quotes early on, that tend to confuse people.

Head First Servlets and JSP - I just finished this book myself and I'm quite pleased by it. It's narrative is along the same style as the other books in the series. It's funny (sometimes cynical) and informative. The Kung-fu movie captions are awesome. This is one of the few IT books that I've ever read in it's entirety. I highly recommend!

I've just ordered Maven: A Developer's Notebook. That will be my next book to tackle. Now that I'm doing J2EE development, I'm looking for tools that will take the sting out of packaging and deployment. Maven seems to handle a lot of those issues for you. I'm looking foward to learning more about it.