Limited rollout - OCR in 24 languages

| No Comments | No TrackBacks
A new version of OCR Terminal was launched today. We turned on a new feature for a small group of people to get some feedback and to closely monitor our performance. Recognition in languages other than English is one of our most-requested features and we are glad to have gotten this done.

The languages we now support are: Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Norwegian, Polish, Romanian, Slovak, Slovenian, Spanish and Swedish.

We even added recognition in a few programming languages - Basic, C/C++. COBOL, Fortran and Java.

This will be made available to everyone in a couple of weeks. If you currently have access to this feature, let us know what you think.


Well ok, we know we aren't the first. It seems 3 days are what each of the Microsoft BizSpark companies get when they are featured on the Startup of the Day section. Still, it makes for a cool headline.

And besides, we must've done something right to get up there. Wanna find out what that is? Here's the link - http://www.microsoftstartupzone.com/Blogs/Microspark-BizSpark-Startup-of-the-Day/Lists/Posts/Post.aspx?ID=104

Demo - OCR Terminal - the online OCR service

| 2 Comments | No TrackBacks

OCR Terminal is an online OCR service that allows you to convert scanned images to searchable text. You can convert PDF to Word online, JPEG to Word and many other image formats (PNG, TIFF, GIF) to editable document formats such as TXT, RTF and PDF. See how it works by watching the "OCR Terminal - the online OCR service Demo video"!


Transcript of this video:
"OCR Terminal" is an online OCR service, which allows you to convert scanned images or PDF files into editable, searchable documents. Printed text in your image is carefully and automatically converted into text, while formatting, page layout and unreadable text remain in your image. Converting an image file is easy all it involves is two steps: Upload and process.

To begin, click on "Create New User". The process is extremely simple and we will have u set up in minutes. You can now enter our website using your username and password. All users get 20 free pages every month and you are able to check on the number of available pages on the right hand panel of the dashboard. I am going to select a tiff image and then click "upload" to begin.

Once your file has finished uploading, make sure you have uploaded the correct file by looking at the different pages of a multi-paged document or zooming the image. The document I uploaded contains printed text, embedded images and some tabular data. Click on the "Yes" button to start the OCR process.

Your document is now ready for download in a variety for formats. I am going to have it downloaded as a Word document first. The printed text is accurately recognized and you can begin to edit the text immediately. Notice that the format, two-column page layout, figures, tables and font have been preserved. It is also possible to download the results as a PDF document. The PDF document incorporates the actual image itself, but now also includes searchable text and you can search through the large PDFs for exactly the word or phrase you're looking for with a simple Ctrl-F. Remember you do have the document file format options of .txt or .rtf too.

So how does one get started?

You can check out the FAQ page for answers to the other questions that you might still have about the site. Or you can try it out today, with twenty free pages every month. OCR Terminal provides you with the power of a professional OCR engine right in your browser. So try it right now by creating an OCR Terminal account!

The OCR Terminal blog is back

| No Comments | No TrackBacks
In August, the American Congress goes on a recess, and Europe goes on a holiday. Our blog also went into a slumber, but unlike policymakers or les Européens, this was no I'm-off-to-Hawaii break.

We were hard at work, constantly trying to make the internet's leading online-OCR service even better, faster, and more stable. And as we celebrate the re-birth of our blog (we've now moved it to Movable Type on Amazon Web Services), there's more that we can pop the champagne corks for -

1. New office: Our new office is a little smaller, but brings us something we've never had before - a window! And one that looks out to a beautiful little hill of trees with all the green we need to keep our mental gears running throughout the day.

2. New version of the Desktop Client: The one-of-its-kind productivity tool is even better. Many bugs are gone, and new features are in. So if you aren't already using it yet, ask us for a copy now. It's free!

3. New features: We're almost finished adding a new DropBox to our service, which means no more limits on input file sizes, as well as recognition support for a number of new languages. This feature is due anytime now.

And if you have anything you'd like to share with us or complain about, then we'd like to hear from you - click here to fill our survey. It'll help us improve OCR Terminal further.

And as for the blog, August is over, and so it is back on its feet. We should be updating the blog much more regularly with product updates, interesting links and videos that we have made about how OCR works, and so on.

Stay tuned...

Faster and more Stable OCR Terminal

| 4 Comments | No TrackBacks

OCR Terminal has changed and version 2.6 was launched July 08 2009. And the difference?

Well, let me first share what you Can't visibly spot from the surface:

  • Redesigned server design and architecture
  • Resolved bugs known to slow down our systems

This means you can now expect a performance upgrade with faster processing time and a more stable and secure OCR conversion!

For what you Can spot, the differences lie in what you are seeing and will experience

  • New interface and content
  • Improved workflow that allows users to OCR of documents more than 20 pages directly from their Dashboard.
  • Desktop Client: You can now upload multiple files all-at-once to OCR Terminal from your computer's desktop directly with this downloadable application. The Desktop Client is now in Public Beta and users can contact us to try it out.
  • More competitive and affordable price scheme for users with a frequent and high OCR usage per month
  • See-how-it-works: Video demos that will guide users to use the improved OCR Terminal
  • What's New?: This will give you the latest and most recent updates and happenings on OCR Terminal

This new release is about the Accurate, Fast and Simple experience. Did you spot the differences? Tell us about it.

Going Paperless ?

| 2 Comments | No TrackBacks

The Economist did a piece last month on paperless offices, or rather about one that decided to do away with its paper for good and is "now about halfway there."

Hopefully they follow this story until it ends, for I sure would like to be there when Breedlove & Associates crosses the finish line - if only to give them a little hand-written 'Congratulations' note with the hope that they'll respect my regards enough to not throw away that piece of paper immediately.

The author cites plenty of examples of people shinning paper for woodfree technology - Demand for office paper is now declining unlike a decade ago, the world's largest paper maker has closed 5 factories in America in this period, and students now take their laptops instead of printed notes to their lectures.

These are all major developments that bring efficiency and great promise for an easier life, but they've got less to do with saying au revoir to paper forever, and more about technology now pervading territories that were formerly untouched.

Emails, e-tickets, e-returns: these are all important trends, but they are probably asymptotic. For there are things we still do where paper won't budge, at least not just yet - Google recipes are good, but I still wouldn't bring my laptop to my kitchen; keyboards and pasta sauce don't taste very good.

Economist claims that "information thus appears to be becoming paperless roughly as transport has become horseless." The analogy is certainly a telling reflection of where we're headed. In our homes and offices however, I still think we'll be riding these horses for some years to come.

Free OCR alternatives

| No Comments | No TrackBacks

OCR can be useful in unexpected situations. When I was in University, old examination papers had been made available online as PDFs. These had been scanned and converted to the pdf format and were not text-searchable. I managed to convert them to searchable pdf using ABBYY - however, people who just want a few pages/ images OCRed to avoid typing might find commercial OCR software expensive.

There's Tesseract, the free OCR engine that Google is currently developing. We have built some applications using Tesseract as the back-end and have found that the accuracy of text recognition is not comparable to other commercially available solutions. Tesseract does not have a user interface making it an unattractive choice for someone who just wants a couple of pages OCRed.

Two other open source OCR software that we haven't used in-house are OCRAD and GOCR. Both these are command line programs requiring a separate front end installation for users who are not programmers. Their accuracy in character recognition has been reported to be lower than that of Tesseract.

Microsoft Office comes pre-packaged with Microsoft Office Document Imaging that is included in Office tools. The instructions of how to scan and digitize your documents using Office have been documented here. Though the OCR works reasonably well on screen shots, it fails to preserve page layout information. A document with two columns (e.g. scientific papers) produced an OCR output where every horizontal row was treated as a line. This effectively meant that the resulting block of text made very little sense semantically.

OCR Terminal was built to provide the user a free and easy way of performing OCR without having to install new software or having to pay for it. Automatic layout extraction and segmentation of image areas ensure that the page layout and formatting of the document is preserved accurately. We are currently working on a desktop client that will allow users to upload the document by dragging and dropping all the tiffs and pdfs into the client and get back perfectly formatted Word files.

And, OCR Terminal is free.

The end of typing?

| 2 Comments | No TrackBacks

After decades of research in OCR technologies, machines today are much better today at reading text than their ancestors were. A 100% accuracy rate on some printed documents is not surprising today. Yet, there isn't one software that claims to be completely accurate (including us!), even within a defined data set, say clean white sheets with black 12pt printed text.

This problem has as much to do with technology as it has with liability, especially in error-sensitive settings. A secretary that makes an error in typing a document can be reprimanded or even fired. Shouting at a computer however is never a solution.

So if we may be so adventurous so as to venture a prediction, here's one: No matter how good OCR gets, it will never replace manual systems until legal jurisdictions expand to hold machines liable for the errors they make. Until that happens, no matter how intelligent we make our algorithms, we'll never be able to promise you can trust them blindly.

Of course that's no excuse - OCR Terminal remains the easiest and the most accurate OCR solution on the web, and we'll make sure it always does.