The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 21, 2019, 05:55:06 AM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: training gocr  (Read 5066 times)
arms
Expert
****
Offline Offline

Posts: 235



View Profile
« on: May 12, 2009, 08:13:47 AM »

anyone familiar with gocr?

i'm using gocr to create a database because it is recognizing some characters wrong. it isn't that it's having trouble recognizing some characters but it's recognizing some characters wrong with 100% certainty. for example it is 100% certain "k" is "h".  so to train it i am forced to use this comand :
Code:
gocr -C a-zA-Z0-9 -m 162 -a 100 imagename.pnm
which makes it prompt me for all characters in the image (i don't think the -a 100 arg is doing anything because i am telling it to not use any recognition algorithm but check the database.)

so this works, but i would like to only do this for images that contain characters it guesses wrong with 100% certainty. the problem is that it seem i can either tell it to check the databse only or check if it's not x% certain. i would like to tell it to check the database first, then guess.

anyone have an idea to get this to work?
maybe i will have to modify the source or else just enter new characters manually until i go through all possible characters?

 
Logged
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #1 on: May 12, 2009, 08:32:16 AM »

I'll be watching this thread intently.

Sorry I don't have anything to add but it's been a while since I tooled with GOCR.
Logged

hai
kurdt
Lifer
*****
Offline Offline

Posts: 1153


paha arkkitehti


View Profile
« Reply #2 on: May 12, 2009, 09:45:09 AM »

I'm going to play with GOCR in the next few weeks again so if nobody steps up befire that, I'll keep this in mind and try to remember to give you some pointers...
Logged

I met god and he had nothing to say to me.
arms
Expert
****
Offline Offline

Posts: 235



View Profile
« Reply #3 on: May 15, 2009, 12:36:20 PM »

i tried the latest version (0.47) from http://jocr.sourceforge.net/download.html and it works much better. ubuntu 8.10 has 0.45 in the repos.
i don't know if it's recognition algo has improved or the way it uses the db, but as i train it, it prompts less and less for characters and gets less wrong from the outset.

so for reference, to train it i procure an image Wink , clean it up and from the working directory that the image is in i use the following command:

Code:
gocr -C a-zA-Z0-9 -m 162 -a 100 imagename.pnm

it will prompt for characters it doesn't recognize 100% (the -a 100 arg sets the certainty level to 100% - the default is 95).
it will the display these characters as ascii images in the console, usually with the adjoining characters as well - the character it doesn't recognize will be drawn with the "#" character. only enter the one character drawn with #. in the beginning i didn't pay attention and entered all characters and had to delete the database.

the database will be in the 'db' subdir of your working directory.

to use the databse for ocr use the command:
Code:
gocr -C a-zA-Z0-9 -m 2 -a 100 imagename.pnm

i know it's working because if i omit the "-m 2" arg (use database) it doesn't recognize all the characters.

you can use the -p flag to specify the location of the db - it needs a trailing slash to work:
Code:
gocr -C a-zA-Z0-9 -m 2 -a 100 -p "/path/to/db/" imagename.pnm

hopefully this might be clearer to some than the man pages.
Logged
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #4 on: May 15, 2009, 01:05:35 PM »

i will be the genius that asks...

What the fuck is GOCR?

Glorification Of Curmudgeoned Retards?
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
arms
Expert
****
Offline Offline

Posts: 235



View Profile
« Reply #5 on: May 15, 2009, 03:20:44 PM »

it stands for gnu optical character recognition. it is an ocr program.
it reads text from images. it's primary application would be to digitize scanned texts.

you can probably guess it's other uses. Smiley
Logged
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #6 on: May 15, 2009, 03:42:21 PM »

ah duh.
i just didnt see the OCR in that for some reason.

move along, nothing to see here.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #7 on: May 15, 2009, 06:28:41 PM »

 Devilish

::marks thread::

Popcorn
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
walrus
Rookie
**
Offline Offline

Posts: 46


View Profile
« Reply #8 on: June 04, 2009, 03:36:02 PM »

tesseract
Logged
EdwardST
n00b
*
Offline Offline

Posts: 2


View Profile
« Reply #9 on: July 22, 2009, 04:37:20 AM »

Hey i'm using this for the same thing, what exactly does
Code:
-C a-zA-Z0-9
do?
Logged

No links in signatures please
arms
Expert
****
Offline Offline

Posts: 235



View Profile
« Reply #10 on: July 22, 2009, 10:08:08 AM »

what characters to use. the same notation as regular expressions.

a-z(lowercse)A-Z(uppercase)0-9(numbers)
Logged
EdwardST
n00b
*
Offline Offline

Posts: 2


View Profile
« Reply #11 on: July 22, 2009, 08:22:45 PM »

Oh hahaha, I was using -C "abcdefg..." etc.

Anyway did you solve the problem of it recognizing chars wrong?  At -a 100 it wont recognise it at all, but less than that -a 60 it will recognise r as T. though i think i may have entered it in the database wrong  Embarrassed
Logged

No links in signatures please
arms
Expert
****
Offline Offline

Posts: 235



View Profile
« Reply #12 on: July 23, 2009, 06:25:42 AM »

the training works but it'll never be 100%.
Logged
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!