Exactly perk,
OK, I first got interested when I had this as a problem.....
I had to parse a file which had the following structure;
data1
data2 (possibly missing)
empty line
This looks considerably easier than it actually is to parse, the problems presented are the possible absence of the second piece of data and the empty lines in between. Either one when taken in isolation is not that bad but together they presented me with a headache.
Now awk can either be run like this "awk -f cmd.awk target_to_parse" or the cmd.wak (arbitrary name) can be made stand alone by the use of a shebang line and you just call it like any other executable.
Anyway, the solution to the above example came out looking like this;
#!/bin/awk -f
BEGIN {
FS="\n"
RS=""
}
{
# if( $2 != "" )
if( NF > 1 )
print $1 "\n" $2
}
The first line is the shebang and I'm not going into that here.
next is the BEGIN section which is like an init() function where you set up variables etc. for the run. FS is an internal variable which stands for field separator (in this case a line feed), RS is Record separator (in this case a blank line). the next set of parenthesis represent the main guts of the script, this is the section that gets applied to each "record", in this case I test that there is more than 1 field and, if there is print the two fields seperated by a new line. This satisified the requirements for that particular task.
next I had a task where the numbers involved were very large as mentioned in the OP. Once again awk came to the rescue, this time I've changed the regex slightly to protect the guilty ;-)
#!/bin/awk -f
BEGIN {
FS="\""
RS="\n"
}
{
# if( NF == 7 )
if( / *?<img name/ )
print $4
}
this time we're testing whether a line matches the regex (in this case looking for a line that starts with at least one space and is a name <img>) and printing the fourth field based on a " as the field separator. the lines that begin with # are commented out BTW, except the shebang of course

this barely scratches the surface of what can be done with awk but I hope it is enough to whet your collective appetites.
Heaps more info here
http://www.gnu.org/software/gawk/manual/html_node/index.htmlCheers,
Brad