The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. March 19, 2024, 02:28:41 AM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: Is it possible to dynamically get the status result that google sees when crawls  (Read 2038 times)
tommytx
Expert
****
Offline Offline

Posts: 123


View Profile WWW
« on: July 22, 2012, 07:25:26 PM »

I do a huge amount of transferring websites from one place to another and changing domain names and all sorts of stuff.  One of the things i watch very carefully during the first few months of transfer is how the crawl is going. So basically every time google grabs a page I track them in a log such as..

66.249.73.215
Sunday, 22 July 2012 07:23 pm
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
www.mydomain.com
Page Requested: /real_estate_article.asp?gi=1013&ai=432
Page Sent To:via 301-> mls-2956102-26_lorelei_rd_west_orange_twp_nj_07052_1845
Status Reported: 200 (this line not available yet)

But to take the action I want to in a hurry when a lot of fails are occurring.. I need to dynamically get the status of the page that google is looking at.  I know that AWstats gets this, but I need it dynamically at the time of the visit.

Is there anyway with PHP, or even Ajax to get this information?

Edit: I suppose I could have the program try to read the file after google leaves, and assume that if I could read it and get the number of bytes that google could probably read it also.. but this seems very inefficient to say the least.

« Last Edit: July 22, 2012, 07:28:24 PM by tommytx » Logged
tommytx
Expert
****
Offline Offline

Posts: 123


View Profile WWW
« Reply #1 on: July 23, 2012, 08:58:47 AM »

I had not thought about this before, but if google is forwarded to the 404 page I could put a snippet on the 404.php to log the status of at a min 404 into my data at this point. However i do not believe the word press blog redirects a bot to my 404 page, i think the server just tells him 404.... does anyone know how this works....  I just need some way to add whether the spider hit a bad or good page.. so I don't have to wait for my tool box to fill with 404s before I know there is a problem as it takes google several days to get around to updating the toolbox...

Logged
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #2 on: July 23, 2012, 04:25:24 PM »

Edit: I suppose I could have the program try to read the file after google leaves, and assume that if I could read it and get the number of bytes that google could probably read it also.. but this seems very inefficient to say the least.

If you want to know what status code google is getting, get it yourself.

Sounds efficient to me.

Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
tommytx
Expert
****
Offline Offline

Posts: 123


View Profile WWW
« Reply #3 on: July 24, 2012, 08:16:10 PM »

Quote
but if google is forwarded to the 404 page I could put a snippet on the 404.php

Well it worked perfectly.. all I have to do is grab the 404.php since in wordpress google is also directed to that same 404 page..
So its a simple process to log the visit of google at least as 404.  When I go after the page following the google visit I get a second 404 error if google got a bad page.. so its working perfectly now..
Logged
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!