Title: Is it possible to dynamically get the status result that google sees when crawls Post by: tommytx on July 22, 2012, 07:25:26 PM I do a huge amount of transferring websites from one place to another and changing domain names and all sorts of stuff. One of the things i watch very carefully during the first few months of transfer is how the crawl is going. So basically every time google grabs a page I track them in a log such as..
66.249.73.215 Sunday, 22 July 2012 07:23 pm Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) www.mydomain.com Page Requested: /real_estate_article.asp?gi=1013&ai=432 Page Sent To:via 301-> mls-2956102-26_lorelei_rd_west_orange_twp_nj_07052_1845 Status Reported: 200 (this line not available yet) But to take the action I want to in a hurry when a lot of fails are occurring.. I need to dynamically get the status of the page that google is looking at. I know that AWstats gets this, but I need it dynamically at the time of the visit. Is there anyway with PHP, or even Ajax to get this information? Edit: I suppose I could have the program try to read the file after google leaves, and assume that if I could read it and get the number of bytes that google could probably read it also.. but this seems very inefficient to say the least. Title: Re: Is it possible to dynamically get the status result that google sees when crawls Post by: tommytx on July 23, 2012, 08:58:47 AM I had not thought about this before, but if google is forwarded to the 404 page I could put a snippet on the 404.php to log the status of at a min 404 into my data at this point. However i do not believe the word press blog redirects a bot to my 404 page, i think the server just tells him 404.... does anyone know how this works.... I just need some way to add whether the spider hit a bad or good page.. so I don't have to wait for my tool box to fill with 404s before I know there is a problem as it takes google several days to get around to updating the toolbox...
Title: Re: Is it possible to dynamically get the status result that google sees when crawls Post by: Bompa on July 23, 2012, 04:25:24 PM Edit: I suppose I could have the program try to read the file after google leaves, and assume that if I could read it and get the number of bytes that google could probably read it also.. but this seems very inefficient to say the least. If you want to know what status code google is getting, get it yourself. Sounds efficient to me. Title: Re: Is it possible to dynamically get the status result that google sees when crawls Post by: tommytx on July 24, 2012, 08:16:10 PM Quote but if google is forwarded to the 404 page I could put a snippet on the 404.php Well it worked perfectly.. all I have to do is grab the 404.php since in wordpress google is also directed to that same 404 page.. So its a simple process to log the visit of google at least as 404. When I go after the page following the google visit I get a second 404 error if google got a bad page.. so its working perfectly now.. |