The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 16, 2019, 01:35:05 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: curl multi apache crash when follow redirection.  (Read 4990 times)
nattsurfaren
Journeyman
***
Offline Offline

Posts: 64


View Profile
« on: April 04, 2009, 12:19:18 PM »


I'm using wamp on windows. When I parse all links on:
http://www.kaboodle.com
it crashes the apache server on this link:
http://www.kaboodle.com/my/publicpages
Turning off redirect solves the problem but I don't know why it fails on this specific url.



I have this function:

Code:

function MultiGetCurl($urls = array())
{

    $agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; sv-SE; rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16';

    $mh = curl_multi_init();
    $conn = array();
    foreach ($urls as $i => $url)
    {
        $conn[$i] = curl_init($url);

        curl_setopt($conn[$i], CURLOPT_URL, $url);
        curl_setopt($conn[$i], CURLOPT_REFERER, $url);
        curl_setopt($conn[$i], CURLOPT_USERAGENT, $agent);
        curl_setopt($conn[$i], CURLOPT_RETURNTRANSFER, true);
        //curl_setopt($conn[$i], CURLOPT_FOLLOWLOCATION, true);//This will crash apache.
        curl_setopt($conn[$i], CURLOPT_FOLLOWLOCATION, false);
        curl_setopt($conn[$i], CURLOPT_CONNECTTIMEOUT, 20);
    curl_setopt($conn[$i], CURLOPT_TIMEOUT, 40);

        curl_multi_add_handle($mh, $conn[$i]);

    }


    $mrc = CURLM_CALL_MULTI_PERFORM;
    while ($mrc == CURLM_CALL_MULTI_PERFORM)
    {
        $mrc = curl_multi_exec($mh, $active);
    }

    while ($active and $mrc == CURLM_OK)
    {
        // wait for network
        if (curl_multi_select($mh) != -1)
        {
            // pull in any new data, or at least handle timeouts
            do
            {
                $mrc = curl_multi_exec($mh, $active);
            } while ($mrc == CURLM_CALL_MULTI_PERFORM);
        }
    }


    if ($mrc != CURLM_OK)
    {
        print "Curl multi read error $mrc\n";
    }
    $res = array();
    foreach ($urls as $i => $url)
    {
        if (($err = curl_error($conn[$i])) == '')
        {
            $res[$url] = curl_multi_getcontent($conn[$i]);
        }
        else
        {
            print "Curl error on handle $i: $err\n";
        }
        curl_multi_remove_handle($mh, $conn[$i]);
            curl_close($conn[$i]);
    }

    curl_multi_close($mh);//The crash will occur here
    return $res;

}




Follow redirection crashes my apache. To resolve set CURLOPT_FOLLOWLOCATION to false:
curl_setopt($conn[$i], CURLOPT_FOLLOWLOCATION, false);

Custom redirection code is needed.

/Natt


Logged
nattsurfaren
Journeyman
***
Offline Offline

Posts: 64


View Profile
« Reply #1 on: April 04, 2009, 02:11:17 PM »

Fish this now it crashes when I scan another site. Does anybody see a problem with the function?
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #2 on: April 04, 2009, 04:05:40 PM »

Crashes Apache?

I'm not a cURL guy, but that sounds like something deeper than your script here. I notice that you are using curl_multi, and again, although I'm no cURL guy I know that multithreading under a process under Apache is (for the most part) a no-no. Does it behave differently when you execute it from a command line?
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nattsurfaren
Journeyman
***
Offline Offline

Posts: 64


View Profile
« Reply #3 on: April 05, 2009, 02:32:11 AM »

Crashes Apache?

I'm not a cURL guy, but that sounds like something deeper than your script here. I notice that you are using curl_multi, and again, although I'm no cURL guy I know that multithreading under a process under Apache is (for the most part) a no-no. Does it behave differently when you execute it from a command line?

Yes that could work. I have to rewrite the code to make it work from the command line. 
Logged
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« Reply #4 on: May 05, 2009, 05:56:24 PM »

I am not an apache/php guy but I have used curl.
Multi is not multi-threaded. So it is not a threading problem.
But multi very hard to use, and makes the logic of your code very complex.
(Only time it is easy to use is if you use continuations Smiley which PHP does not have)

Piece of advice.
2 viable options for multi-tasking. Either use select (which multi curl does behind the scenes) or
Do what perks calls "poor man threading". Or learn how to use fork.
Never use threads (unless u are writing a 3d game or something where u need nano second access to the video card).

In the past the select option was the best (and still might be on 1 cpu machines).
Processor power is cheap. Multiple processes will carry it accross multiple CPU.
Each process is isolated. That way when ur process crashes, it will not bring down the entire mess.
Use some sort of database etc, to share state between process.





Logged
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #5 on: May 06, 2009, 11:13:37 AM »

curl_multi is easy to use, but I have no idea why you're doing some of the things you're doing in that code.
Logged

hai
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« Reply #6 on: May 06, 2009, 04:02:04 PM »

curl multi is not "hard" to use but it requires a different approach
Basically it is "event" driven programming. You have to setup something very similar to twisted python.
vsloathe is right, ur code is a mess.
99% of good coding is not knowing what some stupid function is but "organization".

in this case u have to setup ur own state machine.
as in each operation on the curl multi is in its own method.
simplest way in php/python is to setup a class (language really does not matter u could do same shit in C++)
this is just sudo code
so a google serp scrapper would look like this

class MyScrapper :

def __init__:
do u initialization here. Setup ur curl etc

def state_start :
setup curl to open first page of google "http://www.google.com"
then add curl to multi

def state_start_search :
set the search form in curl to what ever u want
add curl to multi

def state_next_search:
scrape the results
tell curl to click on the next button
add curl to multi

def state_done:
from the next search have it call this when it is done.

so now u have a class which represents a curl operation

if u are smart, u notice that at the end, in multi mode u add the curl to the multi, or in easy mode u just call curl_perform
so with a proper parent u could make it so the same code could be used from either multi or easy Wink

when i muli u make another class
it basically runs continously, it loops over the multi, when the curls are ready it calls the proper state class at the proper state.

Too lazy to find. But learn how to make a state machine.
I had ages ago a really good book in C++ (language does not matter).

Tons of retarded books that show u how to use some stupid function.
very few books that tell u how to set shit up when u code.







Logged
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #7 on: May 07, 2009, 06:53:15 AM »

I do the same thing in my own parallel processing endeavors; I create a statemachine for forking.
Logged

hai
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #8 on: May 07, 2009, 01:46:37 PM »

ROFLMAO Yeah man. But can the n00bs follow you?
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #9 on: May 07, 2009, 01:53:54 PM »

Apparently not, since I open the source of everything I write and I have yet to see any takers edging in on my cheddar.
Logged

hai
klaus
n00b
*
Offline Offline

Posts: 2


View Profile
« Reply #10 on: May 10, 2009, 01:48:34 PM »

Check here:

http://www.somacon.com/p537.php
Logged

No links in signatures please
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« Reply #11 on: May 10, 2009, 04:40:51 PM »

ROFLMAO Yeah man. But can the n00bs follow you?
I can't follow what ever he wrote.
The output of a disassembler makes more sense.

Noobs get a copy of that gang of four book.
Logged
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!