 |
gnarlyhat
As many of you guys know, I just bought a new PHP brain and have been going thru the manual and it's time to put it to good use. I'm gonna be building a little command center for myself and I'm gonna figure out each piece of the puzzle and hopefully have the whole thing done by tying them up all together to complete the project. I'm no programmer so the terms I use may be alien to you. Please feel free to correct me if you spot anything that should be corrected. Here's what this module is going to do. 1. Scrape new keywords from Wordze's API and save and clean results as txt in keywords directory. 2. Read txt files from a keyword directory. I can FTP existing keyword lists into the directory. 3. Create a category system to keep keyword lists tidy. The best way to do it? 4. Display the keyword lists on a ListBox and display the selected lists on a text box like below. http://www.imagebam.com/image/e02009663789 Now here's where I bug you guys for advice  1. Is there anything I should be aware of first considering my development platform is going to be on WAMP. Any advice so to make sure my codes will run nicely on my server once I upload it. 2. I'm kinda done with the Wordze scraper except for the cleaning part. For it to be usable in my content generator, it can only be alphanumeric. Is this good enough to do the job? $keywords = ereg_replace("[^A-Za-z0-9[:space:]]", "", $keywords); I ripped this from SEO book's keyword stripper. 3. I've figured out step 2 so question for step 3 is "What is the best way to categorise my keywords? Put them in separate dirs? I have different sets of keyword lists for say ... Adult, MFA, Pharma, blah blah blah" 4. Is it possible to do this function from PHP itself as I select each list or I have to rely on javascript ? Thanks so much for your help in advance
thedarkness
quote author=gnarlyhat link=topic=598.msg3988#msg3988 date=1193903245
4. Is it possible to do this function from
PHP itself as I select each list or I have to rely on javascript ?
http:// php .net /preg_replace Cheers, td
taky
like i told you in im man, i would really recommend just starting off with a database for this kind of project. it will make things oodles easier to manage once you get past the syntax. it really isnt that difficult, and its def worth it to learn how to interact with mysql, especially for blackhat . def invaluable.
freedom1972
I am going to agree with Taky. Dive into MySQL with this one. It may up the learn ing
curve, but will definitely up your development skill and your ideas for projects. Plus, the DB will become a tool that you use for many different types of applications as your programming creativity kicks off.
gnarlyhat
MySQL .... hmmmm. Looks like I have to rethink and recode what I've done. Would it slow down everything when I have lists near 10K words? I've been thinking too it's a good idea so I can assign a key to the list to put them into categories. Can anyone point me to a good hands on MySQL tut? TIA  quote author=thedarkness link=topic=598.msg3989#msg3989 date=1193910664 http://
php .net /preg_replace Cheers, td
td: Yes I understand preg_replace but is it possible to control html form values? What I meant is initially, if no item is selected ... the right textbox will be empty. When I select "credit-repair.txt" ... only credit-repair will ap pear on the textbox. I will use this to copy to another program for my subdomain creation.
freedom1972
will the MySQL queries slow your code down? that depends. If you are using them excessively, then hell yeah! This is really a question of your code relative to your server environment. I try to use as a rule of thumb to make as few possible queries as necessary, and load the results into arrays that I can process with. PHP are great tools when it comes to array and text manipulation. The balance between DB and speed is relative. Personally, I normally go through a multi stage coding process: 1) prototyping / scripting to get the initial results 2) rethinking the design (OO or functional, how can I generalize the classes/functions) 3) code optimization (could include writing extension in C/C++ if necesary, etc.) that being said, i think you can lear A LOT in stage 1 and 2. Normally, I seek outside help with #3 as it is not my top skill.
perkiset
From the lowest level standpoint, disk/file access is almost 10x as slow as database access. This is because the directory lookup, file open, stream and close systems take time. With MySQL for example, the files are open, indexed and ready to go - you talk to them via TCP/IP which is hugely fast, the lookups are optimized (because that's what a database does) and the return is precisely what you are looking for. I'd turn the argument around GG... if you'll NEVER have more than say, 5 items in a list, a file is fine - in fact, I'd probably make it a file and cache it for speed. But if you're going to have 10K items, you are just starting to get to where a database makes the most sense of all. This will be a bit of a learn ing
curve as Freedom mentiones, but as taky notes indirectly, the benefit of you learn ing
this tool will FAR FAR FAR outstrip the time investment to get you there. Most folks on this board are LAMP, so there's lots of resources here to get you going. Since your WAMP, you can get pre-compiled and executables for MySQL - then you can either download the native tools from the MySQL site (they are OK) or you can download php MyAdmin and go that route - which is what I recommend, because the exercise of getting a new PHP site working against a MySQL database will in itself be invaluable experience. Additionally, php MyAdmin (although not the best of the best when it comes to tools, but certainly adequate) is written in PHP (cleverly, they added that to the name  ) so it will work regardless of your platform of choice in the future. Go man! Don't stop now, you're on a roll...you're really only doing WAP not WAMP yet... add the M!
nutballs
quote author=gnarlyhat link=topic=598.msg3993#msg3993 date=1193930939 Would it slow down everything when I have lists near 10K words? I've been thinking too it's a good idea so I can assign a key to the list to put them into categories.
no, it will speed things up. plus it will increase your flexibility for future features. there is almost no circumstance under which a database would not be justified, when you are talking about scaling projects. Sure a contact form for a website that gets 1 form submission a week doesnt need a database, but the minute you start mentioning 10k of something, you should go database. Flat file / directory methods are good for portability, and technically can run anywhere, but, slow. that being said, if you have never done Database in any language, and are learn ing php which i would assume is your first language then, you two choices. slow down and learn it all, which will result in many rewrites as you move forward. or just continue the way you have, get it solid, until you run into a problem where you NEED a database, then recode at that point.
nutballs
LOL perk and I share a brain apparently.
perkiset
perkiset ... nutballs ... separated at birth? Enquiring minds want to know! (If this is the case, then I must also be the smartest human on earth, and you must have an awe-inspiring shwanstooker.)
gnarlyhat
Database then I should go. Since I'm learn ing
from scratch might as well learn it right. Got any good MySQL tutor ials to point me to? Ones with real examples instead of long ass winded syntax yadda yadda which would make me quit learn ing
after 30 minutes. Also would appreciate recommendations on IDEs to use. I am currently using Notepad++ and switching to my browser pointing to local WAMP. Just before this post, I tried Dreamweaver with ftp setup to connect directly to my Linux server. Have yet to fully test it out. An IDE with autocomplete would definitely help out a lot here. Here's a problem I have now. Code that used to run fine on my Linux server running PHP 4.xx is not on my WAMP which is running PHP 5. I was under the impression that older version code should run on a higher version platform. But obviously I'm wrong. Would you guys suggest I code right on the server itself rather than on my notebook with WAMP? What are the pitfalls I should avoid in order to make sure my code works on both 4 and 5? From my questions, you all must know I'm a complete noob so please bear with me  Thanks so much for your help
nutballs
assuming your PC, i use ultraedit, actually UEstudio. has ftp and localcopy capabilites. i have looked at a bunch of IDEs and decided that if I have to read a manual to figure out how to use the software to write code that I already know how to write.... i aint gonna use it. Aptana was one of those, as was eclipse. (though they are more for java technically).
perkiset
I just posted this list for somebody else a couple days ago: http://www.perkiset.org/forum/database_discussion_mysql_oracle_and_such/help_out_a_n00b_trying_to_learn _sql-t585.0.html;msg3906#msg3906... that'll get you started. Regarding code that worked in 4 but not in 5 - that is exceedingly rare. In fact, I have never personally experienced it. There are very few things that changed such that code is not backwardly compatible. It is more likely that you have different modules compiled into the PHP instance on the WAMP box than on the LAMP box. PHP is compiled with a series of switches and dependencies based on what you want to do with it... you can create an incredibly light and tight instance or an overly bloated huge memory hog of an instance if that's your thing. The way to check what the differences are is to do a php info() on both boxes and print what you get. This screen will tell you all the modules, the compile string etc that was used to make up <that> instance of php . I'm about 98% certain that this is your issue... or that theres a version difference between instances. /p
gnarlyhat
Ok ... here's the problem. I don't know why it's dumping out the source. I have an include("WordzeApiClass. php "); in the wordzescraper. php file and when I run it ... it dumps out the source of WordzeApiClass. php and this error below Fatal error: Class 'WordzeApi' not found in C:wampwwwmiwordzescraper. php on line 28 Here is line 28 of wordzescraper. php $tmp = new WordzeApi('myapicode', 1, 100, $SearchStyle, $CharLen, $CountLim, $FilterType);
perkiset
Change the include() to require() or require_once() and then it should fatally bomb before that line. Include is rather a suggestion, and if PHP can't find it it will go on anyway. Since the class cannot be found, I am about 99.9% sure that the include was not found. Couple options: add error_reporting(E_ALL) to the front of your code - it will dump even minor notifications, this will tell you a lot Never use include() unless you really don't care if PHP finds the file I personally never use relative files either: in the beginning of my routines I'll create a couple vars with a line liek this: $rootPath = $GLOBASL['rootPath'] = '/www/sites/aDir'; (Note here, that you can actually assign two variables at once which I have done: a global version of this string and a local version). Then later when I need something I'll do this: require("$rootPath/theFile. php "  or require("{$GLOBALS['rootPath']}/theFile. php "); That'll fix you right up. /p
gnarlyhat
Thanks perk .. I will try it IF I get some time off from my weekend work. Entertaining the Mrs and my son. Sounds like what you said ... but what's weird is that it's on the same dir and function in another file calling a file from a subdir works just fine. thanks again
gnarlyhat
perk: I went ahead to check it out and it produced the same results. I'm on WAMP BTW. Here's what I put in. The script works fine on Linux server. $rootPath = $GLOBASL['rootPath'] = 'c:wampwwwmi'; require("{$GLOBALS['rootPath']}WordzeApiClass. php ");
gnarlyhat
I'm just gonna ignore this error cos I'm now currently working on the Linux server and all is well regarding this issue. I'm facing 1 issue with my scraper. I don't know if my server's problem or wordze's problem ... If I enter one seed keyword, it works out fine. When I enter more than 1 seed in the form ... it seems to time out or something. I notice also using unpopular seeds, they seem to work out fine so this must be a net work issue. Take for example now .. I enter these seeds below. Wrongful death Legal Advice Taxes On my FTP side, I only see Wrongful-death.txt created with keywords inside and Legal-Advice.txt with 0 filesize. This means that it's done with wrongful-death and processing legal-advice but from my browser, it spits out only half of the wrongful-death keywords. On Wordze site, I refresh to check my remaining API calls and it stays the same meaning the script is not calling OR Wordze is blocking possibly due to too fast API calls? I tried to add a "sleep(10);" in wordzescraper. php but the script just stopped working ... maybe I put it the wrong place. I've commented it out. Would appreciate some help if possible. TIA  Here's my simple form <html><body> <h4>Wordze API Scraper </h4> <form action="wordzescraper. php " method="post"> Filter Type:<br/> <select name="FilterType"> <option>None</option> <option>Adult</option> <option>  rugs</option> <option>Gambling</option> <option>Hacking</option> <option selected="yes">All</option> </select><br><br> Search Style:<br/> <select name="SearchStyle"> <option>exact</option> <option>any</option> <option selected="yes">broad</option> </select><br/><br/> Enter Keywords Below:<br/> <textarea name = "keywords" cols="40" rows="20" wrap="hard"></textarea><br/> <input type="submit" /> </form> </body></html> Here's wordzescraper. php <? php require("WordzeApiClass. php "); set_time_limit(0); $FilterType = $_POST['FilterType']; $SearchStyle = $_POST['SearchStyle']; $seeds = explode("
",$_POST['keywords']); $num = count($seeds); print "Total seeds: ".count( $seeds )."<p />"; if($FilterType = "None"  { $FilterType = 1; } if($FilterType = "Adult"  { $FilterType = 2; } if($FilterType = "  rugs"  { $FilterType = 3; } if($FilterType = "Gambling"  { $FilterType = 4; } if($FilterType = "Hacking"  { $FilterType = 5; } if($FilterType = "All"  { $FilterType = 6; } for ($i = 0; $i < $num; $i++) { $fh = fopen("keywords/".trim(str_replace(" ", "-",$seeds[$i])).".txt", 'w') or die("can't open file"); $tmp = new WordzeApi('6d974ba2d7e5a0_notmyapikey_982734', $FilterType, 100, $SearchStyle); $tmp->getResults(1, $seeds[$i]); while($tmp->CurrentPage < $tmp->TotalPages) { $tmp->getNextPage(1); } foreach($tmp->showResults() as $kwd=>$sdat) { $kwd = preg_replace( "/[^0-9a-zA-Zs]/", '', $kwd ); print $kwd."<br />"; $stringData = $kwd; fwrite($fh, $stringData."
"); } fclose($fh); // print "Pausing 10 seconds before going to the next seed."; // sleep(10); } ?>
gnarlyhat
Just what I predicted. <results version="1.0.1"> <Error>Query speed too fast</Error> </results> Can someone help me and tell me where to put my sleep line?
thedarkness
while($tmp->CurrentPage < $tmp->TotalPages) { sleep(10); $tmp->getNextPage(1); } Just a guess. Cheers, td
gnarlyhat
Thanks TD. I had to play around with the sleep value though ... too fast and I get the same results. Too slow it stops grabbing keywords .. yeah weird.
gnarlyhat
I'm now trying to separate my keywords into folders and wonder if it's able to create folders automatically using this command. I'm getting an error though. can't open file $Folder = $_POST['Folder']; $fh = fopen("keywords/".$Folder."/".trim(str_replace(" ", "-",$seeds[$i])).".txt", 'w') or die("can't open file"); Do I have to create the directory myself first? Permissions are 777. TIA
deregular
I wouldnt think that would work, as you're trying to access a directory that isnt there. I know fopen('','w') will create a file that isnt there, but not a directory. You'll have to use mkdir() i guess. Untested of course. $Folder = $_POST['Folder']; if(!is_dir(keywords/{$Folder}){ mkdir(keywords/{$Folder}); } $fh = fopen("keywords/".$Folder."/".trim(str_replace(" ", "-",$seeds[$i])).".txt", 'w') or die("can't open file");
perkiset
quote author=deregular link=topic=598.msg4144#msg4144 date=1194660358 I wouldnt think that would work, as you're trying to access a directory that isnt there. I know fopen('','w') will create a file that isnt there, but not a directory. You'll have to use mkdir() i guess.
On the dot. Dirs have to be there and you have to have the right access to make that work.
Indica
quote author=gnarlyhat link=topic=598.msg4012#msg4012 date=1193972204 An IDE with autocomplete would definitely help out a lot here.
i recommend komodo. it works for a good amount of languages plus it has a regex tool built in which sold me. it's available for all 3 platforms also.
gnarlyhat
Thanks Thanks Thanks
gnarlyhat
quote author=deregular link=topic=598.msg4144#msg4144 date=1194660358 I wouldnt think that would work, as you're trying to access a directory that isnt there. I know fopen('','w') will create a file that isnt there, but not a directory. You'll have to use mkdir() i guess. Untested of course. $Folder = $_POST['Folder']; if(!is_dir(keywords/{$Folder}){ mkdir(keywords/{$Folder}); } $fh = fopen("keywords/".$Folder."/".trim(str_replace(" ", "-",$seeds[$i])).".txt", 'w') or die("can't open file");
I'm stumped. I tried the above with no results. Then I research abit and though I used file_exists instead. $Folder = $_POST['Folder']; if(!file_exists($Folder){ mkdir($Folder); } I just get a blank page. Dir is 777. How do I debug?
gnarlyhat
Found out it was missing a fishin bracket. Thanks
gnarlyhat
Just pointing out. mkdir("$Folder", 0777); When I check the permissions via FTP ... it shows 755 and any file created inside it would give me permission problems when I try to delete it from FTP. So I have to write a php file to remove them. To solve that problem, I did this mkdir("$Folder"); chmod("$Folder", 0777); Funny though since mkdir("$Folder", 0777); should be setting the permission I need. Maybe I'm missing the whole point.
deregular
quote author=gnarlyhat link=topic=598.msg4187#msg4187 date=1194926535 To solve that problem, I did this
mkdir("$Folder"); chmod("$Folder", 0777);
Funny though since mkdir("$Folder", 0777); should be setting the permission I need. Maybe I'm missing the whole point.
Ya i used to have problems with page generators and troubles deleting, and did the same thing you did, just wrote a php file that would delete them for me. I always presumed it was becuase PHP created the file it only PHP had permissions to alter it. So does your solution ^^ above work? If so thanks for the tip! Ill keep it in mind next time i have the problem.
gnarlyhat
Yes it works. chmod 0777 makes it really 777 so you can easily delete with any permission. Not too sure about security though
gnarlyhat
Ok next in line would be fixing the problem with Wordze ... <? php require("WordzeApiClass.php "); set_time_limit(0); $Folder = $_POST['Folder']; $FilterType = $_POST['FilterType']; $SearchStyle = $_POST['SearchStyle']; $seeds = explode("
",$_POST['keywords']); $num = count($seeds); print "Folder: ".$Folder."<br />"; print "Total seeds: ".count( $seeds )."<br />"; print "Filter Type: ".$FilterType."<br />"; print "Search Type: ".$SearchStyle."<p />"; if(!file_exists($Folder)){ mkdir("$Folder"); chmod("$Folder", 0777); } for ($i = 0; $i < $num; $i++) { $fh = fopen($Folder."/".trim(str_replace(" ", "-",$seeds[$i])).".txt", 'w') or die("can't open file");
$tmp = new WordzeApi('6d974bnonotmyapikey', $FilterType, 100, $SearchStyle); $tmp->getResults(1, $seeds[$i]);
while($tmp->CurrentPage < $tmp->TotalPages) { sleep(2); $tmp->getNextPage(1); } foreach($tmp->showResults() as $kwd=>$sdat) { $kwd = preg_replace( "/[^0-9a-zA-Zs]/", '', $kwd ); //print $kwd."<br />"; $stringData = $kwd; fwrite($fh, $stringData."
"); } fclose($fh); }
?>I've added a 2 second delay to ensure things go smooth. However ... I am not able to do this with large amounts of seeds. Anything over 10 seeds would give me unreliable results. Meaning that it will not complete everything for me. I've contacted Wordze and Levi tells me that I should only be executing one call at a time and make sure they don't overlap. From my code above, can someone tell me what I'm doing wrong? From my understanding, it should not overlap. How do I add a function to ensure that my call is complete before looping for the next one?
|