gnarlyhat

As many of you guys know, I just bought a new

PHP

  brain and have been going thru the manual and it's time to put it to good use. I'm gonna be building a little command center for myself and I'm gonna figure out each piece of the puzzle and hopefully have the whole thing done by tying them up all together to complete the project.

I'm no programmer so the terms I use may be alien to you. Please feel free to correct me if you spot anything that should be corrected.

Here's what this module is going to do.

1. Scrape new keywords from Wordze's API and save and clean results as txt in keywords directory.
2. Read txt files from a keyword directory. I can FTP existing keyword lists into the directory.
3. Create a category system to keep keyword lists tidy. The best way to do it?
4. Display the keyword lists on a ListBox and display the selected lists on a text box like below.
http://www.imagebam.com/image/e02009663789

Now here's where I bug you guys for advice Applause

1. Is there anything I should be aware of first considering my development platform is going to be on WAMP. Any advice so to make sure my codes will run nicely on my server once I upload it.

2. I'm kinda done with the Wordze scraper except for the cleaning part. For it to be usable in my content generator, it can only be alphanumeric. Is this good enough to do the job?
$keywords = ereg_replace("[^A-Za-z0-9[:space:]]", "", $keywords);
I ripped this from

SEO

 book's keyword stripper.

3. I've figured out step 2 so question for step 3 is "What is the best way to categorise my keywords? Put them in separate dirs? I have different sets of keyword lists for say ... Adult, MFA, Pharma, blah blah blah"

4. Is it possible to do this function from

PHP

  itself as I select each list or I have to rely on

javascript

 ?

Thanks so much for your help in advance Applause

thedarkness

quote author=gnarlyhat link=topic=598.msg3988#msg3988 date=1193903245


4. Is it possible to do this function from

PHP

  itself as I select each list or I have to rely on

javascript

 ?


http://

php

 

.net

 /preg_replace

Cheers,
td

taky

like i told you in im man, i would really recommend just starting off with a database for this kind of project. it will make things oodles easier to manage once you get past the syntax.

it really isnt that difficult, and its def worth it to

learn

  how to interact with mysql, especially for

blackhat

 . def invaluable.

freedom1972

I am going to agree with Taky. Dive into MySQL with this one. It may up the

learn

 ing  curve, but will definitely up your development skill and your ideas for projects. Plus, the DB will become a tool that you use for many different types of applications as your

programming

  creativity kicks off.

gnarlyhat

MySQL .... hmmmm. Looks like I have to rethink and recode what I've done.

Would it slow down everything when I have lists near 10K words? I've been thinking too it's a good idea so I can assign a key to the list to put them into categories.

Can anyone point me to a good hands on MySQL tut? TIA Applause

quote author=thedarkness link=topic=598.msg3989#msg3989 date=1193910664

http://

php

 

.net

 /preg_replace
Cheers,
td


td: Yes I understand preg_replace but is it possible to control html form values? What I meant is initially, if no item is selected ... the right textbox will be empty. When I select "credit-repair.txt" ... only credit-repair will ap

pear

  on the textbox. I will use this to copy to another program for my subdomain creation.

freedom1972

will the MySQL queries slow your code down? that depends. If you are using them excessively, then hell yeah!  This is really a question of your code relative to your server environment. I try to use as a rule of thumb to make as few possible queries as necessary, and load the results into arrays that I can process with.

PHP

  are great tools when it comes to array and text manipulation.

The balance between DB and speed is relative. Personally, I normally go through a multi stage coding process:

1) prototyping / scripting to get the initial results
2) rethinking the design (OO or functional, how can I generalize the classes/functions)
3) code optimization (could include writing extension in C/C++ if necesary, etc.)

that being said, i think you can lear A LOT in stage 1 and 2. Normally, I seek outside help with #3 as it is not my top skill.

perkiset

From the lowest level standpoint, disk/file access is almost 10x as slow as database access. This is because the directory lookup, file open, stream and close systems take time. With MySQL for example, the files are open, indexed and ready to go - you talk to them via TCP/IP which is hugely fast, the lookups are optimized (because that's what a database does) and the return is precisely what you are looking for.

I'd turn the argument around GG... if you'll NEVER have more than say, 5 items in a list, a file is fine - in fact, I'd probably make it a file and cache it for speed.

But if you're going to have 10K items, you are just starting to get to where a database makes the most sense of all. This will be a bit of a

learn

 ing  curve as Freedom mentiones, but as taky notes indirectly, the benefit of you

learn

 ing  this tool will FAR FAR FAR outstrip the time investment to get you there.

Most folks on this board are LAMP, so there's lots of resources here to get you going. Since your WAMP, you can get pre-compiled and executables for MySQL - then you can either download the native tools from the MySQL site (they are OK) or you can download

php

 MyAdmin and go that route - which is what I recommend, because the exercise of getting a new

PHP

  site working against a MySQL database will in itself be invaluable experience. Additionally,

php

 MyAdmin (although not the best of the best when it comes to tools, but certainly adequate) is written in

PHP

  (cleverly, they added that to the name Applause ) so it will work regardless of your platform of choice in the future.

Go man! Don't stop now, you're on a roll...you're really only doing WAP not WAMP yet... add the M!

nutballs

quote author=gnarlyhat link=topic=598.msg3993#msg3993 date=1193930939

Would it slow down everything when I have lists near 10K words? I've been thinking too it's a good idea so I can assign a key to the list to put them into categories.


no, it will speed things up. plus it will increase your flexibility for future features. there is almost no circumstance under which a database would not be justified, when you are talking about scaling projects. Sure a contact form for a website that gets 1 form submission a week doesnt need a database, but the minute you start mentioning 10k of something, you should go database. Flat file / directory methods are good for portability, and technically can run anywhere, but, slow.

that being said, if you have never done Database in any language, and are

learn

 ing 

php

  which i would assume is your first language then, you two choices. slow down and

learn

  it all, which will result in many rewrites as you move forward. or just continue the way you have, get it solid, until you run into a problem where you NEED a database, then recode at that point.

nutballs

LOL perk and I share a brain apparently.

perkiset

perkiset ... nutballs ... separated at birth?

Enquiring minds want to know!

(If this is the case, then I must also be the smartest human on earth, and you must have an awe-inspiring shwanstooker.)

nutballs

lol

gnarlyhat

Database then I should go. Since I'm

learn

 ing  from scratch might as well

learn

  it right. Got any good MySQL

tutor

 ials to point me to? Ones with real examples instead of long ass winded syntax yadda yadda which would make me quit

learn

 ing  after 30 minutes. Also would appreciate recommendations on IDEs to use. I am currently using Notepad++ and switching to my browser pointing to local WAMP. Just before this post, I tried Dreamweaver with ftp setup to connect directly to my

Linux

  server. Have yet to fully test it out. An IDE with autocomplete would definitely help out a lot here.

Here's a problem I have now. Code that used to run fine on my

Linux

  server running

PHP

  4.xx is not on my WAMP which is running

PHP

 5. I was under the impression that older version code should run on a higher version platform. But obviously I'm wrong. Would you guys suggest I code right on the server itself rather than on my notebook with WAMP?

What are the pitfalls I should avoid in order to make sure my code works on both 4 and 5?

From my questions, you all must know I'm a complete noob so please bear with me Applause Thanks so much for your help

nutballs

assuming your PC, i use ultraedit, actually UEstudio.
has ftp and localcopy capabilites.

i have looked at a bunch of IDEs and decided that if I have to read a manual to figure out how to use the software to write code that I already know how to write.... i aint gonna use it. Aptana was one of those, as was eclipse. (though they are more for java technically).

perkiset

I just posted this list for somebody else a couple days ago:

http://www.perkiset.org/forum/database_discussion_mysql_oracle_and_such/help_out_a_n00b_trying_to_

learn

 _sql-t585.0.html;msg3906#msg3906


... that'll get you started.

Regarding code that worked in 4 but not in 5 - that is exceedingly rare. In fact, I have never personally experienced it. There are very few things that changed such that code is not backwardly compatible. It is more likely that you have different modules compiled into the

PHP

  instance on the WAMP box than on the LAMP box.

PHP

  is compiled with a series of switches and dependencies based on what you want to do with it... you can create an incredibly light and tight instance or an overly bloated huge memory hog of an instance if that's your thing. The way to check what the differences are is to do a

php

 info() on both boxes and print what you get. This screen will tell you all the modules, the compile string etc that was used to make up <that> instance of

php

 . I'm about 98% certain that this is your issue... or that theres a version difference between instances.

/p

gnarlyhat

Ok ... here's the problem. I don't know why it's dumping out the source.

I have an
include("WordzeApiClass.

php

 "); in the wordzescraper.

php

  file and when I run it ...

it dumps out the source of WordzeApiClass.

php

  and this error below

Fatal error: Class 'WordzeApi' not found in C:wampwwwmiwordzescraper.

php

  on line 28

Here is line 28 of wordzescraper.

php

 

$tmp = new WordzeApi('myapicode', 1, 100, $SearchStyle, $CharLen, $CountLim, $FilterType);

perkiset

Change the include() to require() or require_once() and then it should fatally bomb before that line. Include is rather a suggestion, and if

PHP

  can't find it it will go on anyway. Since the class cannot be found, I am about 99.9% sure that the include was not found.

Couple options:
add error_reporting(E_ALL) to the front of your code - it will dump even minor notifications, this will tell you a lot
Never use include() unless you really don't care if

PHP

  finds the file

I personally never use relative files either: in the beginning of my routines I'll create a couple vars with a line liek this:

$rootPath = $GLOBASL['rootPath'] = '/www/sites/aDir';

(Note here, that you can actually assign two variables at once which I have done: a global version of this string and a local version).

Then later when I need something I'll do this:

require("$rootPath/theFile.

php

 "Applause or
require("{$GLOBALS['rootPath']}/theFile.

php

 ");

That'll fix you right up.

/p

gnarlyhat

Thanks perk .. I will try it IF I get some time off from my weekend work. Entertaining the Mrs and my son.

Sounds like what you said ... but what's weird is that it's on the same dir and function in another file calling a file from a subdir works just fine. thanks again Applause

gnarlyhat

perk: I went ahead to check it out and it produced the same results. I'm on WAMP BTW. Here's what I put in. The script works fine on

Linux

  server.

$rootPath = $GLOBASL['rootPath'] = 'c:wampwwwmi';
require("{$GLOBALS['rootPath']}WordzeApiClass.

php

 ");

gnarlyhat

I'm just gonna ignore this error cos I'm now currently working on the

Linux

  server and all is well regarding this issue.

I'm facing 1 issue with my scraper.

I don't know if my server's problem or wordze's problem ... If I enter one seed keyword, it works out fine. When I enter more than 1 seed in the form ... it seems to time out or something. I notice also using unpopular seeds, they seem to work out fine so this must be a

net

 work issue. Take for example now .. I enter these seeds below.
Wrongful death
Legal Advice
Taxes

On my FTP side, I only see Wrongful-death.txt created with keywords inside and Legal-Advice.txt with 0 filesize. This means that it's done with wrongful-death and processing legal-advice but from my browser, it spits out only half of the wrongful-death keywords. On Wordze site, I refresh to check my remaining API calls and it stays the same meaning the script is not calling OR Wordze is blocking possibly due to too fast API calls? I tried to add a "sleep(10);" in wordzescraper.

php

  but the script just stopped working ... maybe I put it the wrong place. I've commented it out. Would appreciate some help if possible. TIA Applause

Here's my simple form
<html><body>
<h4>Wordze API Scraper </h4>
<form action="wordzescraper.

php

 " method="post">
Filter Type:<br/>
<select name="FilterType">
<option>None</option>
<option>Adult</option>
<option>Applauserugs</option>
<option>Gambling</option>
<option>Hacking</option>
<option selected="yes">All</option>
</select><br><br>
Search Style:<br/>
<select name="SearchStyle">
<option>exact</option>
<option>any</option>
<option selected="yes">broad</option>
</select><br/><br/>
Enter Keywords Below:<br/>
<textarea name = "keywords" cols="40" rows="20" wrap="hard"></textarea><br/>
<input type="submit" />
</form>
</body></html>


Here's wordzescraper.

php

 

<?

php

 
require("WordzeApiClass.

php

 ");
set_time_limit(0);
$FilterType = $_POST['FilterType'];
$SearchStyle = $_POST['SearchStyle'];
$seeds = explode(" ",$_POST['keywords']);
$num = count($seeds);
print "Total seeds: ".count( $seeds )."<p />";
if($FilterType = "None"Applause
{ $FilterType = 1; }
if($FilterType = "Adult"Applause
{ $FilterType = 2; }
if($FilterType = "Applauserugs"Applause
{ $FilterType = 3; }
if($FilterType = "Gambling"Applause
{ $FilterType = 4; }
if($FilterType = "Hacking"Applause
{ $FilterType = 5; }
if($FilterType = "All"Applause
{ $FilterType = 6; }
for ($i = 0; $i < $num; $i++) {

$fh = fopen("keywords/".trim(str_replace(" ", "-",$seeds[$i])).".txt", 'w') or die("can't open file");

$tmp = new WordzeApi('6d974ba2d7e5a0_notmyapikey_982734', $FilterType, 100, $SearchStyle);
    $tmp->getResults(1, $seeds[$i]);
 
    while($tmp->CurrentPage < $tmp->TotalPages) {
        $tmp->getNextPage(1);
    }
 
    foreach($tmp->showResults() as $kwd=>$sdat) {
  $kwd = preg_replace( "/[^0-9a-zA-Zs]/", '', $kwd );
  print $kwd."<br />";
          $stringData = $kwd;
          fwrite($fh, $stringData." ");
}
fclose($fh);
// print "Pausing 10 seconds before going to the next seed.";
// sleep(10);
}
?>

gnarlyhat

Just what I predicted.

<results version="1.0.1">
<Error>Query speed too fast</Error>
</results>

Can someone help me and tell me where to put my sleep line?

thedarkness

    while($tmp->CurrentPage < $tmp->TotalPages) {
        sleep(10);
        $tmp->getNextPage(1);
    }


Just a guess.

Cheers,
td

gnarlyhat

Thanks TD.

I had to play around with the sleep value though ... too fast and I get the same results. Too slow it stops grabbing keywords .. yeah weird.

gnarlyhat

I'm now trying to separate my keywords into folders and wonder if it's able to create folders automatically using this command. I'm getting an error though. can't open file

$Folder = $_POST['Folder'];
$fh = fopen("keywords/".$Folder."/".trim(str_replace(" ", "-",$seeds[$i])).".txt", 'w') or die("can't open file");


Do I have to create the directory myself first? Permissions are 777. TIA Applause

deregular

I wouldnt think that would work, as you're trying to access a directory that isnt there.
I know fopen('','w') will create a file that isnt there, but not a directory.
You'll have to use mkdir() i guess.

Untested of course.


$Folder = $_POST['Folder'];
if(!is_dir(keywords/{$Folder}){
  mkdir(keywords/{$Folder});
}
$fh = fopen("keywords/".$Folder."/".trim(str_replace(" ", "-",$seeds[$i])).".txt", 'w') or die("can't open file");


perkiset

quote author=deregular link=topic=598.msg4144#msg4144 date=1194660358

I wouldnt think that would work, as you're trying to access a directory that isnt there.
I know fopen('','w') will create a file that isnt there, but not a directory.
You'll have to use mkdir() i guess.

On the dot. Dirs have to be there and you have to have the right access to make that work.

Indica

quote author=gnarlyhat link=topic=598.msg4012#msg4012 date=1193972204

An IDE with autocomplete would definitely help out a lot here.


i recommend komodo. it works for a good amount of languages plus it has a

regex

  tool built in which sold me. it's available for all 3 platforms also.

gnarlyhat

Thanks
Thanks
Thanks

Applause

gnarlyhat

quote author=deregular link=topic=598.msg4144#msg4144 date=1194660358

I wouldnt think that would work, as you're trying to access a directory that isnt there.
I know fopen('','w') will create a file that isnt there, but not a directory.
You'll have to use mkdir() i guess.

Untested of course.


$Folder = $_POST['Folder'];
if(!is_dir(keywords/{$Folder}){
  mkdir(keywords/{$Folder});
}
$fh = fopen("keywords/".$Folder."/".trim(str_replace(" ", "-",$seeds[$i])).".txt", 'w') or die("can't open file");



I'm stumped. I tried the above with no results.

Then I research abit and though I used file_exists instead.

$Folder = $_POST['Folder'];
if(!file_exists($Folder){
  mkdir($Folder);
}


I just get a blank page. Dir is 777. How do I debug?

gnarlyhat

Found out it was missing a fishin bracket. Thanks Applause

gnarlyhat

Just pointing out.

mkdir("$Folder", 0777);

When I check the permissions via FTP ... it shows 755 and any file created inside it would give me permission problems when I try to delete it from FTP. So I have to write a

php

  file to remove them.

To solve that problem, I did this

mkdir("$Folder");
chmod("$Folder", 0777);

Funny though since mkdir("$Folder", 0777); should be setting the permission I need. Maybe I'm missing the whole point.

deregular

quote author=gnarlyhat link=topic=598.msg4187#msg4187 date=1194926535

To solve that problem, I did this

mkdir("$Folder");
chmod("$Folder", 0777);

Funny though since mkdir("$Folder", 0777); should be setting the permission I need. Maybe I'm missing the whole point.


Ya i used to have problems with page generators and troubles deleting, and did the same thing you did, just wrote a

php

  file that would delete  them for me. I always presumed it was becuase

PHP

  created the file it only

PHP

  had permissions to alter it.
So does your solution ^^ above work? If so thanks for the tip! Ill keep it in mind next time i have the problem.

gnarlyhat

Yes it works. chmod 0777 makes it really 777 so you can easily delete with any permission. Not too sure about security though Applause

gnarlyhat

Ok next in line would be fixing the problem with Wordze ...

<?

php

 
require("WordzeApiClass.

php

 ");
set_time_limit(0);
$Folder = $_POST['Folder'];
$FilterType = $_POST['FilterType'];
$SearchStyle = $_POST['SearchStyle'];
$seeds = explode(" ",$_POST['keywords']);
$num = count($seeds);
print "Folder: ".$Folder."<br />";
print "Total seeds: ".count( $seeds )."<br />";
print "Filter Type: ".$FilterType."<br />";
print "Search Type: ".$SearchStyle."<p />";
if(!file_exists($Folder)){
  mkdir("$Folder");
  chmod("$Folder", 0777);
}
for ($i = 0; $i < $num; $i++) {
$fh = fopen($Folder."/".trim(str_replace(" ", "-",$seeds[$i])).".txt", 'w') or die("can't open file");

$tmp = new WordzeApi('6d974bnonotmyapikey', $FilterType, 100, $SearchStyle);
    $tmp->getResults(1, $seeds[$i]);

    while($tmp->CurrentPage < $tmp->TotalPages) {
sleep(2);
$tmp->getNextPage(1);
    }
 
    foreach($tmp->showResults() as $kwd=>$sdat) {
  $kwd = preg_replace( "/[^0-9a-zA-Zs]/", '', $kwd );
  //print $kwd."<br />";
          $stringData = $kwd;
          fwrite($fh, $stringData." ");
}
fclose($fh);
}

?>


I've added a 2 second delay to ensure things go smooth. However ... I am not able to do this with large amounts of seeds. Anything over 10 seeds would give me unreliable results. Meaning that it will not complete everything for me. I've contacted Wordze and Levi tells me that I should only be executing one call at a time and make sure they don't overlap. From my code above, can someone tell me what I'm doing wrong? From my understanding, it should not overlap. How do I add a function to ensure that my call is complete before looping for the next one?


Perkiset's Place Home   Politics @ Perkiset's