The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 16, 2019, 10:04:05 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: Project 0.5: Converting file based to MySQL  (Read 6690 times)
gnarlyhat
Journeyman
***
Offline Offline

Posts: 51



View Profile WWW
« on: November 02, 2007, 02:28:11 AM »

Hehe ... Before I can start Project 1.0 I need to learn how to use MySQL instead of a file.

I will be doing a simple Wordze API script and then store it into MySQL. First I would need to build me a table to store the list.

Being a complete noob at this, after reading thru a tutorial ... I figured ... I need to specify the design the database first.

I would want to store my keyword lists into separate categories so it's easy to manage them. Say ... MFA keywords, Adult Keywords, etc.

In those of course I have my seed keyword and a huge list of keywords ... may reach 20K lines at times and I reckon 100K when I go crazy into typos and shit.

Again ... please correct me if you think there's a better way of doing this. I will create a table each for the category.

TABLE MFA
id INT
seed VARCHAR
keywords TEXT

TABLE ADULT
id INT
seed VARCHAR
keywords TEXT

My only worry is ... will the TEXT field be able to hold that much keywords for me. As I mentioned ... somewhere between 20K to 100K lines.

TIA  Praise
Logged
leadegr00t
n00b
*
Offline Offline

Posts: 8



View Profile WWW
« Reply #1 on: November 02, 2007, 06:16:12 AM »

Hey there!

Not sure you are approaching the subject correctly,.
Yes, design the db first (its a 'measure twice cut once' thing Smiley) but I would have expected that each keyword would be one row in the db.
Perhaps a structure like this:
id (primary key, auto_increment)
keyword
category
seednum

so each keyword would appear once (ignoring a keyword appearing in more than one cat)
categories would repeat frequently
and seed would be only a few variations across the rows.

does that help?

(Possibly first post, not sure - hi guys!)
Logged

No links in signatures please
DangerMouse
Expert
****
Offline Offline

Posts: 244



View Profile
« Reply #2 on: November 02, 2007, 07:36:36 AM »

Hi gnarly,

Personally I'm a firm believer in standards driven code and structures, as such I'd suggest reading up on database normalisation before defining your structure - the rules may seem quite complex at first, but they will hep create an efficient model thats flexible and extensible. Similarly the SQL required may at first seem to be more complex, but using a full relational structure will pay dividends when you come to do more complex queries.

I would consider a structure like this:

tbl_category
- id [Autoincrement]
- category

tbl_keyword
- id [Autoincrement]
- keyword
- daily_searches
- other info directly related to keyword

tjx_category_keyword
- id [Autoincrement]
- fk_Category
- fk_Keyword

Basically this illustrates the many-to-many relationship between keywords and categories, a keyword can be in many categories, and a category can have multiple keywords. The junction table tjx_category_keyword stores this relationship based on the id numbers from the category and keyword tables (foreign keys).

Having never created a keyword DB I dont really know whether this serves as the most pragmatic solution for what your trying to achieve, plus it is probably a little too simplistic when you consider the relationship between keywords and keyword phrases but hopefully it stresses the importance of solid relational modelling.

Although the SQL to do this will seem like a baptism of fire at first there are tons of good MySQL syntax examples on the web so it shouldnt take too long to pick up.

Forgive me if i've miss read this and your previous post (assuming your new to DB design rather than just MySQL). Hope this helps a little.

DM
Logged
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #3 on: November 02, 2007, 08:21:11 AM »

go with DMs design, if you think you understand the concepts.

what he has outlined is a 'normalized' database design. You store only data in 'datatables' and relationships in 'lookup' or 'reference' tables. You store in each table information only directly related to the subject of that table.
Addresses are an example.
street,city,state. In and normalized database structure, the belief is that it would be silly to store all 4 of those in 1 table, since many streets exist in the same city. and many cities exist in the same state, resulting in a lot of repetition, such as the same city being repeated over and over. This makes the db unnecessarily bigger.

Fully normalize puts each type of data into its own table, and doesnt store the references in that table. So you would have a table for streets, a table for cities, and a table for states. Then you would create all the relationships in a relationship table by referencing the ID columns of each of those tables. Thus, fk_streetID, fk_cityID, fk_stateID. (fk stands for foriegn key, as opposed to PK, which is the primary key of a table like the ID column). So in the lookuptable you would store in each row, lookupID-fk_streetID-fk_cityID-fk_stateID. Personally, i feel this is overnormalized, and I would store the city and state FKids in the street table, but its easier to understand all broken apart.

Normalizing does not mean breaking EVERYTHING up though. Username and Password would not make sense to break up, since you could assume everyone will have a different password, and everyone will definitely have a different username.
Firstname,lastname, date of birth is the same thing, breaking it up doesnt get you anywhere.

keywords and categories, break it up. 1 keyword may have many categories AND 1 category will have many keywords. So normalize.

if you dont want to have to deal with JOINs, which is what you would need to do for a normalized database, you can just go flat.
keyword-category-whateverelseyouwanttostore.
but that results in a lot of redundancy, and bigger problems potentially, later on when you want to add columns.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #4 on: November 02, 2007, 09:01:31 AM »

leadegr00t
Ditto

dangermouse
Ditto

nutballs
Ditto

and welcome to the Cache leadegr00t!

This is really the way of things. You should really stop typing, read this stuff and get into a beginners MySQL book immediately. The small hiccup time in this beginning will pay off BIG BIG time in the very near future.

You should listen to these boys, they're all pre-med.

/p
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
gnarlyhat
Journeyman
***
Offline Offline

Posts: 51



View Profile WWW
« Reply #5 on: November 02, 2007, 09:32:44 AM »

That went a little way over my head. I should have paid a lot more attention in Database class. I really hated that part. Normalization ... didn't realize it would come back haunting me. To make it simple ... let me explain how I was doing it with flat files.

/MFA(category)/bank-loans(seed).txt <- 5000 bank loan keywords
/MFA/bad-credit.txt <- 4700 bad credit keywords

/Adult/free-porn.txt <- 8000 free porn keywords
/Adult/gay-porn.txt <- 5000 gay porn keywords

You get my drift. As far as keywords go ... for now I don't see any of them overlapping into another category but I can't say for sure in the future. I might want to categorize them for different affiliates so there is a possibility of them overlapping.

I would of course want to do things right but it might really hinder my learning process as I find that I need to learn more and more as I take each step. I can't say I'm a complete programming noob as I've done dumb old pascal back in the school days but can't even remember how pascal code look like now. So ... I'm a noob who forgot everything I learnt or rather didn't pay much attention.

My main goal is to learn PHP so I can automate stuff which I doing manual now as I really see a limit to what I can do purely by hand. This is why I came here and set myself to learn PHP by hook or by crook. I've attempted a few times before this and quite frankly gave up halfway.

This is also why I'm documenting everything here so I can look back at my progress and it would probably help other newbies in return. As I've started Project 1.0 and learnt that I need to use MySQL to store my data, I had to backtrack and create Project 0.5 (this one) and after asking questions about database design, I might have to backtrack again and start Project 0.25.

I have to put a stop to this or else I would run out of decimals for my project and I'd never get 1.0 done(which is only 20% of what I plan). For now all I can see is the possibility of the same keyword on different categories. I won't be storing any traffic data linked to my keywords .... YET (damn ... I'm never gonna get started)

I might want to store keyword data but I'm not sure if it would be within this same database. I want to keep track of performing keywords and reuse them again and again since they've been proven but I would have to leave this thought out of this picture first as I really need to start.

Is this below logical or I should just create one table with id, keyword and category? Since I only have 2 real fields, I think the reference table is redundant. Please correct me if I'm wrong.

tbl_category
- id [Autoincrement]
- category

tbl_keyword
- id [Autoincrement]
- keyword

tjx_category_keyword
- id [Autoincrement]
- fk_Category
- fk_Keyword
Logged
gnarlyhat
Journeyman
***
Offline Offline

Posts: 51



View Profile WWW
« Reply #6 on: November 02, 2007, 09:36:47 AM »

Damn I totally forgot about my seed field. I think leadergr00t kinda hit it with his.

id (primary key, auto_increment)
category
seed
keyword

Is this ok or should I have a reference table?

Or fuck all and go back to normal .txt files for my keyword lists management and then go MySQL for my Domain/Subdomain Creator, Mass Installer, Multi Spammer & Site Stats/Rank Checker ??
« Last Edit: November 02, 2007, 10:09:53 AM by gnarlyhat » Logged
DangerMouse
Expert
****
Offline Offline

Posts: 244



View Profile
« Reply #7 on: November 02, 2007, 01:24:36 PM »

I know where your coming from, the scale does seem quite daunting (thats possibly why I've got round to very little activity and am still learning!) but I would still argue that its easier to get it right now than fix your system in future. Sometimes something is so fundamental (keywords come into this category imo) that a quick fix isnt the way to go, plus it makes sense to not to develop any bad habits to start with. In the long run your actually saving time, while building a more powerful system.

You get my drift. As far as keywords go ... for now I don't see any of them overlapping into another category but I can't say for sure in the future. I might want to categorize them for different affiliates so there is a possibility of them overlapping.

For example, I would say that the two examples you mention do overlap, they are both adult keywords - "free gay porn" falls under both headings. Thats where the "junction" table comes in.

You seem to have a vision for the future of your system, the best approach i can suggest is to grab a big bit of paper and draw your perfect system in a combination of mindmap and flow chart format, this should show how all the different parts rely on each other (try to include everything, data, scripts, servers). Even though you may not want to build all the elements of the system straight away, this should help you design your data structure so that you can implement it when you have time.

I'm still learning so this could be dodgy advice, and everyone learns differently so I guess it could be better for you to chase quicker results and build on that success but personally I'm too lazy to build something and then rebuild it again in future.

DM
Logged
leadegr00t
n00b
*
Offline Offline

Posts: 8



View Profile WWW
« Reply #8 on: November 02, 2007, 07:50:30 PM »

Ideally you have a lookup table for category, and a lookup table for seed.
In practice, in this example, they are only going to hold the category name, so they aren't really adding any value.
*So* to keep it simple for your learning curve, I would use the single table I suggested.

then you can do:
SELECT * FROM table WHERE category = 'bankloans'
and
SELECT * FROM table WHERE seed = 'something'

The other thing to do (when you get to the PHP part Smiley) is to wrap your calls in functions so that you can change your db design easily as it grows more sophisticated (like, you will add counts and dates at some time, and you might eventually pull keywords out to a separate table).
This will mean you only need to change the PHP side in one spot.

Simple but expandable will get you out the door quickly but not be bad to 'upgrade' later Smiley
Logged

No links in signatures please
gnarlyhat
Journeyman
***
Offline Offline

Posts: 51



View Profile WWW
« Reply #9 on: November 03, 2007, 12:10:13 AM »

I'll take that before I change my mind. Don't think I would get any work time till Monday ... I hate doing work and end up interrupted when junior cries and mummy starts to bitch about me not spending enough time with them yadda yadda yadda. Thanks all  I need to roll one now.
Logged
gnarlyhat
Journeyman
***
Offline Offline

Posts: 51



View Profile WWW
« Reply #10 on: November 03, 2007, 06:29:29 PM »

One question. Is there a limit to the number of rows in a table? TIA
Logged
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #11 on: November 03, 2007, 08:08:03 PM »

i think the limit in myISAM is 32 bit, and inno is 64, both unsigned. Not positive.
so 4294967295 and 18446744073709551615 respectively.
but WAY before that, you are most likely running into hardware issues.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
gnarlyhat
Journeyman
***
Offline Offline

Posts: 51



View Profile WWW
« Reply #12 on: November 03, 2007, 08:51:40 PM »

I don't know what those two are but the lowest number sounds more than enough for me. Thanks nutballs Smiley
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #13 on: November 03, 2007, 10:46:41 PM »

I don't know what those two are ...

That would be a bazillion and a gazillion.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
gnarlyhat
Journeyman
***
Offline Offline

Posts: 51



View Profile WWW
« Reply #14 on: November 04, 2007, 06:23:05 PM »

I meant myISAM and inno. But nonetheless now I know what's a bazillion and a gazillion ... would probably forget it after 24 hours.

After thinking about it the whole weekend (yeah i know it's lame), I've decided to revert back to flat file for my keywords but what I learnt from mySQL tutorials will definitely be used in my future modules. The reason being, I do share my keyword lists with other desktop programs I use and they only accept them in .txt files. I know it's possible to export from MySQL to .txt but it's even easier to upload and download a bunch of preformatted txt files via FTP. I can't imagine how I would import my existing lists (about 500+ lists) into mySQL in a easy way rather than upload everything via ftp in a dir and write a script to import everything into mySQL ... and if I need to grab the lists in txt I would have to write another script to export it into txt and then download.

I've stalled this long enough and really need to get moving before I loose interest and end up doing nothing. Thanks for all your help though. I would definitely use mySQL on my next module onwards ... Domain/Sub domain creator. Thanks once again.
Logged
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!