Project 0.5: Converting file based to MySQL

To The Live Thread

Thread: Project 0.5: Converting file based to MySQL

Back to category: Database Stuff

gnarlyhat

Hehe ... Before I can start Project 1.0 I need to

learn

how to use MySQL instead of a file.

I will be doing a simple Wordze API script and then store it into MySQL. First I would need to build me a table to store the list.

Being a complete noob at this, after reading thru a

tutor

ial ... I figured ... I need to specify the design the database first.

I would want to store my keyword lists into separate categories so it's easy to manage them. Say ... MFA keywords, Adult Keywords, etc.

In those of course I have my seed keyword and a huge list of keywords ... may reach 20K lines at times and I reckon 100K when I go crazy into

typo

s and shit.

Again ... please correct me if you think there's a better way of doing this. I will create a table each for the category.

TABLE MFA
id INT
seed VARCHAR
keywords TEXT

TABLE ADULT
id INT
seed VARCHAR
keywords TEXT

My only worry is ... will the TEXT field be able to hold that much keywords for me. As I mentioned ... somewhere between 20K to 100K lines.

TIA Applause

leadegr00t

Hey there!

Not sure you are approaching the subject correctly,.
Yes, design the db first (its a 'measure twice cut once' thing

) but I would have expected that each keyword would be one row in the db.
Perhaps a structure like this:
id (primary key, auto_increment)
keyword
category
seednum

so each keyword would ap

pear

once (ignoring a keyword ap

pear

ing in more than one cat)
categories would repeat frequently
and seed would be only a few variations across the rows.

does that help?

(Possibly first post, not sure - hi guys!)

DangerMouse

Hi gnarly,

Personally I'm a firm believer in standards driven code and structures, as such I'd suggest reading up on database normalisation before defining your structure - the rules may seem quite complex at first, but they will hep create an efficient model thats flexible and extensible. Similarly the SQL required may at first seem to be more complex, but using a full relational structure will pay dividends when you come to do more complex queries.

I would consider a structure like this:

<>tbl_category
- id [Autoincrement]
- category

<>tbl_keyword
- id [Autoincrement]
- keyword
- daily_searches
- other info directly related to keyword

<>tjx_category_keyword
- id [Autoincrement]
- fk_Category
- fk_Keyword

Basically this illustrates the many-to-many relationship between keywords and categories, a keyword can be in many categories, and a category can have multiple keywords. The junction table tjx_category_keyword stores this relationship based on the id numbers from the category and keyword tables (foreign keys).

Having never created a keyword DB I dont really know whether this serves as the most pragmatic solution for what your trying to achieve, plus it is probably a little too simplistic when you consider the relationship between keywords and keyword phrases but hopefully it stresses the importance of solid relational modelling.

Although the SQL to do this will seem like a baptism of fire at first there are tons of good MySQL syntax examples on the web so it shouldnt take too long to pick up.

Forgive me if i've miss read this and your previous post (assuming your new to DB design rather than just MySQL). Hope this helps a little.

DM

nutballs

go with DMs design, if you think you understand the concepts.

what he has outlined is a 'normalized' database design. You store only data in 'datatables' and relationships in 'lookup' or 'reference' tables. You store in each table information only directly related to the subject of that table.
Addresses are an example.
street,city,state. In and normalized database structure, the belief is that it would be silly to store all 4 of those in 1 table, since many streets exist in the same city. and many cities exist in the same state, resulting in a lot of repetition, such as the same city being repeated over and over. This makes the db unnecessarily bigger.

Fully normalize puts each type of data into its own table, and doesnt store the references in that table. So you would have a table for streets, a table for cities, and a table for states. Then you would create all the relationships in a relationship table by referencing the ID columns of each of those tables. Thus, fk_streetID, fk_cityID, fk_stateID. (fk stands for foriegn key, as opposed to PK, which is the primary key of a table like the ID column). So in the lookuptable you would store in each row, lookupID-fk_streetID-fk_cityID-fk_stateID. Personally, i feel this is overnormalized, and I would store the city and state FKids in the street table, but its easier to understand all broken apart.

Normalizing does not mean breaking EVERYTHING up though. Username and Password would not make sense to break up, since you could assume everyone will have a different password, and everyone will definitely have a different username.
Firstname,lastname, date of birth is the same thing, breaking it up doesnt get you anywhere.

keywords and categories, break it up. 1 keyword may have many categories AND 1 category will have many keywords. So normalize.

if you dont want to have to deal with JOINs, which is what you would need to do for a normalized database, you can just go flat.
keyword-category-whateverelseyouwanttostore.
but that results in a lot of redundancy, and bigger problems potentially, later on when you want to add columns.

perkiset

leadegr00t
Applause

dangermouse
Applause

nutballs

and welcome to the Cache leadegr00t!

This is really the way of things. You should really stop typing, read this stuff and get into a beginners MySQL book immediately. The small hiccup time in this beginning will pay off BIG BIG time in the very near future.

You should listen to these boys, they're all pre-med.

/p

gnarlyhat

That went a little way over my head. I should have paid a lot more attention in Database class. I really hated that part. Normalization ... didn't realize it would come back haunting me. To make it simple ... let me explain how I was doing it with flat files.

/MFA(category)/bank-loans(seed).txt <- 5000 bank loan keywords
/MFA/bad-credit.txt <- 4700 bad credit keywords

/Adult/free-porn.txt <- 8000 free porn keywords
/Adult/gay-porn.txt <- 5000 gay porn keywords

You get my drift. As far as keywords go ... for now I don't see any of them overlapping into another category but I can't say for sure in the future. I might want to categorize them for different affiliates so there is a possibility of them overlapping.

I would of course want to do things right but it might really hinder my

learn

ing process as I find that I need to

learn

more and more as I take each step. I can't say I'm a complete

programming

noob as I've done dumb old pascal back in the school days but can't even remember how pascal code look like now. So ... I'm a noob who forgot everything I

learn

t or rather didn't pay much attention.

My main goal is to

learn

PHP

so I can automate stuff which I doing manual now as I really see a limit to what I can do purely by hand. This is why I came here and set myself to

learn

PHP

by hook or by crook. I've attempted a few times before this and quite frankly gave up halfway.

This is also why I'm documenting everything here so I can look back at my progress and it would probably help other newbies in return. As I've started Project 1.0 and

learn

t that I need to use MySQL to store my data, I had to backtrack and create Project 0.5 (this one) and after asking questions about database design, I might have to backtrack again and start Project 0.25.

I have to put a stop to this or else I would run out of decimals for my project and I'd never get 1.0 done(which is only 20% of what I plan). For now all I can see is the possibility of the same keyword on different categories. I won't be storing any traffic data linked to my keywords .... YET (damn ... I'm never gonna get started)

I might want to store keyword data but I'm not sure if it would be within this same database. I want to keep track of performing keywords and reuse them again and again since they've been proven but I would have to leave this thought out of this picture first as I really need to start.

Is this below logical or I should just create one table with id, keyword and category? Since I only have 2 real fields, I think the reference table is redundant. Please correct me if I'm wrong.

tbl_category
- id [Autoincrement]
- category

tbl_keyword
- id [Autoincrement]
- keyword

tjx_category_keyword
- id [Autoincrement]
- fk_Category
- fk_Keyword

gnarlyhat

Damn I totally forgot about my seed field. I think leadergr00t kinda hit it with his.

id (primary key, auto_increment)
category
seed
keyword

Is this ok or should I have a reference table?

Or fish all and go back to normal .txt files for my keyword lists management and then go MySQL for my Domain/Subdomain Creator, Mass Installer, Multi Spammer & Site Stats/Rank Checker ??

DangerMouse

I know where your coming from, the scale does seem quite daunting (thats possibly why I've got round to very little activity and am still

learn

ing !) but I would still argue that its easier to get it right now than fix your system in future. Sometimes something is so fundamental (keywords come into this category imo) that a quick fix isnt the way to go, plus it makes sense to not to develop any bad habits to start with. In the long run your actually saving time, while building a more powerful system.

quote author=gnarlyhat link=topic=601.msg4026#msg4026 date=1194021164

You get my drift. As far as keywords go ... for now I don't see any of them overlapping into another category but I can't say for sure in the future. I might want to categorize them for different affiliates so there is a possibility of them overlapping.

For example, I would say that the two examples you mention do overlap, they are both adult keywords - "free gay porn" falls under both headings. Thats where the "junction" table comes in.

You seem to have a vision for the future of your system, the best approach i can suggest is to grab a big bit of paper and draw your perfect system in a combination of mindmap and flow chart format, this should show how all the different parts rely on each other (try to include everything, data, scripts, servers). Even though you may not want to build all the elements of the system straight away, this should help you design your data structure so that you can implement it when you have time.

I'm still

learn

ing so this could be dodgy advice, and everyone

learn

s differently so I guess it could be better for you to chase quicker results and build on that success but personally I'm too lazy to build something and then rebuild it again in future.

DM

leadegr00t

Ideally you have a lookup table for category, and a lookup table for seed.
In practice, in this example, they are only going to hold the category name, so they aren't really adding any value.
*So* to keep it simple for your

learn

ing curve, I would use the single table I suggested.

then you can do:
SELECT * FROM table WHERE category = 'bankloans'
and
SELECT * FROM table WHERE seed = 'something'

The other thing to do (when you get to the

PHP

part

) is to wrap your calls in functions so that you can change your db design easily as it grows more sophisticated (like, you will add counts and dates at some time, and you might eventually pull keywords out to a separate table).
This will mean you only need to change the

PHP

side in one spot.

Simple but expandable will get you out the door quickly but not be bad to 'upgrade' later

gnarlyhat

I'll take that before I change my mind. Don't think I would get any work time till Monday ... I hate doing work and end up interrupted when junior cries and mummy starts to bitch about me not spending enough time with them yadda yadda yadda. Thanks all Applause

I need to roll one now.

gnarlyhat

One question. Is there a limit to the number of rows in a table? TIA

nutballs

i think the limit in myISAM is 32 bit, and inno is 64, both unsigned. Not positive.
so 4294967295 and 18446744073709551615 respectively.
but WAY before that, you are most likely running into hardware issues.

gnarlyhat

I don't know what those two are but the lowest number sounds more than enough for me. Thanks nutballs

perkiset

quote author=gnarlyhat link=topic=601.msg4057#msg4057 date=1194148300

I don't know what those two are ...

That would be a bazillion and a gazillion.

gnarlyhat

I meant myISAM and inno. But n

onet

heless now I know what's a bazillion and a gazillion ... would probably forget it after 24 hours.

After thinking about it the whole weekend (yeah i know it's lame), I've decided to revert back to flat file for my keywords but what I

learn

t from mySQL

tutor

ials will definitely be used in my future modules. The reason being, I do share my keyword lists with other desktop programs I use and they only accept them in .txt files. I know it's possible to export from MySQL to .txt but it's even easier to upload and download a bunch of preformatted txt files via FTP. I can't imagine how I would import my existing lists (about 500+ lists) into mySQL in a easy way rather than upload everything via ftp in a dir and write a script to import everything into mySQL ... and if I need to grab the lists in txt I would have to write another script to export it into txt and then download.

I've stalled this long enough and really need to get moving before I loose interest and end up doing nothing. Thanks for all your help though. I would definitely use mySQL on my next module onwards ... Domain/Sub domain creator. Thanks once again.

Thread Categories

		Best of The Cache Home
		Search The Cache