The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. October 14, 2019, 06:39:43 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: Keyword Management  (Read 2420 times)
DangerMouse
Expert
****
Offline Offline

Posts: 244



View Profile
« on: July 10, 2008, 07:50:31 AM »

Hi all,

I know this has been discussed before, but I couldnt really establish exactly what I was looking for.

Broadly I'm looking for the best way to manage keywords, within a database - and its proving more challenging to establish than I first thought it would!

Assumptions:

Projects
- Independant of sites relating to a topic / sector.

Category
- A category of keyword; I want to split my keywords up by a perscribed 'type'. These types will be finate and defined. A keyword may only be of one type.

Tag
- As I'm limiting myself to allowing keywords to only fall under one 'type' I need another method of assigning attributes to them, thus a many-to-many 'tag' relationsip. This will be for things like tagging a keyword as appropriate for a specific demographic for example (old people versus young people!).

Sites - pretty obvious.

I was originally getting confused over the relationship between 'sites', 'keywords' and 'projects' but typing this I think it may have become a little clearer! I'd still appreciate any views though, despite knowing the theory I'm quite inexperienced at managing the process on a large scale - and I know many of you guys operate on a very large scale! Wink

Does it make sense to have 'projects' as an umbrella for both 'sites' and 'keywords'? - i.e. a keyword belongs to a project, not a site and sites belong to projects. So the relationship between sites and keywords is indirect. Does this make sense from the point of view of maintaining referential integrity?

I should say that this system will be linked to a SERP tracker. As a result of this I only want unique keywords to be stored.

How would you go about implementing a campaign system, possibly for PPC purposes. This brings in the idea of a keyword 'list' which may be generated using the category and tagging system, it would obviously be beneficial to store convesion data against keywords when they've been used for campaigns - I guess this data is linked not to the keyword but to the keyword plus its 'properties' ?

I feel like I'm rambling now so will shut up! I'd really appreciate any opinions, criticisms.

Cheers,

DM

BTW - this isnt for a BH project, I'm just trying to create a robust system.

BTW2 - how would you handle stemming? Well maybe not stemming, but where you tack on common words e.g. for localisation. Should these be stored as keywords in their own right? Maybe so if they same list will be used for SERP monitoring... :s
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #1 on: July 10, 2008, 09:22:43 AM »

Hey DM -

Man, huge amount of think to do regarding how you want to lay this out. I'd like to offer how I marshall my thoughts, rather than a straight answer - perhaps this will help.

My personal way of working through this kind of thing is to build an ownership / "knowledge of" tree, much like building an OO hierarchy. For example, looking at your plan, a project has sites, sites don't have projects - ergo, projects is the master and sites are the detail in that relationship. So from build the DB perspective I'd have ID as an autoincrementing integer in projects and then reference project_id in the sites table.

Keywords are part of a category, which is an effort in a site. I get the feeling you're worried about duplicating keywords in the DB, so you want to build a many-to-many relationship here, but that will cause you troubles. The amount of overlap or duplicated bytes in keywords showing up in two different categories will be small (relatively speaking) so I'd waste the disk space in the name of clarity.

Sites have traffic, but traffic has its own hierarchy. A surfer IS A human or a spider. A spider is a member of an engine. A surfer has a referrer, but referrers should be grouped up for reporting. So looking so far:

Projects are your root entity
* Sites are part of projects.
* Sites use categories
* Categories have keywords
* Sites see spiders
* Spiders are members of engines
* Sites see surfers
* Surfers can be distilled into hourly/daily traffic integers
* Surfer referrers can be placed into another table and referenced as a foreign key

This is a common challenge and one that many here have fought with a lot, I'm sure. I'm pretty set on my schema, finally, after a lot of trial and error over the last 13 years of web work. Hope this helps.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
DangerMouse
Expert
****
Offline Offline

Posts: 244



View Profile
« Reply #2 on: July 10, 2008, 10:13:12 AM »

Man, huge amount of think to do regarding how you want to lay this out. I'd like to offer how I marshall my thoughts, rather than a straight answer - perhaps this will help.

My personal way of working through this kind of thing is to build an ownership / "knowledge of" tree, much like building an OO hierarchy.

I'm certainly finding it alot to think about! The reason its now cropped up is that I've just changed my job to do eMarketing full time (from project management), so I now have the excuse I've been waiting for to draw all my knowledge into one place - however the problem turns out to be almost knowing too much about the different nuances etc.

Great tip there with the "has knowledge of..." thought process, thats got me thinking more about the relationships between my objects. I'm not too worried about the database as I'll factor that out to a different code layer and map the results back and forth.

Keywords are part of a category, which is an effort in a site. I get the feeling you're worried about duplicating keywords in the DB, so you want to build a many-to-many relationship here, but that will cause you troubles. The amount of overlap or duplicated bytes in keywords showing up in two different categories will be small (relatively speaking) so I'd waste the disk space in the name of clarity.

I'm not so worried about the overhead with duplicating keywords, but I did want to track the SERPS for most of the keywords, and I don't want to be scraping twice. Maybe a some kind of find distinct query then update the relationship to result pages across all relevant keyword rows would sort this though?

When I was thinking of categories, I was thinking in terms of 'type' of keyword, so I'm not sure if they're necessarily related to site, but more to project. I was thinking in terms of "Information query", "Buying query" etc as categories?

When you model these things in your code do you usually represent each element as an object? Creating a class for a 'project' when it just has a name and acts as a container seems at little overkill, although it is a logical representation.

Thanks for the tips.

DM
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #3 on: July 10, 2008, 11:05:53 AM »

Great tip there with the "has knowledge of..." thought process, thats got me thinking more about the relationships between my objects. I'm not too worried about the database as I'll factor that out to a different code layer and map the results back and forth.
excellent - the DB mechanics must be secondary to the schema.

I'm not so worried about the overhead with duplicating keywords, but I did want to track the SERPS for most of the keywords, and I don't want to be scraping twice. Maybe a some kind of find distinct query then update the relationship to result pages across all relevant keyword rows would sort this though?
Perhaps an aggregation table or view here. If you grab a distinct keyword from the keywords table, you could then go backwards up into the hierarchy to find the sites/projects that make use of <this keyword>. This is a little more convoluted, but once you got the mechanics of it down you can maintain a proper IS A/Has A hierarchy yet get the answers you are looking for. So if you scrape for keyword <x> then use this method to look up into the DB for sites that make use of it, you can scrape just once.

When I was thinking of categories, I was thinking in terms of 'type' of keyword, so I'm not sure if they're necessarily related to site, but more to project. I was thinking in terms of "Information query", "Buying query" etc as categories?
... which is why, IMO, sites should make use of categories, which contain keywords. However, in this case, "buying query" is not very helpful at first I think... I thought you meant things like cat:jewelry -> keywords:gold,silver,turquoise,amber,green amber,bracelet,necklace etc. Because why would you categorize "silver" in a buying category and not "amber?"

When you model these things in your code do you usually represent each element as an object? Creating a class for a 'project' when it just has a name and acts as a container seems at little overkill, although it is a logical representation.
Interesting that you ask this - I did when my frameworks were all compiled systems. It made a lot of sense to do this because I could pull data down, keep it live and pass it around to lots of handling functions and such. But with scripted code (MySQL stored proces, PHP, Javascript, Ajax) I find that less is more, so I don't do it nearly as often. In some cases I do - for example, my shopping cart is a class that keeps it's data in a serialized array in the session, then when I need to make use of the data, I unserialize the data into an object. But for most jobs, I do more straight up CRUD database work. As I mentioned in your other thread though, I am starting to use call thisProc() a lot more as I move DB-specific logic to stored procedures, functions, triggers and views. In some ways, I view the DB, PHP Renderer and Client as sort of "macro objects" that each have a pretty large property and method set. This way of thinking helps me decide where logic belongs.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 564


Where does this show?


View Profile
« Reply #4 on: July 10, 2008, 07:11:52 PM »

3x5 cards

Wink

Logged

"The most beautiful and profound emotion we can experience is the sensation of the mystical..." - Albert Einstein
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!