Great tip there with the "has knowledge of..." thought process, thats got me thinking more about the relationships between my objects. I'm not too worried about the database as I'll factor that out to a different code layer and map the results back and forth.
excellent - the DB mechanics must be secondary to the schema.
I'm not so worried about the overhead with duplicating keywords, but I did want to track the SERPS for most of the keywords, and I don't want to be scraping twice. Maybe a some kind of find distinct query then update the relationship to result pages across all relevant keyword rows would sort this though?
Perhaps an aggregation table or view here. If you grab a distinct keyword from the keywords table, you could then go backwards up into the hierarchy to find the sites/projects that make use of <this keyword>. This is a little more convoluted, but once you got the mechanics of it down you can maintain a proper IS A/Has A hierarchy yet get the answers you are looking for. So if you scrape for keyword <x> then use this method to look up into the DB for sites that make use of it, you can scrape just once.
When I was thinking of categories, I was thinking in terms of 'type' of keyword, so I'm not sure if they're necessarily related to site, but more to project. I was thinking in terms of "Information query", "Buying query" etc as categories?
... which is why, IMO, sites should make use of categories, which contain keywords. However, in this case, "buying query" is not very helpful at first I think... I thought you meant things like cat:jewelry -> keywords:gold,silver,turquoise,amber,green amber,bracelet,necklace etc. Because why would you categorize "silver" in a buying category and not "amber?"
When you model these things in your code do you usually represent each element as an object? Creating a class for a 'project' when it just has a name and acts as a container seems at little overkill, although it is a logical representation.
Interesting that you ask this - I did when my frameworks were all compiled systems. It made a lot of sense to do this because I could pull data down, keep it live and pass it around to lots of handling functions and such. But with scripted code (MySQL stored proces, PHP, Javascript, Ajax) I find that less is more, so I don't do it nearly as often. In some cases I do - for example, my shopping cart is a class that keeps it's data in a serialized array in the session, then when I need to make use of the data, I unserialize the data into an object. But for most jobs, I do more straight up CRUD database work. As I mentioned in your other thread though, I am starting to use call thisProc() a lot more as I move DB-specific logic to stored procedures, functions, triggers and views. In some ways, I view the DB, PHP Renderer and Client as sort of "macro objects" that each have a pretty large property and method set. This way of thinking helps me decide where logic belongs.