DangerMouse

Hello there,

I'm in the middle of creating my own spider system, and have taken great inspiration from Perks spider class. I'm planning on adding a few things to it, simple

seo

  analysis type things ... Yahoo backlinks, pagerank that sort of stuff.

I'm trying to model these things in terms of objects to correctly define my classes. I can understand how a "webpage" is an object, it has simple properties like title, meta tags amongst plenty of other things. I guess my question is would something like "pagerank" and the associated methods to acquire it be an object in its own right or part of maybe an "offsite factors" class?

More generally I guess I'm asking to what degree is it sensible to break things down into objects, how low level should you go?

Its a shame that all applications can't be as simple as the "dog" thats "brown" and capable of "barking". Tips appreciated.

Cheers,

DM

perkiset

Hey DM -

That's as broad a question as, "How should we build a building. I'm thinking it should have windows, and walls are good as well."

Sorry, not making fun, just pointing out that in OO thinking there is very little notion of "Right and Wrong." One of the beauties of OO thinking and architecture is that it can help you model your programmatic architecture around the way you perceive life/things around you.

For example: if I were to think about scrapers and an object hierarichy, it'd probably be something like this:
perksBaseClass
requestBase
httpRequest
serpBase
googleSerps
yahooSerps
askSerps
scrapeBase
blogScraper
socketRequest
secureSocketRequest
merchantProcessingRequest
linkPointRequest

I've added WAY more than you are asking so that you can see the way that I'd structure it - bear in mind that this is quick notion and not well thought out. Another unfortunate truism in OO

programming

  is that the 3rd time you write it is when you'll get it right...

Part of what you'll see in my hierarchies is levels where I can MIGHT need to add stuff. Take, for example, the perkBaseClass - I like having an ability to add something to the entire family if I need. Also, the names of the objects tend to outline what they will add to the family.

Although some might argue that this adds a lot of code to the compile process, you can see where I employ APC here. Here's the juice: if the code for a class is already compiled and ready to become and object, then you can have massively intricate and deep structures that work the way you think, and pay virtually no penalty for them. The VERY FIRST request to the system will, but after that, all the complexity of your class trees are ready to go - and they'll have a huge amount of capability available to them at essentially no processor cost.

Just a quick Applause
/p

DangerMouse

That actually helps quite a bit thanks Perk, and you're right the question was a little broad lol!

To a degree I've probably been getting caught up in the semantics of it all. While its a good idea to code to standards, part of the beauty of

PHP

  is its flexibility, maybe I should think more about how I'll want to use and extend my code (and objects) in future rather than breaking things down for the sake of it.

What started this off was reading some blog comment about class architecture being used as a different format for proceedural code, rather than a providing a true OOP implementation, got me thinking that to a degree this is what I've been doing with my little projects so far (Yahoo! Answers API wrapper for example). I'm trying to start thinking of classes as object blueprints rather than library components, this is straightforward in theory but seems to become a grey area in practice!

The idea of building objects on top of a Web Request object appeals, currently i've been initiating an instance of my web request class within the class definition for various scraper type objects, but like you say, it would be more efficient to just extend the webrequest as its essentially the base element of whats going on.

Hmm interesting stuff, thanks for the advice.

DM

nop_90

lol at 3rd time u get right Applause true.

there are lots of books on howto do shit, ussually crappy books on API etc
very few books on how to design shit

http://en.wikipedia.org/wiki/Design_Patterns
read that book quoted there.

DangerMouse

quote author=nop_90 link=topic=607.msg4090#msg4090 date=1194385085

there are lots of books on howto do shit, ussually crappy books on API etc
very few books on how to design shit


Yeah I'm starting to notice that Nop! Don't get me wrong I've got some decent info from books and

tutor

 ials, but tend to be able to find most of what I need with a simple google search. However theres a distinct lack of beginner information on how to structure code - its all very well knowing  how to create a class, how to extend one and how to use magic functions but theres very little on where the best place to use them is.

perkiset

quote author=DangerMouse link=topic=607.msg4089#msg4089 date=1194383597

The idea of building objects on top of a Web Request object appeals, currently i've been initiating an instance of my web request class within the class definition for various scraper type objects, but like you say, it would be more efficient to just extend the webrequest as its essentially the base element of whats going on.

I do that exactly. As you may or may not know, when you have a class hierarchy, the code for each "level" of the tree is compiled only once and referenced from then on. So if you have 10 classes built on a huge base class, you'll only have one instance of the baseclass code in memory and 10 little wrappers for the additions to the class. It's very cost-effective.

/p

nop_90

99% of the problem with good code is proper design structure.
Most people suck at that.

That is why you read the gang of 4 book (or a similar book).
It is like a book of blue prints / design patterns.

Not to say perks way is wrong (there really is no right or wrong, just as long as it works)
But you can see from his layout he comes from the "old skool" where for a class to be polymorphic is had to inherit off a common ancestor with same methods. (this is the way object pascal and C++ work, it has to be this way since they are compiled).

Not sure how

php

  works, but with languages like

perl

 ,ruby,

python

  it does not have to be this way. And you can harness the true power of the language.

So i would make a webclient class.

<>TWebclient

  • get

  • post



Then I make a seperate class for each search engine.
But inside that class would be like the webclient
and each class would have on function called search
so it would look like this

class TSearchGoogle {
        TWebclient webclient;
        TSearchGoogle {
                  webclient = new TWebclient();
                  }
        search(query) { return results}
}
to make other class for other SE just make exactly the same.

now since you are using scripting language you can do cool shit like this

function search_engines(engine_classes,query) {
        for engine_class in engine_classes {
                  engine = new engine_class()
                  result = engine.search(query)
                  }
}

I would then use like this
search_engines([TSearchGoogle,TSearchYahoo,TSearchMsn])

So i have advantage of perks system, if i need to change my webclient code, it will propagate down.
But on other hand i need a lot less classes

perkiset

@ OldSkool - it's a fair cop  Applause

I am confused, however, if we look at simply the last layer of the HTTP arm of my hierarchy, would this not be very similar? I was outlining that you'd have an HTTP request class that you would add to it the understanding of serps, and then add to that the understanding of Google, or Yahoo or such. So you have one body of code that understands HTTP, then perhaps the next layer is simply abstract functions like getTop10() or an array for the actual serps - no real implementation code, just the structures that the next layer will implement, specific to the idioms of each engine that you scrape. I dunno, just shitkickin.

As an interesting point of research, it is good to look at the Borland Delphi/C++ VCL hierarchy, in which *every single* class in the library ultimately descends from TObject. It is an excellent way to see some REALLY good design of a huge library, although it's a fantastic amount of overkill in about 99.9% of situations. But it's fair to say that my design methodology was hugely influenced by that work.

DangerMouse

This is a really interesting discussion, highlighting that there seems to be no right or wrong approach.

I've just been taking a look at this: http://framework.zend.com/manual/en/zend.service.flickr.html- they seem to provide results as iterations of an object itself - I've never thought of constructing classes like this (being a noob) and was just wondering what your views on this approach were?

DM

stefan

Yes interesting discussion! Wanna say something as well. Even though I don't do

PHP

  but rahter C#/C++ etc...

Agree on no rights or wrongs - my basic idea is that noone can tell *me* what's best practice about anything - and that goes with object modelling as well. I'm looking for best practice for ME and that
might be totally wrong for others. I am lucky enough to be alone in my shop and therefore have no boss telling me what's wrong or not.

My best tip is to yes do read up, yes listen to others, but code/code/code and you'll find
a model that works for YOU. And that's the most important.

My second idea when it comes to object modelling, and in the end mapping against physical storage (a database in 99% of the cases for me) is

a) DONT START WITH THE MODEL - start with writing your client. I.e think how the you want the code using the objects to look like

o = new googleSerps();
o.Run();  (or however perk was thinking it should be used - sorry if I am completely wrong...)

or maybe you fancy the notion of nop_90

search_engines([TSearchGoogle,TSearchYahoo,TSearchMsn])

Cause the purpose of the object model is to be USED - not to be the perfect replica of the "real world" objects behind it. Cause you will need to do changes either way,
your requirements will change - or the WORLD (i.e the objects, or even the meaning of the objects) will change since the world around us does indeed change. 

b) I will mess this serp example up even more and say we have a database as backend. I typically create a class with EXACT mappings to the database fields (or sp result fields) - using a codegenerator.
Then I made a new class which inherits from that. "Business layer"  where I put functions and properties. That's my base I start with at least.





nop_90

quote author=perkiset link=topic=607.msg4096#msg4096 date=1194394032

@ OldSkool - it's a fair cop  Applause

Old skool not bad skool Applause

quote author=perkiset link=topic=607.msg4096#msg4096 date=1194394032

I am confused, however, if we look at simply the last layer of the HTTP arm of my hierarchy, would this not be very similar? I was outlining that you'd have an HTTP request class that you would add to it the understanding of serps, and then add to that the understanding of Google, or Yahoo or such. So you have one body of code that understands HTTP, then perhaps the next layer is simply abstract functions like getTop10() or an array for the actual serps - no real implementation code, just the structures that the next layer will implement, specific to the idioms of each engine that you scrape. I dunno, just shitkickin.

the end result will be the same Applause. It is what happens behind the scenes.
End result is that we have 3 classes, all with 1 function called search.
My previous example kinda sucked i hide the webclient object inside the class it really does not matter whether you do it perks way or mine.


class SearchGoogle :
        def search(query):

class SearchYahoo :
        def search(query):

class SearchMsn :
        def search(query):

behind the scenes in perks case (the delphi case) they all inherit off serpBase which has to have a virtual function called search.
in my case they are all 3 independant classes, they only thing they have in common is one member function called search.

In my case you can do intresting things like this.
@dangermouse yes i do stuff like that all the time, that is power of dynamic language.

seConstructors = [SearchGoogle,SearchYahoo,SearchMsn];
query = "

blackhat

 

seo

 ";
results = [];
for seConstructor in seConstructors :
    se = seConstructor()
    results.append(se.search(query))  <---- notice here regardless of what ever the object is, i am able to call the member function search

The advantage of my method is it eliminates 1 class. Smaller code means quicker coding time etc, also easier to debug etc.
Disadvantage of dynamic languages is that type errors will not be caught.

quote author=perkiset link=topic=607.msg4096#msg4096 date=1194394032

As an interesting point of research, it is good to look at the Borland Delphi/C++ VCL hierarchy, in which *every single* class in the library ultimately descends from TObject. It is an excellent way to see some REALLY good design of a huge library, although it's a fantastic amount of overkill in about 99.9% of situations. But it's fair to say that my design methodology was hugely influenced by that work.

Always a good idea to look at how others solved the problem.
And i suspect that the design of

python

  was strongly influenced by Delphi.
He basically took ideas from lisp and then combined them with delphi.

python

  was the first mainstream language to do this. (

perl

  objects are a whole different ball of wax, basically you can take any type (hash,array,etc) and "bless" it into a class)
All objects in

python

  inherit off a common object (this occurs automatically)
Inside that

python

  parent object are functions allow you to query/manipulate the classes/objects at runtime (a lisp idea).

This allows you to do extra cool shit like http://pyro.sourceforge

.net

 /
You can make a class that constucts another class that you specify. While the class is being constructed (since you can query it at runtime etc), you can stick "hooks" in there to intercept the calls. Basically that is what pyro does.

http://psyco.sourceforge

.net

 / is even cooler.

python

  compiles to a virtual

mac

 hine. psyco then gets the VM code at runtime first time function is executed. it then checks to see if it can be compiled into assembler. it possible it will do so. hence massive speedup in code.

With

python

  the biggest thing they stole from lisp is the idea of a console which you can attach to a running process.
(there are implementations that allow you to attach a ssl

python

  console to like a webserver)
you can then poke and prode arround in the insides while the server is running

I suspect that

php

  allows similar things to be done.
(a casual glance at the

php

  manual shows this http://www.

php

 

.net

 /manual/en/language.oop5.reflection.

php

 Applause
But these things are not being used or very little

example of

python

 's xmlrpc
http://docs.

python

 .org/lib/xmlrpc-client-example.html
because the code use interspection (not sure if that correct term, makes itself at run time).
xmlrpc is a snap to use, basically you set the server, and then you can call xmlrpc functions just like you call regular ones.
magically at runtime when function is called using rpc it will magically convert into the correct

python

  types or vice versa. (

perl

  similar story).
It is not really magic Applause. Basically when the function is called it inspects the arguements, and then converts them into the proper xml representation.
When the call is done it does the exact reverse.

This is not problem with

php

 , it is how people use it.
Part of the problem with

php

  is that it's major selling was basically C++ using a scripting language.
Yah it sounds good when u sell the idea to the guys upstairs,
But the consequence is that you lose the advantages/safety of delphi where you have compiler checking for proper types etc.
And you do not gain any of the advantages of the language.



Perkiset's Place Home   Politics @ Perkiset's