The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 19, 2019, 02:52:42 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: short version of how to do an ubuntu LAMP cluster?  (Read 5000 times)
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« on: May 12, 2009, 10:08:15 PM »

Anyone have a brief overview and pointers on doing an ubuntu LAMP cluster?

I really like the idea of being able to just scale with more machines, But I also dont want to add TOO much extra complexity. I am hoping its a pretty easy affair.

I am going to try to get a virtual cluster set up on my 'doze box in VMware, simulating the potential environment. But before I even bother wasting time I dont really have on it, I am looking for some tips.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
jammaster82
Lifer
*****
Offline Offline

Posts: 666


Thats craigs list for ya


View Profile
« Reply #1 on: May 13, 2009, 05:52:40 AM »


Logged

The watched pot, never boils... But if you walk away from it , the soup burns.  What gives?
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #2 on: May 13, 2009, 06:15:44 AM »

Cluster for redundancy or performance?
Logged

hai
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #3 on: May 13, 2009, 07:44:12 AM »

Personal cloud? I think Claude @ GetNet has done it, might even still have one running. Might be worth a call there.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #4 on: May 13, 2009, 09:03:00 AM »

Good question V.

the thing is, Im actually not sure which is the correct answer... LOL

in all the reading I have done about clustering, there is an enormous amount of conflicted info.
performance, load balancing, high-availability, fail over, redundancy. They all get used intermixed it seems.

So...

What I want is N-clones. Technically, I want N-masters.
I want changes that are made in the database of node-N to replicate to node-notN.
I want changes that are made in the directories of node-n to replicate to node-notN.
I want traffic to be distributed among the nodes.

Now the distribution I have handled. That is simple round-robin, built into IPcop, or load balancing via a plugin for IPcop, or i switch to PFsense that has it built in.

And, I want to be able to plug in a new machine, adding it to the cluster, and say "hey your now part of the clan, go forth and become clone!"

So, WTF is that ACTUALLY called?
Distributed Load-Balanced Mirrors? LOL

@perk. thanks. I will possibly give him a call, after this thread hits a wall.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #5 on: May 13, 2009, 09:37:02 AM »

I think it depends on if you truly want the exact same thing on every server, or you're OK with the cloud notion of a sort of "striped" processing environment, if you catch my meaning. You're right about the intermixing of terms and ideas, I don't think it's really standardized in anyone's brains yet. I think "Distributed Load Balanced Mirrors" is as good a description for a processing cluster as I've seen.

I've called my load-balanced processing chain a cluster, because the renderers are all behind a load balancer. They are identical and easily cloned, but they all talk to a central database because that is SO VERY MUCH easier than master-master-master relationships. The database is master->slaved for near-line failover, but there's only one living instance at any given time.

I've given that up though (it still lives but is steadily being deprecated as I move sites off that framework) because I like the velocity of single-machine-single-site-single-database rigs. My newest stuff is turning out pages in 4-6/1000s so I can handle lots of pages per machine pretty effortlessly. Then I back up the machine and I'm good to go. Eliminate the complexity, hone the process and you've got what you want. This is all PHP and stored procedure stuff, so it can be done meng, I assure you. In fact, if I took some of my tracking, cloaking SEO and magic out of the pages, they'd be in the 1-2/1000 per page without me breaking a sweat. At that rate, you'd need to have some WICKED benefit in a cluster for me to implement all of that complexity for a higher cost-per-page ratio.

If the real goal though is processing abstraction like a cloud, then I get it, but I'd still be curious why you'd want to implement that, unless you're back to rendering vShrapnel from vExplosions for the university and can job-out the bits to multiple processors. I'm not sure I see where you'd benefit from this - unless your cluster is actually geographically abstract ie., nodes in different hosts, in which case that's an entirely different discussion.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #6 on: May 13, 2009, 10:30:03 AM »

Part of my reason is to no longer care which machine something is on.

I currently run a DB server, a Webserver, a Failover box (which does not run crons to keep load down and can be a bit out of sync depending on the time of day.).
The problem is my DB is POUNDED on. I am WAY overusing that box, and WAY underusing the webbox, and the failover is just sitting there laughing at me.

So if I went your route, load a machine up with sites until it is utilized, then start a new box, it ads a level of complexity to the daily grind, that I could avoid if it was just "1 big ass machine".
I have a potential scenario lined up that may just cause that problem for me, however would not use much DB at all.

So in my case, I have 2 types of sites. Database heavy massiveDB munching/crunching/grinding, and lightweight fast as fuck quick smallDB queries. The small ones obviously dont matter, but they suffer due to the monsters.

Really it is a cloud that is ideal. But even with the big clouds, I dont think they have figured out distributed database either. I think when you run MySQL, it runs on 1 box, thats it.
So i am pretty sure I am pipedreaming here.

Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #7 on: May 13, 2009, 11:38:10 AM »

Perhaps an answer is to simply bump up the DB box. Consider a quad-core with hyperthreading, a bunch of RAM and your MySQL instance will sing. Or even a dual-quad with hyperthreading and you've got the equivalent of 16 procs working for you. Add 4 NICs and I think you'll be amazed how much cooler that box runs.

I'd also be curious about the efficiency of your queries, if you're that pounded.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #8 on: May 13, 2009, 01:17:29 PM »

my DB is dual Quadcore Xeons with 8gb ram, though I am going to boost to 16gb.

Now, do I HAVE to run the DB that hot? no, of course not. But i do if I want to do the things I want at any speed worth my energy. <evil maniacal laugh>

There is no public facing blackhat on my rack, however... there is my mothership.

The one thing I do question is if the database is running multithreaded. How would I verify that? Because even so, it does seem like it is slower than it should be. There are bottlenecking queries and such. But aside from that, a FullText lookup on 10million rows is not a lightweight query by any stretch...
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #9 on: May 13, 2009, 02:52:06 PM »

Right on - you're blistering THAT machine?
Sounds like the world domination plan proceeds apace. Well done Wink

MySQL is multithreaded by default, I don't know how to make it run in a single thread.

A fulltext on 10MM rows really shouldn't be that bad, because it's an indexed search. Unless you're doing whole bunches at a time.

Is there an opportunity to put some of the work into stored procedures? That sure has reduced my machine load.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #10 on: May 13, 2009, 03:52:17 PM »

LOL

I have pegged that thing scale high, to the point that I couldnt even log in. Had to have Jeremy go hit the power button for me.

I figured as much about the threading.

10mm rows of fulltext is actually a surprising hurdle.
the table is only 1 field.
that field is a sentence, varchar(333) (utf8)
Each is unique, so...
PRIMARY      10988139  Sentence
FULLTEXT    1044376  Sentence

That beast has inserts and fulltext searches.
my inserts are:  insert IGNORE into sentences (Sentence) .....
my FTs are: SELECT Sentence FROM newsentences WHERE MATCH(Sentence) AGAINST ('some phrase here')>10 ORDER BY RAND() LIMIT 5

now the problem is of course the rand... but... I need a random sampling of the sentences that match over a score of 10.
I cant do it in PHP because some phrases return thousands of rows and its way slower.

2-3 second execution time.

I am considering stored procs, but i really don't relish the idea of retooling this damn thing... again...
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #11 on: May 13, 2009, 11:36:48 PM »

Ugh, the order by RAND thing again.

You know, I'm gonna have to apply some time to that problem because you've been the unwilling participant in that Dirty Sanchez chunk of code for a long time. There's got to be a way to use stored procs to help you, without re-tooling and dropping that per-query cost down.

Hmmm...
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #12 on: May 14, 2009, 08:50:40 AM »

i have tried everything I can think of.
the one with the most promise was caching result sets in a table, but that was pointless. Works in testing, but not in live, because the queries are so random, filling the cache table faster than the source table gets filled. Plus, requeries are quite low, so that makes caching less useful.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #13 on: May 14, 2009, 09:33:38 AM »

The only way that I've done this that seemed to make sense was to get the total count of the output set, then randomize an array for the number of results I want, then do a SQL multi query using the LIMIT as a record pointer. The output that went to the DB looked like this:

select * from mytable limit 4201,1;
select * from mytable limit 13,1;
select * from mytable limit 724,1;

etc, you get the idea. The booger here is that your result set is never fixed, and doing that from PHP would be a lot of thrashing. I'll bet if you put that sort of code into a SP it would work great, because, unlike everything else, the SP has direct access to the cached query. So if you stored proc did the original query, then asked for the count, it would take WAY shorter than anything else because of the way SPs work. Then you could use RAND() right there to get (x) number of results and select from that table. I wonder if a cursor could be used as well, which would then prepare and compile the query.

More thinking to do...
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
nutballs
Administrator
Lifer
*****
Offline Offline

Posts: 5627


Back in my day we had 9 planets


View Profile
« Reply #14 on: May 14, 2009, 11:06:01 AM »

in my experiments this this best, non-SP way to do it.

grabbing the MATCHes into a temptable, then doing a rand() on that as a seperate step is slower.
Grabbing the MATCHes IDs into a template, then doing a rand() on that is slower.
grabbing the matches into a PHP array, is WAY slower.


Normally, If i had a big table of uniques, and I only needed some random items from the table, I would just select random IDs between 0 and MAX(ID). But like  you pointed out my results are never based on a fixed data set, and they are not in sequence. Its a full index scan every time.
Logged

I could eat a bowl of Alphabet Soup and shit a better argument than that.
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!