The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 17, 2019, 11:54:37 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: Parallel processing - the right way  (Read 2396 times)
kurdt
Lifer
*****
Offline Offline

Posts: 1153


paha arkkitehti


View Profile
« on: September 18, 2009, 04:34:45 AM »

I have a project that requires A LOT of parallel processing. It's going to be created with C++ (finally learning that) and it's going to be GUIless database processing program that should run with MySQL in any Linux distro.

I haven't ever done parallel programming before so what's the right way to do it? I'm afraid I go "too small" with the independent pieces that I sent to different processing units. Where the dividing should happen? Is it in function level or per executable line like $foo = $a+$d+$c or what?

The best way obviously is to first do my own Grand Central Dispatch (or what it was called) but I'm not sure how to include that in functions? I only have bits and pieces from different systems so I'm looking some kind of guidelines how to glue it together.

One way I was thinking was that let's say I have my own GCD class that monitors the task load and I have queue() function that queues tasks. Now do I always have to call queue() and use my task function (let's say sum($a,$b)) as input or is there way to get this done automatically while running the application.

Is there any definitive books about the subject?

Btw, sorry if this is a mess because I know I have to do this with parallel computing but I have no clue even what to ask... off to do my own research but please do reply if you have anything to say Smiley

*edit* Wikipedia has excellent article about parallel computing.. read that if you don't understand what I'm saying Smiley
« Last Edit: September 18, 2009, 04:40:05 AM by kurdt » Logged

I met god and he had nothing to say to me.
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #1 on: September 18, 2009, 06:01:20 AM »

Ew

Doing it "right" requires that you make everyone compile it, since doing it the right way means kernel-level threading.
Logged

hai
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #2 on: September 18, 2009, 08:55:40 AM »

What exactly do you mean, "a lot" of parallel processing? And what type of app? Internet based (ie., scraper spider) graphics renderer... or are we to assume it's a big mining project since you've brought up hadoop?
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
kurdt
Lifer
*****
Offline Offline

Posts: 1153


paha arkkitehti


View Profile
« Reply #3 on: September 18, 2009, 09:36:54 AM »

What exactly do you mean, "a lot" of parallel processing? And what type of app? Internet based (ie., scraper spider) graphics renderer... or are we to assume it's a big mining project since you've brought up hadoop?
There's a lot of mining but data analyzing is the main problem, you guessed that part right. Application however shouldn't matter that much since it's all about designing principles. The problem with big tasks and a lot of processing units seems to be synchronization and dealing with deadlocks etc. problems. I think easiest and still beneficial way is to go with data parallelism. Bit-level is way too difficult and probably can't even be dealt with C++ very efficiently, or at least that's my perception. Data parallelism seems to be the most flexible of all choices.

vsloathe, what do you mean about requiring everyone compile it? It's my program, it's compiled already. Or am I missing your point? Smiley

 
Logged

I met god and he had nothing to say to me.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #4 on: September 18, 2009, 10:03:11 AM »

What he means is for you to write a configure script to create make and allow others to compile it on their own, so that it will work on any other machine.

Deadlocks etc infer that you have dependencies, which would be an immediate flag for me re. "lots of parallel." Mutexes, semaphores and critical sections are things you'll need to brush up on ... even if you don't use them (proper) you'll need to understand the nature of controlling parallel flow.

A parallel-processing data analysis tool hmm? As your first effort? Can you not share more here? Or is this a "just want to do it, because." kind of thing? I am always wary of lots of parallel as the first solution to a problem - usually I figure out the serial way to get an answer, then look to see how that can be converted/munged/divvied/objectified etc into multiple serial paths of programming to all meet up at the end for the answer.
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
vsloathe
vim ftw!
Global Moderator
Lifer
*****
Offline Offline

Posts: 1669



View Profile
« Reply #5 on: September 18, 2009, 10:06:55 AM »

It's really awesome to parallelize stuff and all that, when it's required.

But computers can't really do two things at once, they're just fooling us.
Logged

hai
kurdt
Lifer
*****
Offline Offline

Posts: 1153


paha arkkitehti


View Profile
« Reply #6 on: September 18, 2009, 12:27:11 PM »

Deadlocks etc infer that you have dependencies, which would be an immediate flag for me re. "lots of parallel." Mutexes, semaphores and critical sections are things you'll need to brush up on ... even if you don't use them (proper) you'll need to understand the nature of controlling parallel flow.

A parallel-processing data analysis tool hmm? As your first effort? Can you not share more here? Or is this a "just want to do it, because." kind of thing? I am always wary of lots of parallel as the first solution to a problem - usually I figure out the serial way to get an answer, then look to see how that can be converted/munged/divvied/objectified etc into multiple serial paths of programming to all meet up at the end for the answer.
It's more like "just want to do it, because" type of thing. But at the same time I know that when the project gets to a certain point, planning from the start for parallel processing will pay off. Actually I was thinking the exact same thing about coding it as serial (still keeping mind the parallel) and then trying to mutate it to parallel when certain limits of processing time hits. I think it's just good to know your end goal when starting so you can take that in consideration when making important decisions that will be very costly and difficult to change later, like the structure of the whole program.

And about my first effort.. well, I have never been the type who writes hello worlds. I believe that nothing is impossible if you want to do it, only thing that limits you is your own beliefs. I still haven't found anybody who can prove me wrong. So to me this is just another thing I'm going to do and it really doesn't matter if it seems hard or impossible for me to do.
Logged

I met god and he had nothing to say to me.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #7 on: September 18, 2009, 01:20:27 PM »

It's more like "just want to do it, because" type of thing. But at the same time I know that when the project gets to a certain point, planning from the start for parallel processing will pay off. Actually I was thinking the exact same thing about coding it as serial (still keeping mind the parallel) and then trying to mutate it to parallel when certain limits of processing time hits. I think it's just good to know your end goal when starting so you can take that in consideration when making important decisions that will be very costly and difficult to change later, like the structure of the whole program.
Excellent, because parallel processing is, essentially, lots of serial paths running concurrently. You'll be well prepared to parallelize it when the time is ripe - or not, if you see that parallelizing it will offer no real benefit.


And about my first effort.. well, I have never been the type who writes hello worlds. I believe that nothing is impossible if you want to do it, only thing that limits you is your own beliefs. I still haven't found anybody who can prove me wrong. So to me this is just another thing I'm going to do and it really doesn't matter if it seems hard or impossible for me to do.
ROFLMAO Story of my life, right on man. some of my best contracts were when I had no frickin' idea how I would get it done, but I know computers from the bits up - so I knew it COULD be done. Case in point: ITTO and I had to learn OO methodology, C++, developing apps for Windows 3.1 and Borland's OWL framework in '91... and deliver a management app for telephone systems we'd never touched before in just 4 weeks.

kerSNAP, got the tee shirt.

Of course, we had to live in a rented apartment with cots and literally survived on delivered pizza and Motrin, but we done it.
I don't think he's ever forgiven me  ROFLMAO ROFLMAO ROFLMAO

Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
isthisthingon
Global Moderator
Lifer
*****
Offline Offline

Posts: 2879



View Profile
« Reply #8 on: September 18, 2009, 05:18:35 PM »

Quote
It's more like "just want to do it, because" type of thing.

Excellent, since going down this path for the first time on any kind of schedule would be  Need Help

Since it seems you're at liberty to choose the language I would recommend something a little more managed than C++ (C#, Java) diving in.  That's just my    But when spawning multiple threads, dealing with re-entrant issues (mutexes, semaphores, critical sections, etc.) and other parallel processing stuff the memory management aspect creates another moving target that can multiply the complexity of debugging deep issues.

As for a queue, I'm less of a fan of this approach since it still hangs on a single thread of consecutive instructions, even though worker threads can be launched before previous worker threads have finished.  However, using a completely bare bones queue to simply register what new requests have arrived and then letting each chunk of work proceed based on their own:

Priority
Age
System Availability
Table-span (degree of system impact while executing)

So rather than a queue and then FIFO, I'd add other factors including additional pre-processing for heavy requests, priority (affected by age), and others.  Heavy I/O queries can cripple a system but one IO intensive and three RAM/Calc types may live happily together.

The most important point IMO is to make sure that you can adjust away from a linear FIFO approach - at runtime, and based on real factors in the moment.

Logged

I would love to change the world, but they won't give me the source code.
kurdt
Lifer
*****
Offline Offline

Posts: 1153


paha arkkitehti


View Profile
« Reply #9 on: September 18, 2009, 10:41:49 PM »

Since it seems you're at liberty to choose the language I would recommend something a little more managed than C++ (C#, Java) diving in.  That's just my    But when spawning multiple threads, dealing with re-entrant issues (mutexes, semaphores, critical sections, etc.) and other parallel processing stuff the memory management aspect creates another moving target that can multiply the complexity of debugging deep issues.

As for a queue, I'm less of a fan of this approach since it still hangs on a single thread of consecutive instructions, even though worker threads can be launched before previous worker threads have finished.  However, using a completely bare bones queue to simply register what new requests have arrived and then letting each chunk of work proceed based on their own:

Priority
Age
System Availability
Table-span (degree of system impact while executing)

So rather than a queue and then FIFO, I'd add other factors including additional pre-processing for heavy requests, priority (affected by age), and others.  Heavy I/O queries can cripple a system but one IO intensive and three RAM/Calc types may live happily together.

The most important point IMO is to make sure that you can adjust away from a linear FIFO approach - at runtime, and based on real factors in the moment.
That array table based queue seems like very interesting idea. I think you could get pretty good job just be running the damn thing for few days while having crazy detailed monitoring on and then analyzing the data. After that you can easily assign priorities and categories like (IO heavy, IO low, CPU high, CPU low, etc) to different tasks that go thru queue. Then instead of you deciding, it goes back to the computer. When you do this enough, at least in theory it will optimize itself.
Logged

I met god and he had nothing to say to me.
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!