The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. September 18, 2019, 12:37:36 PM

Login with username, password and session length


Pages: [1]
  Print  
Author Topic: script to pull all thumbs from a page.  (Read 2113 times)
nop_90
Global Moderator
Lifer
*****
Offline Offline

Posts: 2203


View Profile
« on: June 06, 2007, 05:05:02 PM »

uses libxml2dom
http://www.boddie.org.uk/python/libxml2dom.html
basically you supply it the url of the gallery
the complete url of all the thumbs will be returned as a list.

Code:
def get_thumbs(url):
    tree = libxml2dom.parseURI(url,1)
    anchors = tree.getElementsByTagName("a")
    #(g_proto,g_netloc,g_path,g_params,g_query) = urlparse.urlsplit(url)
    result = []
    thumb_exts = [".jpg",".gif",".avi",".mpg",".wmv"]
    for anchor in anchors :
        href = anchor.getAttribute("href")
        #print urlparse.urlsplit(href)
        (proto,netloc,path,params,query) = urlparse.urlsplit(href)
        (root,ext) = os.path.splitext(path.lower())
        if thumb_exts.count(ext)>0 :
            imgs = anchor.getElementsByTagName("img")
            if len(imgs)>0 :
                img_src = imgs[0].getAttribute("src")
                result.append(urlparse.urljoin(url,img_src))
    if len(result)<=0 :
        print url
        raise "Error"
    return result
Logged
Pages: [1]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!