The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register. October 14, 2019, 06:36:16 PM

Login with username, password and session length


Pages: [1] 2 3
  Print  
Author Topic: Perk's Spider Source  (Read 12928 times)
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« on: April 26, 2007, 04:30:31 PM »

How To Use It

You'll need to create some MySQL stuff first - the SQL is in following posts. Then you'll need to "seed" the crawler with the first domain and first page that you want to crawl. In this case, you'd want the domain to be <your domain> and the starting page to be '/'. The crawler will take it from there.

I run this from the command line. Reading through the code you'll see that it pops a couple different characters up as it walks through its job. At the end, it will have created a list of records in the crawl_pages table that are either found or not found, a compressed version of their content and the page that pointed to <this page>.

In this particular case, I keep the tk_pages table TEXT indexed so that I can use the data for searches on my own sites.

Unfortunately in my last playing around with the code I jumped pretty much into a PHP5 mentality, so the crawler will not longer work in 4.anything.

CAVEAT: I have stripped some of the really proprietary stuff out, but there is still stuff here that will not pertain to you OR you might wonder WTF is THAT there for... in which case please post and I'll fill you in.

IMPORTANT: I am expecting that if you take this code, grow it, change it or enhance it you'll let me know and let it grow here. Although there is no license agreement on it, it should all be considered GPL.

/p
« Last Edit: April 26, 2007, 05:02:49 PM by perkiset » Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #1 on: April 26, 2007, 04:31:19 PM »

This is the domains table. Put the domain you want to crawl in here. the ID is autoincrement, so you just need to add the domain name (fully qualified like www.this.com) - you'll be using the domain id <here> in the other tables.

Code:
-- phpMyAdmin SQL Dump
-- version 2.8.2.1
-- http://www.phpmyadmin.net
--
-- Host: localhost
-- Generation Time: Apr 26, 2007 at 05:30 PM
-- Server version: 5.0.24
-- PHP Version: 5.2.1
--
-- Database: `temp`
--

-- --------------------------------------------------------

--
-- Table structure for table `crawl_domains`
--

CREATE TABLE `crawl_domains` (
  `id` int(11) NOT NULL auto_increment,
  `domain` varchar(128) NOT NULL default '',
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=latin1 AUTO_INCREMENT=2 ;

--
-- Dumping data for table `crawl_domains`
--

INSERT INTO `crawl_domains` (`id`, `domain`) VALUES (1, 'www.myfirstdomain.com');
« Last Edit: April 26, 2007, 04:40:38 PM by perkiset » Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #2 on: April 26, 2007, 04:34:01 PM »

This is the pages table - you will need to seed it with domain id 1, the page being '/' for the spider to start at your root page.

Code:
-- phpMyAdmin SQL Dump
-- version 2.8.2.1
-- http://www.phpmyadmin.net
--
-- Host: localhost
-- Generation Time: Apr 26, 2007 at 05:32 PM
-- Server version: 5.0.24
-- PHP Version: 5.2.1
--
-- Database: `temp`
--

-- --------------------------------------------------------

--
-- Table structure for table `crawl_pages`
--

CREATE TABLE `crawl_pages` (
  `id` int(11) NOT NULL auto_increment,
  `siteid` int(11) NOT NULL default '0',
  `url` varchar(128) NOT NULL default '',
  `referrer` int(11) NOT NULL,
  `crawlstate` tinyint(4) NOT NULL default '1',
  `crawlfound` tinyint(4) NOT NULL default '0',
  `lastcrawl` datetime NOT NULL default '0000-00-00 00:00:00',
  `pagetitle` varchar(254) NOT NULL default '',
  `avatar` varchar(254) NOT NULL default '',
  `searchblob` text NOT NULL,
  `lastping` datetime NOT NULL,
  `nextping` datetime NOT NULL default '1980-01-01 00:00:00',
  PRIMARY KEY  (`id`),
  UNIQUE KEY `siteid` (`siteid`,`url`),
  KEY `crawlstate` (`crawlstate`,`url`),
  KEY `nextping` (`nextping`),
  FULLTEXT KEY `searchblob` (`pagetitle`,`searchblob`)
) ENGINE=MyISAM AUTO_INCREMENT=4471 DEFAULT CHARSET=latin1 AUTO_INCREMENT=4471 ;
« Last Edit: April 26, 2007, 04:40:52 PM by perkiset » Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #3 on: April 26, 2007, 04:37:35 PM »

This is the "spiderlets" table - it is for the dispatcher to watch who's done and can he fire up some more. It is also to pass information to the spiderlet so that <it> knows what to do.

Yes, I could've used all sorts of other ways... execution params, pipes... all kindso shit. I did it this way because it was easy at the time and still makes too much sense to fight out something more efficient.

Code:
-- phpMyAdmin SQL Dump
-- version 2.8.2.1
-- http://www.phpmyadmin.net
--
-- Host: localhost
-- Generation Time: Apr 26, 2007 at 05:35 PM
-- Server version: 5.0.24
-- PHP Version: 5.2.1
--
-- Database: temp
--

-- --------------------------------------------------------

--
-- Table structure for table `crawl_spiderlets`
--

CREATE TABLE `crawl_spiderlets` (
  `id` int(11) NOT NULL auto_increment,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 PACK_KEYS=0 AUTO_INCREMENT=1 ;
« Last Edit: April 26, 2007, 04:41:07 PM by perkiset » Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #4 on: April 26, 2007, 04:43:40 PM »

Unfortunately, I need to post several support files as well... here is an example of the "paths.inc" file referenced in the dispatcher - it basically points to all the essential places my classes and code reference:

Code:
<?php

$GLOBALS
['rootPath'] = $rootPath '/www/sites/temp/cc';
$GLOBALS['pagePath'] = $pagePath '/www/sites/temp/cc/pages';
$GLOBALS['systemPath'] = $systemPath '/www/sites/temp/cc/system';
$GLOBALS['libPath'] = $libPath '/www/sites/lib';
$GLOBALS['classPath'] = $classPath '/www/sites/lib/classes';
$GLOBALS['themePath'] = $themePath '/www/sites/temp/cc/theme';
$GLOBALS['fontPath'] = $fontPath '/www/sites/lib/classes/fonts';
$GLOBALS['galleryPath'] = "/www/sites/temp/storage/galleries";
$GLOBALS['transPath'] = "/www/sites/temp/storage";

?>

Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #5 on: April 26, 2007, 04:45:44 PM »

This post is to stop you so that you go up to the code repository and grab class.dbconnection.php as well - you'll need it for the spider.

You will also need to be have the webRequest class, located in the repository as well.
« Last Edit: April 26, 2007, 05:00:02 PM by perkiset » Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #6 on: April 26, 2007, 04:49:04 PM »

This is the dispatcher. I call it crawler.php.

Note that this spider was NOT meant as a web crawler, it was meant as a site crawler. You can see a comment I made to myself shortly down the code where I say that it will need to be modified if I ever want to do more than one domain per crawl. I'll leave that up to you.

Note also that I grab a cookie called "sessionid" in the beginning - this is so that my own crawler doesn't beat the shit out of me like other bots... at least I should be able to see my own cookie and report back that <this bot> is a single user not a user-per-page-pull.

Code:
#! /usr/local/bin/php
<?php

require_once('./paths.inc');
require_once(
"$classPath/class.webrequest.php");
require_once(
"$classPath/class.dbconnection.php");
$spiderletExec "$systemPath/spiderlet.php";

$db = new dbConnection('127.0.0.1''auser''thepassword''thedatabase');
$GLOBALS['utilDB'] = &$db;

// THIS WILL HAVE TO BE UPDATED IF I EVEN INTEND TO DO MORE THAN ONE DOMAIN PER SPIDER
$http = new webRequest();
$http->Host $db->singleAnswer("select domain from crawl_domains where id='1'");
$http->URL '/';
$http->Get();
$sessionID $http->GetCookie('sessionid');
//print "Setting SessionID=$sessionID\n";

// Set all pages current in the database to "unfound" and "need to be worked"
$db->query("update crawl_pages set crawlstate=1, crawlfound=0");

$allDone false;
while (!
$allDone)
{
// Only go forward if there are 10 or less spiderlets running
$spiderlets $db->singleAnswer("select count('x') from crawl_spiderlets");
if ($spiderlets >= 10)
{
print 'W';
sleep(1);
continue;
}

// There is a spiderlet slot open - go get a page
$nextID $db->singleAnswer("select id from crawl_pages where crawlstate=1 limit 1");

if (($nextID <= ' ') && ($spiderlets == 0))
{
// No page - if there are no spiderlets either then I am done.
$allDone true;
continue; 
}

if ($nextID <= ' ')
{
print 'w';
// No page, but there were still spiderlets working... given them a chance.
sleep(1);
continue;
}

// There is a page to do and a free spiderlet slot.
print '.';
$db->query("insert into crawl_spiderlets() values()");
$spiderletID $db->singleAnswer("select LAST_INSERT_ID()");
$db->query("update crawl_pages set crawlstate=2 where id=$nextID");
$execStr "$spiderletExec $nextID $spiderletID $sessionID > /dev/null &";
exec($execStr);

// for testing...
// $execStr = "$spiderletExec $nextID $spiderletID $sessionID 1";
// print shell_exec($execStr);

}

print 
chr(10);

// Reset the ID auto-incrementer to 1...
$db->query("truncate table crawl_spiderlets");

?>

« Last Edit: April 26, 2007, 04:51:22 PM by perkiset » Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #7 on: April 26, 2007, 04:59:28 PM »

Here's the juice: the actual spiderlet.

This script is executed by the crawler. If you read the crawler script you will see that I currently let 10 run at once. This is an arbitrary number.

There is a little chunk in the beginning of the code, talking about "customWords" - this is a little file I use to "augment" certain words. For example, I have sites that deal with things that might be red, might be a bra, might be new etc - and unfortunately, MySQL by default does not TEXT index words that are only 3 chars long. You can modify this and recompile, but it puts a burden on the search engine and suddenly things don't work quite so nicely. So instead, I have a little file that looks like this:

Code:
<?php

// I will have been called where arrays searchWords and replaceWords are already in existence.
// I just need to replace words I want special-tagged here. Words that often need tagging are
// 3 letter words or incredibly common words.

$customSearchWords[] = 'aid'
$customReplaceWords[] = 'aidxx';

?>


... where 3 letter words are added to in such a way that they will not be normal words, but I can use that when a surfer types them into a search box. For example, someone types in "aid" and in the background I start searching for "aidxx."

Another thing you'll notice about the code is that I exclude any file that doesn't look like it's going to return HTML to me... images, zips - you name it. Makes the handling of weirdness later easier.

<edit> Sorry - looks like the forum flubbed up the tabbing a bit... you'll need to clean yours manually</edit>

Code:
#! /usr/local/bin/php
<?php

//error_reporting(E_ALL);

require_once('./paths.inc');
require_once(
"$classPath/class.webrequest.php");
require_once(
"$classPath/class.dbconnection.php");
require_once(
"$rootPath/localvars.php");
$customWords "$systemPath/custom.searchWords.php";

if (
file_exists($customWords))
{
$customSearchWords '';
$customReplaceWords '';
include($customWords);
$GLOBALS['customsearch']['search'] = &$customSearchWords;
$GLOBALS['customsearch']['replace'] = &$customReplaceWords;
}

$db = new dbConnection($db_host$db_user$db_password$db_database);
$GLOBALS['utilDB'] = &$db;
$linkID $_SERVER['argv'][1];
$spiderletID $_SERVER['argv'][2];
$sessionID $_SERVER['argv'][3];
$verbose $_SERVER['argv'][4];
$quiet = ($verbose) ? false true;

$spiderlet = new Spiderlet($linkID$spiderletID$quiet$sessionID);
$spiderlet->Crawl();

class 
Spiderlet{

var $db;
var $http;
var $title;
var $pageAvatar;
var $content;
var $linkID;
var $releaseID;
var $badLinks = array();
var $badChars = array();
    var 
$links = array();
var $quiet;
var $sessionID;

function __construct($myLinkID$myReleaseID$qt=true$sID='')
{
$this->quiet $qt;
$this->sessionID $sID;

$this->linkID $myLinkID;
$this->releaseID $myReleaseID;

$this->db = &$GLOBALS['utilDB'];
       
$this->http = new WebRequest();
       
$this->http->Port 80;
       
$this->http->endOnBody true;
       
$this->http->timeout 10;
       
$this->http->succeedOnTimeout true;

// Create arrays for later...
$this->badLinks[] = 'file\:';
$this->badLinks[] = 'news\:';
$this->badLinks[] = 'ftp\:';
$this->badLinks[] = 'mailto\:';
$this->badLinks[] = 'telnet\:';
$this->badLinks[] = 'javascript\:';
$this->badLinks[] = 'https\:';
$this->badLinks[] = '\.gif';
$this->badLinks[] = '\.jpg';
$this->badLinks[] = '\.png';
$this->badLinks[] = '\.pdf';
$this->badLinks[] = '\.tar';
$this->badLinks[] = '\.zip';
$this->badLinks[] = '\.rpm';
$this->badLinks[] = '\.mp3';
$this->badLinks[] = '\.aac';
$this->badLinks[] = '\.wmf';
$this->badLinks[] = '\.mov';
$this->badLinks[] = '\.com';
$this->badLinks[] = '\.deb';
$this->badLinks[] = '\.tgz';
$this->badLinks[] = '\.gz';
$this->badLinks[] = '\.rtf';
$this->badLinks[] = '\.doc';
$this->badLinks[] = '\.aiff';
$this->badLinks[] = '\.wav';
$this->badLinks[] = '\.tif';
$this->badLinkStr implode('|'$this->badLinks);

$this->badChars[] = chr(13);
$this->badChars[] = chr(9);
}

function __destruct() { $this->db->query("delete from crawl_spiderlets where id={$this->releaseID}"); }

function AcceptableLink($linkStr) { return (!preg_match("/{$this->badLinkStr}/"$linkStr)); }

function Crawl()
{
// Get the job from the database...
$this->db->query("select crawl_pages.*, domain from crawl_pages, crawl_domains where crawl_pages.id={$this->linkID} and crawl_domains.id=crawl_pages.siteid");
$this->db->fetchArray();
$siteID $this->db->row['siteid'];

// Setup the requestor...
       
$this->http->Host strtolower($this->db->row['domain']);
$this->http->URL $this->db->row['url'];

// It's possible the that crawler main put a sessionID into me that I need to pass
// to the page I call as a cookie...
if ($this->sessionID) { $this->http->SetCookie('sessionid'$this->sessionID); }

// Get the page
        
if (!$buff $this->http->Get())

$db->query("update crawl_pages set crawlstate=-1 where id={$this->linkID}");
exit;
}
$this->buffer $this->http->Content();
$DNI preg_match('/rfspider\: donotindex/'$this->buffer);

// It's is an error, get out quick
if (!(strpos($this->buffer'404') === false)) 
{
        
$this->db->query("update crawl_pages set crawlstate=-1, pagetitle='', searchblob='{$this->buffer}' where id={$this->linkID}");
            exit;
}

// Distill the page into my content...
$this->ExtractTitle();
$this->ExtractPageAvatar();
$this->GatherLinks();
$this->FinalCleaning();

// Update the database now...
$now date('Y-m-d H:i:s'time());
$title mysql_escape_string($this->title);
$this->db->query("update crawl_pages set crawlstate=0, crawlfound=1, lastcrawl='$now', " .
"pagetitle='$title', avatar='{$this->pageAvatar}', searchblob='{$this->content}' where id={$this->linkID}");

// Now insert links. The table is UNIQUE indexed, so it will fail if the page already exists...
for ($i=0$i<count($this->links); $i++)
{
// Note that "to work" values of crawlstate and crawlfound are set by the DB...
$newPage $this->links[$i];
if (($newPage == $this->http->URL)  || ($newPage <= ' ')) { continue; }
$this->db->query("insert into crawl_pages(siteid, url, referrer) values($siteID, '{$this->links[$i]}', {$this->linkID})"true);
}

// Now this looks rather bizarre, but if <this page> didn't want to 
// be in the index, eliminate it. It would, however, have contributed
// to the links to-do list by now...
if ($DNI) { $this->db->query("delete from crawl_pages where id={$this->linkID}"); }

}

function ExtractPageAvatar()
{
// The pageavatar is a graphic that can be used for search results. 
// If it's in the page, it'll be like this: <!-- pageavatar: /graphics/afile.jpg -->
preg_match('/pageavatar:[ ]*([^ ]*)/i'$this->buffer$matches);
if ($matches[1]) { $this->pageAvatar mysql_escape_string($matches[1]); }
}

function ExtractTitle()
{
$this->title '[ No Page Title ]';
preg_match('/\<title\>([^<]*)/i'$this->buffer$matches);
if ($matches[1]) { $this->title $matches[1]; }

$searchWords = &$GLOBALS['customsearch']['search'];
if ($searchWords) { $this->title str_replace($searchWords, &$GLOBALS['customsearch']['replace'], $this->title); }
}

function FinalCleaning()
{
$regex = (strpos($this->buffer'<!-- startcontent')) ? '/\<\!\-\- startcontent \-\-\>(.*)\<\!\-\- endcontent/ismU' '/\<body(.*)$/ismU';
preg_match($regex$this->buffer$matches);

$cleanArr = array();
$cleanArr[] = '/(\<script.*\<\/script\>)/imsU';
$cleanArr[] = '/(\<style.*\<\/style\>)/imsU';
$cleanArr[] = '/(\<\!\-\- hide.*endhide \-\-\>)/imsU';
$replArr = array(' '' '' ');
$this->buffer preg_replace($cleanArr$replArr$this->buffer);
$this->buffer strip_tags(str_replace($this->badChars''$this->buffer));

$searchWords = &$GLOBALS['customsearch']['search'];
if ($searchWords) { $this->buffer str_replace($searchWords, &$GLOBALS['customsearch']['replace'], $this->buffer); }

$outArr = array();
$inArr explode(chr(10), $this->buffer);
foreach($inArr as $line)
{
if ($line trim($line)) { $outArr[] = $line; }
}
$this->content implode(' '$outArr);
while (strpos($this->content'  ') > 0) { $this->content str_replace('  '' '$this->content); }
$this->content mysql_escape_string($this->content);
}

function GatherLinks()
{
$ptr 0;
$rawBuff $this->buffer;
preg_match_all('/href="([^"]*)/ims'$rawBuff$matches);
foreach($matches[1] as $thisURL)
{
if (preg_match('/http:/'$thisURL))
{
// It MIGHT be an outbound...
preg_match('/http:\/\/([^\/]*)(.*)$/'$thisURL$matches);
$thisHost $matches[1];
$thisURL $matches[2];
if (strtolower($thisHost) == strtolower($this->http->Host)) {
if ($this->AcceptableLink($thisURL)) {
if (!in_array($thisURL$this->links)) {
array_push($this->links$thisURL); 
}
}
}
} else {
if ($this->AcceptableLink($thisURL)) {
if (!in_array($thisURL$this->links)) {
array_push($this->links$thisURL); 
}
}
}
}
}
}

?>

« Last Edit: April 26, 2007, 05:01:14 PM by perkiset » Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
Caligula
Rookie
**
Offline Offline

Posts: 39



View Profile
« Reply #8 on: April 26, 2007, 05:42:24 PM »

Awesome Perk! That must have taken forever! Great stuff bro.. thanks for sharing!  Grin
Logged
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #9 on: April 26, 2007, 05:45:56 PM »

Glad you like it Calig - lemme know how it does for you.

/p
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
KaptainKrayola
Keeper of Pie
Global Moderator
Lifer
*****
Offline Offline

Posts: 994



View Profile WWW
« Reply #10 on: April 26, 2007, 08:03:00 PM »

man perk your code is so neat and easy to follow.  9 thumbs up from the Kaptain

Logged

We can't stop here, this is bat country.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10096



View Profile
« Reply #11 on: April 26, 2007, 08:05:22 PM »

Thanks Kapn - that's because I'm an idiot and know how quickly I forget.

*6 months pass*

"I never wrote that! What kinda asshole wrote that?!?! It needs to be completely redone."

 ROFLMAO

So I try really hard to minimize self-inflicted Codezheimers.

/p
« Last Edit: April 26, 2007, 09:53:46 PM by perkiset » Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #12 on: April 26, 2007, 11:23:20 PM »

man perk your code is so neat and easy to follow.  9 thumbs up from the Kaptain

The Kaptain has nine thunbs? Some sort of serious interbreeding? Was your Dad related to your Mum before they got Married? Small town Tortuga huh? Isolated?

Cheers,
td
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
thedarkness
Lifer
*****
Offline Offline

Posts: 585



View Profile
« Reply #13 on: April 26, 2007, 11:24:47 PM »

"I never wrote that! What kinda asshole wrote that?!?! It needs to be completely redone."

Yeah, I often say that just b4 I realise I was the asshole (still am :-) )

td

P.S. Rockin' werk perk

[edit]Forgot to conjugate the verb "to go"[/edit]
Logged

"I want to be the guy my dog thinks I am."
 - Unknown
KaptainKrayola
Keeper of Pie
Global Moderator
Lifer
*****
Offline Offline

Posts: 994



View Profile WWW
« Reply #14 on: April 27, 2007, 07:55:32 AM »

What makes you think all the thumbs belong to the Kaptain?  What kind of pirate doesn't have extra thumbs laying around just in case?
Logged

We can't stop here, this is bat country.
Pages: [1] 2 3
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!