The Cache: Technology Expert's Forum
 
*
Welcome, Guest. Please login or register.
Did you miss your activation email?
May 21, 2013, 02:33:00 PM

Login with username, password and session length


Pages: 1 [2]
  Print  
Author Topic: filtering bad characters  (Read 8099 times)
dirk
Global Moderator
Expert
*****
Offline Offline

Posts: 416


View Profile
« Reply #15 on: July 10, 2007, 05:43:29 AM »

Bompa,

the empty curly braces {} mean that the string in the preceeding curly braces
shall be replaced by nothing. So the special characters will be deleted.

Dirk
Logged
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 566


View Profile
« Reply #16 on: January 03, 2008, 05:07:57 AM »

Bompa,

the empty curly braces {} mean that the string in the preceeding curly braces
shall be replaced by nothing. So the special characters will be deleted.

Dirk

ahhh, I finally get it. 

Bompa <--- SLOW

You have an extra curly brace cuz YOU HAVE TO in order to have braces in pairs,
whereas, if we delimit with slashes, we can use jut three.

damn!

Logged

Whenever I point my finger, I have three pointing back at me.
dirk
Global Moderator
Expert
*****
Offline Offline

Posts: 416


View Profile
« Reply #17 on: January 03, 2008, 10:06:59 AM »

Bompa,

here are some more examples:

Code:
$string =~ s/[^\x00-\x7E]//;
$string =~ s|[^\x00-\x7E]||;
$string =~ s~[^\x00-\x7E]~~;

$string =~ s([^\x00-\x7E])();
$string =~ s[[^\x00-\x7E]][];
$string =~ s{[^\x00-\x7E]}{};

Normally only three delimiters are required.

But if you use brackets you need 4 (two pairs).
Logged
Bompa
Administrator
Lifer
*****
Offline Offline

Posts: 566


View Profile
« Reply #18 on: October 23, 2008, 08:40:24 PM »

Using a regex you could skip all weird characters and keep only ASCII 0 - 127:
Code:
$string =~ s{ ( [^\x00-\x7E] ) }{}xmsg;   # ASCII 0 - 127


This thread is old, but may need this bump OR I'm reading the docs wrong, (which is very likely).

ASCII 0-19hex (0-31decimal) are all control characters I think.

So, to allow only printable characters and remove all others, shouldn't the basic regex be...

If the character is not 20h to 7Eh remove it.

Code:
/[^\x20-\x7E]//g;


Bompa
Logged

Whenever I point my finger, I have three pointing back at me.
perkiset
Olde World Hacker
Administrator
Lifer
*****
Offline Offline

Posts: 10009



View Profile
« Reply #19 on: October 24, 2008, 09:05:03 AM »

Although I'm no good with the PERL syntax, you're right about the ASCII value for exclusion Bomps. 0..31 are indeed unprintables and control characters, although it is arguable that you might want to send a chr(7) [beep] to the printer to drive the operator insane. It's been known to happen Wink
Logged

It is now believed, that after having lived in one compound with 3 wives and never leaving the house for 5 years, Bin Laden called the U.S. Navy Seals himself.
dirk
Global Moderator
Expert
*****
Offline Offline

Posts: 416


View Profile
« Reply #20 on: October 24, 2008, 01:57:17 PM »

If you skip ASCII 0-19hex (0-31decimal) you would also filter tabs, line feeds and carriage returns.

This may be ok, but it depends on the purpose of the filtering.
Logged
Pages: 1 [2]
  Print  
 
Jump to:  

Perkiset's Place Home   Best of The Cache   phpMyIDE: MySQL Stored Procedures, Functions & Triggers
Politics @ Perkiset's   Pinkhat's Perspective   
cache
mart
coder
programmers
ajax
php
javascript
Powered by MySQL Powered by PHP Powered by SMF 1.1.2 | SMF © 2006-2007, Simple Machines LLC
Seo4Smf v0.2 © Webmaster's Talks


Valid XHTML 1.0! Valid CSS!