html 태그 제거

Language/C 2012. 1. 5. 08:30
$search = array ("'<SCRIPT[^>]*?>.*?'si", // 자바 스크립트 제거 
                    "'<[\/\!]*?[^<>]*?>'si", // HTML 태그 제거 
                    "'<\!\-\-(.*)?\-\->'si", //주석제거 
                    "'([\r\n])[\s]+'", 
                    "'&(quot|#34);'i", // HTML 엔티티 치환 
                    "'&(amp|#38);'i", 
                    "'&(lt|#60);'i", 
                    "'&(gt|#62);'i", 
                    "'&(nbsp|#160);'i", 
                    "'&(iexcl|#161);'i", 
                    "'&(cent|#162);'i", 
                    "'&(pound|#163);'i", 
                    "'&(copy|#169);'i", 
                    "'&#(\d+);'e"); // php로 실행 
   $replace = array ("", "", "", "\\1", "\"", "&", "<", ">", " ", chr(161), chr(162), chr(163), chr(169), "chr(\\1)"); 
   $body = preg_replace($search,$replace,$body);

  1. <>태그형태를 제거하는 정규표현식
String title = title.replaceAll("<(/)?([a-zA-Z]*)(\\s[a-zA-Z]*=[^>]*)?(\\s)*(/)?>", "");


2. &lt;나 &gt;태그를 제거하는 정규표현식
String title = title.replaceAll("&lt(;)?(/)?([a-zA-Z]*)(\\s[a-zA-Z]*=[^>]*)?(\\s)*(/)?&gt(;)?", "");


1    <?php  
2   // $document should contain an HTML document.  
3   // This will remove HTML tags, javascript sections  
4   // and white space. It will also convert some  
5   // common HTML entities to their text equivalent.  
6   $search = array ("'<script[^>]*?>.*?</script>'si",  // Strip out javascript  
7                    "'<[/!]*?[^<>]*?>'si",          // Strip out HTML tags  
8                    "'([rn])[s]+'",                // Strip out white space  
9                    "'&(quot|#34);'i",                // Replace HTML entities  
10                    "'&(amp|#38);'i",  
11                    "'&(lt|#60);'i",  
12                    "'&(gt|#62);'i",  
13                    "'&(nbsp|#160);'i",  
14                    "'&(iexcl|#161);'i",  
15                    "'&(cent|#162);'i",  
16                    "'&(pound|#163);'i",  
17                    "'&(copy|#169);'i",  
18                    "'&#(d+);'e");                    // evaluate as php  
19   $replace = array ("",  
20                    "",  
21                    "1",  
22                    """,  
23                    "&",  
24                    "<",  
25                    ">",  
26                    " ",  
27                    chr(161),  
28                    chr(162),  
29                    chr(163),  
30                    chr(169),  
31                    "chr(xxx1)"); // remove the "xxx" - this is just for showing the source  
32   $text = preg_replace($search, $replace, $document);  
33   ?>
 

'Language > C' 카테고리의 다른 글

multicast(send)  (0) 2012.01.16
포인터, 배열, 구조체 연산  (0) 2012.01.11
C로 정규식라이브러리 사용  (0) 2012.01.05
패턴 매칭 ( Pattern Matching )  (0) 2012.01.05
C 정규식  (0) 2012.01.05
: