#3070 closed defect (bug) (wontfix)
use of php's "strip_tags" gives improper/incomplete results
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | Priority: | high | |
Severity: | major | Version: | 2.1 |
Component: | Administration | Keywords: | |
Focuses: | Cc: |
Description
Wordpress uses the PHP strip_tags function at least 25 times.
The built-in function is known to have serious limitations including faults with SCRIPT and STYLE tags which can cause security issues and/or cause non-validation failures (ie. javacript is inserted into RSS feeds)
Solution: replace with the superior html2txt example as shown in the PHP manual
function html2txt($document){ $search = array('@<script[^>]*?>.*?</script>@si', // Strip out javascript '@<[\/\!]*?[^<>]*?>@si', // Strip out HTML tags '@<style[^>]*?>.*?</style>@siU', // Strip style tags properly '@<![\s\S]*?--[ \t\n\r]*>@' // Strip multi-line comments including CDATA ); $text = preg_replace($search, '', $document); return $text; }
Change History (11)
#3
@
17 years ago
echo html2txt("<><script<>>alert('hey');</scr<>ipt>");
I'd say strip_tags is much better.
#4
@
17 years ago
weird, strip_tags is not removing javascript in my posts on my real-world setup for RSS (it's only php 4.4.1 but still it's not that old)
html2txt seems to grab it all?
I'll have to test and compare more carefully and post examples soon.
#5
@
17 years ago
Okay part of the problem with html2txt as it is posted on php.net is the array search order is bad. It will pass your test if you move the HTML strip to the first item:
$search = array('@<[\/\!]*?[^<>]*?>@si', // Strip out HTML tags '@<script[^>]*?>.*?</script>@si', // Strip out javascript '@<style[^>]*?>.*?</style>@siU', // Strip style tags properly '@<![\s\S]*?--[ \t\n\r]*>@' // Strip multi-line comments including CDATA );
#6
@
17 years ago
Okay I've figured out the problem is with desired behavior (and that I am not explaining it enough).
If javascript is used within a post (or possibly a comment if that is allowed) the problem is strip_tags will remove SCRIPT tags ONLY and leave the code inbetween!
So your post via RSS will look like:
blah blah blah document.write("example"); blah blah
html2txt will fix that behavior by stripping the code between SCRIPT first, then processing HTML tags (ignore my suggestion to change the processing order array in the previous comment).
You are correct in that it has a weakness for purposely maligned tags.
There must be a way to harden it, and I am working on that.
Certainly you'd agree that leaving the javascript code behind after removing SCRIPT tags is bad behavior via strip_tags?
#7
@
17 years ago
Okay this code is somewhat nasty but works.
Perhaps some PHP pros can improve upon it but keep it's functionality.
It now strips script/style/embed/object/iframe which are the most serious not just for attacks but for honest cleanup of your posts to RSS viewing.
<? echo html2txt("1. <><script< >>alert('hey <br> html2txt <scr<>ipt>');</scr<>ipt>")."<br>"; echo html2txt("2. before <scr<>ipt>document.write('html2txt <scr<>ipt>');</script> after ")."<br>"; echo strip_tags("3. <><script< >>alert('hey <br> strip_tags <scr<>ipt>');</scr<>ipt> ")."<br>"; echo strip_tags("4. before <scr<>ipt>document.write('strip_tags <scr<>ipt>');</script> after")."<br>"; function html2txt($text){ $search = array('@<>@', '@<script[^>]*?>.*?</script>@siU', // Strip out javascript '@<style[^>]*?>.*?</style>@siU', // Strip style tags properly '@<embed[^>]*?>.*?</embed>@siU', // embed '@<object[^>]*?>.*?</object>@siU', // object '@<iframe[^>]*?>.*?</iframe>@siU', // iframe '@<![\s\S]*?--[ \t\n\r]*>@', // Strip multi-line comments including CDATA '@</?[^>]*>*@' // html tags ); while($text != strip_tags($text)) { $text = preg_replace($search, '', $text); } return $text; } ?>
#8
@
17 years ago
I have been having some problems with some neato add-ons and have suspected some such problem.
Possible examples
- I use Ultimate Tag Warrior. With 2.0.3 I could search ?tag=Christine+Davis and get all posts tagged Christine AND tagged Davis. With 2.0.4 the results are for posts tagged "Christine Davis"
- I use k2 theme and had modified my sidebar using SBM so that my delicious linkroll appeared and Technorati ID. Both are javascript.
Now Sidebar Module seems to remove all javascript from these added modules preventing their function.
#9
@
17 years ago
An update:
http://getk2.com/forum/showthread.php?p=10216
Which is consternation about the Ajax used in Sidebar Module breaking other stuff. In other words, reverting to an earlier pre-SBM k2 theme eliminated problem #1 -- Lost Multi-tag Search Functionailty.
Apologies, this is the correct link to the 25 references
http://www.redalt.com/xref/trunk/_functions/strip_tags.htm