WordPress.org

Make WordPress Core

Opened 4 years ago

Closed 2 years ago

#32257 closed enhancement (fixed)

Patch: add support for multi-line textarea sanitization

Reported by: ottok Owned by: chriscct7
Milestone: 4.7 Priority: normal
Severity: normal Version:
Component: Formatting Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

New function: sanitize_textarea_field

Add new function that can be used to sanitize textarea inputs (or outputs) which unlike the regular sanitize_text_field does not strip line breaks.

This kind of sanitization is demanded by users and WP should definately provide such basic functions in the name of security. Otherwise users will use other less thorough sanitization solutions, like only wp_kses(). See examples at
https://wordpress.org/support/topic/sanitizing-a-text-field-form-but-keep-line-breaks

Also removed trailing spaces from file.

Development done at https://github.com/WordPress/WordPress/compare/master...ottok:feature-sanitize-textarea-field

Attachments (9)

sanitize-textarea-field.svn-format.patch (4.9 KB) - added by ottok 4 years ago.
sanitize-textarea-field.svn-format.2.patch (2.0 KB) - added by ottok 4 years ago.
Reworked patch
sanitize-textarea-field.svn-format.3.patch (2.2 KB) - added by ottok 4 years ago.
version 3
feature-sanitize-textarea-field.git-no-prefix.patch (3.8 KB) - added by ottok 3 years ago.
Refreshed patch for 4.6 and after review by Nikolay Bachiyski
32257.patch (5.3 KB) - added by ottok 2 years ago.
Refreshed patch for 4.7
32257.diff (7.4 KB) - added by pento 2 years ago.
32257.2.diff (7.7 KB) - added by pento 2 years ago.
32257.2.2.diff (7.7 KB) - added by chriscct7 2 years ago.
Corrects issue with docbloc variable not matching function variable name by changing $keep_whitespace to $keep_newlines which more accurately reflects what the parameter does
32257.2.3.diff (7.8 KB) - added by ottok 2 years ago.

Download all attachments as: .zip

Change History (34)

#1 follow-up: @iandunn
4 years ago

Hi ottok, I can't speak to whether this functionality is desired or not, but if it is, I have a few implementation recommendations.

it looks like sanitize_textarea_field() is almost identical to sanitize_text_field(), just minus the bits that strip newlines. Rather than duplicating code -- which creates lots of maintenance issues -- would it be better to add a new parameter to sanitize_text_field() to control whether or not whitespace is stripped?

It would default to stripping whitespace, so that the existing behavior is maintained. Then, a wrapper named sanitize_textarea_field() could optionally be introduced to call sanitize_text_field() with the new parameter set to not strip whitespace.

function sanitize_text_field( $str, $whitespace = 'strip' ) {
    // ...
}

function sanitize_textarea_field( $str ) {
    sanitize_text_field( $str, 'preserve' );
}

Also, it's best to have separate tickets/patches for separate issues, rather than mixing unrelated things together. IIRC, formatting-only changes are generally rejected since they force other patches to be refreshed against the new code. Instead, formatting improvements are made at the same time that those lines need to be changed for a more substantial reason.

#2 in reply to: ↑ 1 @ottok
4 years ago

  • Keywords has-patch added

Replying to iandunn:

Thanks for the feedback, I attached now a reworked patch

#3 follow-up: @iandunn
4 years ago

Looks good :)

A few minor notes, the coding standards ask for self-explanatory flag values for function arguments and spaces before parenthesis.

#4 in reply to: ↑ 3 @ottok
4 years ago

Replying to iandunn:

All of the code I added follows the WP coding standards rules on whitespace, but I now also styled some of the code inherited from the old function. I can format the whole sanitize_text_field() if you think current new patch isn't enough.

(Github has a very good visual diff: https://github.com/WordPress/WordPress/compare/master...ottok:feature-sanitize-textarea-field that shows what spaces where added to what line)

I understand the self-explanatory flag rule, but in this case I think decreases readability and anyways the rest of the same file uses quite a lot of true/false flags, and the name of the variables are long enough to make those cases easy to read and understand:

$ grep -E '^function (.*)=' wp-includes/formatting.php 
function wptexturize($text, $reset = false) {
function wpautop($pee, $br = true) {
function _wp_specialchars( $string, $quote_style = ENT_NOQUOTES, $charset = false, $double_encode = false ) {
function wp_specialchars_decode( $string, $quote_style = ENT_NOQUOTES ) {
function wp_check_invalid_utf8( $string, $strip = false ) {
function utf8_uri_encode( $utf8_string, $length = 0 ) {
function sanitize_user( $username, $strict = false ) {
function sanitize_title( $title, $fallback_title = '', $context = 'save' ) {
function sanitize_title_with_dashes( $title, $raw_title = '', $context = 'display' ) {
function sanitize_html_class( $class, $fallback = '' ) {
function convert_chars($content, $deprecated = '') {
function balanceTags( $text, $force = false ) {
function format_to_edit( $content, $richedit = false ) {
function antispambot( $email_address, $hex_encoding = 0 ) {
function is_email( $email, $deprecated = false ) {
function get_gmt_from_date( $string, $format = 'Y-m-d H:i:s' ) {
function get_date_from_gmt( $string, $format = 'Y-m-d H:i:s' ) {
function iso8601_to_datetime($date_string, $timezone = 'user') {
function human_time_diff( $from, $to = '' ) {
function wp_trim_excerpt($text = '') {
function wp_trim_words( $text, $num_words = 55, $more = null ) {
function esc_url( $url, $protocols = null, $_context = 'display' ) {
function esc_url_raw( $url, $protocols = null ) {
function wp_html_excerpt( $str, $count, $more = null ) {
function links_add_base_url( $content, $base, $attrs = array('src', 'href') ) {
function links_add_target( $content, $target = '_blank', $tags = array('a') ) {
function wp_strip_all_tags($string, $remove_breaks = false) {
function sanitize_text_field( $str, $remove_newlines = true ) {
function wp_basename( $path, $suffix = '' ) {

However, since you took the time to review my code I wanted to humor you and changed it to use a verbose flag :)

#5 @iandunn
4 years ago

Yeah, you'll usually find lots of places in the code that don't follow the standards, usually from before the standards were codified. Any new or modified code should follow them, though.

#6 @tomauger
4 years ago

Wow, I totally hate using a string instead of a boolean flag - unless you take the trouble to create a constant for all valid options to be passed to the string variable, which is definitely overkill in the case where a Boolean is clearly the correct variable type.

Sure, the code is more readable at a glance, but it's less writable - every time you're using a new method, instead of an obviously-named boolean flag, you have to refer to the source code and look up the allowed values. Is it "strip", "remove", "clean", "none"? Darn, better look it up.

As for readability of USAGE, well, in most IDEs you can just insert the cursor and lookup the function signature.

Sorry, I realize this isn't the place to debate already documented styleguide points, but I thought @Ottok's initial approach with the boolean flag was the way to go.

/soapbox_rant

@ottok
3 years ago

Refreshed patch for 4.6 and after review by Nikolay Bachiyski

#7 @ottok
3 years ago

Fixed a typo. To avoid sending same patch over and over, instead I made this branch and this time based on the real development repository (not Github build mirror):

https://github.com/ottok/WordPress-develop/compare/master...feature-sanitize-textarea-field.patch

#8 @ottok
3 years ago

@nbachiyski I would appreciate it very much if you have time to do the last step in the review, thanks!

#9 @rinatkhaziev
3 years ago

@ottok Why not just use wp_filter_nohtml_kses() ? It does exactly that - stripping html but keeping linebreaks

#10 @ottok
3 years ago

  • Version 4.3 deleted

@rinatkhaziev Your suggestion is to rewrite sanitize_textarea_field() to use wp_filter_nohtml_kses()?

I think we need a function with the name sanitize_textarea_field() so that it is easy to discover for developers. Much of security is about makin secure choices easy to make by default. What the implementation of the function is, I like the current best, but I wan rewrite it (again) if you commit to actually accepting the patch then. I've worked on this on multiple occasion for over a year.. I hope it would be accepted now.

#11 @ottok
3 years ago

  • Keywords has-unit-tests added
  • Version set to trunk

Can you please already accept this patch? It has got plenty of review and was basically rewritten when reviewed in-person with @nbachiyski to be perfect, with unit tests and all.

I stuble across all the time live sites where people skip sanitization because they don't like the multi-line form data becomes a one-line data via the usual sanitize_text_field() function.

I'd hate to publish a plugin merely to get a sensible and easy to use sanitize_textarea_field() function out there. This really belongs to the core.

The function wp_filter_nohtml_kses() suggested above is not an equivalent to sanitize_text_field in either function nor name (think developer usability). We don't want to strip away HTML here, but rather convert tags into entities that are more secure to transport and display.

Come on, this is a really minor change and almost impossible to have regressions but with the potential to stop HTML/octet etc injections in an easy whay that developers are much more likely to use. We need to help developers make secure code, and we help them by providing ready-made and well reviewed sanitization functions for all common scenarios. And using textarea instead of just one-line "input type=text" is a scenario that is very common, but which WordPress does not yet have covered.

#12 @ottok
3 years ago

Please read the latest version of the patch https://core.trac.wordpress.org/attachment/ticket/32257/feature-sanitize-textarea-field.git-no-prefix.patch and you'll see it has unit tests, it is well implemented and not likely to have regressions, so it should be a no brainer to apply.

This issue has older iterations of the patch attached too. I don't know how to remove them (there is not button for that) but I hope nobody reads them and gets confused. Bases on the discussion here and questions regarding kses() use I assume people didn't actually read the patch and the accompanying git commit message.

Please run:

git pull https://github.com/ottok/WordPress-develop.git feature-sanitize-textarea-field

#13 @ottok
2 years ago

Hello!

I've been waiting fo 4 months without progress here.. I have the code, matching unit tests and review completed at WordCamp Finland contributor day. All that is missing is for somebody to commit this:
https://github.com/ottok/WordPress-develop/commit/24fdddfdfdb436a18b621f3759e9750ca6a5ed4b

Please, please :)

#14 @chriscct7
2 years ago

  • Component changed from Security to Formatting
  • Milestone changed from Awaiting Review to 4.7
  • Owner set to chriscct7
  • Status changed from new to reviewing
  • Version 4.6 deleted

I'll take a stab at reviewing this and see if I can finish reviewing this + get the required patch files generated in time, but as a reminder the enhancement window is closing for the 4.7 release on Wednesday.

#15 @ottok
2 years ago

@chriscct7 Thanks! I will rebase the patch on the latest master now when I know somebody will review it shortly. The patch is formatted as recommended at https://make.wordpress.org/core/2014/01/15/git-mirrors-for-wordpress/ but I am happy to reformat it if I'd know what the exact requirement is.

#16 @ottok
2 years ago

@chriscct7 I have now rebased the patch to apply cleanly on the latest development trunk of today. I created the patch with svn diff as documented at https://make.wordpress.org/core/handbook/tutorials/working-with-patches/ to make sure the format is OK to you. (After many years on git my brain bleeds every time I need to use svn, but I really want to make sure it is easy for you to apply this patch :))

I also re-tested that the test suite still passes:

$ phpunit tests/phpunit/tests/formatting/SanitizeTextField.php
Installing...
Running as single site... To run multisite, use -c tests/phpunit/multisite.xml
Not running ajax tests. To execute these, use --group ajax.
Not running ms-files tests. To execute these, use --group ms-files.
Not running external-http tests. To execute these, use --group external-http.
PHPUnit 5.6.1 by Sebastian Bergmann and contributors.

.                                                                   1 / 1 (100%)

Time: 3.02 seconds, Memory: 20.00MB

OK (1 test, 26 assertions)

Thanks to having a Travis-CI available in WordPress nowadays, I also activated it and the result is visible here (all passing): https://travis-ci.org/ottok/WordPress-develop/builds/169934794

@ottok
2 years ago

Refreshed patch for 4.7

@pento
2 years ago

@pento
2 years ago

#17 @pento
2 years ago

Thank you for your patience with this ticket, @ottok. Your persistence is greatly appreciated. :-)

A minor workflow note - I appreciate that GitHub is your preferred development environment, but for the purposes of reviewing changes, patches must be uploaded to Trac - it's currently the only way we can ensure that code hasn't changed between a review being requested, and the review happening.

Anyway, on to the change! In 32257.2.diff:

  • Convert the unit tests to a dataProvider model, and adds a few more tests.
  • Minor documentation and code tweaks.

I'm concerned about the tag stripping behaviour, I've included a deliberately failing unit test to demonstrate - a lone < isn't being converted to &gt;.

#18 @ottok
2 years ago

@pento I have uploaded patches to Trac. Latest one is 32257.patch. I am providing git links as it is much easier to review and rebase them as needed compared to the workflow in svn. But of course, you don't need to use git.

I haven't written dataProvider tests before and I think that rewriting the original sanitize_text_field() is out of the scope of the title of this issue, but I am happy you seem to have already done so. For my part this is OK and you could go ahead and merge it.

@chriscct7
2 years ago

Corrects issue with docbloc variable not matching function variable name by changing $keep_whitespace to $keep_newlines which more accurately reflects what the parameter does

#19 @chriscct7
2 years ago

  • Status changed from reviewing to accepted

#20 @chriscct7
2 years ago

  • Keywords commit added

Pending @pento's approval, this appears ready for commit.

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


2 years ago

#22 @pento
2 years ago

  • Keywords commit removed

The failing unit test I included is still failing.

sanitize_textarea_field( "foo <\ndiv\n> bar" ) produces "foo <\ndiv\n> bar", when it should produce "foo &lt;\ndiv\n> bar". The more I think about it, the more I'm concerned that this could be a vector for an XSS attack - if the textarea is sanitised using sanitize_textarea_field(), but then the \n is stripped sometime later when displaying the content, it will start parsing as HTML, bypassing earlier KSES checking.

#23 @ottok
2 years ago

The test fails with the current sanitize_text_field() exactly in the same way. So you want me to fix the original sanitize_text_field() in the same go? OK, I'll take a shot at it.

1) Tests_Formatting_SanitizeTextField::test_sanitize_text_field with data set #5 ('foo <\ndiv\n> bar', array('foo < div > bar', 'foo &lt;\ndiv\n> bar'))
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'foo &lt;
+'foo <
 div
 > bar'

#24 @ottok
2 years ago

The original function allowed some characters to go without being replaced by html entities. Are you now suggesting we should due to security reasons always encode everything as html entities? Or do you want only the special case of '<\n' to be encoded and all other whitespace followed '<' to remain as is?

I assume the latter, that you are concerned only about this special case, an a patch for that is attached.

@ottok
2 years ago

#25 @pento
2 years ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 38944:

General: Add a sanitize_textarea_field() function.

Like its predecessor (sanitize_text_field()), sanitize_textarea_field() is a helper function to sanitise user input. As the name suggests, this function is for sanitising input from textarea fields - it strips tags and invalid UTF-8 characters, like sanitize_text_field(), but retains newlines and extra inline whitespace.

Props ottok, nbachiyski, chriscct7, pento.
Fixes #32257.

Note: See TracTickets for help on using tickets.