Make WordPress Core

Opened 9 years ago

Closed 7 years ago

#30866 closed defect (bug) (fixed)

wordpress-Importer plugin xml parser unit testing fail.

Reported by: edwardkim's profile edwardkim Owned by:
Milestone: WordPress.org Priority: normal
Severity: normal Version: 4.1
Component: Import Keywords: has-patch needs-testing
Focuses: Cc:

Description

Wordpress-importer unit testing is fail. In plugins/wordpress-importer/parsers.php, XML CDATA handler trim a string even nested CDATA.

1) Tests_Import_Parser::test_escaped_cdata_closing_sequence
Parser WXR_Parser_XML
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'Content with nested <![CDATA[ tags ]]> :)'
+'Content with nested <![CDATA[ tags]]> :)'

tests/phpunit/tests/import/parser.php:159

Attachments (3)

30866.diff (915 bytes) - added by edwardkim 9 years ago.
30866-1.diff (776 bytes) - added by edwardkim 9 years ago.
indent fixed
30866.2.patch (582 bytes) - added by ianmjones 9 years ago.
Refreshed and Yodafied patch.

Download all attachments as: .zip

Change History (11)

@edwardkim
9 years ago

@edwardkim
9 years ago

indent fixed

#1 follow-up: @boonebgorges
9 years ago

  • Keywords reporter-feedback added

Hi edwardkim - I can't reproduce this. Running trunk or stable wordpress-importer, the test is passing for me. Can you share more details about your setup?

#2 in reply to: ↑ 1 @edwardkim
9 years ago

Replying to boonebgorges:

Hi edwardkim - I can't reproduce this. Running trunk or stable wordpress-importer, the test is passing for me. Can you share more details about your setup?

Sure. I'm using MAMP on OSX. This is my environment below:

MAC OSX 10.10.1 (14B25)

PHP 5.6.2 (cli) (built: Oct 20 2014 16:21:27) 
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.6.0, Copyright (c) 1998-2014 Zend Technologies

Apache/2.2.29 (Unix) mod_fastcgi/2.4.6 mod_wsgi/3.4 Python/2.7.8 PHP/5.6.2 mod_ssl/2.2.29 OpenSSL/0.9.8za DAV/2 mod_perl/2.0.8 Perl/v5.20.0
Database client version: libmysql - 5.5.38
PHP extension: mysqli Documentation

MySQL 5.5.38 - Source distribution

Also I tested other version of php.

PHP 5.3.29 (cli) (built: Oct 20 2014 16:12:01) 
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2014 Zend Technologies

PHP 5.4.34 (cli) (built: Oct 20 2014 16:16:28) 
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.4.0, Copyright (c) 1998-2014 Zend Technologies

The code from svn. This is svn repo detail.

➜  wordpress-develop  svn info
Path: .
Working Copy Root Path: /Users/edward/Documents/wordpress/wordpress-develop
URL: http://develop.svn.wordpress.org/trunk
Repository Root: http://develop.svn.wordpress.org
Repository UUID: 602fd350-edb4-49c9-b593-d223f7449a82
Revision: 31006
Node Kind: directory
Schedule: normal
Last Changed Author: boonebgorges
Last Changed Rev: 31006
Last Changed Date: 2014-12-31 08:30:16 +1100 (Wed, 31 Dec 2014)

I hope this detail will help you. If you need anything else, please let me know.

Last edited 9 years ago by edwardkim (previous) (diff)

#3 @boonebgorges
9 years ago

Thanks for the environment details. I still can't make this test fail. The fix you've suggested seems too broad to me. It appears that something is happening - maybe in the regex in get_tag()? - that is causing an incorrect value to be passed to cdata(). In my own tests, the value passed to cdata() looks like this - Content with nested <![CDATA[ tags ]].

Perhaps you can get a full stack trace (debug_backtrace()) at the point where the "tags" string is being passed to cdata(), and you can check to see what's happening in get_tag(). (And maybe someone who understands the importer better than I do can chime in :) )

@ianmjones
9 years ago

Refreshed and Yodafied patch.

#4 follow-up: @ianmjones
9 years ago

  • Keywords has-patch added; reporter-feedback removed

@boonebgorges I have the same problem as @edwardkim, 100% when running the unit tests with MAMP Pro.

I've refreshed and slightly updated the patch, it looks good to me and works well.

The reason that the checks for being in the tag are needed is documented in the comments for xml_set_character_data_handler by quite a few people hitting the same or related problems (http://php.net/manual/en/function.xml-set-character-data-handler.php).

The reason why the data() function concatenates the return rather than overwriting is the same reason that every call should not trim the data, the handler for xml_set_character_data_handler may be called multiple times for the same element depending on various circumstances that split the string. Non-ascii characters tend to trigger the breaking up of the contents and multiple iterations, but for some reason MAMP Pro's current build of PHP seems to split the calls at a relatively short string length that causes the issue to arise for me in the unit tests if the data is always trimmed.

This ticket was mentioned in Slack in #core by ianmjones. View the logs.


9 years ago

#6 @ianmjones
8 years ago

The unit tests still fail for me on MAMP unless this patch is applied.

I've tested this patch on other none MAMP setups (e.g. fresh Debian 7 setup) and there were no adverse effects.

#7 @ianmjones
7 years ago

  • Keywords needs-testing added

Patch 30866.2.patch still applies and works on 4.6.

Without it, running unit tests on MAMP fail.

#8 in reply to: ↑ 4 @boonebgorges
7 years ago

  • Milestone changed from Awaiting Review to WordPress.org
  • Resolution set to fixed
  • Status changed from new to closed

Replying to ianmjones:

@boonebgorges I have the same problem as @edwardkim, 100% when running the unit tests with MAMP Pro.

I've refreshed and slightly updated the patch, it looks good to me and works well.

The reason that the checks for being in the tag are needed is documented in the comments for xml_set_character_data_handler by quite a few people hitting the same or related problems (http://php.net/manual/en/function.xml-set-character-data-handler.php).

The reason why the data() function concatenates the return rather than overwriting is the same reason that every call should not trim the data, the handler for xml_set_character_data_handler may be called multiple times for the same element depending on various circumstances that split the string. Non-ascii characters tend to trigger the breaking up of the contents and multiple iterations, but for some reason MAMP Pro's current build of PHP seems to split the calls at a relatively short string length that causes the issue to arise for me in the unit tests if the data is always trimmed.

Sorry for the delay in responding, @ianmjones. This reasoning makes sense to me. Fixed in https://github.com/boonebgorges/wordpress-importer/commit/6a59d32740edbea9e7ec174a47362b985716b139 and it will be part of the next release.

Note: See TracTickets for help on using tickets.