Make WordPress Core

Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#24121 closed defect (bug) (duplicate)

Blank title caused by PHP 5.4 htmlspecialchars() changes

Reported by: trevhcs's profile trevHCS Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.5.1
Component: Formatting Keywords:
Focuses: Cc:

Description

Due to changes in PHP 5.4 within the htmlspecialchars() function, non UTF-8 characters in a post title will cause said title to go blank.

This is similar behaviour to ticket ID #23688 except:

  • That ticket affected the body of the post not the title.
  • This may require a slightly diff solution.
  • The affected code is in two separate scripts.

Scenario:

  • You add / edit a post and give it a title containing "You’re"
  • You save the post and it appears on the site correctly.
  • However, the admin -> post screen looses the title due to the ’
  • Any further updates will lose the title from the public blog.

Offending character in this case is , fancy quote mark, but any non UTF-8 character will do the same, eg: the Euro symbol.

Problem: This occurs in edit-form-advanced.php around line 331 where it says:

<?php echo esc_attr( htmlspecialchars( $post->post_title ) ); ?>

Suggested solutions: My reading of the code is that esc_attr() does basically the same thing in this case as htmlspecialchars() so perhaps removing htmlspecialchars would work?

If not, a similar solution to that other ticket could be used, but it would likely need to be something like below, although see the notes in the other ticket about normalising blog_charset.

<?php echo esc_attr( htmlspecialchars( $post->post_title, ENT_SUBSTITUTE, get_option( 'blog_charset' ) ) ); ?>

I have tested with the alternative ENT_DISALLOWED but that seems to cause blank titles too.

Finally - I wasn't 100% sure if this should be a new bug or related to the previous ticket, but as that one is old I didn't want this important problem to be missed as it affects the very nature of blog publishing.

Change History (4)

#1 @toscho
11 years ago

  • Cc info@… added

Just for the record: non UTF-8 character is misleading. ’ or € are part of UTF-8, they just have to be encoded correctly. Does your blog run with a legacy encoding like ISO-8859-1?

#2 @trevHCS
11 years ago

  • Cc trevattdp@… added

That does make more sense now - was never 100% sure about the UTF-8 cause.

After doing more tests, I can conform this and the linked post content problem occur when the database is using something like "latin_swedish_ci" as the table collation as one of the blogs we run has "utf8_general_ci" and that does not suffer this problem.

As for page encoding, that seems set as ISO-8859-1 on those non UTF-8 blogs in the database 'options' table.

So it looks like both problems will affect older blogs before 'DB_CHARSET' in wp-config became utf8 by default at a guess?

As a side note - I've also found the same problem in the comments editor too but not completely sure where in the code yet.

Version 0, edited 11 years ago by trevHCS (next)

#3 @SergeyBiryukov
11 years ago

  • Keywords needs-patch removed
  • Milestone Awaiting Review deleted
  • Status changed from new to closed

I wasn't 100% sure if this should be a new bug or related to the previous ticket, but as that one is old I didn't want this important problem to be missed as it affects the very nature of blog publishing.

#23688 appears to be a manifestation of the same bug and is assigned to the 3.6 milestone. Let's continue the discussion there.

#4 @SergeyBiryukov
11 years ago

  • Resolution set to duplicate
Note: See TracTickets for help on using tickets.