WordPress.org

Make WordPress Core

Opened 11 months ago

Closed 8 months ago

Last modified 7 months ago

#37989 closed defect (bug) (fixed)

Unexpected change to media title behavior in WP 4.6.1

Reported by: arhenderson63 Owned by: joemcgill
Milestone: 4.7 Priority: normal
Severity: major Version: 4.6.1
Component: Media Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

Managing a media-heavy site that auto-upgraded to WP 4.6.1 overnight (8 Sept 2016). On new media uploads, WP is now changing the title to a concatenated string with escape characters rather than leaving the title as the original filename. This is an issue as I have relied on clean original filenames as titles for images; now each image title must be individually edited after upload as the WP-assigned title is decidedly unfriendly to audience users. This behavior occurs only on new uploads and does not effect already resident media.

Example:
Upload a file called 'Acer ×freemanii 'Jeffersred' (Autumn Blaze®); 2½ in.jpg'

BEHAVIOR IN WP 4.6.0 and lower
WP Filename: %ROOT%/wp-content/uploads/Acer-×freemanii-Jeffersred-Autumn-Blaze®-2½-in.jpg
WP Title: Acer ×freemanii 'Jeffersred' (Autumn Blaze®); 2½ in

BEHAVIOR IN WP 4.6.1
WP Filename: %ROOT%/wp-content/uploads/Acer-×freemanii-Jeffersred-Autumn-Blaze®-2½-in.jpg
WP Title: acer-xfreemanii-jeffersred-autumn-blaze-2%c2%bd-in

Attachments (4)

37989.diff (702 bytes) - added by joemcgill 10 months ago.
37989.2.diff (663 bytes) - added by joemcgill 10 months ago.
37989.3.diff (1.9 KB) - added by joemcgill 10 months ago.
37989.4.diff (678 bytes) - added by SergeyBiryukov 10 months ago.

Download all attachments as: .zip

Change History (79)

#1 follow-up: @joemcgill
11 months ago

  • Keywords needs-patch added
  • Milestone changed from Awaiting Review to 4.6.2
  • Owner set to joemcgill
  • Status changed from new to accepted

Hi @arhenderson63,

Thanks for the report. This looks like a consequence of [38538], which was an intentional change but may be more strict than is necessary.

#2 @joemcgill
11 months ago

#37989 was marked as a duplicate.

[Edit: Incorrect ticket number.]

Last edited 11 months ago by joemcgill (previous) (diff)

#3 @SergeyBiryukov
11 months ago

And if the title is in Cyrillic, it gets urlencoded and becomes completely unreadable, e.g. Изображение 1 turns into %d0%98%d0%b7%d0%be%d0%b1%d1%80%d0%b0%d0%b6%d0%b5%d0%bd%d0%b8%d0%b5%201 :( Same for Arabic, Japanese, and other non-ASCII alphabets, I guess.

Confirmed on all releases [38538] was backported to: 3.7.16, 3.8.16, 3.9.14, 4.0.13, 4.1.13, 4.2.10, 4.3.6, 4.4.5, 4.5.4, 4.6.1.

#4 @joemcgill
11 months ago

#37987 was marked as a duplicate.

This ticket was mentioned in Slack in #forums by sergey. View the logs.


11 months ago

#6 in reply to: ↑ 1 @arhenderson63
11 months ago

@joemcgill

In examining the code for [38538], it appears that the sanitization routine is implemented for the title as well. I can understand the logic for sanitizing filenames, but title is overkill; especially as there is (a very tedious) work-around by manually editing the media title in WP Media Manager.

#7 @jeremyfelt
11 months ago

It would be helpful to start building a list of filenames and expected titles that are based on those filenames so that we can build out tests. A list in tests is also being built as part of #22363 that should also be useful here.

One key distinction between these tickets is that here we're worried about sanitizing the filename for use as the attachment's post title. In #22363, we're focused on sanitizing the filename for use as an actual filename. There will likely be some difference is how that should be handled. This is seen in @arhenderson63's report too, where the following is expected:

  • Upload Acer ×freemanii 'Jeffersred' (Autumn Blaze®); 2½ in.jpg
  • Stored filename Acer-×freemanii-Jeffersred-Autumn-Blaze®-2½-in.jpg
  • Attachment title Acer ×freemanii 'Jeffersred' (Autumn Blaze®); 2½ in

Also worth noting is that [38294] has changed the behavior in 4.7 trunk so that the file extension remains as part of the attachment title. We'll want to fix that as well, but it may cause confusion short term with any tests that are written.

This ticket was mentioned in Slack in #core-images by mike. View the logs.


11 months ago

This ticket was mentioned in Slack in #forums by macmanx. View the logs.


11 months ago

#10 @ocean90
11 months ago

#38016 was marked as a duplicate.

#11 follow-ups: @SergeyBiryukov
10 months ago

A workaround:

function wp37989_fix_encoded_attachment_titles( $data ) {
	if ( empty( $_FILES ) ) {
		return $data;
	}

	$file = current( $_FILES );
	$ext  = pathinfo( $file['name'], PATHINFO_EXTENSION );
	$name = wp_basename( $file['name'], ".$ext" );

	$data['post_title'] = sanitize_text_field( $name );

	return $data;
}
add_filter( 'wp_insert_attachment_data', 'wp37989_fix_encoded_attachment_titles' );
Last edited 10 months ago by SergeyBiryukov (previous) (diff)

#12 in reply to: ↑ 11 @folletti
10 months ago

  • Focuses performance removed

Replying to SergeyBiryukov:

A workaround:

function wp37989_fix_encoded_attachment_titles( $data, $postarr ) {
	$basename = pathinfo( $postarr['file'], PATHINFO_BASENAME );

	$data['post_title'] = sanitize_text_field( $basename );

	return $data;
}
add_filter( 'wp_insert_attachment_data', 'wp37989_fix_encoded_attachment_titles', 10, 2 );

I have test but also imports the extension, was not there before with 4.6.0
ex. DSC_06060.jpg = with 4.6.0 is DSC_06060 and not DSC_06060.jpg

Last edited 10 months ago by folletti (previous) (diff)

#13 in reply to: ↑ 11 ; follow-up: @richdixon
10 months ago

Replying to SergeyBiryukov:

A workaround:

function wp37989_fix_encoded_attachment_titles( $data, $postarr ) {
	$basename = pathinfo( $postarr['file'], PATHINFO_BASENAME );

	$data['post_title'] = sanitize_text_field( $basename );

	return $data;
}
add_filter( 'wp_insert_attachment_data', 'wp37989_fix_encoded_attachment_titles', 10, 2 );

Can you help me with using this workaround? Do I just paste the code in my child theme's functions.php code? I tried that and it didn't seem to work.

#14 in reply to: ↑ 13 ; follow-up: @SergeyBiryukov
10 months ago

Replying to richdixon:

Do I just paste the code in my child theme's functions.php code?

Correct. It only works for newly uploaded images though, existing ones will have to be renamed manually.

#15 in reply to: ↑ 14 @richdixon
10 months ago

Replying to SergeyBiryukov:

Replying to richdixon:

Do I just paste the code in my child theme's functions.php code?

Correct. It only works for newly uploaded images though, existing ones will have to be renamed manually.

I pasted the coded in child theme's functions.php.

Uploaded an image with filename 2016-frftchurch-bg.png

When I resized the image it was re-named: 2016-frftchurch-bg-e1473807797947.png

What am I doing wrong?

#16 in reply to: ↑ 11 @arhenderson63
10 months ago

@SergeyBiryukov

I've tested the workaround; while it's going in the right direction; still does not produce clean, usable media titles as were present in 4.6.0 and prior.

  • Upload filename: 'Acer ×freemanii 'Jeffersred' (Autumn Blaze®); 2½ in.jpg'
  • Pre-WP 4.6.1 WP Title: Acer ×freemanii 'Jeffersred' (Autumn Blaze®); 2½ in
  • WP 4.6.1 title (prepatch): acer-xfreemanii-jeffersred-autumn-blaze-2%c2%bd-in
  • WP 4.6.1 title (w/ patch): Acer-×freemanii-Jeffersred-Autumn-Blaze®-2½-in.jpg

So capitalisation is now retained as are certain higher-order ASCII characters. But dashes are still swapped for spaces, quotes and parentheses are dropped and the file extension is now appended to the title. It appears that this is just a different sanitised string; the original media filename should be passed to the WP title field as presented, not sanitised.

ETA 0133z 2016-09-14. It also appears that the function is somehow preventing manual edits to the media title from being saved in the WP database.

Replying to SergeyBiryukov:

A workaround:

function wp37989_fix_encoded_attachment_titles( $data, $postarr ) {
	$basename = pathinfo( $postarr['file'], PATHINFO_BASENAME );

	$data['post_title'] = sanitize_text_field( $basename );

	return $data;
}
add_filter( 'wp_insert_attachment_data', 'wp37989_fix_encoded_attachment_titles', 10, 2 );
Last edited 10 months ago by arhenderson63 (previous) (diff)

#17 in reply to: ↑ 11 ; follow-up: @vetalv
10 months ago

Don't work.

Warning: Cannot modify header information - headers already sent by (output started at /var/www/........../functions.php:1) in /var/www/............./wp-admin/async-upload.php on line 35

Replying to SergeyBiryukov:

A workaround:

function wp37989_fix_encoded_attachment_titles( $data, $postarr ) {
	$basename = pathinfo( $postarr['file'], PATHINFO_BASENAME );

	$data['post_title'] = sanitize_text_field( $basename );

	return $data;
}
add_filter( 'wp_insert_attachment_data', 'wp37989_fix_encoded_attachment_titles', 10, 2 );

#18 in reply to: ↑ 17 ; follow-up: @SergeyBiryukov
10 months ago

Replying to vetalv:

Don't work.

Warning: Cannot modify header information - headers already sent by (output started at /var/www/........../functions.php:1) in /var/www/............./wp-admin/async-upload.php on line 35

Please don't use default Windows Notepad to edit PHP files. It's known to cause the "headers already sent" warning due to a byte order mark (BOM) it inserts into UTF-8 files.

The file should be re-saved in UTF-8 without BOM using Notepad++ or other editor.

See How do I solve the Headers already sent warning problem? (English) or Cannot modify header information (Russian) for more info.

#19 in reply to: ↑ 18 @vetalv
10 months ago

thanks SergeyBiryukov
UTF-8 without BOM - OK.
But now title and ALT of image "file-name-nomber-one.jpg"

file name "file name nomber one.jpg" and title and alt should be ""file name nomber one"

Replying to SergeyBiryukov:

Replying to vetalv:

Don't work.

Warning: Cannot modify header information - headers already sent by (output started at /var/www/........../functions.php:1) in /var/www/............./wp-admin/async-upload.php on line 35

Please don't use default Windows Notepad to edit PHP files. It's known to cause the "headers already sent" warning due to a byte order mark (BOM) it inserts into UTF-8 files.

The file should be re-saved in UTF-8 without BOM using Notepad++ or other editor.

See How do I solve the Headers already sent warning problem? (English) or Cannot modify header information (Russian) for more info.

#20 @hcarsten
10 months ago

add_filter( 'wp_insert_attachment_data', 'fix_encoded_attachment_titles', 10, 2 );
function fix_encoded_attachment_titles( $data, $postarr ) {
	$basename = pathinfo( $postarr['file'], PATHINFO_FILENAME );
	$data['post_title'] = sanitize_text_field($basename );
	
	return $data;
}

I changed the work-a-round a bit to get the same behavior as before, so setting the filename (without extension) as the post-title.

best

@joemcgill
10 months ago

#21 @joemcgill
10 months ago

37989.diff Should restore the previous behavior, by using sanitize_text_field() instead of sanitize_title() to sanitize the post titles that are generated. This also fixes the addition of file extensions to file names, which was introduced in [38294].

@joemcgill
10 months ago

#22 @joemcgill
10 months ago

I should have read through the comments sooner :)

37989.2.diff uses PATHINFO_FILENAME instead of PATHINFO_BASENAME to get the file name without an extension, as proposed by @hcarsten above.

#23 @joemcgill
10 months ago

  • Keywords has-patch needs-unit-tests added; needs-patch removed

@joemcgill
10 months ago

#24 @joemcgill
10 months ago

  • Keywords has-unit-tests added; needs-unit-tests removed

37989.3.diff simplifies slightly from 37989.2.diff and adds a unit test to at least cover the most basic expected behavior.

It would be nice to be able to create unit tests for multiple edge cases, but I would first want to refactor media_handle_upload() so the logic that creates the attachment parameters was isolated from the process of uploading an image and inserting the data into the database. Currently, the only way to test this behavior is to execute an upload of a file, and then test to see if the title of the post created for that attachment that is matches our expected file name. If we were to create a large number of these tests, the upload process would become a pretty big drag on our test suite, which I'd like to avoid.

This ticket was mentioned in Slack in #forums by clorith. View the logs.


10 months ago

#26 follow-up: @oldrup
10 months ago

Am experiencing this issue on all my wp 4.6.1 sites as well - same stuff, uploading PDF files, titles gets sanitized. Must edit and manually fix all titles afterward. Hope this gets fixed :)

#27 in reply to: ↑ 26 ; follow-up: @hcarsten
10 months ago

Replying to oldrup:

Am experiencing this issue on all my wp 4.6.1 sites as well - same stuff, uploading PDF files, titles gets sanitized. Must edit and manually fix all titles afterward. Hope this gets fixed :)

Hi,
try my fix comment 20 above. We had the same problem with PDF files also.

best

#28 in reply to: ↑ 27 @oldrup
10 months ago

Oh thanks - not to be ungrateful, but titles are still sanitized, so if I choose to upload media with the name:
"Dette er en test, æøå ÆØÅ & halløj"

I expect the title to be exactly that, and not
"Dette-er-en-test-æøå-ÆØÅ-halløj"

As a previously poster mentioned, sanitizing the FILENAME makes good sense, but the TITLE should be left unchanged in my opinion.

Btw https://wordpress.org/plugins/code-snippets/ is great for testing fixes like that without having to edit the files manually.

Anyone able to create a fix, that leaves the title alone?

Replying to hcarsten:

Replying to oldrup:

Am experiencing this issue on all my wp 4.6.1 sites as well - same stuff, uploading PDF files, titles gets sanitized. Must edit and manually fix all titles afterward. Hope this gets fixed :)

Hi,
try my fix comment 20 above. We had the same problem with PDF files also.

best

#29 @swissspidy
10 months ago

#38075 was marked as a duplicate.

This ticket was mentioned in Slack in #core by janr. View the logs.


10 months ago

This ticket was mentioned in Slack in #core-images by janr. View the logs.


10 months ago

#32 @aaroncampbell
10 months ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 38614:

Media: Improved media titles when created from filename.

Preserves spaces and generally creates more accurate, cleaner titles from filenames of uploaded media.

Props joemcgill.
Fixes #37989.

#33 @aaroncampbell
10 months ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

With that patch:

  • Filename: Acer ×freemanii 'Jeffersred' (Autumn Blaze®); 2½ in.jpg
  • Title: Acer ×freemanii 'Jeffersred' (Autumn Blaze®); 2½ in
  • Filename: Dette er en test, æøå ÆØÅ & halløj.jpg
  • Title: Dette er en test, æøå ÆØÅ & halløj

Also reopening for 4.6 branch

#34 follow-up: @aaroncampbell
10 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 38615:

Media: Improved media titles when created from filename.

Preserves spaces and generally creates more accurate, cleaner titles from filenames of uploaded media.

Merge of [38614] to the 4.6 branch.

Props joemcgill.
Fixes #37989.

#35 @ocean90
10 months ago

#38087 was marked as a duplicate.

#36 @SergeyBiryukov
10 months ago

#38090 was marked as a duplicate.

#37 @Coptic-Treasures
10 months ago

  • Resolution fixed deleted
  • Severity changed from normal to major
  • Status changed from closed to reopened

Arabic (UTF-8) filenames are now shown with percentage encoding too.
The files are uploaded with correct file names to the server, and the slugs are shown with correct Arabic characters, but the file names in the media library are percentage encoded.
This means the users have to ename every single file they upload to the media library.

#38 in reply to: ↑ 34 @Coptic-Treasures
10 months ago

Replying to aaroncampbell:

In 38615:

Media: Improved media titles when created from filename.

Preserves spaces and generally creates more accurate, cleaner titles from filenames of uploaded media.

Merge of [38614] to the 4.6 branch.

Props joemcgill.
Fixes #37989.

Hi,
Does this mean that people now have to either live with a media library that looks like this:
https://test.coptic-treasures.com/wp-content/uploads/2016/09/02-media-library-view.jpg
or manually rename every single file after upload?
thanks

#39 follow-up: @joemcgill
10 months ago

@Coptic-Treasures can you make sure you testing using the latest nightly version of WordPress? In my testing, filenames with Arabic (or other non-english UTF-8 characters) are not being URL encoded after this change.

#40 in reply to: ↑ 39 @Coptic-Treasures
10 months ago

Hi @joemcgill
I installed latest nightly version and now the Arabic names do not get converted to URL encoded.
However, the first word of the file name gets omitted.
Please check this screenshot (Arabic is RTL so the first word is the right most word).

صورة باللغة العربية

is now saved as:

باللغة-العربية


https://test.coptic-treasures.com/wp-content/uploads/2016/09/new-arabic.jpg

Replying to joemcgill:

@Coptic-Treasures can you make sure you testing using the latest nightly version of WordPress? In my testing, filenames with Arabic (or other non-english UTF-8 characters) are not being URL encoded after this change.

This ticket was mentioned in Slack in #core by joemcgill. View the logs.


10 months ago

This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.


10 months ago

This ticket was mentioned in Slack in #core by swissspidy. View the logs.


10 months ago

#44 follow-up: @SergeyBiryukov
10 months ago

  • Keywords commit added

Finally got a chance to look at this ticket again.

For the 4.6 branch, [38615] is enough, no additional fixes needed. (However, I think the issue is important enough to backport [38615] to other affected branches listed in comment:3, since media management is essentially broken for non-ASCII locales.)

For the 4.7 branch, it's a bit more complicated. The issue with pathinfo() is that it depends on PHP locale and does not always work correctly with UTF-8 characters, leading to an issue with truncated titles mentioned in comment:40.

We've encountered similar issues with basename() before, see #21217 and #23267.

37989.4.diff uses wp_basename(), which is the i18n-friendly version of basename(), designed to work correctly with multibyte file names.

I've also updated my workaround in comment:11 to use the same solution. (It should work in 3.9.14+, as the wp_insert_attachment_data filter does not exist in 3.7.x or 3.8.x).

#45 in reply to: ↑ 44 ; follow-up: @Coptic-Treasures
10 months ago

@SergeyBiryukov :
Thanks a lot for the detailed reply.
Sorry I am new to the system. Does your reply means that I need to provide more testing, or this issue will be looked after by the Wordpress team?
Thanks.
.
Replying to SergeyBiryukov:

Finally got a chance to look at this ticket again.

For the 4.6 branch, [38615] is enough, no additional fixes needed. (However, I think the issue is important enough to backport [38615] to other affected branches listed in comment:3, since media management is essentially broken for non-ASCII locales.)

For the 4.7 branch, it's a bit more complicated. The issue with pathinfo() is that it depends on PHP locale and does not always work correctly with UTF-8 characters, leading to an issue with truncated titles mentioned in comment:40.

We've encountered similar issues with basename() before, see #21217 and #23267.

37989.4.diff uses wp_basename(), which is the i18n-friendly version of basename(), designed to work correctly with multibyte file names.

I've also updated my workaround in comment:11 to use the same solution. (It should work in 3.9.14+, as the wp_insert_attachment_data filter does not exist in 3.7.x or 3.8.x).

#46 in reply to: ↑ 45 @SergeyBiryukov
10 months ago

Replying to Coptic-Treasures:

Sorry I am new to the system. Does your reply means that I need to provide more testing, or this issue will be looked after by the Wordpress team?

Once 37989.4.diff is committed to the 4.7 branch and the nightly build is refreshed (within 24 hours after the commit, I guess), the issue with truncated titles should be resolved. If you could keep an eye on the ticket and confirm the fix, that would be great :)

This ticket was mentioned in Slack in #core by joemcgill. View the logs.


10 months ago

#48 @joemcgill
10 months ago

Thanks @SergeyBiryukov. 37989.4.diff looks good to me. I'd still like to refactor so we can add additional unit test without needing to upload actual files each time, but that can be handled separately.

#49 @SergeyBiryukov
10 months ago

For reference, the current code in the 4.7 branch was introduced in [38294] (for #37608).

Left a comment about UTF-8 file names on that ticket as well: comment:14:ticket:37608.

Last edited 10 months ago by SergeyBiryukov (previous) (diff)

#50 @joemcgill
10 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 38673:

Media: Use wp_basename() to create attachment titles from filenames.

In [38294], pathinfo() was used with the PATHINFO_BASENAME constant to
get the basename of the file to be used as an attachment title, which depends
on PHP locale and can cause issues with UTF-8 characters. This uses
wp_basename() instead, which is a more i18n-friendly version of basename().

Props SergeyBiryukov.
Fixes #37608, #37989.

#51 @joemcgill
10 months ago

  • Keywords fixed-major added; commit removed
  • Resolution fixed deleted
  • Status changed from closed to reopened

As @SergeyBiryukov mentioned, [38615] needs to be backported to previous branches, going back to 3.7.

#52 follow-up: @Coptic-Treasures
10 months ago

@joemcgill @SergeyBiryukov
I updated the Wordpress version to (4.7-alpha-38672) now, but the problem still exist.
Please ignore my message if it is not helpful. I am just trying to help by reporting.
Thanks

#53 in reply to: ↑ 52 @joemcgill
10 months ago

Replying to Coptic-Treasures:

@Coptic-Treasures – Thanks for the quick eye. However, you'll have to wait for the next nightly version to get have this fix included.

This ticket was mentioned in Slack in #forums by macmanx. View the logs.


10 months ago

This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.


10 months ago

#56 follow-up: @oldrup
10 months ago

Sorry if obscuring this ticket with irrelevant feedback, but I really needed to find a fix on this. When uploading many files each day, with proper names, its quite a nuisance having to re-enter the title each time..

This snippet works for me, leaving the title alone.

/* Don't sanitize title of uploaded files */
add_filter( 'sanitize_title', function( $changed, $raw ) {
return $raw;
}, PHP_INT_MAX, 2 );

I've got my spaces and my special characters - just as I need it. This is most likely not a proper solution for everyone, but at least is shows what end results is needed by some.

#57 @ihfbib
10 months ago

Same Error in saved file name

http://i.imgur.com/5he82ZX.png

#58 in reply to: ↑ 56 ; follow-up: @SergeyBiryukov
10 months ago

Replying to oldrup:

This snippet works for me, leaving the title alone.

/* Don't sanitize title of uploaded files */
add_filter( 'sanitize_title', function( $changed, $raw ) {
return $raw;
}, PHP_INT_MAX, 2 );

It would also affect post slugs though, as they would no longer be run through sanitize_title_with_dashes() and might contain spaces and other characters: http://example.com/2016/10/My%20new%20post/.

Have you tried the snippet from comment:11?

#59 in reply to: ↑ 58 @oldrup
10 months ago

Dear @SergeyBiryukov

Thank you VERY much for pointing that side-effect out. Obviously I do NOT want spaces and stuff like that in my slugs...

Yes, I just gave the workaround in comment #11 a shot, and it works beautifully, just what I need. The titles are fine again, and slugs not affected. Thanks!

Bjarne

Replying to SergeyBiryukov:

Replying to oldrup:

This snippet works for me, leaving the title alone.

/* Don't sanitize title of uploaded files */
add_filter( 'sanitize_title', function( $changed, $raw ) {
return $raw;
}, PHP_INT_MAX, 2 );

It would also affect post slugs though, as they would no longer be run through sanitize_title_with_dashes() and might contain spaces and other characters: http://example.com/2016/10/My%20new%20post/.

Have you tried the snippet from comment:11?

This ticket was mentioned in Slack in #forums by macmanx. View the logs.


10 months ago

#61 @abda53
9 months ago

There was an issue I had with the proposed fixed. The fix itself worked, but was causing titles to become blank if updating a media item. This was because the fix did not account for an empty $postarrfile? on updates.

This was my fix

<?php
function wp37989_fix_encoded_attachment_titles( $data, $postarr )
{
        if($postarr['file']!='' && $data['post_title']!=''){
                $show_ext = TRUE;
                $basename = pathinfo( $postarr['file'], $show_ext ? PATHINFO_BASENAME : PATHINFO_FILENAME);
                $data['post_title'] = preg_replace("/-/", " ", sanitize_text_field( $basename ));
        }
        return $data; 
}
add_filter( 'wp_insert_attachment_data', 'wp37989_fix_encoded_attachment_titles', 10, 2 );

#62 @swissspidy
9 months ago

#38649 was marked as a duplicate.

#63 @arthurvkuhrmeier
8 months ago

Searching for a solution myself I stumbled upon this thread. Happy to say, I found a way to fix this.

EDIT: My first approach was a bit simpler, but I missed the fact that sanitize_title() is also used for creating slugs for posts etc.

Luckily, WordPress uses two different variables to sanitize the title and the filename. We need to determine if a title is the file we just uploaded. The only way to do this is to browse the $_FILES global.

First, we need to control the way filenames are sanitized. I prefer pure ASCII without spaces.

<?php
//===== Fix sanitized Media Titles - Part I =====
function noobclub_sanitize_filename ($name) {
        //--- If your PHP version does not support transliterator, ---
        //--- you'll have to find a different solution             ---
        $name = transliterator_transliterate('Any-Latin; Latin-ASCII; [\u0080-\u7fff] remove', $name);
        $name = sanitize_file_name(sanitize_title_with_dashes($name));
        return $name;
}

//===== Fix sanitized Media Titles - Part II =====
function noobclub_upload_filter($file) {
        $name = explode('.', $file['name']);
        $ext  = array_pop($name);
        $name = implode('.', $name);
        $file['name'] = noobclub_sanitize_filename($name).'.'.$ext;
        return $file;
}
add_filter('wp_handle_upload_prefilter', 'noobclub_upload_filter');

Next we check if sanitize_title() wants to change our filename. The function also knows the original title, the same we sanitized right after upload. After sanitizing it must match the filename found in the $_FILES global.

<?php
//===== Fix sanitized Media Titles - Part III =====
function noobclub_sanitize_title ($title, $fallback_title, $context) {
        if (count($_FILES)) {
                $fallback = noobclub_sanitize_filename($fallback_title);
                foreach ($_FILES as $file) {
                        $name_parts = pathinfo($file['name']);
                        $name = trim(substr( $file['name'], 0, -(1 + strlen($name_parts['extension'])) ));
                        if  ($name === $fallback && strlen($fallback_title))  return $fallback_title;
                }
        }
        return $title;
}
add_filter('sanitize_title', 'noobclub_sanitize_title', 10, 3);

Finally, I wanted the slug to be the same as the filename:

<?php
//===== Use media file name as slug =====
function noobclub_attachment_slug ($data, $postarr) {
        if ($postarr['file'] != '' && $data['post_title'] != '') {
                $name = explode('/', $data['guid']);
                $name = array_pop($name);
                $name = explode('.', $name);
                //--- You may add a unique prefix for media slugs:     ---
                //--- e.g. $data['post_name'] = 'media-'.reset($name); ---
                $data['post_name'] = reset($name);
        }
        return $data;
}
add_filter('wp_insert_attachment_data', 'noobclub_attachment_slug', 10, 2);

Now, I put together a fictive filename and uploaded the file:

Größe? - S’il (vous) plaît. - 神保彰 - Руслана.png

The result is very satisfying! The title is unaltered, taken from the original file. The filename and the attachment page now look nice, too:

http://noobclub.net/wp-content/uploads/grosse-sil-vous-plait-shen-bao-zhang-ruslana.png
http://noobclub.net/grosse-sil-vous-plait-shen-bao-zhang-ruslana/

I simply put the code in the functions.php of my theme. I hope it is useful for you. Enjoy!

Last edited 8 months ago by arthurvkuhrmeier (previous) (diff)

This ticket was mentioned in Slack in #core-media by helen. View the logs.


8 months ago

#65 follow-up: @joemcgill
8 months ago

  • Milestone changed from 4.7.1 to 4.7
  • Resolution set to fixed
  • Status changed from reopened to closed

Now that the 4.6.2 milestone got renamed 4.7.1, this can be considered resolved because it was already fixed in trunk for 4.7. I'll backport the fix to the 4.6 branch so the fix goes out with our next security release regardless.

#66 in reply to: ↑ 65 @joemcgill
7 months ago

  • Keywords fixed-major removed

Replying to joemcgill:

Now that the 4.6.2 milestone got renamed 4.7.1, this can be considered resolved because it was already fixed in trunk for 4.7. I'll backport the fix to the 4.6 branch so the fix goes out with our next security release regardless.

Looks like [38615] fixed this issue sufficiently for the 4.6 branch. The remaining work here would be to backport the change to use sanitize_text_field() instead of sanitize_title() in all other branches (down to 3.7) where it was applied.

#67 @joemcgill
7 months ago

In 39711:

Media: Improved media titles when created from filename.

Preserves spaces and generally creates more accurate, cleaner titles from filenames of uploaded media.

Merge of [38615] to the 4.5 branch.

Fixes #37989.

#68 @joemcgill
7 months ago

In 39712:

Media: Improved media titles when created from filename.

Preserves spaces and generally creates more accurate, cleaner titles from filenames of uploaded media.

Merge of [38615] to the 4.4 branch.

Fixes #37989.

#69 @joemcgill
7 months ago

In 39713:

Media: Improved media titles when created from filename.

Preserves spaces and generally creates more accurate, cleaner titles from filenames of uploaded media.

Merge of [38615] to the 4.3 branch.

Fixes #37989.

#70 @joemcgill
7 months ago

In 39714:

Media: Improved media titles when created from filename.

Preserves spaces and generally creates more accurate, cleaner titles from filenames of uploaded media.

Merge of [38615] to the 4.2 branch.

Fixes #37989.

#71 @joemcgill
7 months ago

In 39715:

Media: Improved media titles when created from filename.

Preserves spaces and generally creates more accurate, cleaner titles from filenames of uploaded media.

Merge of [38615] to the 4.1 branch.

Fixes #37989.

#72 @joemcgill
7 months ago

In 39716:

Media: Improved media titles when created from filename.

Preserves spaces and generally creates more accurate, cleaner titles from filenames of uploaded media.

Merge of [38615] to the 4.0 branch.

Fixes #37989.

#73 @joemcgill
7 months ago

In 39717:

Media: Improved media titles when created from filename.

Preserves spaces and generally creates more accurate, cleaner titles from filenames of uploaded media.

Merge of [38615] to the 3.9 branch.

Fixes #37989.

#74 @joemcgill
7 months ago

In 39718:

Media: Improved media titles when created from filename.

Preserves spaces and generally creates more accurate, cleaner titles from filenames of uploaded media.

Merge of [38615] to the 3.8 branch.

Fixes #37989.

#75 @joemcgill
7 months ago

In 39719:

Media: Improved media titles when created from filename.

Preserves spaces and generally creates more accurate, cleaner titles from filenames of uploaded media.

Merge of [38615] to the 3.7 branch.

Fixes #37989.

Note: See TracTickets for help on using tickets.