Opened 7 weeks ago
Last modified 6 weeks ago
#64457 new enhancement
Early filter invalid hosts in wp_http_validate_url
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Milestone: | Future Release | Priority: | normal |
| Severity: | normal | Version: | |
| Component: | HTTP API | Keywords: | needs-patch |
| Focuses: | performance | Cc: |
Description (last modified by )
A little performance improvement in wp_http_validate_url, early returning in the presence of invalid values.
Theoretically, a hostname check for a TLD with underscores will never succeed when calling gethostbyname but given how “expensive” in performance this function is, maybe early returning for a $host in the presence of any invalid value will save that call for certain malformed URLs.
Attachments (1)
Change History (23)
#2
@
7 weeks ago
That’s a fair question, and I agree with the underlying point.
The reason the ticket originally mentions underscores specifically is mostly practical: _ is a very common mistake in hostnames, and it’s a case where we can be confident that gethostbyname() will never succeed. So the idea was to avoid that call early in an obviously invalid case, purely as a small performance win.
But you’re right that, once we start thinking about this more generally, singling out underscores feels a bit arbitrary. If we’re going to short-circuit at this stage, it probably makes more sense to do a broader “is this a valid hostname at all?” check and bail early for any disallowed characters, rather than hard-coding one specific case.
I’m open to adjusting the scope of the ticket in that direction if that’s the better approach.
#3
@
7 weeks ago
- Keywords needs-patch added; 2nd-opinion removed
- Milestone changed from Awaiting Review to Future Release
@westonruter true, I was triaging a GB report which happened to not only enable underscores but also test truthy for underscore domains and I was a little TF?
In fact, I was thinking that we should be filtering input with FILTER_VALIDATE_DOMAIN
@manhphucofficial if you want take this.
#4
@
7 weeks ago
- Description modified (diff)
- Summary changed from Avoid underscores for hosts in wp_http_validate_url to Early filter invalid hosts in wp_http_validate_url
#5
@
7 weeks ago
Thanks for updating the summary and description — that aligns well with the direction discussed.
I’ll work on a small patch using early hostname validation (e.g. FILTER_VALIDATE_DOMAIN) and follow up with tests.
#6
follow-up:
↓ 7
@
7 weeks ago
Unless the situation has changed, I don't believe we can use filter_var(). See https://github.com/WordPress/wordpress-develop/blob/1bd29b14806f471f3ba1df0dc0e86b6aaae27b1e/src/wp-includes/functions.php#L7326-L7327
#7
in reply to:
↑ 6
@
7 weeks ago
Replying to westonruter:
Unless the situation has changed, I don't believe we can use
filter_var(). See https://github.com/WordPress/wordpress-develop/blob/1bd29b14806f471f3ba1df0dc0e86b6aaae27b1e/src/wp-includes/functions.php#L7326-L7327
But like iconv, filter is recommended but not forced
https://make.wordpress.org/hosting/handbook/server-environment/
In this case, like we can see in other places, basically what I would do is a conditional function_exists, for those who have this, they will get a little performance upgrade; for those who doesn't, they will have to rely on gethostbyname
Meanwhile I'm going to ask in meta to see the % of installations with filter in place. I can't believe is anything below 99% nowadays (iconv is 99%, but I never disputed it, because nowadays mbstring almost replaces 100% of the iconv usage)
#8
@
7 weeks ago
I would support the function_exists() check and make sure the Filter extension is included among the suggested extensions, like cURL is. I see Filter is included in the Hosting handbook (as you noted already): https://make.wordpress.org/hosting/handbook/server-environment/#php-extensions
So yes, I think we should be safe to use Filter, if we add safeguards for it not being enabled.
#9
@
7 weeks ago
Patch attached.
Uses early hostname validation via Filter when available, with a fallback to the existing behavior otherwise. Includes a test for underscore hostnames.
#10
@
7 weeks ago
- Priority changed from low to normal
@manhphucofficial please put that patch in a pull request so we can better review and see the tests passing on all environments.
This ticket was mentioned in PR #10669 on WordPress/wordpress-develop by @manhphucofficial.
7 weeks ago
#11
- Keywords has-patch has-unit-tests added; needs-patch removed
Fixes #64457.
Adds early hostname validation using the Filter extension when available, while falling back to the existing behavior when it’s not. Includes a test case for underscore hostnames.
#12
@
7 weeks ago
- Keywords needs-patch added; has-patch has-unit-tests removed
Sure — I’ve opened a PR so the patch can be reviewed with CI:
https://github.com/WordPress/wordpress-develop/pull/10669
@manhphucofficial commented on PR #10669:
7 weeks ago
#13
Thanks for the review!
I’ve updated the patch to address all the points raised:
- hostname validation now only applies when the host is not an IPv4 address
- removed the FILTER_VALIDATE_IP check and related constant assumptions
- added test coverage for underscores in hostnames
Please let me know if anything should be adjusted further. Appreciate you taking a look!
@manhphucofficial commented on PR #10669:
7 weeks ago
#14
Thanks for the feedback!
I’ve updated the patch to address all the points:
- switched the check to
extension_loaded( 'filter' )as suggested - kept IPv4 handling separate to avoid affecting IP-based hosts
- added a test case for a valid IP host (
https://1.1.1.1/)
Happy to adjust further if there’s anything else you’d like me to refine.
@westonruter commented on PR #10669:
7 weeks ago
#15
@SirLouen what do you think?
#16
@
6 weeks ago
Story time...
In the lead up to and during the early days of the Iraq war in the early 2000's, Salam Pax blogged anonymously about the goings on in Iraq. Selected entries from his blog were subsequently released in a book titled he Baghdad Blog.
The blog, which is still available online, was hosted on Blogger at dear_raed.blogspot.com. The domain resolved at the time and continues to resolve now.
Arguably, Blogger should never have allowed sub-domains with underscores to be used but they did. I assume the case is true for other services as well.
My point is that the practical often differs from the theoretical, as is the case for DNS resolution. In its current form wp_http_validate_url() handles sub-domains with underscores and that will need to be the case in the future.
Testing the current pull request with the domain used by Salam Pax shows a change in behaviour that will need to be accounted for:
vagrant@wp-dev:/vagrant/wordpress-develop$ git checkout trunk Switched to branch 'trunk' Your branch is up to date with 'origin/trunk'. vagrant@wp-dev:/vagrant/wordpress-develop$ wp eval "var_dump( wp_http_validate_url( 'https://dear_raed.blogspot.com/' ) );" eval()'d code:1: string(31) "https://dear_raed.blogspot.com/" vagrant@wp-dev:/vagrant/wordpress-develop$ git checkout 64457-early-filter-invalid-hosts Switched to branch '64457-early-filter-invalid-hosts' vagrant@wp-dev:/vagrant/wordpress-develop$ wp eval "var_dump( wp_http_validate_url( 'https://dear_raed.blogspot.com/' ) );" eval()'d code:1: bool(false)
@westonruter commented on PR #10669:
6 weeks ago
#17
I asked Gemini to review the changes and it had some helpful feedback:
I have completed the review of the changes.
The changes introduce stricter validation for hostnames in wp_http_validate_url() using filter_var() with FILTER_VALIDATE_DOMAIN, which is a good improvement. However, I identified a critical regression regarding IPv6 support and a minor edge case with numeric hostnames.
### Review Findings
- Critical Issue: IPv6 Support Regression
- Observation: The new validation block runs when
$is_ipv4is false. If the URL contains an IPv6 literal (e.g.,http://[::1]/),$is_ipv4will be false. Thefilter_var( '[::1]', FILTER_VALIDATE_DOMAIN, ... )call returnsfalsefor bracketed IPv6 addresses, causingwp_http_validate_urlto returnfalseimmediately. - Impact: Valid IPv6 URLs will be rejected.
- Recommendation: The check should be skipped if the host appears to be an IPv6 literal. Since
parse_urlpreserves brackets for IPv6 hosts, checking if$hoststarts with[would be sufficient (e.g.,&& strpos( $host, '[' ) === false).
- Observation: The new validation block runs when
- Minor Issue: Handling of "0" Hostname
- Observation:
filter_var( '0', FILTER_VALIDATE_DOMAIN, ... )returns the string"0". In PHP,! "0"evaluates totrue. This causes the check! filter_var(...)to pass (evaluating as "invalid") for the hostname"0", returningfalse. - Recommendation: Use strict comparison
=== falseto ensure only actual validation failures trigger the early return. - Code:
&& false === filter_var( ... )
- Observation:
- Tests
- The added test case
underscore_in_hostnamecorrectly asserts that underscores are now invalid in domain names, which aligns with theFILTER_FLAG_HOSTNAMEbehavior. - Suggestion: It would be beneficial to add a test case for an IPv6 literal (e.g.,
http://[::1]/) to ensure this functionality is preserved and to prevent future regressions.
- The added test case
- Code Style & Compatibility
- The code adheres to WordPress coding standards (indentation, spacing).
- PHP 7.2 compatibility is maintained (
FILTER_VALIDATE_DOMAINis available since 7.0).
### Summary
The logic improvement is sound but needs to account for IPv6 literals to avoid breaking support for them. I recommend adjusting the condition to exclude IPv6 hosts and using strict comparison for the filter_var result.
I will not modify the code myself but I present these findings for the user to act upon.
#18
@
6 weeks ago
My point is that the practical often differs from the theoretical, as is the case for DNS resolution. In its current form wp_http_validate_url() handles sub-domains with underscores and that will need to be the case in the future.
Trying to create a blogspot account now with an underscore.
It appears that Google has evolved.
Luckily, @peterwilsoncc has a vast memory to recall one of those in ten million cases.
Still, if we would like to play with Jurassic Park rules and avoid the T-Rex could escape from the enclosure, we could stick just to the domain, schema and tld part, because, in reality is the only thing that is sticking to the real RFC rulings (from there any kind of subdomain sublevel could be technically the jungle).
So (take notes for unit tests):
h_ttp://example.orgshould be invalidhey_ho_lets_go._example.orgshould be invalidomg.c_omshould be invalidpeter_is_amazing.example.orgshould be VALID
@SirLouen commented on PR #10669:
6 weeks ago
#19
@manhphuc check some additional suggestions in the Core Trac thread.
@SirLouen commented on PR #10669:
6 weeks ago
#20
@westonruter I was suspicious that filter adoption is 100% by now and I have been confirmed
We can still be conservative but I believe its time to update the Core docs and simply add filter in the pack of mandatory (and still no one will notice anything).
@SirLouen commented on PR #10669:
6 weeks ago
#21
@manhphuc check the core ticket. Specially because the particular case you used for the unit test, seemed to be conflictive. You can add a bunch of extra unit tests as I commented in the reply. I think we can move this forward.
@manhphucofficial commented on PR #10669:
6 weeks ago
#22
Thanks everyone for the detailed feedback and edge-case examples.
I’ve updated the hostname validation logic to avoid regressing legacy hosts that include underscores in subdomains (e.g. Blogspot), while still rejecting underscores in the registrable domain / TLD.
The implementation now:
- Skips FILTER_VALIDATE_DOMAIN for IPv6 literals
- Allows underscores in subdomains, but not in the registrable domain portion
- Preserves existing behavior for valid legacy hosts
I’ve also added unit tests covering the cases discussed in the Trac thread:
- h_ttp://example.org (invalid)
- https://hey_ho_lets_go._example.org (invalid)
- https://omg.c_om (invalid)
- https://peter_is_amazing.example.org (valid)
All HTTP-related PHPUnit tests are passing locally.

Why underscores alone? Shouldn't it short-circuit for any character which isn't allowed in a host name?