#9317 closed defect (bug) (fixed)
Safari 4 + Visual Text Editor
| Reported by: |
|
Owned by: |
|
|---|---|---|---|
| Milestone: | 2.8 | Priority: | high |
| Severity: | normal | Version: | 2.7.1 |
| Component: | TinyMCE | Keywords: | tinymce has-patch has-unit-tests |
| Focuses: | Cc: |
Description
In WP 2.7.1 with Safari 4, using TinyMCE, clicking on the Link button causes WordPress to "grey screen".
Verified.
Change History (5)
#1
@
17 years ago
- Milestone changed from Unassigned to 2.8
- Resolution set to fixed
- Status changed from new to closed
This ticket was mentioned in PR #9307 on WordPress/wordpress-develop by @dmsnell.
7 months ago
#2
- Keywords has-patch added
Currently here as an experiment.
Related #9317.
This ticket was mentioned in PR #6883 on WordPress/wordpress-develop by @dmsnell.
6 months ago
#3
- Keywords has-unit-tests added
See also #9317
## Status
This is an exploratory patch so far.
## Tasks
- [ ] Test behavior of new functions.
- [ ] Benchmark performance.
## Motivation
_mb_strlen()attempts to split a string on UTF-8 boundaries, falling back to assumed character patterns if it can't run Unicode-supported PCRE patterns.wp_check_invalid_utf8()performs similar PCRE-based counting.- If sending HTML in any encoding other than UTF-8, it's important not to perform basic conversion with a function like
mb_convert_encoding()oriconv()because these can lead to data loss for characters not representable in the target encoding. Althoughmb_encode_numericentity()exists with the multi-byte extension, having a streaming UTF-8 decoder would allow WordPress to handle proper conversion to numeric character references natively and universally. E.g. converting…to… - URL detection and XML name parsing requires detecting sequences of bytes with specific Unicode ranges, like a Unicode-aware
strspn(), and should stop as soon as any given character is outside of that range.
## Description
WordPress relies on various extensions, regular expressions, and basic string operations when working with text potentially encoded as UTF-8.
In this patch an efficient UTF-8 decoding pipeline is introduced which can remove these dependencies, normalize all decoding behaviors, and open up new kinds of processing opportunities.
The decoder was taken from [Björn Höhrmann]. While it may be possible that other methods are more efficient, such as in the multi-byte extension, this decoder provides a streamable interface useful for more flexible kinds of processing: for example, whether or not to replace invalid byte sequences, zero-memory-overhead code point counting, and partially decoding strings.
[Björn Höhrmann]: http://bjoern.hoehrmann.de/utf-8/decoder/dfa/
Fixed with [10791]