Make WordPress Core

Opened 17 years ago

Closed 17 years ago

Last modified 5 months ago

#9317 closed defect (bug) (fixed)

Safari 4 + Visual Text Editor

Reported by: technosailor's profile technosailor Owned by: azaozz's profile azaozz
Milestone: 2.8 Priority: high
Severity: normal Version: 2.7.1
Component: TinyMCE Keywords: tinymce has-patch has-unit-tests
Focuses: Cc:

Description

In WP 2.7.1 with Safari 4, using TinyMCE, clicking on the Link button causes WordPress to "grey screen".

Verified.

Change History (5)

#1 @azaozz
17 years ago

  • Milestone changed from Unassigned to 2.8
  • Resolution set to fixed
  • Status changed from new to closed

Fixed with [10791]

This ticket was mentioned in PR #9307 on WordPress/wordpress-develop by @dmsnell.


7 months ago
#2

  • Keywords has-patch added

Currently here as an experiment.

Related #9317.

This ticket was mentioned in PR #6883 on WordPress/wordpress-develop by @dmsnell.


6 months ago
#3

  • Keywords has-unit-tests added

See also #9317

## Status

This is an exploratory patch so far.

## Tasks

  • [ ] Test behavior of new functions.
  • [ ] Benchmark performance.

## Motivation

  • _mb_strlen() attempts to split a string on UTF-8 boundaries, falling back to assumed character patterns if it can't run Unicode-supported PCRE patterns.
  • wp_check_invalid_utf8() performs similar PCRE-based counting.
  • If sending HTML in any encoding other than UTF-8, it's important not to perform basic conversion with a function like mb_convert_encoding() or iconv() because these can lead to data loss for characters not representable in the target encoding. Although mb_encode_numericentity() exists with the multi-byte extension, having a streaming UTF-8 decoder would allow WordPress to handle proper conversion to numeric character references natively and universally. E.g. converting to …
  • URL detection and XML name parsing requires detecting sequences of bytes with specific Unicode ranges, like a Unicode-aware strspn(), and should stop as soon as any given character is outside of that range.

## Description

WordPress relies on various extensions, regular expressions, and basic string operations when working with text potentially encoded as UTF-8.

In this patch an efficient UTF-8 decoding pipeline is introduced which can remove these dependencies, normalize all decoding behaviors, and open up new kinds of processing opportunities.

The decoder was taken from [Björn Höhrmann]. While it may be possible that other methods are more efficient, such as in the multi-byte extension, this decoder provides a streamable interface useful for more flexible kinds of processing: for example, whether or not to replace invalid byte sequences, zero-memory-overhead code point counting, and partially decoding strings.

[Björn Höhrmann]: http://bjoern.hoehrmann.de/utf-8/decoder/dfa/

@dmsnell commented on PR #9307:


6 months ago
#4

Abandoning in favor of #9317, but the code in this PR contains valuable historical data for performance benchmarks of various UTF-8 validation approaches.

@dmsnell commented on PR #6883:


5 months ago
#5

Closed by c9166919cce1f78fc10de220a92970f9448e03dd
[60768]

Note: See TracTickets for help on using tickets.