WXR export/import umbrella ticket
|Reported by:||duck_||Owned by:||duck_|
Description (last modified by duck_)
Umbrella ticket for a number of upgrades to the WXR export/import process.
- Bump WXR version to 1.1
- Removed filtering for now (see explanation below)
- Removed wxr_missing_parents (local function), seems to be a remnant from pre-get_categories
- Added author information to export (for better import UX) - #11118
- Greater usage of slug-like identifiers, e.g. login instead of name in <dc:creator>
- Don't export auto-drafts
- Filled in docs
- Ignore _edit_lock and _edit_last meta keys
- Only use the 'forward compatible' term tags, <category domain="foo" nicename="bar">, within post items
- Use an XML parser (where available). 3 parser options: SimpleXML (yay!), XML Parser (yay!), or regular expressions (boo!)
- Proper import support for nav menus - #14750
- Menu items for missing content will be skipped, there should be no problems when an associated object is further down the import file than the menu item
- Orphaned menu items (e.g. their parent was skipped due to above point) will become top-level
- Greater usage of slug-like identifiers, e.g. Use <category domain="..." nicename="..."> tags to fix a bunch of category issues
- Either import author as is (i.e. from information stored in WXR file, this allows us to create a user with more data by default) or map to an existing user - #10319
- Less direct feedback (ignoring errors, currently none :( !), as it is unwieldy for a large import.
All accompanied by a number of smaller changes and anything I forgot to write down.
The main problem for now is ensuring backwards compatibility with WXR 1.0 files. That said, no major faults should occur when importing a 1.0 file. Excluding all the problems you will come across already in an export/import in 3.0.1:
- No author import (the current importer takes author data from each post)
SOLVED: if we get an empty author array then loop through posts grabbing unique authors and offering to map them (but not to import)
All term menu items will be skipped due to missing term_id XML tags Possible solution: slugs instead of IDs for processed_terms mapping?In fact, as far as I can see, filling imported menus is actually impossible with WXR 1.0 since the file doesn't contain custom terms for post items, see #13453 and #14306, so we don't know which menu to assign the menu items to
- Probably some indexes and vars which need to be checked with isset and fallback provided (for when the XML tag doesn't exist in 1.0 files)
- ... and possibly more with further testing
How far should this go back?
Example: 3 years ago  introduced forwards compatible category tags including the slug and taxonomy. These are the only category tags the parsers currently read, is it worth checking the really old style XML tags if no terms are found for a post (should be easy for SimpleXML and regular expressions, but I think will be harder for XML Parser)?
The problem of filtering
- Potential to export a pretty useless file, e.g. choose Category: Uncategorized and Content Type: Pages
- Makes reliable importing of nav menus harder (worse UX when importer is creating half made menus)
Moving forward I am currently imagining some sort of grid of post types selectable by checkbox. Each post type lists its taxonomies below, these are only activated/recognised if the post type is selected. But what filters to include and how to show them are probably for another ticket.
The feedback from the importer needs to be completed (see above), I was thinking of listing errors (default hidden with JS show?) and a table of results showing the number of successes and failures for each of authors, posts, terms, ...
The can_export property of a post type only enables it to show up in the Content Types dropdown for export filtering, but if "All Content" is selected then all post types are exported including those with can_export set to false. Fix based on export patch here could be something like:
$post_types = get_post_types( array( 'public' => true, 'can_export' => true ) ); $where = "post_type IN ('" . implode( "','", $post_types ) . "') AND post_status != 'auto-draft'"; // grab a snapshot of post IDs, just in case it changes during the export $post_ids = $wpdb->get_col( "SELECT ID FROM $wpdb->posts WHERE $where ORDER BY post_date_gmt ASC" );
(NB: would need to look into exactly which builtin posts are and should be can_export => false)
Docs in the importer.
Currently I have unit tests for the parsers and hopefully coming soon will be more for the whole process (need to think up a full checklist of tests for edge and problem cases)
This is still partly a work in progress so feedback and a lot of testing please. Thank you.
This ticket aims to fix the following:
#5447 #5460 #7400 #7973 #8471 #9237 #10319 #11118 #11144 #11354 #11574 #12685 #13364 #13394 #13453 #13454 #13627 #14306 #14442 #14524 #14750 #15055 #15091 #15108
Change History (66)
comment:15 follow-ups: ↓ 17 ↓ 27 @nacin — 5 years ago
- Owner set to duck_
- Status changed from new to assigned
comment:52 follow-up: ↓ 53 @kbiglione — 5 years ago
- Resolution fixed deleted
- Status changed from closed to reopened