WordPress.org

Make WordPress Core

Opened 16 months ago

Last modified 2 months ago

#22981 accepted task (blessed)

Tweets import plugin tracking ticket

Reported by: nacin Owned by: PeteMall
Milestone: WordPress.org Priority: high
Severity: normal Version:
Component: Import Keywords:
Focuses: Cc:

Description

This ticket is to track the development of a plugin that can import tweets from a downloaded twitter.com archive. Presumably, such a plugin would be added to the importers list on wp-admin/import.php.

Trac is best when it is used to discuss implementation. If you want to discuss the general idea, please do so on make/core.

Some initial thoughts on implementation:

  • It should use the JSON-formatted data that comes with a downloaded tweet archive. The importer should take the entire zip, extract it, and loop through the monthly files. Anything more is an unnecessary burden on the user.
  • The plugin should import the tweet as actual content. A filter is good idea, if someone wishes to toggle this to instead insert links to tweets (and thus rely on oEmbed). It should also store the JSON-serialized array of data (directly from 1.1 of Twitter's API) in postmeta.
  • It should import posts as a post format. Status makes the most sense; 'link' could also work for links, then there's also 'aside'. The post format to use should be filterable on a tweet-by-tweet basis. The post type to use should be filterable, as a 'tweet' type may be desired.
  • It should handle importing an archive over an existing archive, by looking for the existing tweet (probably IDs as a meta key). I don't think deleted tweets should be removed in this process, though.
  • Remember that tweet IDs are going to be bigger than 32-bit integers, so they must be treated as strings, and we should not try to set a post ID as we might with other importers. This importer should be tested on a 32-bit environment.

Beyond that, there are other "nice to haves" that would likely be left to plugins of this plugin, given they are beyond the standard role of an importer. Beau Lebens, for example, has done some/all of this already:

  • Tagging based on hashtags, and a separate mentions and/or in-reply-to taxonomy.
  • Filtering over raw (no-HTML) content to add things like links to hashtags, links in tweets, etc., on display, rather than doing all of this on save. (Should a hashtag link go to the internal tag, or to twitter.com? Maybe the internal tag's description links to twitter.com?)
  • A cron to import new tweets using the same importing methods.

One thing I will suggest: decisions, not options. Note I said "filter" a bunch of times, but never the word "option." Not that there won't be a need for any user decision here, but we should make a plugin that works well for the common use cases, and leave the rest to other enterprising developers.

Side note: I am working on acquiring a namespace for the Twitter importer in the wordpress.org plugin repository.

Attachments (6)

chadhuber-tweets.zip (495.2 KB) - added by nacin 16 months ago.
The downloaded tweet archive from @chadhuber. Thanks!
aaroncampbell-tweets-2012-12-16.zip (534.3 KB) - added by aaroncampbell 16 months ago.
twitter-importer.php (5.4 KB) - added by aaroncampbell 16 months ago.
wp-twitter-importer.zip (18.0 KB) - added by PeteMall 16 months ago.
wp-twitter-importer.2.zip (5.7 KB) - added by aaroncampbell 16 months ago.
wp-twitter-importer.3.php (13.2 KB) - added by Otto42 2 months ago.

Download all attachments as: .zip

Change History (45)

nacin16 months ago

The downloaded tweet archive from @chadhuber. Thanks!

comment:1 nacin16 months ago

Thanks to Chad Huber (http://twitter.com/chadhuber) for contributing his tweets to the cause. [chadhuber-tweets.zip] is exactly as downloaded from Twitter. Here's a file tree:

index.html (for the HTML application)
README.txt (content below)
css/ (for the HTML application)
data/ (for the HTML application)
     csv/
          (monthly files)
     js/
          tweets/
                (monthly files)
          payload_details.js (number of tweets)
          tweet_index.js (index of the monthly files)
          user_details.js (basic user info)
img/ (for the HTML application)
js/ (for the HTML application)
lib/ (for the HTML application)

The readme:

# How to use your Twitter archive data

The simplest way to use your Twitter archive data is through the archive browser
interface provided in this file. Just double-click `index.html` from the root
folder and you can browse your entire history of Tweets from inside your
browser.

---

In the `data` folder, your Twitter archive is present in two formats: JSON and
CSV exports by month and year.

* CSV is a generic format that can be imported into many data tools, spreadsheet
applications, or consumed simply using a programming language.

## JSON for Developers

* The JSON export contains a full representation of your Tweets as returned by
v1.1 of the Twitter API. See https://dev.twitter.com/docs/api/1.1 for more
information. * The JSON export is also used to power the archive browser
interface (index.html). * To consume the export in a generic JSON parser in any
language, strip the first and last lines of each file.

To provide feedback, ask questions, or share ideas with other Twitter
developers, join the discussion forums on https://dev.twitter.com.

comment:2 PeteMall16 months ago

Aaron and I worked on it yesterday. I'm attaching what we have so far as a patch. The plugin just accepts a zip file and parses the tweets. We are currently working on it and I'll add an updated patch at the end of the day.

comment:3 aaroncampbell16 months ago

Lol, I guess Pete and I were both commenting here at the same time.

I uploaded what we have so that anyone interested can use it as a starting point (twitter-importer.php). So far, like Pete said, it lets you upload the .zip and it parses the json. It doesn't do any actual importing yet (but we're working on it more today).

I also uploaded my export to use as another reference

comment:4 nacin16 months ago

One other thing: We should probably leverage the cron-based importing code used in the current Blogger and Tumblr importers, as primarily architected (AFAIK) by Otto42.

comment:5 beaulebens16 months ago

  • Cc beau@… added

comment:6 follow-up: norcross16 months ago

  • Cc andrew@… added

Question: is there any discussion about where these should go? Asides (as a post format), a CPT, give users the option, etc?

Version 0, edited 16 months ago by norcross (next)

comment:7 PeteMall16 months ago

  • Owner set to PeteMall
  • Status changed from new to accepted

Updated plugin imports the tweets and sets the hastags as post tags (props beaulebens).

comment:8 in reply to: ↑ 6 ; follow-up: aaroncampbell16 months ago

Replying to norcross:

Question: is there any discussion about where these should go? Asides (as a post format), a CPT, give users the option, etc?

I think it makes the most sense to make each a post with the format "status" and then add in some filters that could be used to change them to asides or even a CPT.

comment:9 in reply to: ↑ 8 ; follow-up: beaulebens16 months ago

Replying to aaroncampbell:

I think it makes the most sense to make each a post with the format "status" and then add in some filters that could be used to change them to asides or even a CPT.

Now we get to have the discussion of "status" vs "aside". In the Twitter Importer I wrote/am running on my site, I opted for asides. The problem is that cogent arguments can be made for both, since it largely depends (I think) on how the individual is using Twitter.

comment:10 in reply to: ↑ 9 ; follow-up: aaroncampbell16 months ago

Replying to beaulebens:

Now we get to have the discussion of "status" vs "aside". In the Twitter Importer I wrote/am running on my site, I opted for asides. The problem is that cogent arguments can be made for both, since it largely depends (I think) on how the individual is using Twitter.

I don't particularly want to have that debate. Mostly because I could easily argue either side. A filter is a given and I don't care which is the default. I chose status because I think more people use Twitter that way, not because it's how I use it. I think the question is: Is there a good enough argument for actually adding an option and allowing the user to choose?

comment:11 Otto4216 months ago

  • Cc Otto42 added

comment:12 in reply to: ↑ 10 norcross16 months ago

Replying to aaroncampbell:

Replying to beaulebens:

Now we get to have the discussion of "status" vs "aside".

Is there a good enough argument for actually adding an option and allowing the user to choose?

I can see a few arguments to allowing a choice:

1.) not all themes have post formats enabled
2.) people with larger twitter archives may not want to flood their post area with new content
3.) some people are more particular about content organization than others.

now granted, I fall into all 3 of those scenarios, but I can't be the only person who fits at least one of those criteria.

comment:13 aaroncampbell16 months ago

The option I was talking about would just be "status" vs "aside" for post format.

  1. Themes that don't support post formats will treat the content as regular posts.
  2. If you have a large Twitter archive you're importing, it seems like you would want a theme that shows asides or status updates separate from main content. If you don't have that, then you probably expect to have the content in the main area.
  3. For those really particular about content organization, there are several filters so they can do whatever they want.

wp-twitter-importer.2.zip has the filters I'm talking about.

I'm leaning toward using status, leaving the filter in, and making decisions not options.

comment:14 norcross16 months ago

Correct me if I'm wrong, but don't post formats show up in RSS? An import would blow out someone's feed unless they had previously disabled that specific format in their theme.

For me, I was going to put it into a CPT (but that's besides the point).

comment:15 follow-up: jeremyfelt16 months ago

I haven't tested for functionality yet, but an additional filter for post_type in wp-twitter-importer.2.zip, defaulting to post, would be ideal for maximum extendability.

comment:16 in reply to: ↑ 15 aaroncampbell16 months ago

Replying to jeremyfelt:

I haven't tested for functionality yet, but an additional filter for post_type in wp-twitter-importer.2.zip, defaulting to post, would be ideal for maximum extendability.

You can set the post type using the twitter-import-post-data filter, which filters the entire array passed to wp_insert_post(). Use something like this:

function twitter_import_post_data( $post ) {
	$post['post_type'] = 'tweet';
	return $post;
}
add_action( 'twitter-import-post-data', array( $this, 'twitter_import_post_data' ) );
add_action( 'twitter-import-post-format', '__return_null' );
Last edited 16 months ago by PeteMall (previous) (diff)

comment:17 betzster16 months ago

  • Cc j@… added

comment:18 follow-up: MikeSchinkel16 months ago

  • Cc mike@… added

Although I fear the ship may have already sailed I figure I'd post about an alternate approach per chance it wasn't considered and might be considered instead.

Instead of posts could we instead consider using comments for tweets? Comments model tweets better than posts, and comments already have meta so they can be extended by plugins and themes as needed.

I'd love to use this feature but the idea of dumping 20,000+ posts into a WordPress site that might have less than 100 total real posts causes me a lot of heartache, and I would expect the same might be true for many other advanced users. Just looking at some of the people on this ticket, here's the number of tweets they have to import at this moment:

  • petemall - 4033
  • otto - 7831
  • nacin - 19,436
  • me - 21,056
  • norcross - 65,583(!)

Having that many extra posts could be a real pain for those who do a lot of debugging at the SQL level, i.e. I could no longer just browse tables but instead would always have to run a query.

We could create a post type of 'twitter_account' and another post type of 'tweet_period' and then have the comments associated with the 'tweet_period' post type where tweet periods could default to maybe 'weekly' but be configurable via a filter to 'daily', 'monthly', etc. Then the posts of type 'tweet_period' could be child posts for 'twitter_account' posts and we could use comment meta to provide a direct link back from the comment to the parent 'twitter_account' post ID (or even use the comment_parent field to point to parent post to reduce joins required, but that approach might be iffy.)

Using comments in this way would allow plugins a lot of built-in flexibility such as theme-able pages for tweet_period posts without any extra infrastructure to build, and it would allow plugins to work with and segment multiple Twitter accounts in a clean and standard way, and it would obviously keep the wp_posts table from being overwhelmed with tweets.

I'd be willing to tackle coding this over the weekend if people think it's a worthy approach to consider (assuming I get my current client project complete before then.) I'd hate to code it up though and find zero interest in the approach.

And even if people don't like the idea can we at least make the post storage engine replaceable via hooks so that a plugin could be developed to use comments instead?

comment:19 in reply to: ↑ 18 ; follow-up: aaroncampbell16 months ago

Replying to MikeSchinkel:

Instead of posts could we instead consider using comments for tweets? Comments model tweets better than posts, and comments already have meta so they can be extended by plugins and themes as needed.

I definitely don't agree with this one. I understand that you'd prefer to keep your posts table cleaner (and apparently don't care about your comments table), but I definitely think that tweets are better modeled as a post with the format status than by anything else offered by default in WordPress (with a post with the format of aside being a close second).

If you were importing tweet threads instead of a user's timeline, then I could see how the back/forth makes sense as comments (with the original Tweet still being a status post), but when you're importing a timeline I definitely think posts are the way to go.

How's this as a potential solution: We still need to add checks to keep from re-importing Tweets if a user requests another export a month later and imports it. If we do that by using a filter that we hook into, you could use it to short-circuit the post creation:

if ( ! apply_filters( 'twitter-import-tweet-exists', false, $tweet ) ) {
	// Create post here
}

You could use that filter to insert the comment you want and return true. I haven't completely thought through it all, but it seems like something like that would work.

comment:20 in reply to: ↑ 19 ; follow-up: MikeSchinkel16 months ago

Replying to aaroncampbell:

I definitely don't agree with this one.

Disappointed, disagree of course, but I respect your position.

but I definitely think that tweets are better modeled as a post with the format status than by anything else offered by default in WordPress (with a post with the format of aside being a close second).

I'd be curious if you could elaborate why? Posts are independent and heavy duty in capability while tweets are dependent on a twitter account are light weight in requirements. Seems like a mismatch to me.

How's this as a potential solution: ...

If it allows a way to bypass adding to posts then that's minimally what I was requesting.

P.S. Ironic note, I didn't include your tweet count in the list because yours was the lowest at only 1533. Wondering if that colors your opinion any?

P.P.S. Huge apologizes to Nacin. As I write this I see your request to discuss how on Make. If after this post you want me to propose it there and make future comments there I will. Again, sorry.

comment:21 mcgaritydotme15 months ago

  • Cc mcgaritydotme added

comment:22 in reply to: ↑ 20 mcgaritydotme15 months ago

Replying to MikeSchinkel:

As I write this I see your request to discuss how on Make. If after this post you want me to propose it there and make future comments there I will.

FYI -- comments on the Make post are closed. So I assume that any further idea discussion would have to occur here.

comment:23 quicoto15 months ago

  • Cc quicoto added

Looking forward to see this importer.

Regarding the post format I'd vote for "status".

Cheers.

comment:24 williamsba114 months ago

  • Cc brad@… added

comment:25 follow-up: williamsba114 months ago

Just imported my Twitter stream using the latest plugin version and it successfully imported 22.5k+ tweets. Awesome!

One suggestion: The plugin states the user should upload the zip, however if you try to upload the CSV directly the import completely blows up. If a zip file is not uploaded the plugin should show a nice error message reminding the user to upload the original zip.

Has there been any discussions around storing the in_reply_to_status_id as metadata? I'm not sure if there's a way to construct the reply URL, but would probably be good to store that data for future use.

I also agree the post format should be status

comment:26 in reply to: ↑ 25 ; follow-up: chriswallace14 months ago

I cannot truly express the depth of my hatred for importing tweets into WordPress. That is all.

comment:27 philiparthurmoore14 months ago

  • Cc philip@… added

comment:28 wpsmith14 months ago

  • Cc t@… added

comment:29 in reply to: ↑ 26 travisnorthcutt14 months ago

  • Cc travis@… added

Replying to chriswallace:

I cannot truly express the depth of my hatred for importing tweets into WordPress. That is all.

Why bother posting that here? People are working on something that lots of people would demonstrably appreciate having available, and you show up to quite literally hate on it? Why?

(Unless of course I'm misunderstanding and what you're actually saying is that you hate importing tweets now, and are looking forward to the importer.)

comment:30 sc0ttkclark14 months ago

  • Cc lol@… added

comment:31 kpdesign14 months ago

  • Cc kpdesign3@… added

comment:32 jtsternberg13 months ago

  • Cc justin@… added

comment:33 bradparbs11 months ago

  • Cc brad@… added

comment:34 frederick.ding9 months ago

  • Cc frederick+wordpress@… added

comment:35 rachelbaker5 months ago

  • Cc rachel@… added

comment:36 nacin2 months ago

  • Component changed from Plugins to Import

Is there anything in the repo for this yet?

comment:37 Otto422 months ago

Don't think so, but I just downloaded my archive and they apparently changed the format of the ZIP file. The CSV version is now in the root in a single large tweets.csv file, instead of being in monthly files in the data folder.

Also note the readme.txt included in the file says the following:

To consume the export in a generic JSON parser in any language, strip the first and last lines of each file.

This is incorrect, only the first line should be stripped if you wish to parse this file with a json parser. Also note that the json appears to contain far more useful data than the CSV file does, parsing those would likely be preferable.

comment:38 Otto422 months ago

wp-twitter-importer.3.php is my modification of the last version uploaded by aaroncampbell.

  • Removed the dependency on the WP_Importer_Cron (for now, the plugin wasn't really using it anyway). May add this back later, along with checking for duplicates on import and smarter parsing.
  • Some start and end functions to speed up the process by turning off the cache invalidation and term/comment counting during the import
  • Speed boost by turning off autocommit and manually committing once every 1000 tweets. This isn't the safest thing in the world, but it's a start. The speed boost is substantial. Importing 11,000 tweets went down from ~6 minutes to ~30 seconds.
  • Added a filter for the post_type used, in case somebody doesn't want them all to be "post".

Few known problems:

  • It's using the WP_Filesystem slightly incorrectly. I'm not sure how to fix this at present because I've never tried to use it with a file upload. Think the order of operations needs to be changed around here. It will work with "direct" mode only at present.
  • It's still a bit slow and will run into PHP timeouts. There are some enhancements that can be made for speed, and it can be put into a continuous cron job.
  • Turning off autocommit with 'set autocommit = 0' strikes me as potentially bad. Probably need to find a better way.
  • Like the other importers, we should probably make a token effort of some kind to download the media "attachments" in the tweets and rejigger the URLs to local ones. Might want to limit this to known domains, like pic.twitter and the like.

comment:39 jeremyfelt2 months ago

This is mostly irrelevant, but may be useful for someone one day: As a weekly downloader of my Twitter archive, I've noticed that certain past tweets disappear and reappear in the data. Not sure if this has any impact on the importer.

Note: See TracTickets for help on using tickets.