WordPress.org

Make WordPress Core

Ticket #1303: import-blosxom.php

File import-blosxom.php, 13.8 KB (added by anonymousbugger, 10 years ago)
Line 
1<?php
2define('BLOSXOM_RSSFILE', '/home/edavis/index.rss20');
3// Example:
4// define('BLOSXOM_RSSFILE', '/home/foobar/rss.xml');
5// or if it's in the same directory as import-blosxom.php
6// define('BLOSXOM_RSSFILE', 'rss.xml');
7
8$post_author       = 1; // Author to import posts as author ID
9$post_status       = 'publish'; // Status for imported posts: 'publish', 'draft', 'private'
10$comment_status    = 'open'; // Allow comments for imported posts: 'open', 'closed'
11$ping_status       = 'open'; // Allow pings for imported posts: 'open', 'closed'
12$timezone_offset   = 0; // GMT offset of posts your importing
13$import_writebacks = 1; // import writebacks (1 = yes, 0 = no)
14
15function unhtmlentities($string) // From php.net for < 4.3 compat
16{
17   $trans_tbl = get_html_translation_table(HTML_ENTITIES);
18   $trans_tbl = array_flip($trans_tbl);
19   return strtr($string, $trans_tbl);
20}
21
22$add_hours = intval($timezone_offset);
23$add_minutes = intval(60 * ($timezone_offset - $add_hours));
24
25if (!file_exists('../wp-config.php'))
26{
27    die("There doesn't seem to be a wp-config.php file. You must install WordPress before you import any entries.");
28}
29
30require('../wp-config.php');
31
32$step = $_GET['step'];
33if (!$step) $step = 0;
34?>
35
36<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
37<html xmlns="http://www.w3.org/1999/xhtml">
38<title>WordPress &rsaquo; Import from RSS</title>
39<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
40<style media="screen" type="text/css">
41  body {
42    font-family: Georgia, "Times New Roman", Times, serif;
43    margin-left: 20%;
44    margin-right: 20%;
45  }
46  #logo {
47    margin: 0;
48    padding: 0;
49    background-image: url(http://wordpress.org/images/logo.png);
50    background-repeat: no-repeat;
51    height: 60px;
52    border-bottom: 4px solid #333;
53  }
54  #logo a {
55    display: block;
56    text-decoration: none;
57    text-indent: -100em;
58    height: 60px;
59  }
60  p {
61    line-height: 140%;
62  }
63</style>
64</head>
65<body>
66
67<h1 id="logo"><a href="http://wordpress.org/">WordPress</a></h1>
68
69<?php
70switch($step)
71{
72    case 0:
73?>
74
75<p>
76Howdy! This importer allows you to extract posts from a Blosxom generated RSS 2.0
77feed file into your blog.  This tool also has the ability to import writebacks from
78the Blosxom <code>writeback</code> or <code>writeback_plus</code> plugins.
79</p>
80<p>
81The feed should be generated using the following Blosxom <code>.rss20</code> theme.
82If you don't use themes then simply break the theme up into the necessary individual
83flavour files.
84</p>
85<p>
86Blosxom RSS 2.0 Wordpress Import theme file:
87</p>
88<p><pre>
89  &lt;!-- blosxom theme .rss20 --&gt;
90 
91  &lt;!-- blosxom content_type text/xml --&gt;
92 
93  &lt;!-- blosxom head --&gt;
94  &lt;?xml version="1.0"?&gt;
95  &lt;rss version="2.0"&gt;
96    &lt;channel&gt;
97      &lt;title&gt;$blog_title&lt;/title&gt;
98      &lt;link&gt;$url&lt;/link&gt;
99      &lt;description&gt;$blog_description&lt;/description&gt;
100      &lt;language&gt;$blog_language&lt;/language&gt;
101      &lt;copyright&gt;Get Lost!&lt;/copyright&gt;
102      &lt;generator&gt;Blosxom&lt;/generator&gt;
103      &lt;ttl&gt;180&lt;/ttl&gt;
104 
105  &lt;!-- blosxom date --&gt;
106 
107  &lt;!-- blosxom story --&gt;
108 
109      &lt;item&gt;
110        &lt;title&gt;$title&lt;/title&gt;
111        &lt;link&gt;$url$path/$fn.html&lt;/link&gt;
112        &lt;description&gt;&lt;![CDATA[$body]]&gt;&lt;/description&gt;
113        &lt;pubDate&gt;$dw, $da $mo $yr $hr:$min:00 PST&lt;/pubDate&gt;
114        &lt;category&gt;&lt;@filesystem.path_basename path="$path" output="yes" /&gt;&lt;/category&gt;
115        &lt;guid isPermaLink="true"&gt;$url$path/$fn.html&lt;/guid&gt;
116        &lt;author&gt;Administrator&lt;/author&gt;
117        $writeback::writebacks
118      &lt;/item&gt;
119 
120  &lt;!-- blosxom writeback --&gt;
121 
122        &lt;wb&gt;
123          &lt;wb_name&gt;$writeback::name&lt;/wb_name&gt;
124          &lt;wb_url&gt;$writeback::url&lt;/wb_url&gt;
125          &lt;wb_date&gt;$writeback::date&lt;/wb_date&gt;
126          &lt;wb_ip&gt;$writeback::ip&lt;/wb_ip&gt;
127          &lt;wb_title&gt;$writeback::title&lt;/wb_title&gt;
128          &lt;wb_comment&gt;&lt;![CDATA[$writeback::comment]]&gt;&lt;/wb_comment&gt;
129        &lt;/wb&gt;
130 
131  &lt;!-- blosxom foot --&gt;
132 
133    &lt;/channel&gt;
134  &lt;/rss&gt;
135</pre></p>
136<p>
137If you look carefully at the above theme you will see the <i>category</i> field is filled
138in using the <code>interpolate_fancy</code> plugin which calls the <code>path_basename</code>
139function within the <code>filesystem</code> plugin.  This code results in the basename of
140the file path of the post to be used as the category name.  For example, if a post exists
141at <code>/software/blosxom/hacks.txt</code>, the path is <code>/software/blosxom</code>
142and the resulting category will be <i>blosxom</i>.  This might not be what you want and
143you have a couple options.  First is to remove the <i>category</i> field from the
144theme resulting in every post being posted in the <i>Uncategorized</i> category.   Then
145you'll need to recategorize your posts manually using the Wordpress admin pages.  Second
146is to write, or politely ask someone else to write, a Blosxom plugin that will break apart
147the post path and create multiple <i>category </i> fields for the post.  This will
148essentially cross pollinate the post into multiple categories.  Third... any ideas?
149</p>
150<p>
151The <code>interpolate_fancy</code> plugin can be downloaded
152<a href="http://www.blosxom.com/plugins/interpolate/interpolate_fancy.htm">here</a> and
153the <code>filesystem</code> plugin can be downloaded
154<a href="http://www.insanum.com/software">here</a>.
155</p>
156<p>
157If your Blosxom blog does not have comments or you don't use the <code>writeback</code> or
158<code>writeback_plus</code> plugins then remove the <code>$writeback::writebacks</code> line
159from the above theme.
160</p>
161<p>
162To get started you must modify your Blosxom blog by installing the above theme and
163changing your <code>$num_entries</code> configuration item to a very large number (i.e.
164more than the number of posts in your blog).  Then visit your blog using the following
165url: <code>http://&lt;your_site&gt;/index.rss20</code>.  Save this output to a file.  This
166is your Blosxom generated RSS 2.0 Wordpress Import file.
167<p>
168Now edit the following line in this file (<code>import-blosxom.php</code>):
169</p>
170<p>
171<code>define('BLOSXOM_RSSFILE', '');</code>
172</p>
173<p>
174You want to define where the RSS file you saved above is.  For example:
175</p>
176<p>
177<code>define('BLOSXOM_RSSFILE', '/home/foobar/rss.xml');</code>
178</p>
179<p>
180You have to do this manually for security reasons.  When you're done
181<a href="import-blosxom.php?step=1">reload this page</a> and we'll take you to the
182next step.
183</p>
184<?php if ('' != BLOSXOM_RSSFILE) : ?>
185<h2 style="text-align: right;"><a href="import-blosxom.php?step=1">Begin Blosxom RSS Import &raquo;</a></h2>
186<?php endif; ?>
187
188<?php
189    break;
190
191    case 1:
192
193// Bring in the data
194set_magic_quotes_runtime(0);
195$datalines = file(BLOSXOM_RSSFILE); // Read the file into an array
196$importdata = implode('', $datalines); // squish it
197$importdata = str_replace(array("\r\n", "\r"), "\n", $importdata);
198
199preg_match_all('|<item>(.*?)</item>|is', $importdata, $posts);
200$posts = $posts[1];
201
202echo "<ol>";
203
204foreach ($posts as $post)
205{
206    $title = $date = $categories = $content = $post_id = '';
207
208    echo "<li>Importing post... ";
209
210    preg_match('|<title>(.*?)</title>|is', $post, $title);
211    $title = addslashes(trim($title[1]));
212    $post_name = sanitize_title($title);
213
214    preg_match('|<pubDate>(.*?)</pubDate>|is', $post, $date);
215    $date = strtotime($date[1]);
216    $post_date = gmdate('Y-m-d H:i:s', $date);
217
218    preg_match_all('|<category>(.*?)</category>|is', $post, $categories);
219    $categories = $categories[1];
220
221    preg_match('|<guid.+?>(.*?)</guid>|is', $post, $guid);
222    $guid = addslashes(trim($guid[1]));
223
224    preg_match('|<description>(.*?)</description>|is', $post, $content);
225    $content = str_replace(array('<![CDATA[', ']]>'), '', addslashes(trim($content[1])));
226    $content = unhtmlentities($content);
227
228    // Clean up content
229    $content = preg_replace('|<(/?[A-Z]+)|e', "'<' . strtolower('$1')", $content);
230    $content = str_replace('<br>', '<br />', $content);
231    $content = str_replace('<hr>', '<hr />', $content);
232
233    // Check for a duplicate
234    $duplicate = $wpdb->get_var("SELECT ID FROM $wpdb->posts WHERE
235                                 post_title = '$title' AND post_date = '$post_date'");
236    if ($duplicate)
237    {
238        echo "Post already imported</li>";
239        continue;
240    }
241
242    // Insert the post into the database
243    $wpdb->query("INSERT INTO $wpdb->posts
244                  (post_author, post_date,
245                   post_date_gmt, post_content,
246                   post_title, post_status,
247                   comment_status, ping_status,
248                   post_name, guid)
249                  VALUES
250                  ('$post_author', '$post_date',
251                  DATE_ADD('$post_date', INTERVAL '$add_hours:$add_minutes' HOUR_MINUTE),
252                  '$content', '$title',
253                  '$post_status', '$comment_status',
254                  '$ping_status', '$post_name', '$guid')");
255
256    $post_id = $wpdb->get_var("SELECT ID FROM $wpdb->posts WHERE
257                               post_title = '$title' AND post_date = '$post_date'");
258    if (!$post_id)
259    {
260        die("Couldn't get post ID");
261    }
262
263    // Insert and associate the categories with the post
264    if (count($categories) != 0)
265    {
266        foreach ($categories as $post_category)
267        {
268            $post_category = unhtmlentities($post_category);
269
270            // See if the category exists yet
271            $cat_id = $wpdb->get_var("SELECT cat_ID from $wpdb->categories WHERE
272                                      cat_name = '$post_category'");
273
274            if (!$cat_id && (trim($post_category) != ''))
275            {
276                $cat_nicename = sanitize_title($post_category);
277
278                $wpdb->query("INSERT INTO $wpdb->categories (cat_name, category_nicename)
279                              VALUES ('$post_category', '$cat_nicename')");
280
281                $cat_id = $wpdb->get_var("SELECT cat_ID from $wpdb->categories WHERE
282                                          cat_name = '$post_category'");
283            }
284
285            if (trim($post_category) == '')
286            {
287                $cat_id = 1;
288            }
289
290            // Double check it's not there already
291            $exists = $wpdb->get_row("SELECT * FROM $wpdb->post2cat WHERE
292                                      post_id = $post_id AND category_id = $cat_id");
293
294            if (!$exists)
295            {
296                $wpdb->query("INSERT INTO $wpdb->post2cat (post_id, category_id)
297                              VALUES ($post_id, $cat_id)");
298            }
299        }
300    }
301    else
302    {
303        $exists = $wpdb->get_row("SELECT * FROM $wpdb->post2cat WHERE
304                                  post_id = $post_id AND category_id = 1");
305        if (!$exists)
306        {
307            $wpdb->query("INSERT INTO $wpdb->post2cat (post_id, category_id)
308                          VALUES ($post_id, 1)");
309        }
310    }
311
312    // Insert the writebacks for the post
313    $wbs = '';
314    preg_match_all('|<wb>(.*?)</wb>|is', $post, $wbs);
315    $wbs = $wbs[1];
316
317    if (!$import_writebacks || (count($wbs) == 0))
318    {
319        echo "Done!</li>";
320        continue;
321    }
322
323    foreach ($wbs as $post_wb)
324    {
325        $wb_name = $wb_url = $wb_email = $wb_date = '';
326        $wb_title = $wb_comment = $wb_ip = '';
327
328        preg_match('|<wb_name>(.*?)</wb_name>|is', $post_wb, $wb_name);
329        if ($wb_name)
330        {
331            $wb_name = addslashes(trim($wb_name[1]));
332        }
333
334        preg_match('|<wb_url>(.*?)</wb_url>|is', $post_wb, $wb_url);
335        if ($wb_url)
336        {
337            $wb_url = trim($wb_url[1]);
338
339            if (preg_match('|mailto|is', $wb_url) || preg_match('|.+@.+|is', $wb_url))
340            {
341                $wb_url = '';
342                $wb_email = addslashes($wb_url);
343            }
344            else
345            {
346                $wb_url = addslashes($wb_url);
347                $wb_email = '';
348            }
349        }
350        else
351        {
352            $wb_url = '';
353            $wb_email = '';
354        }
355
356        preg_match('|<wb_date>(.*?)</wb_date>|is', $post_wb, $wb_date);
357        if ($wb_date)
358        {
359            $wb_date = trim($wb_date[1]);
360                        $wb_date = date('Y-m-d H:i:s', strtotime($wb_date));
361        }
362
363        preg_match('|<wb_ip>(.*?)</wb_ip>|is', $post_wb, $wb_ip);
364        if ($wb_ip)
365        {
366            $wb_ip = trim($wb_ip[1]);
367        }
368
369        preg_match('|<wb_title>(.*?)</wb_title>|is', $post_wb, $wb_title);
370        if ($wb_title)
371        {
372            $wb_title = addslashes(trim($wb_title[1]));
373        }
374
375        preg_match('|<wb_comment>(.*?)</wb_comment>|is', $post_wb, $wb_comment);
376        if ($wb_comment)
377        {
378            $wb_comment = str_replace(array('<![CDATA[', ']]>'), '', addslashes(trim($wb_comment[1])));
379        }
380
381        if ($wb_title)
382        {
383            $wb_comment = $wb_title . "<br/>" . $wb_comment;
384        }
385
386        $wb_comment = unhtmlentities($wb_comment);
387
388        // Check if it's already there
389        if (!$wpdb->get_row("SELECT * FROM $wpdb->comments WHERE
390                             comment_date = '$comment_date' AND
391                             comment_content = '$comment_content'"))
392        {
393            $wpdb->query("INSERT INTO $wpdb->comments
394                          (comment_post_ID, comment_author,
395                           comment_author_email, comment_author_url,
396                           comment_author_IP, comment_date,
397                           comment_content, comment_approved)
398                          VALUES
399                          ($post_id, '$wb_name',
400                           '$wb_email', '$wb_url',
401                           '$wb_ip', '$wb_date',
402                           '$wb_comment', '1')");
403        }
404    }
405
406    echo "Done!</li>";
407}
408?>
409
410</ol>
411
412<h3>All done. <a href="../">Have fun!</a></h3>
413
414<?php
415    break;
416}
417?>
418
419</body>
420</html>
421