Importing Content to Drupal

To get the stripped core content into Drupal, I wrote a PHP script that uses the Drupal API. It is invoked using the drush command, as follows:

drush -r $D7 -l www.iac.org scr inhale.php

Here's the script itself:

<?php
//
// DJM, 2012-02-08
//
// PHP script that takes a list of filenames as input, and converts those files
// into Legacy nodes on the www.iac.org web site
//
// The first line of each file is assumed to be the HTML <title> attribute, and is
// stored in the node's title field.
// The remaining lines are stored in the node's body.
//
// The filename itself is the legacy URL with pound signs (#) in place of slashes,
// and at signs (@) in place of spaces. This script replaces slashes and spaces,
// and stores the result in the field_old_url field.
//
// The field_status field is always set to 'Not Started'

$home_dir = "/home/djmolny/legacy-import/";

while (( $filename = readline("")) != FALSE) {

print 'filename="' . $filename . '"' . "\n";

$f = fopen($home_dir . $filename, 'r');
if ($f == FALSE) { exit(1); }

$title = trim(fgets($f));
print 'title="' . $title . "\"\n";

$body = fread($f, 1024*1024); // 1MB limit is arbitrary, but should suffice

$old_url = str_replace("#", "/", str_replace("@", " ", $filename));
$old_url = str_replace("public//", "http://www.iac.org/", $old_url);
$old_url = str_replace("members//", "http://members.iac.org/", $old_url);
print 'old_url="' . $old_url . "\"\n";

  $node = new stdClass();
  $node->type = 'legacy_page';
  node_object_prepare($node);

  $node->title    = $title;
  $node->language = LANGUAGE_NONE;

  $node->body[$node->language][0]['value']   = $body;
  $node->body[$node->language][0]['summary'] = text_summary($body);
  $node->body[$node->language][0]['format']  = 'full_html';

$node->field_old_url[$node->language][0]['value'] = $old_url;
$node->field_status[$node->language][0]['value'] = 'Not Started';

node_save($node);
print("Done!\n\n\n"); // !!!
}
?>

 

Note: Drupal rejected numerous files because they contained special symbols that are not part of the UTF-8 character set, such as "½ loop", "360º roll", or "Fédération". I edited each of these files manually, replacing the symbols with their HTML equivalents (&frac12;&deg;, and &eacute;, respectively.) I thought about scripting this process, but since the import is a one-time exercise I decided it wasn't worth the effort. However I'm documenting the problem in case it crops up somewhere down the road.