The MU forums have moved to WordPress.org

Importing Attachments fails if spaces in filename (11 posts)

  1. maestro4k
    Member
    Posted 14 years ago #

    I'm importing some blogs into a WPMU install and selecting import attachments to make sure images get imported. I noticed a lot of errors like this:

    # Importing attachment http://renegade.animeblogger.net/wp-content/uploads/Nintendo consoles.gif... Remote file error: Remote file returned error response 404 Not Found
    # Importing attachment http://renegade.animeblogger.net/wp-content/uploads/Sony Consoles.gif... Remote file error: Remote file returned error response 404 Not Found
    # Importing attachment http://renegade.animeblogger.net/wp-content/uploads/Sega Consoles.gif... Remote file error: Remote file returned error response 404 Not Found
    # Importing attachment http://renegade.animeblogger.net/wp-content/uploads/MS Consoles.gif... Remote file error: Remote file returned error response 404 Not Found

    But those images do exist:

    [renegade@mei ~]$ cd renegade.animeblogger.net/wp-content/uploads/
    [renegade@mei uploads]$ ls -lsa | grep -i console
    4 -rw-r--r-- 1 renegade bloggers 2766 2006-06-21 15:15 MS Consoles.gif
    4 -rw-r--r-- 1 renegade bloggers 2713 2006-06-21 15:15 MS Consoles.thumbnail.gif
    4 -rw-r--r-- 1 renegade bloggers 3392 2006-06-21 15:14 Nintendo consoles.gif
    4 -rw-r--r-- 1 renegade bloggers 3315 2006-06-21 15:14 Nintendo consoles.thumbnail.gif
    4 -rw-r--r-- 1 renegade bloggers 3193 2006-06-21 15:15 Sega Consoles.gif
    4 -rw-r--r-- 1 renegade bloggers 3309 2006-06-21 15:15 Sega Consoles.thumbnail.gif
    4 -rw-r--r-- 1 renegade bloggers 3179 2006-06-21 15:15 Sony Consoles.gif
    4 -rw-r--r-- 1 renegade bloggers 2946 2006-06-21 15:15 Sony Consoles.thumbnail.gif

    Looking at the error log for that blog I see the following:

    [Fri Apr 24 21:57:03 2009] [error] [client 76.73.8.162] File does not exist: /home/renegade/renegade.animeblogger.net/wp-content/uploads/Nintendo
    [Fri Apr 24 21:57:03 2009] [error] [client 76.73.8.162] File does not exist: /home/renegade/renegade.animeblogger.net/wp-content/uploads/Sony
    [Fri Apr 24 21:57:04 2009] [error] [client 76.73.8.162] File does not exist: /home/renegade/renegade.animeblogger.net/wp-content/uploads/Sega
    [Fri Apr 24 21:57:04 2009] [error] [client 76.73.8.162] File does not exist: /home/renegade/renegade.animeblogger.net/wp-content/uploads/MS

    The importer's chopping off the names at the space apparently. It did this for every attachment that had a space in it. I had planned to import a lot of dead blogs into a WPMU install to preserve content and lower maintenance. This bug has that work more or less stopped now. :(

    I've been trying to figure out where it's occurring but it's beyond me, I'm just not familiar enough with the codebase.

  2. tdjcbe
    Member
    Posted 14 years ago #

    Wordpress doesn't like spaces in URLs and filenames. (I actually could have sworn that either HTML 4.01 or XML whatever disallowed them but it;s Saturday and I don;t feel like digging through the docs to deeply.)

    Not sure if there really is a workaround the issue. There's some previous discussion on the topic:

    http://mu.wordpress.org/forums/topic.php?id=4962
    http://wordpress.org/support/topic/171424

    Best bet would probably be to rename the files.

  3. mellow_bunny
    Member
    Posted 14 years ago #

    Renaming the files cannot seriously be the best option here right? What about changing the codebase to take the string as a whole? Surely this is something in the PHP code that can be altered?

  4. SteveAtty
    Member
    Posted 14 years ago #

    Yes but what is the point of taking in filenames with spaces when they are not really officially recognised as part of a URL

    http://www.rfc-editor.org/rfc/rfc1738.txt

    "Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL."

    Also spaces in file names are a complete pain for lots of very obvious reasons (try copying them on a command line without using quotes round them for example).

    So really you would need to turn spaces into another character, _ for example, when uploading the file which could possibly confuse people and it would basically mean renaming them anyway

    So I'd suggest you rename them before you attempt to upload them.

  5. andrea_r
    Moderator
    Posted 14 years ago #

    It's way less WP's fault and more the webserver itself and the entire Internet. Windows lets you put spaces in filenames. Everything else doesn't.

  6. kgraeme
    Member
    Posted 14 years ago #

    The easiest way from a developer perspective is to not allow filenames with spaces or other disallowed characters.

    For media files, IMO the best solution for any web app, not just wordpress, is to store the file in the DB with spaces and url encode the filename string so that spaces in the name get converted to "%20;" when being called by the application for display.

    Wordpress takes a middle ground and generally does allow users to upload files (or create posts/pages with spaces), but then converts the spaces to hyphens when saving to the database. Unfortunately, because of this it can't handle importing files with spaces because the system doesn't urlencode the spaces on display automatically and if it called the same conversion function to replace the spaces with hyphens, any post/page that referenced the filename would have a broken link since the link would be to the filename with a space not the hyphen.

    So it's a design decision with Wordpress that's just an unfortunate reality we have to accept.

  7. mellow_bunny
    Member
    Posted 14 years ago #

    Right on. Thanks for your replies guys. Too right spaces are straight foolish.

  8. maestro4k
    Member
    Posted 14 years ago #

    Umm andrea_r, Linux fully supports spaces in filenames, it has for many years. It's not just Windows that does, even Mac OS X supports them. You can handle them from the command line by escaping them, or just quoting the entire filename. If you want someone to blame, blame people. Ee don't write_like_this_all_the_time, it makes more sense to computer neophytes to name files with spaces because that's what they're familiar with. Besides, if you'd actually read my post you'd see I did a directory listing from the Linux command line that showed the files existed, proving pretty handily that Linux does indeed support spaces in filenames.

    And I would like to point out here that WordPress CREATED this problem. These files were uploaded with older versions of WP, and the uploader happily failed to convert the spaces to hyphens. There's no middle ground here (maybe there is now), but lots of people out there will have these issues due to WP's poor design decisions in the past. SteveAtty: I think you're confused about what we're talking about here, this is doing a transfer from a stand-alone WP install into a WPMU install. These files are already uploaded -- with spaces in the file names. WP allowed them, now it's breaking them doing the import. And if you want to try to bring in RFCs, WP allowed the filenames to break said RFC in the past.

    In any case most of the responses haven't been very helpful. I have hundreds of blogs I need to move over to WPMU, and many of those blogs originally used various versions of WP, going all the way back to 1.5. They were all controlled by different people, many of which did use spaces in filenames. The blog I discovered this on has hundreds, maybe even thousands of images that are impacted. Renaming the files manually is not an option, and even if we wanted to, you seem to be forgetting that renaming alone won't do it. You'd also have to change the info in the databsse for every single file. Trying that is beyond unrealistic.

    Now don't take this as demanding something be done about it, but this very well may derail our plans to move to WPMU entirely. This is certainly an issue that the developers of both WP and WPMU should take seriously. WP core seems to strive for ease of use for neophytes. It's been bragging about the short (and easy) install as long as I've used it, and that goes back to version 1.5. And what really blows my mind is that the importer prints out the correct URL, but then completely fails to use said URL when trying to import the file. That looks much more like a bug to me, one piece of the code does it right, the other does it differently and wrong.

    We'll continue to try to solve this on our own, but please don't bother replying if you're not going to try to actually help us solve it. And if any WPMU devs think I should bring this topic up on the regular WP support forums, please let me know. I brought it up here to start with because it's an issue that seems more likely to impact WPMU users trying to do exactly what I'm doing now -- importing stand-alone blogs into one WPMU install.

    And thanks to tdjcbe, kgraeme and mellow_bunny for trying to help.

  9. mellow_bunny
    Member
    Posted 14 years ago #

    Further to this issue slight modifications to the wordpress.php import file made it possible for me to succesfully request and import posts and their attachments regardless of file name spaces or not.

    I altered one function in the file wp-admin/import/wordpress.php I shall show the edited function below.

    function fetch_remote_file($post, $url) {
      $upload = wp_upload_dir($post['post_date']);
    
      // extract the file name and extension from the url
      $file_name = basename($url);
    
      // get placeholder file in the upload dir with a unique sanitized filename
      $upload = wp_upload_bits( $file_name, 0, '', $post['post_date']);
      if ( $upload['error'] ) {
       echo $upload['error'];
       return new WP_Error( 'upload_dir_error', $upload['error'] );
      }
    
      // fetch the remote url and write it to the placeholder file
      //$headers = wp_get_http($url, $upload['file']);
      $headers = wp_get_http(str_replace(' ','%20',$url), $upload['file']);
    
      //Request failed
      if ( ! $headers ) {
       @unlink($upload['file']);
       return new WP_Error( 'import_file_error', __('Remote server did not respond') );
      }
    
      // make sure the fetch was successful
      if ( $headers['response'] != '200' ) {
       @unlink($upload['file']);
       return new WP_Error( 'import_file_error', sprintf(__('Remote file returned error response %1$d %2$s'), $headers['response'], get_status_header_desc($headers['response']) ) );
      }
      elseif ( isset($headers['content-length']) && filesize($upload['file']) != $headers['content-length'] ) {
       @unlink($upload['file']);
       return new WP_Error( 'import_file_error', __('Remote file is incorrect size') );
      }
    
      $max_size = $this->max_attachment_size();
      if ( !empty($max_size) and filesize($upload['file']) > $max_size ) {
       @unlink($upload['file']);
       return new WP_Error( 'import_file_error', sprintf(__('Remote file is too large, limit is %s', size_format($max_size))) );
      }
    
      // keep track of the old and new urls so we can substitute them later
      $this->url_remap[(str_replace(' ','%20',$url))] = $upload['url'];
      // if the remote url is redirected somewhere else, keep track of the destination too
      if ( $headers['x-final-location'] != (str_replace(' ','%20',$url)) )
       $this->url_remap[$headers['x-final-location']] = $upload['url'];
    
      return $upload;
    
     }

    I have taken the last three uses of the $url variable in this function and altered them like so:

    str_replace(' ','%20',$url)

    I have tested this on a default Ubuntu 8.10-2 LAMP setup with no issues. All files were requested from the remote server and converted by WP into a valid file format (with dashes). The urls were successfully remapped from their file names with spaces to the correct locally hosted files with dashes. If anyone else wishes to test this please do, I would like to know if this hack is portable to other servers or if the result is unique to my install.

  10. mellow_bunny
    Member
    Posted 14 years ago #

    I don't know enough about the importer to know whether all links to the file will be remapped but I can confirm that the links within the post the file is attached to do get remapped successfully.

  11. mellow_bunny
    Member
    Posted 14 years ago #

    Below is a PHP script that can be used to prepare WP XML files for use with the alterations I made to the WP import script I posted earlier. This is useful to those people like maestro4k who need to be able to import files that have spaces in the file names.

    This script should be run after exporting data from your current WP install. There are three variables in the script that should be altered to suit the export file you wish to process.

    The first variable is called $open and the value of this variable should be the filepath/name of the WP export file you wish to process. The last two variables you should alter are called $current_domain and $new_domain. You should set $current_domain to the domain name of the site you just exported your WP data from. You should then set $new_domain to the domain name you wish to import your WP data to.

    It is important to not include any other characters in the domain name variables other then the domain itself. Do not include http:// at the start nor a trailing slash.

    Once the variables have been set the easiest way to run the script is to access it through your web browser. The script will generate (in the same folder as the source xml) a fresh WP export file for you. This file has all the appropriate links altered for you. This is especially useful if you have linked to a file multiple times throughout your blog. You can be assured that once the import process is complete all your links will resolve successfully.

    After making sure you have replaced your wp-admin/import/wordpress.php file as posted above you should now login to your new WP setup. Go to the Import option, select Wordpress and browse to the "clean" WP export file the cleaner script created for you. Make sure you tick the box that asks if you want to import attachments and then press start or go or whatever the button is.

    Your blog will now go through and import remote attachments regardless of whether they contain spaces in their filenames or not. This process has been successfully tested on 2.7 installs.

    Ensure that the folder you have your original WP export file in has global write access as the script will need to generate the new file for you.

    <?php
    
    // WPXMLCleaner 1.0.1
    
    // Alter $open to match the filename you wish to run this script on.
    $open = "wordpressblog.2009-04-24.xml";
    
    $fin = fopen( $open ,"r");
    $fout = fopen( $open . ".clean","w");
    
    $input = stream_get_contents($fin);
    
    preg_match_all('/<wp:attachment_url>(http:\/\/.+? .+?\.(jpg|gif|jpeg|png|bmp))<\/wp:attachment_url>/i', $input, $matches);
    
    print_r($matches[1]);
    
    $needles = $matches[1];
    $cleaned = $input;
    
    foreach( $needles as $needle )
    {
      // get matchname
      $bad_url = str_replace(' ','%20',$needle);
    
      // get basenames
      $base = basename($needle);
      $proper_base = strtolower(str_replace(' ','-',$base));
      $proper_url = str_replace($base, $proper_base, $needle);
    
      // Set the below two variables to ensure that the links are read correctly when the import is made into WP.
      $current_domain = 'blogging.net';
      $new_domain = 'bloggers.net'; 
    
      $final_url = str_replace($current_domain, $new_domain, $proper_url);
    
    print_r($final_url);
    
      // do replace on data
      $cleaned = str_replace( $bad_url, $final_url, $cleaned );
    }
    
    fwrite( $fout, $cleaned );
    fclose($fout);
    fclose($fin);
    
    ?>

    Thanks go to Ryan Altman@melative.com for helping to create this script.

About this Topic

  • Started 14 years ago by maestro4k
  • Latest reply from mellow_bunny