How to copy someone's else folders from public.me.com with a wget-like tool?

How can I copy a folder from http://public.me.com/ (a service related to iDisk, or MobileMe) to my local filesystem with a Unix tool (like wget, a command-line non-interactive tool)?

The problem is that the web interface is actually a complex Javascript-based thing rather than simply exposing the files. (Even w3m can’t browse, e.g., https://public.me.com/rudchenko.)

My goal is to update the local copy from time to time non-interactively, and to put the command to download the files to a script, so that other people can run the script and download the files.

A wget-like (rsync-like, git pull-like) tool will suit me, or a combination of mounting a network filesystem via FUSE and then using standard Unix commands to copy the directories will do.

I’ve read in the Wikipedia articles (which I refer to above) that Apple provides WebDAV access to these services, and I’ve also read about cadaver, a wget-like WebDAV client, but I can’t figure out which address I should use to access the folders at http://public.me.com/ read-only (anonymously).

Perhaps Gilles’ comment (that WebDAV isn’t currently used) is true, but still there seems to be some WebDAV stuff behind the scene: the URL passed to the browser for downloading an archive with a directory (after pressing the “download selected files” button at the top of the web interface) looks like this:

https://public.me.com/ix/rudchenko/SEM%20Sep21%201%20TO%20PRINT.zip?webdav-method=ZIPGET&token=1g3s18hn-363p-13fryl0a20-17ial2zeu00&disposition=download

— note that it mentions “WebDAV”. (If you are curious, I tried to re-use this URL as an argument for wget, but it failed:

$ LC_ALL=C wget 'https://public.me.com/ix/rudchenko/SEM%20Sep21%201%20TO%20PRINT.zip?webdav-method=ZIPGET&token=1g3s18hn-363p-13fryl0a20-17ial2zeu00&disposition=download'
--2011-11-21 01:21:48--  https://public.me.com/ix/rudchenko/SEM%20Sep21%201%20TO%20PRINT.zip?webdav-method=ZIPGET&token=1g3s18hn-363p-13fryl0a20-17ial2zeu00&disposition=download
Resolving public.me.com... 23.32.106.105
Connecting to public.me.com|23.32.106.105|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2011-11-21 01:21:48 ERROR 404: Not Found.
$ 

)

(I’m using a GNU/Linux system.)

Here is Solutions:

We have many solutions to this problem, But we recommend you to use the first solution because it is tested & true solution that will 100% work for you.

Solution 1

That server is clearly running a partial or broken implementation of WebDAV. Note that you need to connect to an URL like https://public.me.com/ix/rudchenko, not the normal URL https://public.me.com/rudchenko. I tried several clients:

  • With a normal HTTP downloader such as wget or curl, I could download a file knowing its name (e.g. wget https://public.me.com/ix/rudchenko/directory/filename), but was not able to obtain a directory listing.
  • FuseDAV, which would have been my first choice, is unable to cope with some missing commands. It apparently manages to list the root directory (visible in the output from fusedav -D) but eventually runs some request that returns “PROPFIND failed: 404 Not Found” and locks up.
  • Nd lacks a list command.
  • Cadaver works well, but lacks a recursive retrieval command. You could use it to obtain listings, then retrieve individual files as above.

    It’s not perfect, and there is a problem specifically in this case: cadaver‘s mget fails to treat args with wildcards that expand to filenames with spaces.

  • Davfs2 works very well. I could mount that share and copy files from it. The only downside is that this is not a FUSE filesystem, you need root to mount it or an entry in /etc/fstab.
  • The FUSE-based wdfs-1.4.2-alt0.M51.1 worked very well in this case, requiring no root (only permissions for /dev/fuse).

    mkdir viewRemote
    wdfs https://public.me.com/ix/rudchenko/ viewRemote
    rsync -a viewRemote/SEM*TO\ PRINT* ./
    fusermount -u viewRemote
    rmdir viewRemote
    

(Of course, a simple cp instead of rsync would work well in this example; rsync was chosen merely for extra diagnostics about the difference when we would update the copy.)

(Apart from wdfs, I tried these commands on a Debian squeeze system. Your mileage may vary.)

Solution 2

There are also some special scripts and a tool (wget-warc) to download the content of https://public.me.com/ user’s folders — https://github.com/ArchiveTeam/mobileme-grab/blob/master/dld-me-com.sh (and see the containing repo). (Found via http://archiveteam.org/index.php?title=MobileMe#How_to_help_archiving.)

Internally, the script seems to compose WebDAV requests and use then the responses , e.g.:

# step 1: download the list of files

if [[ "$domain" =~ "public.me.com" ]]
then

  # public.me.com has real WebDAV

  # PROPFIND with Depth: infinity lists all files
  echo -n "   - Discovering urls (XML)..."
  curl "https://public.me.com/ix/${username}/" \
       --silent \
       --request PROPFIND \
       --header "Content-Type: text/xml; charset=\"utf-8\"" \
       --header "Depth: infinity" \
       --data '<?xml version="1.0" encoding="utf-8"?><DAV:propfind xmlns:DAV="DAV:"><DAV:allprop/></DAV:propfind>' \
       --user-agent "${USER_AGENT}" \
     > "$userdir/webdav-feed.xml"
  result=$?
  if [ $result -ne 0 ]
  then
    echo " ERROR ($result)."
    exit 1
  fi
  echo " done."

  # grep for href, strip <D:href> and prepend https://public.me.com
  grep -o -E "<D:href>[^<]+" "$userdir/webdav-feed.xml" | cut -c 9- | awk '/[^\/]$/ { print "https://public.me.com" $1 }' > "$userdir/urls.txt"
  count=$( cat "$userdir/urls.txt" | wc -l )

elif 

Yes, they also use “https://public.me.com/ix/${username}/”, note the “/ix/” infix in the URL! Not the normal URL — the same thing as Gilles discovered in his answer.

Note: Use and implement solution 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply