cURL returns 404 while the page is found in browser

there is already similar questions on stackoverflow, but none of their solutions have been working for me. I’m trying to grab a page on LoveIt.com with cURL, but it returns me a 404 error, while the url works fine in the browser :

        $url = 'http://loveit.com/loves/P0D1jlFaIOzzZfZqj_bY3KV';

        $curl = curl_init();
        curl_setopt($curl, CURLOPT_URL, $url);
        curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
        curl_setopt ($curl, CURLOPT_HEADER, false);
        curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($curl, CURLOPT_REFERER,'http://loveit.com/');

Here’s the header I receive :

Array ( [url] => http://loveit.com/loves/P0D1jlFaIOzzZfZqj_bY3KV [content_type] => text/html; charset=utf-8 [http_code] => 404 [header_size] => 667 [request_size] => 172 [filetime] => -1 [ssl_verify_result] => 0 [redirect_count] => 0 [total_time] => 0.320466 [namelookup_time] => 0.000326 [connect_time] => 0.119046 [pretransfer_time] => 0.119089 [size_upload] => 0 [size_download] => 499 [speed_download] => 1557 [speed_upload] => 0 [download_content_length] => 499 [upload_content_length] => 0 [starttransfer_time] => 0.320438 [redirect_time] => 0 [certinfo] => Array ( ) [primary_ip] => — [primary_port] => 80 [local_ip] => — [local_port] => 53837 [redirect_url] => )

I read that some website had protections against this kind of scripts; and I did test some solutions proposed, but none worked for me (CURLOPT_USERAGENT,CURLOPT_REFERER…)

Any ideas of what’s happening here ?

I would like to backup my LoveIt account, that’s why i’m making this (no exports functions and no replies from LoveIt.com about the health of the website)

Here is Solutions:

We have many solutions to this problem, But we recommend you to use the first solution because it is tested & true solution that will 100% work for you.

Solution 1

I quickly checked the said page with LiveHeaders enabled and I noticed bunch of cookies set. I suspect that, since it’s not “normal” url, you need to hand those cookies while being redirected otherwise you end being kicked out with 404. Use CURLOPT_COOKIEJAR with your cURL instance at start. See: http://php.net/manual/pl/function.curl-setopt.php

Solution 2

I just had a similar issue with a site. In my case they were expecting a USER_AGENT to be set so anyone with this issue in the future should also check that.

Solution 3

You don’t need to save the cookie file via chrome.

You can create a function to get this cookie, and then reuse it.

Like:

<?php

error_reporting(E_ALL);

Class Crawler{

   var $cookie;
   var $http_response;
   var $user_agent;

   function __construct($cookie){
       $this->cookie     = (string) $cookie;
       $this->user_agent = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0'; 
   }

   function get($url){
       $ch = curl_init();
       curl_setopt($ch, CURLOPT_URL, $this->url);
       curl_setopt($ch, CURLOPT_NOBODY, 1);
       curl_setopt($ch, CURLOPT_USERAGENT, $this->user_agent);
       // Here we create the file with cookies
       curl_setopt($ch, CURLOPT_COOKIEJAR, $this->cookie);
       $this->http_response = curl_exec($ch);
   }

   function get_with_cookies($url){
       $ch = curl_init();
       curl_setopt($ch, CURLOPT_URL, $url);
       curl_setopt($ch, CURLOPT_NOBODY, 1);
       curl_setopt($ch, CURLOPT_USERAGENT, $this->user_agent);
       curl_setopt($ch, CURLOPT_COOKIEJAR, $this->cookie);

       // Here we can re-use the cookie file keeping the save of the cookies 
       curl_setopt($ch, CURLOPT_COOKIEFILE, $this->cookie);
       $this->http_response = curl_exec($ch);
    }
}

$crawler = new Crawler('cookie_file_name');
// Creating cookie file
$crawler->get('uri');
// Request with the cookies
$crawler->get_with_cookies('uri');

Regards.

Solution 4

Thanks for your answer, so I did visit the page, saved the cookies in a cookies.txt file (with chrome extenson cookie.txt export) that I use NOT CURLOPT_COOKIEJAR, but for option CURLOPT_COOKIEFILE.

$cookiefile = './cookie.txt';

curl_setopt($curl, CURLOPT_COOKIEFILE, $cookiefile);

and now it works ! Thanks for your feedback, it was really useful.

Note: Use and implement solution 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply