file_get_contents is not working for some url

I use file_get_contents in PHP. In the below code in first URL works fine but the second one isn’t working.


$URL = "http://test6473.blogspot.com";
$domain = file_get_contents($URL);
print_r($domain);


$add_url= "http://adfoc.us/1575051";
$add_domain = file_get_contents($add_url);
echo $add_domain;

Any suggestions on why the second one doesn’t work?

Here is Solutions:

We have many solutions to this problem, But we recommend you to use the first solution because it is tested & true solution that will 100% work for you.

Solution 1

URL which is not retrieved by file_get_contents, because their server checks whether the request come from browser or any script. If they found request from script they simply disable page contents.

So that I have to make a request similar as browser request. So I have used following code to get 2nd url contents. It might be different for different web server. Because they might keep different checks.

Even though why dont you try to use following code! If you are lucky this might work for you!!

function getUrlContent($url) {
    fopen("cookies.txt", "w");
    $parts = parse_url($url);
    $host = $parts['host'];
    $ch = curl_init();
    $header = array('GET /1575051 HTTP/1.1',
        "Host: {$host}",
        'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language:en-US,en;q=0.8',
        'Cache-Control:max-age=0',
        'Connection:keep-alive',
        'Host:adfoc.us',
        'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36',
    );

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
    curl_setopt($ch, CURLOPT_COOKIESESSION, true);

    curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
    curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
    curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
    $result = curl_exec($ch);
    curl_close($ch);
    return $result;
}

$url = "http://adfoc.us/1575051";
$html = getUrlContent($url);

Thanks everyone for the guidance.

Solution 2

Unfortunately it looks like the second site blocks access from unrecognized browsers. Even using curl from the command line doesn’t work:

curl -I http://adfoc.us/1575051

gives:

HTTP/1.1 200 OK
Server: cloudflare-nginx
Date: Fri, 28 Jun 2013 12:15:40 GMT
Content-Type: text/html
Connection: keep-alive
X-Powered-By: PHP/5.5.0
Set-Cookie: __cfduid=d7cd1bf18c136a288cc2b36065a3b31f01372421740; expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/; domain=.adfoc.us
CF-RAY: 85a4dc6829e06d0

but no content. Note it returns status 200 so if you check the returned string for boolean === false to see if it failed, it will actually appear as if it has worked.

If you need to spoof the useragent (and possibly other things) to try and get the url to accept your request, you’ll need to take the plunge with the curl libraries and try different combinations to try and get it working. Experimenting to see what works with the curl command line first would also be a good way to reduce development time in investigating this.

Here’s someone who has been through this before:

php curl: how can i emulate a get request exactly like a web browser?

Solution 3

looks like the second url answers too slow sometimes, maybe have redirects.
try to use curl and set bigger timeout.
also, turn errors on

error_reporting(-1);
ini_set('display_errors','On');

Solution 4

you can try this code also

<?php

function getUrlContent($url) {
    $parts = parse_url($url);
    $host = $parts['host'];
    $ch = curl_init();
    $header = array('GET /1575051 HTTP/1.1',
        "Host: {$host}",
        'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
        'Accept-Language:en-US,en;q=0.8',
        'Cache-Control:max-age=0',
        'Connection:keep-alive',
        'Host:adfoc.us',
        'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36',
    );

    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
    curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
    $result = curl_exec($ch);
    curl_close($ch);
    return $result;
}

$url = "https://news.google.com/rss/search?q=apple&hl=en-IN&gl=IN&ceid=IN:en";
$html = getUrlContent($url);

$xml = simplexml_load_string($html);
$json = json_encode($xml);
$array = json_decode($json,TRUE);


print_r($array);
?>

Note: Use and implement solution 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply