I am trying to access and then print (or just be able to use) the source code of any website using PHP. I am not very experienced and am now thinking I might need to use JS to accomplish this. So far, the code below accesses the source code of a web page and displays the web page... What I want it to do instead is display the source code. Essentially, and most importantly, I want to be able to store the source code in some sort of variable so I can use it later. And eventually read it line-by-line - but this can be tackled later.
$url = 'http://www.google.com';
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo get_data($url); //print and echo do the same thing in this scenario.
Consider using file_get_contents() instead of curl. You can then display the code on your page by replacing every opening bracket (<) with < and then outputting it to the page.
<?php
$code = file_get_contents('http://www.google.com');
$code = str_replace('<', '<', $code);
echo $code;
?>
Edit:
Looks like curl is actually faster than FGC, so ignore that suggestion. The rest of my post still stands. :)
You should try to print the result between <pre></pre> tags;
echo '<pre>' . get_data($url) . '</pre>';
I rewrote your function. The function can return the source with lines or without lines.
<?php
function get_data($url, $Addlines = false){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
$content = curl_exec($ch);
$content = htmlspecialchars($content); // Prevents the browser to parse the html
curl_close($ch);
if ($Addlines == true){
$content = explode("\n", $content);
$Count = 0;
foreach ($content as $Line){
$lines = $lines .= 'Line '.$Count.': '.$Line.'<br />';
$Count++;
}
return $lines;
} else {
$content = nl2br($content);
return $content;
}
}
echo get_data('https://www.google.com/', true); // Source code with lines
echo get_data('https://www.google.com/'); // Source code without lines
?>
Hope it gets you on your way.
Add a header Content-Type: text/plain
header("Content-Type: plain/text");
Use htmlspecialchars() in php to print the source code.
In your code, use
return htmlspecialchars($data);
instead of
return $data;
Related
I'd like to create a chart with d3.js. Could someone tell me how to work with the jsondata[timestamp,price]. I get the data from coingeckoAPI and it looks like that:
{"prices":[[1649667011317,38721.07051511258],[1649667168163,38726.36780848938],[1649667622285,38750.30201896313],[1649667926510,38715.36968177588],[1649668246571,38705.597785934006],[1649668432287,38690.34512542588],[1649668897715,38620.57305041674],[1649669050953,38613.10740572825],[1649669284813,38568.32503183882],[1649669697192,38518.76279413846],[1649669982557,38491.21941297744],[1649670258121,38460.7219359208],[1649670606639,38417.38270710583],[1649670978757,38349.85248699985],[1649671244134,38336.437837571124],
thats my Php Code:
function history($url) {
$ch= curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL, $url);
$resp = curl_exec($ch);
if ($e= curl_error($ch)) {
echo $e;
}
else {
return $resp;
}
curl_close($ch);
}
and HTML:
<?php echo history("https://api.coingecko.com/api/v3/coins/bitcoin/market_chart?vs_currency=eur&days=1");
?>
Please change HTML line to below
var data = <?php echo history("https://api.coingecko.com/api/v3/coins/bitcoin/market_chart?vs_currency=eur&days=1");?>;
You may use JSON.parse if it gives any parsing error in javascript code.
var data = JSON.parse(<?php echo history("https://api.coingecko.com/api/v3/coins/bitcoin/market_chart?vs_currency=eur&days=1");?>);
You read below link https://www.geeksforgeeks.org/how-to-pass-variables-and-data-from-php-to-javascript/
I am using charts.js and trying to load data from a JSON API using cURL to pass to the chart. So I am using a PHP variable to pass to JavaScript. I did a test in ajax and it worked, but wanting to use cURL I cannot figure out the issue.
I created an if statement that it will print out nothing on an empty variable and that's what it has been doing, so I believe the issue is with cURL.
<?php
$url = "https://api.coindesk.com/v1/bpi/historical/close.json?currency=btc";
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_HTTPHEADER, array('Accept: application/json'));
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($curl);
if(!empty($data)) {
$data = $btc;
} else {
print ("nothing");
}
curl_close($curl);
?>
<body>
<canvas id="myChart" width="250px" height="250px"></canvas>
<script>
jsonData=<?php echo $btc ?>;
var jsonLabels=[];
var jsonValues=[];
for(x in jsonData['bpi']){
jsonLabels.push(x);
jsonValues.push(jsonData['bpi'][x]);
}
I have a need to be able to read the screen size during the execution of a php script. Since that it is a client side issue I created a small script that executes javascript to obtain the screen size. I call this script from my php program with curl. It work great .. almost. The value that is returned is correct but it is not in the form that can be used by php. I tried it by setting a cookie but the cookie value return is always the value from the previous call.
if you want to see it run go here: 3wings.com/testScreenSize.php
Thanks for your help.
testScreenSize.php Code:
<?php
define(HTTPS_SERVER,'http://3wings.com');
$br='<br>';
$url = HTTPS_SERVER. '/getScreenSize.html';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 600);
$ret = curl_exec($ch);
echo curl_error($ch);
curl_close($ch);
var_dump($ret);echo $br;
echo ' $ret '.$ret.$br;
$scrwidth = $ret*1;
echo ' scrwidth '.$scrwidth.$br;
$cookieval = $_COOKIE['scrwidth'];
echo ' $cookieval '.$cookieval.$br;
getScreenSize.html
<!DOCTYPE HTML>
<html>
<body>
<script>
var width = window.innerWidth;
document.cookie = "scrwidth=" + width + "; path=/";
document.writeln(width);</script>
</script>
</body>
</html>
I'm trying to get all link URL of news on some div from this web
To get all link, after I view source but there is nothing.
But there are any data display
Could any that understand PHP, Array() and JS help me, please?
This is my code to get the content:
$html = file_get_contents("https://qc.yahoo.com/");
if ($result === FALSE) {
die("?");
}
echo $html;
$html = new DOMDocument();
#$html->loadHtmlFile('https://qc.yahoo.com/');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[#id='news_moreTopStories']//a/#href" );
foreach ($nodelist as $n){
echo $n->nodeValue."\n";
}
you can get all links from the divs you specify. make sure you put the div ids in id='news_moreTopStories']. you're using xpath to query the divs. you don't need a ton of code, just this portion.
http://php.net/manual/en/class.domxpath.php
Assuming, you want to extract all Anchor Tags with their hyperlinks from the given page.
Now there are certain problems with doing file_get_contents on that URL :
Character encoding for Compression, i.e gzip
SSL Verification of the URL.
So, to overcome first problem of gzip character encoding, we'll use CURL as #gregn3 suggested in his answer. But he missed to use CURL's ability to automatically decompress gziped content.
For second problem, you can either follow this guide or disable SSL verification from CURL's curl_setopt methods.
Now the code which will extract all the links from the given page is :
<?php
$url = "https://qc.yahoo.com/";
# download resource
$c = curl_init ($url);
curl_setopt($c, CURLOPT_HTTPHEADER, ["Accept-Encoding:gzip"]);
curl_setopt ($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($c, CURLOPT_ENCODING , "gzip");
curl_setopt($c, CURLOPT_VERBOSE, 1);
curl_setopt($c, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($c, CURLOPT_SSL_VERIFYHOST, 0);
$content = curl_exec ($c);
curl_close ($c);
$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);
# output results
echo "url = " . htmlspecialchars ($url) . "<br>";
echo "links found (" . count ($matches[1]) . "):" . "<br>";
$n = 0;
foreach ($matches[1] as $link)
{
$n++;
echo "$n: " . htmlspecialchars ($link) . "<br>";
}
But if you want to do advance html parsing, then you'll need to use PHP Simple HTML Dom Parser. In PHP Simple HTML Dom you can select the div by using jQuery selectors and fetch the anchor tags. Here are it's documentation & api manual.
To find all links in HTML you could use preg_match_all().
$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);
That url https://qc.yahoo.com/ uses gzip compression , so you have to detect that and decompress it using the function gzdecode(). (It must be installed in your PHP version)
The gzip compression is indicated by the Content-Encoding: gzip HTTP header. You have to check that header, so you must use curl or a similar method to retrieve the headers.
(file_get_contents() will not give you the HTTP headers... it only downloads the gzip compressed content. You need to detect that it is compressed but for that you need to read the headers.)
Here is a complete example:
<?php
$url = "https://qc.yahoo.com/";
# download resource
$c = curl_init ($url);
curl_setopt ($c, CURLOPT_HEADER, true);
curl_setopt ($c, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec ($c);
$hsize = curl_getinfo ($c, CURLINFO_HEADER_SIZE);
curl_close ($c);
# separate headers from content
$headers = substr ($content, 0, $hsize);
$content = substr ($content, $hsize);
# check if content is compressed with gzip
$gzip = 0;
$headers = preg_split ('/\r?\n/', $headers);
foreach ($headers as $h)
{
$pieces = preg_split ("/:/", $h, 2);
$pieces2 = (count ($pieces) > 1);
$enc = $pieces2 && (preg_match ("/content-encoding/i", $pieces[0]) );
$gz = $pieces2 && (preg_match ("/gzip/i", $pieces[1]) );
if ($enc && $gz)
{
$gzip = 1;
break;
}
}
# unzip content if gzipped
if ($gzip)
{
$content = gzdecode ($content);
}
# find links
$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);
# output results
echo "url = " . htmlspecialchars ($url) . "<br>";
echo "links found (" . count ($matches[1]) . "):" . "<br>";
$n = 0;
foreach ($matches[1] as $link)
{
$n++;
echo "$n: " . htmlspecialchars ($link) . "<br>";
}
I'm currently using whateverorigin.org in some javascript to retrieve a URL as a JSON object because a 3rd party site hasn't made one of their functions available via their JSON API.
I'd like to remove this dependancy from my website as whateverorigin.org breaks the HTTPS/SSL browser checks for secure content because it's a clear http call.
Has anyone done this? I haven't found an example of it anywhere.
Thanks in advance for a response!
Ok, so since I first typed up this question, I've now already found some examples and cobbled together a working proxy function in php... Feel free to use it for your own purposes!
<?php
// Sourced from: http://stackoverflow.com/questions/2511410/curl-follow-location-error
function curl_exec_follow(/*resource*/ &$ch, /*int*/ $redirects = 20, /*bool*/ $curlopt_header = false) {
if ((!ini_get('open_basedir') && !ini_get('safe_mode')) || $redirects < 1) {
curl_setopt($ch, CURLOPT_HEADER, $curlopt_header);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, $redirects > 0);
curl_setopt($ch, CURLOPT_MAXREDIRS, $redirects);
return curl_exec($ch);
} else {
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FORBID_REUSE, false);
do {
$data = curl_exec($ch);
if (curl_errno($ch))
break;
$code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($code != 301 && $code != 302)
break;
$header_start = strpos($data, "\r\n")+2;
$headers = substr($data, $header_start, strpos($data,"\r\n\r\n", $header_start)+2-$header_start);
if (!preg_match("!\r\n(?:Location|URI): *(.*?) *\r\n!",$headers, $matches))
break;
curl_setopt($ch, CURLOPT_URL, $matches[1]);
} while (--$redirects);
if (!$redirects)
trigger_error('Too many redirects. When following redirects, libcurl hit the maximum amount.', E_USER_WARNING);
if (!$curlopt_header)
$data = substr($data, strpos($data, "\r\n\r\n")+4);
return $data;
}
}
header('Content-Type: application/json');
$retrieveurl = curl_init(urldecode($_GET['url']));
$callbackname = $_GET['callback'];
$htmldata = curl_exec_follow($retrieveurl);
if (curl_error($retrieveurl))
die(curl_error($retrieveurl));
$status = curl_getinfo($retrieveurl, CURLINFO_HTTP_CODE);
curl_close($retrieveurl);
$data = array('contents' => $htmldata, 'status' => $status);
$jsonresult = json_encode($data);
echo $callbackname . '(' . $jsonresult . ')';
?>
Hope this helps someone!