Using cURL to get script content [duplicate] - javascript

This question already has answers here:
How to get javascript-generated content from another website using cURL?
(2 answers)
Closed 7 years ago.
I'm using cURL to access a site. The problem is that content that I need to grab is generated by a script as:
function Button(){
...
document.getElementById("out").innerHTML = name;
}
<p id="out"></p>
With cURL, I have the code of the page but not the content.
I'm using this config:
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_REFERER, $referer);
curl_setopt($curl, CURLOPT_COOKIEFILE, $cookiefile);
$redirects=5000;
$data = curl_redirect_exec($curl,$redirects);
curl_close($curl);
I could get the content generate by the script.

You cannot get data rendered in JS from PHP CURL. What you need is a headless browser, something that runs client side scripts like Phantom.JS or Casper.JS which have the capability of running Client-Side JavaScript.

Related

Web scraping for dynamic content

I am trying to scrape the information from a couple sites (mega.nz, openlaod.co) and the content is loaded dynamically so the code i am actuallu using doesn't work
<?php
require 'simple_html_dom.php';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"https://openload.co/f/41I9Ak_QBxw/DPLA.mp4");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
curl_close($ch);
echo $response;
$html = new simple_html_dom();
$html->load($response);
foreach ($html->find('img[id=imagedisplay]') as $key ) {
echo $key;
}
?>
when i use it on openload (like the example above) it redirects me to "https://oload.download/scraping/" being "/scraping" the folder where i have my script at.
Is there any javascript/jquery framework (or php) that i can use to scrape the content on the fly??
It's not suitable for a large amount of scraping, but in the past when I've needed to grab some basic data from a dynamic web page I've found that Selenium works pretty well.
Depending on your stack of choice, I'd recommend looking into headless browsers. This way you can render a page in the background and parse the resulting HTML.

Php Local Host and server are different result [duplicate]

This question already has answers here:
PHP file_get_contents() returns "failed to open stream: HTTP request failed!"
(16 answers)
Closed 4 years ago.
I have created script for lk domain search.
this is the code
<form action="" method="GET">
<input type="text" name="dm" placeholder="tx">
</form>
<?php
if (isset($_GET["dm"])) {
$domain = $_GET["dm"];
$res = file_get_contents("https://www.domains.lk/domainsearch/doDomainSearch?domainname=$domain");
echo $domain;
}
?>
<script type="text/javascript">
var data = '<?php echo $res ?>';
document.write(data);
</script>
var data will show in local host. but i have hosted it in my server then result will not show.
this is server hosted file http://vishmaloke.com/dm/ser.php
SOLUTION #1
There is PHP setting by name allow_url_fopen. This must be enable to get content from remote url. You can do it via .htaccess file.
Put following line in an .htaccess file in the directory you want the setting to be enabled:
php_value allow_url_fopen On
Note: This above setting will apply only into same directory where .htaccess file placed.
SOLUTION #2
Alternatively you can update php.ini
PHP.INI UPDATE
add following line to php.ini
allow_url_fopen = On;
SOLUTION 3
It is recommended to use curl instead of file_get_contents
CURL UPDATE
if (isset($_GET["dm"]))
{
$domain = $_GET["dm"];
// curl
$curl_handle=curl_init();
curl_setopt($curl_handle, CURLOPT_URL,"https://www.domains.lk/domainsearch/doDomainSearch?domainname=$domain");
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
$res = curl_exec($curl_handle);
curl_close($curl_handle);
echo $domain;
}

how can I use curl with javascript

i am using curl to open a page and want to play video using javascript that was shown on the page . i have used following code
$url = "https://www.example.com/";
$link = "http://www.example.com/oembed?url=" . $url. "&format=json";
$curl = curl_init($link);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$return = curl_exec($curl);
curl_close($curl);
$result = json_decode($return, true);
echo '<pre>'; print_r($result);
echo $result['html'];
play();
function play(){
document.getElementById("play-button").click();
}
my curl is working but it didn't play the video.where am iI wrong? do i have pass the x-path of the button to play video?
PHP scripts are executed on the server, while JavaScript is executed on the browser (Node.js is an exception). Thus your PHP code is already executed when the JS wanted to call the click action and there's no way that the PHP code will execute on the browser, thus the curl is not getting called.
What you need to do is call the URL using JavaScript asynchronously. You can either use Ajax or Fetch for this.

jquery for google tts no voice

When I use this code let google_tts speak word voice, the code is ok but have a problem. The word voice must listen http://translate.google.com/translate_tts?tl=en&q=dog(word) first then run this code the rusult is OK, but when I won't listen http://translate.google.com/translate_tts?tl=en&q=dog(word) first the code can't speak the word.I reference Google Translate TTS problem ,I want to know the real problem and how to fix it ?
In browser Firefox is that better but have above-mentioned problem
In IE is audio error: not support file type...
In Chorme is no any action,even //translate.google.com/translate_tts?tl=en&q=dog have no voice
I want to know how to fix let IE and Firefox browser run successful, thank a lot
HTML
<form id="say-form">
<button id="say-button">Say!</button>
<audio id="audio" preload controls>
<source id="s1" />
</audio>
</form>
JQuery
$('#say-form').submit(function(){
var ar = new Array("dog","egg","what","big")
var i=0,file = $("#audio")
console.log(ar[0])
$("#s1").attr("src", "http://translate.google.com/translate_tts?tl=en&q="+ar[0]).detach().appendTo("#audio");
file[0].load();
file[0].play();
i++;
// when it play end, play next word until ar array it's finish
file.on( "ended", function(){
if(i!=ar.length)
{
$("#s1").attr("src", "http://translate.google.com/translate_tts?tl=en&q="+ar[i]).detach().appendTo("#audio");
$(this)[0].load();
$(this)[0].play();
i++;
}
});
return false;
});
Why do not you use Php?
$text = urlencode('my text');
$url = "http://translate.google.com/translate_tts?ie=utf-8&tl=en&q=".$text;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1)");
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
$return = curl_exec($ch);
curl_close($ch);
echo $return;
?>
like this Google tts api giving me blank mp3
or this http://ctrlq.org/code/19147-text-to-speech-php

How to get HTML code source from another site

How do I get the HTML code of another site it wants cookies to be enabled?
I just need to parse this page www.fx-trend.com/pamm/rating/
I'm using javascript jquery (jQMobile) and sometimes PHP.(I prefer to use js)
here is a sample with PHP:
<?php
$url = 'url';
$html = file_get_html($url);
//$html = file_get_contents($url);
echo $html;
?>
here is a sample with js:
How to get data with JavaScript from another server?
OR
$(this).load(url);
alert($(this)); //returns object Object
server answer:
Cookies must be enabled in your browser! Try to clear all cookies, if
cookies are enabled.
code samples are welcome.
Try using Curl and enable cookies. The code sample below is snagged from this page.
<?php
/* STEP 1. let’s create a cookie file */
$ckfile = tempnam ("/tmp", "CURLCOOKIE");
/* STEP 2. visit the homepage to set the cookie properly */
$ch = curl_init ("url");
curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt ($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$output = curl_exec ($ch);
var_dump($output);
Edit: You might have to fake a browser by changing the default user agent header.

Categories