Press Clippings : Script to download all press clippings - javascript

<?php
ignore_user_abort(true);
set_time_limit(0); // disable the time limit for this script
$path = "https://vibrantgujarat.com/pressclippingsnew.htm"; // change the path to fit your websites document structure
$dl_file = preg_replace("([^\w\s\d\-_~,;:\[\]\(\).]|[\.]{2,})", '', $_GET['download_file']); // simple file name validation
$dl_file = filter_var($dl_file, FILTER_SANITIZE_URL); // Remove (more) invalid characters
$fullPath = $path.$dl_file;
if ($fd = fopen ($fullPath, "r")) {
$fsize = filesize($fullPath);
$path_parts = pathinfo($fullPath);
$ext = strtolower($path_parts["extension"]);
switch ($ext) {
case "pdf":
header("Content-type: application/pdf");
header("Content-Disposition: attachment; filename=\"".$path_parts["basename"]."\""); // use 'attachment' to force a file download
break;
// add more headers for other content types here
default;
header("Content-type: application/octet-stream");
header("Content-Disposition: filename=\"".$path_parts["basename"]."\"");
break;
}
header("Content-length: $fsize");
header("Cache-control: private"); //use this to open files directly
while(!feof($fd)) {
$buffer = fread($fd, 2048);
echo $buffer;
}
}
fclose ($fd);
exit;
I Have a page which has more than 500 press clippings information along with date, name, media name and image path.
I want to download all of them using script but I Don't know how to write download script.
Here is link
Any help would be great.
Thank You.

Checkout the following function, which is not working fully, you need to try out some changes to it.
function saveImageAs(){
var images = document.getElementsByTagName("img");
for(var i=0;i<images.length; i++){
var imgOrURL= images[i].src;
window.win = open(imgOrURL);
setTimeout('win.document.execCommand("SaveAs")', 0);
}
}

<?php
set_time_limit(0);
//File to save the contents to
$fp = fopen ('download.zip', 'w+');
$url = "https://vibrantgujarat.com/pressclippingsnew.htm";
//Here is the file we are downloading, replace spaces with %20
$ch = curl_init(str_replace(" ","%20",$url));
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
//give curl the file pointer so that it can write to it
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$data = curl_exec($ch);//get curl response
//done
curl_close($ch);
This will help you to download images and contents for the 1st page.
Hope this helps.

Related

load OTHER LINK if not available

$mp3Linkger = wp_get_attachment_url($mp3_file_id);
$mp3Link = wp_get_attachment_url($mp3_file_id);
$mp3Link = str_replace( 'example.COM', 'static.example.COM', $mp3Link );
$playerTag = '[audio mp3="'.$mp3Linkger.'"][/audio]';
In the above code
$playerTag loads the link
$mp3Linkger Is broadcast
I want to load $mp3Link if $mp3Linkger was not available
Not available like Down Server or 404 error and ...
Update :
Ways that friends tell / Site loading speed slows down :
function check_url($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $mp3Linkger);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch , CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
$headers = curl_getinfo($ch);
curl_close($ch);
return $headers['http_code'];
}
$check_url_status = check_url($mp3Linkger);
if ($check_url_status == '200') {
$playerTag = '[audio mp3="'.$mp3Linkger.'"][/audio]';
} else {
$playerTag = '[audio mp3="'.$mp3Link.'"][/audio]'; }
I want this process to happen when the user clicks on the link ($playerTag)
That is, if link a is not available, link b will be loaded
.
You should take a look at #fopen();.
fopen — Opens file or URL
Mode
Description
'r'
Open for reading only; place the file pointer at the beginning of the file.
Source # https://www.php.net/manual/en/function.fopen.php
<?php
/**
* Check if CDN's url is valid, if not return fallback
*/
$test = #fopen( '_Your_CDN_URL_goes_here_', 'r' );
if ( $test !== false ) {
// CDN's url is valid
$url = _Your_CDN_URL_goes_here_;
} else {
// CDN's url isn't valid
$fallback = _Your_fallback_URL_goes_here_;
}; ?>

PHP. saving many pages as png from the command line

I access a page via GET (some of its contents are loaded using jquery), on document.ready this page gets saved as a png.
I want to call this page from a command line using inside the command a for loop to save multiple pngs.
How can I do it?
If I run this on the browser it works fine but the idea is not to make it manually, one by one for each gln code.
curl did not work or am I using it wrong?
<script>
#isset($saveCode)
$("#btnPng").click();
#endisset
$("#btnPng").click(function () {
var selected_date = $('#selectReportDate').find(':selected').val() ;
var selected_gln = $('#selectAccount').find(':selected').val() ;
html2canvas($("#printable"), {
onrendered: function (canvas) {
var url = canvas.toDataURL();
$("<a>", {
href: url,
download: selected_date + selected_gln
})
.on("click", function() {$(this).remove()})
.appendTo("body")[0].click()
}
})
});
</script>
command handle code
public function handle()
{
$date = WeeklyTopSheetsData::max('report_date');
$accounts = REF_GA_GLN::Select('gln')->orderBy('account_name')->get();
foreach ($accounts as $account) {
$auxURL = 'http://localhost:8000/topsheet/' . $account['gln'] . '/' . $date . '/1';
$ch = curl_init();
echo $auxURL;
//set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $auxURL);
curl_setopt($ch, CURLOPT_HEADER, 0);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
}
echo ' FIN';
}
}

Multiple PDFdownloads

I have a local DB table in which many PDF links are saved around 15000. I want to download all that PDF on one click but my problem is its opening PDF not downloading. I was trying this method.
items = Array.from(document.getElementsByTagName("a"));
items.forEach(function(item) {
link = item.href;
if (link.substr(link.length - 4) == ".pdf") {
filename = link.replace(/^.*[\\\/]/, '');
item.download = filename;
item.click();
}
});
You can not download all files using only 1 click. Instead of You can use ZIP Archive Class in PHP.
Make one zip file of all available pdf and download it.
$files = array('pdf1.pdf','pdf2.pdf');
$zipname = 'file.zip';
$zip = new ZipArchive;
$zip->open($zipname, ZipArchive::CREATE);
foreach ($files as $file) {
$zip->addFile($file);
}
$zip->close();
And Headers Like
header('Content-Type: application/zip');
header('Content-disposition: attachment; filename='.$zipname);
header('Content-Length: ' . filesize($zipname));
readfile($zipname);
Thanks for your reply,
this code works for me with zip archive
$files = array('pdflink','pdflink');
$zip = new ZipArchive();
$tmp_file = tempnam('.','');
$zip->open($tmp_file, ZipArchive::CREATE);
foreach($files as $file){
$download_file = file_get_contents($file);
$zip->addFromString(basename($file),$download_file);
}
$zip->close();
header('Content-disposition: attachment; filename=file.zip');
header('Content-type: application/zip');
readfile($tmp_file);
?>

php : how to get all hyperlinks from a specific div of a given page?

I'm trying to get all link URL of news on some div from this web
To get all link, after I view source but there is nothing.
But there are any data display
Could any that understand PHP, Array() and JS help me, please?
This is my code to get the content:
$html = file_get_contents("https://qc.yahoo.com/");
if ($result === FALSE) {
die("?");
}
echo $html;
$html = new DOMDocument();
#$html->loadHtmlFile('https://qc.yahoo.com/');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[#id='news_moreTopStories']//a/#href" );
foreach ($nodelist as $n){
echo $n->nodeValue."\n";
}
you can get all links from the divs you specify. make sure you put the div ids in id='news_moreTopStories']. you're using xpath to query the divs. you don't need a ton of code, just this portion.
http://php.net/manual/en/class.domxpath.php
Assuming, you want to extract all Anchor Tags with their hyperlinks from the given page.
Now there are certain problems with doing file_get_contents on that URL :
Character encoding for Compression, i.e gzip
SSL Verification of the URL.
So, to overcome first problem of gzip character encoding, we'll use CURL as #gregn3 suggested in his answer. But he missed to use CURL's ability to automatically decompress gziped content.
For second problem, you can either follow this guide or disable SSL verification from CURL's curl_setopt methods.
Now the code which will extract all the links from the given page is :
<?php
$url = "https://qc.yahoo.com/";
# download resource
$c = curl_init ($url);
curl_setopt($c, CURLOPT_HTTPHEADER, ["Accept-Encoding:gzip"]);
curl_setopt ($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($c, CURLOPT_ENCODING , "gzip");
curl_setopt($c, CURLOPT_VERBOSE, 1);
curl_setopt($c, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($c, CURLOPT_SSL_VERIFYHOST, 0);
$content = curl_exec ($c);
curl_close ($c);
$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);
# output results
echo "url = " . htmlspecialchars ($url) . "<br>";
echo "links found (" . count ($matches[1]) . "):" . "<br>";
$n = 0;
foreach ($matches[1] as $link)
{
$n++;
echo "$n: " . htmlspecialchars ($link) . "<br>";
}
But if you want to do advance html parsing, then you'll need to use PHP Simple HTML Dom Parser. In PHP Simple HTML Dom you can select the div by using jQuery selectors and fetch the anchor tags. Here are it's documentation & api manual.
To find all links in HTML you could use preg_match_all().
$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);
That url https://qc.yahoo.com/ uses gzip compression , so you have to detect that and decompress it using the function gzdecode(). (It must be installed in your PHP version)
The gzip compression is indicated by the Content-Encoding: gzip HTTP header. You have to check that header, so you must use curl or a similar method to retrieve the headers.
(file_get_contents() will not give you the HTTP headers... it only downloads the gzip compressed content. You need to detect that it is compressed but for that you need to read the headers.)
Here is a complete example:
<?php
$url = "https://qc.yahoo.com/";
# download resource
$c = curl_init ($url);
curl_setopt ($c, CURLOPT_HEADER, true);
curl_setopt ($c, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec ($c);
$hsize = curl_getinfo ($c, CURLINFO_HEADER_SIZE);
curl_close ($c);
# separate headers from content
$headers = substr ($content, 0, $hsize);
$content = substr ($content, $hsize);
# check if content is compressed with gzip
$gzip = 0;
$headers = preg_split ('/\r?\n/', $headers);
foreach ($headers as $h)
{
$pieces = preg_split ("/:/", $h, 2);
$pieces2 = (count ($pieces) > 1);
$enc = $pieces2 && (preg_match ("/content-encoding/i", $pieces[0]) );
$gz = $pieces2 && (preg_match ("/gzip/i", $pieces[1]) );
if ($enc && $gz)
{
$gzip = 1;
break;
}
}
# unzip content if gzipped
if ($gzip)
{
$content = gzdecode ($content);
}
# find links
$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);
# output results
echo "url = " . htmlspecialchars ($url) . "<br>";
echo "links found (" . count ($matches[1]) . "):" . "<br>";
$n = 0;
foreach ($matches[1] as $link)
{
$n++;
echo "$n: " . htmlspecialchars ($link) . "<br>";
}

Accessing and printing HTML source code using PHP or JavaScript

I am trying to access and then print (or just be able to use) the source code of any website using PHP. I am not very experienced and am now thinking I might need to use JS to accomplish this. So far, the code below accesses the source code of a web page and displays the web page... What I want it to do instead is display the source code. Essentially, and most importantly, I want to be able to store the source code in some sort of variable so I can use it later. And eventually read it line-by-line - but this can be tackled later.
$url = 'http://www.google.com';
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo get_data($url); //print and echo do the same thing in this scenario.
Consider using file_get_contents() instead of curl. You can then display the code on your page by replacing every opening bracket (<) with < and then outputting it to the page.
<?php
$code = file_get_contents('http://www.google.com');
$code = str_replace('<', '<', $code);
echo $code;
?>
Edit:
Looks like curl is actually faster than FGC, so ignore that suggestion. The rest of my post still stands. :)
You should try to print the result between <pre></pre> tags;
echo '<pre>' . get_data($url) . '</pre>';
I rewrote your function. The function can return the source with lines or without lines.
<?php
function get_data($url, $Addlines = false){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
$content = curl_exec($ch);
$content = htmlspecialchars($content); // Prevents the browser to parse the html
curl_close($ch);
if ($Addlines == true){
$content = explode("\n", $content);
$Count = 0;
foreach ($content as $Line){
$lines = $lines .= 'Line '.$Count.': '.$Line.'<br />';
$Count++;
}
return $lines;
} else {
$content = nl2br($content);
return $content;
}
}
echo get_data('https://www.google.com/', true); // Source code with lines
echo get_data('https://www.google.com/'); // Source code without lines
?>
Hope it gets you on your way.
Add a header Content-Type: text/plain
header("Content-Type: plain/text");
Use htmlspecialchars() in php to print the source code.
In your code, use
return htmlspecialchars($data);
instead of
return $data;

Categories