I'm trying to get all link URL of news on some div from this web
To get all link, after I view source but there is nothing.
But there are any data display
Could any that understand PHP, Array() and JS help me, please?
This is my code to get the content:
$html = file_get_contents("https://qc.yahoo.com/");
if ($result === FALSE) {
die("?");
}
echo $html;
$html = new DOMDocument();
#$html->loadHtmlFile('https://qc.yahoo.com/');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[#id='news_moreTopStories']//a/#href" );
foreach ($nodelist as $n){
echo $n->nodeValue."\n";
}
you can get all links from the divs you specify. make sure you put the div ids in id='news_moreTopStories']. you're using xpath to query the divs. you don't need a ton of code, just this portion.
http://php.net/manual/en/class.domxpath.php
Assuming, you want to extract all Anchor Tags with their hyperlinks from the given page.
Now there are certain problems with doing file_get_contents on that URL :
Character encoding for Compression, i.e gzip
SSL Verification of the URL.
So, to overcome first problem of gzip character encoding, we'll use CURL as #gregn3 suggested in his answer. But he missed to use CURL's ability to automatically decompress gziped content.
For second problem, you can either follow this guide or disable SSL verification from CURL's curl_setopt methods.
Now the code which will extract all the links from the given page is :
<?php
$url = "https://qc.yahoo.com/";
# download resource
$c = curl_init ($url);
curl_setopt($c, CURLOPT_HTTPHEADER, ["Accept-Encoding:gzip"]);
curl_setopt ($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($c, CURLOPT_ENCODING , "gzip");
curl_setopt($c, CURLOPT_VERBOSE, 1);
curl_setopt($c, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($c, CURLOPT_SSL_VERIFYHOST, 0);
$content = curl_exec ($c);
curl_close ($c);
$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);
# output results
echo "url = " . htmlspecialchars ($url) . "<br>";
echo "links found (" . count ($matches[1]) . "):" . "<br>";
$n = 0;
foreach ($matches[1] as $link)
{
$n++;
echo "$n: " . htmlspecialchars ($link) . "<br>";
}
But if you want to do advance html parsing, then you'll need to use PHP Simple HTML Dom Parser. In PHP Simple HTML Dom you can select the div by using jQuery selectors and fetch the anchor tags. Here are it's documentation & api manual.
To find all links in HTML you could use preg_match_all().
$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);
That url https://qc.yahoo.com/ uses gzip compression , so you have to detect that and decompress it using the function gzdecode(). (It must be installed in your PHP version)
The gzip compression is indicated by the Content-Encoding: gzip HTTP header. You have to check that header, so you must use curl or a similar method to retrieve the headers.
(file_get_contents() will not give you the HTTP headers... it only downloads the gzip compressed content. You need to detect that it is compressed but for that you need to read the headers.)
Here is a complete example:
<?php
$url = "https://qc.yahoo.com/";
# download resource
$c = curl_init ($url);
curl_setopt ($c, CURLOPT_HEADER, true);
curl_setopt ($c, CURLOPT_RETURNTRANSFER, true);
$content = curl_exec ($c);
$hsize = curl_getinfo ($c, CURLINFO_HEADER_SIZE);
curl_close ($c);
# separate headers from content
$headers = substr ($content, 0, $hsize);
$content = substr ($content, $hsize);
# check if content is compressed with gzip
$gzip = 0;
$headers = preg_split ('/\r?\n/', $headers);
foreach ($headers as $h)
{
$pieces = preg_split ("/:/", $h, 2);
$pieces2 = (count ($pieces) > 1);
$enc = $pieces2 && (preg_match ("/content-encoding/i", $pieces[0]) );
$gz = $pieces2 && (preg_match ("/gzip/i", $pieces[1]) );
if ($enc && $gz)
{
$gzip = 1;
break;
}
}
# unzip content if gzipped
if ($gzip)
{
$content = gzdecode ($content);
}
# find links
$links = preg_match_all ("/href=\"([^\"]+)\"/i", $content, $matches);
# output results
echo "url = " . htmlspecialchars ($url) . "<br>";
echo "links found (" . count ($matches[1]) . "):" . "<br>";
$n = 0;
foreach ($matches[1] as $link)
{
$n++;
echo "$n: " . htmlspecialchars ($link) . "<br>";
}
In the Middle of a research. How do I unwrap the div with the ID #fromWhere in the php script below. The I was able to do was addClass. Whenever I try to wrap it, it unwraps almost every other element except itself. Any suggestions on how I can achieve this.
<?php ob_start(); ?>
<div id="fromWhere" >19977</div>
<?php $contents = ob_get_contents(); ?>
In my effort to unwrap it, I changed things over to make it look like below because it work when i tried targeting it from an external script.
<?php ob_start(); ?>
<script> $('#fromWhere').unwrap(); </script>
<div id="fromWhere" >19977</div>
<?php $contents = ob_get_contents(); ?>
Everything:
PHP
<?php ob_start(); ?>
<script> $('#fromWhere').unwrap(); </script>
<div id="fromWhere" > </div>
<?php $contents = ob_get_contents(); ?>
<?php
$params = array(
'origin' => $contents,
'destination' => um_user('postal_zip_code'),
'sensor' => 'true',
'units' => 'imperial'
);
$params_string='';
// Join parameters into URL string
foreach($params as $var => $val){
$params_string .= '&' . $var . '=' . urlencode($val);
}
// Request URL
$url = "http://maps.googleapis.com/maps/api/directions/json?".ltrim($params_string, '&');
// Make our API request
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$return = curl_exec($curl);
curl_close($curl);
// Parse the JSON response
$directions = json_decode($return);
//echo"<pre>";
//print_r($directions);
// Show the total distance
echo '<p><strong>Total distance:</strong> ' . $directions->routes[0]->legs[0]->distance->text . '</p>';
?>
<div id class="distance"></div>
jQuery that inserts the content into #fromWhere
$(document).ready(function() {
var from = $('#show-here #from').text();
$(".approved #fromWhere").text(from);
});
I am attempting to capture a latitude and longitude and then pass it to Google Maps to return some data and then exact the zip code out of the response and use that.
When I call it directly to a browser I get a response on the page that ends up being like:
{"zip":"90029"}
But when I attempt to view it in Chrome's developer tools, it states "This request has no response data". So it seems as though my JSON is not being encoded properly, or I am not returning it to the page properly in a JSON format type header.
I've tried various things, and nothing seems to work properly in order to get it return the JSON data.
Here's my present code:
header('Cache-Control: no-cache, must-revalidate');
header('Expires: Mon, 26 Jul 1997 05:00:00 GMT');
header('Content-type: application/json');
$now = #date("Ymd-His");
$myFile = "loc" . $now . ".txt";
$fh = fopen("locations/" . $myFile, 'w') or die("can't open file");
$lat = $_REQUEST['lat'];
$long = $_REQUEST['long'];
$type = $_REQUEST['type'];
$phone = $_REQUEST['phone'];
$loc = $_REQUEST['typed_location'];
$url = "http://maps.googleapis.com/maps/api/geocode/json?latlng=$lat,$long&sensor=false";
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_ENCODING, "");
$curlData = curl_exec($curl);
curl_close($curl);
$address = json_decode($curlData, true);
$temp_data = $address['results'][0]['formatted_address'];
$temp = explode(", ", $temp_data);
$temp_2 = explode(" ", $temp[2]);
$zip = $temp_2[1];
fwrite($fh, $zip);
fclose($fh);
$array = array("zip" => $zip);
$t = json_encode($array);
echo $t;
For anyone else looking to solve this problem - when using cross domains, be sure to allow for the Access-Control by setting the following as a header:
header("access-control-allow-origin: *");
I paid a programmer to make a shop basket script to work with Spreadshirt API. Everything is working perfectly, except that the basket keeps emptying itself. I think the session is lost at some point so the script creates another BasketId.
I tried to find if there was a specific reason it was happening, without any success... I can't reproduce the bug. It just happens randomly without any reason. Closing the browser, resetting apache or even the whole webserver won't provoke session lost.
I've got two different scripts working with cookies on the same domain and they don't have any problem (one is a cookie for the admin login session and the other cookie is to save the user's last viewed articles on the shop)
I tried all solutions found on google without any success : editing php.ini , forcing ini settings through php, tried the htaccess way, ...
Here's the "sessions" part of my phpinfo: http://gyazo.com/168e2144ddd9ee368a05754dfd463021
shop-ajax.php (session handling # line 18)
ini_set('session.cookie_domain', '.mywebsite.com' );
header("Pragma: no-cache");
header("Cache-Control: no-store, no-cache, max-age=0, must-revalidate");
$language = addslashes($_GET['l']);
$shopid = addslashes($_GET['shop']);
// if($_SERVER['HTTP_X_REQUESTED_WITH'] != 'XMLHttpRequest') {
// die("no direct access allowed");
// }
if(!session_id()) {
$lifetime=60 * 60 * 24 * 365;
$domain = ".mywebsite.com";
session_set_cookie_params($lifetime,"/",$domain);
#session_start();
}
// Configuration
$config['ShopSource'] = "com";
$config['ShopId'] = $shopid;
$config['ShopKey'] = "*****";
$config['ShopSecret'] = "*****";
/*
* add an article to the basket
*/
if (isset($_POST['size']) && isset($_POST['appearance']) && isset($_POST['quantity'])) {
/*
* create an new basket if not exist
*/
if (!isset($_SESSION['basketUrl'])) {
/*
* get shop xml
*/
$stringApiUrl = 'http://api.spreadshirt.'.$config['ShopSource'].'/api/v1/shops/' . $config['ShopId'];
$stringXmlShop = oldHttpRequest($stringApiUrl, null, 'GET');
if ($stringXmlShop[0]!='<') die($stringXmlShop);
$objShop = new SimpleXmlElement($stringXmlShop);
if (!is_object($objShop)) die('Basket not loaded');
/*
* create the basket
*/
$namespaces = $objShop->getNamespaces(true);
$basketUrl = createBasket('net', $objShop, $namespaces);
$_SESSION['basketUrl'] = $basketUrl;
$_SESSION['namespaces'] = $namespaces;
/*
* get the checkout url
*/
$checkoutUrl = checkout($_SESSION['basketUrl'], $_SESSION['namespaces']);
// basket language workaround
if ($language=="fr") {
if (!strstr($checkoutUrl,'/fr')) {
$checkoutUrl = str_replace("spreadshirt.com","spreadshirt.com/fr",$checkoutUrl);
}
}
$_SESSION['checkoutUrl'] = $checkoutUrl;
}
/*
Workaround for not having the appearance id :(
*/
if ($_POST['appearance']==0) {
$stringApiArticleUrl = 'http://api.spreadshirt.'.$config['ShopSource'].'/api/v1/shops/' . $config['ShopId'].'/articles/'.intval($_POST['article']).'?fullData=true';
$stringXmlArticle = oldHttpRequest($stringApiArticleUrl, null, 'GET');
if ($stringXmlArticle[0]!='<') die($stringXmlArticle);
$objArticleShop = new SimpleXmlElement($stringXmlArticle);
if (!is_object($objArticleShop)) die('Article not loaded');
$_POST['appearance'] = intval($objArticleShop->product->appearance['id']);
}
/*
* article data to be sent to the basket resource
*/
$data = array(
'articleId' => intval($_POST['article']),
'size' => intval($_POST['size']),
'appearance' => intval($_POST['appearance']),
'quantity' => intval($_POST['quantity']),
'shopId' => $config['ShopId']
);
/*
* add to basket
*/
addBasketItem($_SESSION['basketUrl'] , $_SESSION['namespaces'] , $data);
$basketData = prepareBasket();
echo json_encode(array("c" => array("u" => $_SESSION['checkoutUrl'],"q" => $basketData[0],"l" => $basketData[1])));
}
// no call, just read basket if not empty
if (isset($_GET['basket'])) {
if (array_key_exists('basketUrl',$_SESSION) && !empty($_SESSION['basketUrl'])) {
$basketData = prepareBasket();
echo json_encode(array("c" => array("u" => $_SESSION['checkoutUrl'],"q" => $basketData[0],"l" => $basketData[1])));
} else {
echo json_encode(array("c" => array("u" => "","q" => 0,"l" => "")));
}
}
function prepareBasket() {
$intInBasket=0;
if (isset($_SESSION['basketUrl'])) {
$basketItems=getBasket($_SESSION['basketUrl']);
if(!empty($basketItems)) {
foreach($basketItems->basketItems->basketItem as $item) {
$intInBasket += $item->quantity;
}
}
}
$l = "";
$pQ = parse_url($_SESSION['checkoutUrl']);
if (preg_match("#^basketId\=([0-9a-f\-])*$#i", $pQ['query'])) {
$l = $pQ['query'];
}
return array($intInBasket,$l);
}
// Additional functions
function addBasketItem($basketUrl, $namespaces, $data) {
global $config;
$basketItemsUrl = $basketUrl . "/items";
$basketItem = new SimpleXmlElement('<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<basketItem xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://api.spreadshirt.net">
<quantity>' . $data['quantity'] . '</quantity>
<element id="' . $data['articleId'] . '" type="sprd:article" xlink:href="http://api.spreadshirt.'.$config['ShopSource'].'/api/v1/shops/' . $data['shopId'] . '/articles/' . $data['articleId'] . '">
<properties>
<property key="appearance">' . $data['appearance'] . '</property>
<property key="size">' . $data['size'] . '</property>
</properties>
</element>
<links>
<link type="edit" xlink:href="http://' . $data['shopId'] .'.spreadshirt.' .$config['ShopSource'].'/-A' . $data['articleId'] . '"/>
<link type="continueShopping" xlink:href="http://' . $data['shopId'].'.spreadshirt.'.$config['ShopSource'].'"/>
</links>
</basketItem>');
$header = array();
$header[] = createAuthHeader("POST", $basketItemsUrl);
$header[] = "Content-Type: application/xml";
$result = oldHttpRequest($basketItemsUrl, $header, 'POST', $basketItem->asXML());
}
function createBasket($platform, $shop, $namespaces) {
$basket = new SimpleXmlElement('<basket xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://api.spreadshirt.net">
<shop id="' . $shop['id'] . '"/>
</basket>');
$attributes = $shop->baskets->attributes($namespaces['xlink']);
$basketsUrl = $attributes->href;
$header = array();
$header[] = createAuthHeader("POST", $basketsUrl);
$header[] = "Content-Type: application/xml";
$result = oldHttpRequest($basketsUrl, $header, 'POST', $basket->asXML());
$basketUrl = parseHttpHeaders($result, "Location");
return $basketUrl;
}
function checkout($basketUrl, $namespaces) {
$basketCheckoutUrl = $basketUrl . "/checkout";
$header = array();
$header[] = createAuthHeader("GET", $basketCheckoutUrl);
$header[] = "Content-Type: application/xml";
$result = oldHttpRequest($basketCheckoutUrl, $header, 'GET');
$checkoutRef = new SimpleXMLElement($result);
$refAttributes = $checkoutRef->attributes($namespaces['xlink']);
$checkoutUrl = (string)$refAttributes->href;
return $checkoutUrl;
}
/*
* functions to build headers
*/
function createAuthHeader($method, $url) {
global $config;
$time = time() *1000;
$data = "$method $url $time";
$sig = sha1("$data ".$config['ShopSecret']);
return "Authorization: SprdAuth apiKey=\"".$config['ShopKey']."\", data=\"$data\", sig=\"$sig\"";
}
function parseHttpHeaders($header, $headername) {
$retVal = array();
$fields = explode("\r\n", preg_replace('/\x0D\x0A[\x09\x20]+/', ' ', $header));
foreach($fields as $field) {
if (preg_match('/(' . $headername . '): (.+)/m', $field, $match)) {
return $match[2];
}
}
return $retVal;
}
function getBasket($basketUrl) {
$header = array();
$basket = "";
if (!empty($basketUrl)) {
$header[] = createAuthHeader("GET", $basketUrl);
$header[] = "Content-Type: application/xml";
$result = oldHttpRequest($basketUrl, $header, 'GET');
$basket = new SimpleXMLElement($result);
}
return $basket;
}
function oldHttpRequest($url, $header = null, $method = 'GET', $data = null, $len = null) {
switch ($method) {
case 'GET':
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, false);
if (!is_null($header)) curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
break;
case 'POST':
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_POST, true); //not createBasket but addBasketItem
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
break;
}
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
?>
There's also 2 other parts of the script : a form to add a sample tshirt to the basket (example.php) and a script to call the ajax (shop-controller.js). Can post it if needed but there's no session handling stuff.
update - Maybe the problem is not related to sessions. The BasketId is lost, but PHPSESSID stays the same in the browser cookies.
I did the following tests for the last 3 days (tested with diferent computers and browsers):
Empty browser cookies then start a new session during the afternoon
Add 1 item to basket, i write down the BasketId and check the browsers cookies to write down the PHPSESSID
Usually always around midnight, the basket empty itself
PHPSESSID stays the same in my browser cookies, even after basket empty itself
However the BASKETID is not the same, the one used during the afternoon is lost and a new one is regenerated
Server is CentOS 5.9 - PHP Version 5.2.9 (from OVH). Dedicated server on a dedicated IP.
First you need to find if the problem is in session's garbage collection or a logical error within the code. For that, you can:
// Add this right after session_start()
if (!isset($_SESSION['mySessionCheck'])) {
$_SESSION['mySessionCheck'] = "This session (" . session_id() . ") started " . date("Y-m-d H:i:s");
}
// For HTML pages, add this:
echo '<!-- ' . $_SESSION['mySessionCheck'] . ' -->';
// For AJAX pages, add "mySessionCheck" to the JSON response:
echo json_encode(
array(
"c" => array(
"u" => $_SESSION['checkoutUrl'],
"q" => $basketData[0],
"l" => $basketData[1]
),
"mySessionCheck" => $_SESSION['mySessionCheck']
)
);
If this message changes at the same time the basket empties, then you'll know for sure it's a problem with PHP sessions.
In that case, there are a few things you can try:
1) You are doing
$lifetime=60 * 60 * 24 * 365;
$domain = ".mywebsite.com";
session_set_cookie_params($lifetime,"/",$domain);
#session_start();
But according to a user contributed note from PHP.net docs:
PHP's Session Control does not handle session lifetimes correctly when using session_set_cookie_params().
So you may try using setcookie() instead:
$lifetime=60 * 60 * 24 * 365;
session_start();
setcookie(session_name(),session_id(),time()+$lifetime);
Even though it's a 4 year old note as pointed in the comments, I tested it and it still happens (I'm on PHP 5.5.7, Windows Server 2008, IIS/7.5). Only setcookie() produced the HTTP headers to change the expiring date (example setting $lifetime to 600):
Set-Cookie: PHPSESSID=(the id); expires=Mon, 22-Jun-2015 15:03:17 GMT; Max-Age=600
2) If you're using a Debian servers or some derivative, they use a cron job to clear out PHP sessions, so you might try:
Increasing server's configured maxlifetime;
Saving your sessions somewhere else;
Using memcached.
3) To find out if there is some process clearing your sessions, you can place a watch on the directory where the session files are stored (actual path varies from server to server, use session_save_path to find out the location on yours). I'm no server admin, but I've read you can use auditctl for that, just make sure you log who made the changes to your files.
4) If you don't have access to server configuration, or don't want to depend on server config (good if you switch hosts), you can implement your own session handler. Check out this example by Pedro Gimeno.
You put only #session_start(); in the top of your all script.
An also put in the top of your ajax script.
Example Like following:
#session_start();
// you may use session script here or header file
include("header.php");
//some code. you may use session script here or header file
include("main.php");
//-----------next code
I post here, even if is an old post, in case someone experience this problem, check in php.ini session.gc_maxlifetime, or print ini_get('session.gc_maxlifetime'); you have to set it in your php script or php.ini, on my php version the default is 1440 seconds, I have changed it to 1 month, is enough in my case.
Also after start session you can
setcookie(session_name(),session_id(),time() + $sessionLifetime, "", "", false, true);
I hope this helps.
I my case, I replaced session_destroy(); with session_unset(); and problem was solved.
I am trying to access and then print (or just be able to use) the source code of any website using PHP. I am not very experienced and am now thinking I might need to use JS to accomplish this. So far, the code below accesses the source code of a web page and displays the web page... What I want it to do instead is display the source code. Essentially, and most importantly, I want to be able to store the source code in some sort of variable so I can use it later. And eventually read it line-by-line - but this can be tackled later.
$url = 'http://www.google.com';
function get_data($url)
{
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
echo get_data($url); //print and echo do the same thing in this scenario.
Consider using file_get_contents() instead of curl. You can then display the code on your page by replacing every opening bracket (<) with < and then outputting it to the page.
<?php
$code = file_get_contents('http://www.google.com');
$code = str_replace('<', '<', $code);
echo $code;
?>
Edit:
Looks like curl is actually faster than FGC, so ignore that suggestion. The rest of my post still stands. :)
You should try to print the result between <pre></pre> tags;
echo '<pre>' . get_data($url) . '</pre>';
I rewrote your function. The function can return the source with lines or without lines.
<?php
function get_data($url, $Addlines = false){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
$content = curl_exec($ch);
$content = htmlspecialchars($content); // Prevents the browser to parse the html
curl_close($ch);
if ($Addlines == true){
$content = explode("\n", $content);
$Count = 0;
foreach ($content as $Line){
$lines = $lines .= 'Line '.$Count.': '.$Line.'<br />';
$Count++;
}
return $lines;
} else {
$content = nl2br($content);
return $content;
}
}
echo get_data('https://www.google.com/', true); // Source code with lines
echo get_data('https://www.google.com/'); // Source code without lines
?>
Hope it gets you on your way.
Add a header Content-Type: text/plain
header("Content-Type: plain/text");
Use htmlspecialchars() in php to print the source code.
In your code, use
return htmlspecialchars($data);
instead of
return $data;