I started a little personal project just for fun. I hope posting what I'm doing here doesn't break any of the local rules. If so, let me know and I'll take the question down. No need to flag me for anything
I'm trying to pull the background image URL of my chromium homepage. Just for reference, the URL is https://www.mystart.com/new-tab/newtab/ When going to this page, nice background images are loaded. I'm trying to grab those images for personal, not commercial, use.
What I've traced down is that the page listed above calls out to another similar page: https://www.mystart.com/new-tab/newtab/newtab/ Currently, on line #1622 through #1636, two significant lines read:
var fastload = JSON.parse(localStorage.getItem('FASTLOAD_WALLPAPER_557b2c52a6fde1413ac3a48a'))
...
var url = fastload.info.cache_url || fastload.info.data_uri || fastload.info.image;
The value returned in the url is the URL to the background image. If I drop into the Chromium console and use: console.log(url), I see the exact data I'm trying to scrape. I'm wondering how I do that through python, since the actual textValue of url is not seen.
I have looked all over to try to find the localStorage object definition with no luck. I'm pulling the page with result = requests.get("https://www.mystart.com/new-tab/newtab/newtab/"); and then looking through result.text. I've also tried using BeautifulSoup to parse through things, not that this is really any different, but still not getting the results I'm looking for.
Being that I'm a hobbyist coder, I feel like I'm missing something simple. I've searched for answers, but I must be using the wrong keywords. I'm finding a lot of answers for parsing the urls that can be read, but not from the contents of a variable.
if you look at the requests being made, there is JSON response with info for 350 images. image_id is used in the url, e.g.
https://gallery.mystartcdn.com/mystart/images/<image_id>.jpeg
so for id=154_david-wilson-moab:
https://gallery.mystartcdn.com/mystart/images/154_david-wilson-moab.jpeg
Parse the JSON and get url for all images.
Note: this is not an answer of your question, but it looks like XY problem - this solves the underlying problem of retrieving image urls.
Related
I want to get the screenshots from PageSpeed Insights. Using the API, I used a code that i founded here : https://embed.plnkr.co/plunk/c7fAFx, but doesn't work.
please help me! I am learning to code.
Why doesn't the linked code work?
Well because it is ancient and attempting to use the version 1 Page Speed Insights API.
It is currently on version 5 so that is why it does not work, v1 no longer exists as a public API.
How to recreate the functionality of this App?
As you are learning to code I will lay out the steps for you and then you can research how to do each step and use that to learn.
I will warn you as a beginner there is a lot to learn here. However on the flip side if you manage to work out how to do the below you will have a good first project that has covered multiple areas of JS development.
As you have marked this "JavaScript" I have assumed you want to do this in the browser.
This is fine up until the point where you want to save the images as you will have to work out how to ZIP them which is probably the most difficult part.
I have highlighted the steps you need to learn / implement in bold
1. First call the API:
The current URL for Page Speed Insights API is:
https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=https://yoursite.com
Just change url=https://yoursite.com to any site you want to gather the images from.
For a small amount of requests a day you do not need to worry about an API key.
However if you do already have an API key just add &key=yourAPIKey to the end of the URL (replacing the yourAPIKey part obviously :-P).
You want to make an AJAX call to the API URL first.
2. Parse the response
Then when you get a response you are going to get a large JSON response.
You need to parse the JSON response and turn it into a JavaScript Object or Array you can work with.
3. Find the relevant parts
So once you have a JavaScript Object you can work with you are looking for "final-screenshot" and "screenshot-thumbnails".
These are located under "audits".
So for example if you parsed to an array called lighthouseResults you would be looking for lighthouseResults['audits']['final-screenshot'] or lighthouseResults['audits']['screenshot-thumbnails']
"final-screenshot" contains how the site looked after it was loaded, so if you just want that you want this element.
This contains an image that is base64 encoded (lighthouseResults['audits']['final-screenshot']['details']['data']).
"screenshot-thumbnails" is the part you want if you want the "filmstrip" of how the site loads over time. This contains a list of the thumbnails base64 encoded.
To access each of these you need to loop over each of the items located at lighthouseResults['audits']['screenshot-thumbnails']['details']['items'] and return the ['data'] part for each ['item']
Find the parts that you want and store them to a variable
4a. Decode the image(s)
Once you have the image(s) in a variable, you will have them as a base64 encoded string at the moment. You need to convert these into usable jpg images.
To do this you need to base64 decode each image.
For now I would just display them in the browser once they are decoded.
learn how to decode a base64 encoded image
4b. Alternative to decoding the image
As the images are base64 encoded they can be displayed directly in a browser without decoding first.
You can just add an image where the src your base64 image string you gathered in step 3.
If you just want to display the images this is much easier.
Add images to the screen and set the src to the base64 image string you have from step 3
Saving the images
Now you said in a comment you want to save the images. Although this can be done via JavaScript it is probably a little advanced for starting out.
If you want to save the images you really want to be doing that server side.
However if you do want to download the images (filmstrip) in the browser then you want to look into a zip utility such as jszip.js.
The beauty of this is they normally want you to convert the images to base64 first before zipping them, so it may not actually be that difficult!
I'm implementing a small image uploading function in my web page, nothing too fancy, and to that end, I think vgy.me is a good tool. From what I understand, we can upload an image to the site via a POST method in a form. It returns a JSON response for every image uploaded, which contains a link to the image among other things (important because I intend to use that link for future purposes). There's even a helpful little example of the same on its API page (link).
My question is, how can I get that JSON response for my use using vanilla JavaScript? My initial searches have turned out techniques which pertain to server-side, which obviously I can't implement because it isn't my server I'm using. Is there a way to use the default POST method of HTML to get the JSON value, or perhaps I've misinterpreted the instructions?
I'm not using the jQuery code given on the page, because I've no knowledge of any JavaScript framework, and I'd rather not simply copy and paste if I could help it.
I am trying to make a website that stream from a wiki page and take the content down into my page.
Before anyone saying it is illegal to scrape a website, mind you this is a wiki site, and under each page of that site, there is:
Content is available under Attribution-Noncommercial-Share Alike 3.0 Unported.
Meaning I am free to use and REUSE the info that is provided to me.
This is the wiki page: http://wiki.mabinogiworld.com/
Basically I am trying to make a website to take the server online status table directly and put it into my webpage, but at the same time I want to keep it updated, so it have to re-get the table next time the webpage is refreshed.
With this, I faced the cross domain issue and found something related to YQL that seems to be able to help me, but I still cant figure it out.
This is what I did so far:
YUI().use("yql", function (Y)
{
var query = 'SELECT * FROM html WHERE url="http://wiki.mabinogiworld.com/" and xpath="//div/table"';
Y.YQL(query, function(results)
{
var temp;
var size = 0;
temp = results.query.results.table;
size = temp.length;
for (var i = 0; i < size; i++)
{
//Loop through the result and find the exact table I want
}
}
}
With the above code (the loop is too messy that I cut it out) I am able to get the exact table I want with all the sub columns and rows, but it is returned in a structure that I have no idea how to translate back into HTML.
What can I do to get the table from the wiki page and put it onto my webpage? And what is the variable type of "results" anyways? I cant seems to use it in any ways other than access.
Thank you.
Try doing something that is posted here: YQL JSON script not returning?
Basically it makes AJAX possible with help of YQL
Source: http://net.tutsplus.com/tutorials/javascript-ajax/quick-tip-cross-domain-ajax-request-with-yql-and-jquery/
Well, if you really want to keep the formatting and the style of the table, make your own table, and then put your own style onto it, and then extract info out of YQL and start populating the table. That way it be done with your method. YQL is really useful, I started playing with it a bit and find it very powerful.
Not sure if that would violate the copyright rules or not though, since you are indeed reusing the data in your own format.
YQL Solution
First off, your XPath query is way too broad. Looking at the wiki page's source, I came up with this:
//div[#id='mw-content-text']/table//table[#class='center']
Unfortunately, the table that you want doesn't have an ID on it, so selecting tables with a center class was the best I could do. This returns 5 different tables; you want the first one. I tried to use the "first element" predicate (table[#class='center'][1]), but that didn't seem to do anything. Notice that the XML in the <results> element is straight XHTML that you could dump into your page. (That's assuming that you're requesting the results as XML, not JSON)
I found Yahoo's YQL Console really helpful. It allows you to fine tune your query before trying to incorporate it with Javascript to parse the results.
jQuery Solution
This isn't the optimal solution, but it circumvents the need to parse XML in Javascript or convert JSON to HTML. You can do an AJAX call to get the HTML and then strip out everything besides the table:
var scrapeUrl = 'www.example.com';
$.ajax({
type: "GET",
url: scrapeUrl,
success(html) {
var $scrapedElement = $(html).find("h1");
$("#scrapedDataDiv").html($scrapedElement);
},
error() {
alert("Problem getting table");
}
});
In this example, the code downloads the page at www.example.com and scrapes out all of the h1 tags, thanks to jQuery's handy selectors. The h1 tags are then place in a div with the id of scrapedDataDiv.
Obviously, you still have to deal with XSS/Same Origin issues. You can do this by setting up a proxy on your server.
I am trying to pass a single piece of information (using a query string) to my Facebook page tab application.
For example, if the user clicks on this URL:
-http://apps.facebook.com/myappname/?app_data=mydata
I would want to be able to access 'mydata' in the app.
From the reading I've done, Facebook does not allow GET requests, but it's possible to do this using app_data and signed_request.
However, I have not been able to find any information on how to set this up using the javascript SDK (is that even possible?) and .NET (ideally, I would be able to implement this with just JavaScript). I have no idea how to set up and read data using a signed_request, and the documentation around signed_requests is confusing me more than helping. I would really like simple instructions on how to implement this feature.
EDIT:
I think I've almost figured it out. In case anyone else is looking for an answer to this, I put what I did so far below. Also, if you see any room for improvement please let me know. I don't claim this is perfect by any means, but it works.
First, the url needs to be the page tab url, not the direct url to the app (like I posted above):
-http://www.facebook.com/pages/PageName/########?sk=app_#########&app_data=mydata
Here is the javascript code that is working for me:
//get value of signed request and split it
var signedRequest = $('#mainContent_hfSigned').val().split(".");
//decode json (this does not work on ie - needs to be replaced)
var decodedJson = window.atob(signedRequest[1]);
//parse json to gain access to parameters
var jsonParams = jQuery.parseJSON(decodedJson);
//append the app_data varible to ensure it's being read properly
$('.message').append('Your app_data param is "' + jsonParams.app_data + '"')
On jsfiddle:
http://jsfiddle.net/C3xsm/1/
One thing I know I still need to do is replace atob with a base64url decoder for javascript. I'm thinking about using this one:
http://code.google.com/p/stringencoders/source/browse/trunk/javascript/base64.js
If it works well, I'll update it here. Or if anyone knows of something that works better, please let me know.
ideally, I would be able to implement this with just JavaScript
That’s not possible, because as you already found out the signed_request parameter is POSTed to your page, and JavaScript has no access to POST parameters itself.
The documentation on the signed_request parameter has instructions on how to parse/decode it server-side; the example is in PHP, but it should be easy to transfer the basic algorithm (if you can call it that) to your .NET environment.
if im loading data for the markers from a database do i write the output queried from the db into a javascript file or is there a cleaner way of doing it?
thanks
Yeah, writing to a file is a good way to do it. Just write the data as JSON. Your file would look like:
var map = {waypoints:[...]};
And then you can do:
for(var i=o; i<map.waypoints.length; ++i) {
addWaypoint(map.waypoints[i]);
}
I actually do some static caching of nodes using this method: http://www.trailbehind.com/site_media/javascript/gen/national-parks.js
We use that set of National Parks a lot, so we cache it. But we also have urls where you can fetch JSON for a node on the fly, such as: http://www.trailbehind.com/map/node/7538973/632/735/
This URL gets the map for node 7538973, and specifies the dimensions of their map in pixels as well.
The needed Javascript can of course be wrapped in whatever language you prefer to use, see e.g. pymaps for a Python example. While pymaps is actualally inserting the JS code into an HTML template, if you're writing a web app you can perfectly well choose to serve that JS code on the fly at an appropriate URL and use that URL in a <script> tag in your pages.
Depending on the size of your application, you may want to consider printing out plain javascript.
I have a map that uses server-side clustering, so markers update frequently. I found that parsing JSON markers slowed the app significantly, and simply wasn't necessary.
If speed is an issue, I'd suggesting removing all of the unnecessary layers possible (JSON, AJAX, etc.). If it's not, you'll be just fine with JSON, which is cleaner.
I agree with Andrew's answer (+1).
I guess the only point I would add is that rather than including some server side generated JavaScript, you could use an AJAX request to grab that data. Something like:
var request = new Request.JSON (url: 'get_some_json.php',
onSuccess: function(data) {
// do stuff with the data
}).get ();
(This is a Mootools AJAX thing, but you could use any kind of AJAX request object).
Edit: ChrisB makes a good point about the performance of parsing JSON responses and re-reading my answer I certainly didn't make myself clear. I think AJAX requests are suitable for re-requesting data based on parameters generated by user interaction. I guess an example use case might be, a user filtering the data displayed on the map. You might grab the filtered data via an AJAX/SJON request rather than re-loading the page.