Just for fun, i'm trying to implement a "15 puzzle", but with 16 images (from 1 music photo) instead.
The thing is split into 2 scripts / sides. 1 Python CGI script that will perform the Last.FM query + splitting the image in Y x Z chunks. When the python script finishes it outputs a JSON string that contains the location (on server), extension etc.
{"succes": true, "content": {"nrofpieces": 16, "size": {"width": 1096, "height": 961}, "directoryname": "Mako", "extension": "jpeg"}}
On the other side is a HTML, JS, (CSS) combo that will query the CGI script for the images.
$(document).ready(function () {
var artiest = $("#artiest")
var rijen = $("#rijen")
var kolommen = $("#kolommen")
var speelveld = $("#speelveld")
var search = $("#search")
$("#buttonClick").click(function () {
var artiestZ = artiest.val()
var rijenZ = rijen.val()
var kolommenZ = kolommen.val()
$.getJSON("http://localhost:8000/cgi-bin/cgiScript.py", "artiest=" + artiestZ + "&rijen=" + rijenZ + "&kolommen=" + kolommenZ, function (JsonSring) {
console.log("HIIIIII")
if (JsonSring.succes === true){
console.log(JsonSring)
var baseUrl = "http://localhost:8000/"
var extension = JsonSring.content.extension
var url = baseUrl + JsonSring.content.directoryname + "/"
var amountX = rijenZ
var amountY = kolommenZ
for (var i = 0; i < amountX; i += 1){
for (var p = 0; p < amountY; p += 1){
console.log("HI")
var doc = new Image
doc.setAttribute("src", url + JsonSring.content.directoryname + i + "_" + p + "." +extension)
document.getElementById("speelveld").appendChild(doc)
}
}
}else {
// Search failed. Deal with it.
}
})
})
})
where the various id's link to various HTML elements. (Text Fields & Buttons & Div's).
Beneath is a screenshot of the full folder that contains the image files.
Now, coming to the point. All the HTML img tags with src seem correct, yet. Some images don't load, yet other do. I also noticed that all images failed to load in 2s intervals. Is there some kind of timeout, or so?
All this is being ran from a local machine, so disk speed and cpu shouldn't really affect the matter. Also, from what I understand: The call for making the img tags etc is done in a callback from the getJson, meaning it'll only run when getJson has finished / had a reply.
Does the great StackOverFlow community have an idea what's happening here?
To share my knowledge/experiences with the great StackOverflow community,
Small backstory
After progressing a bit further into the project I started to run into various issues going from JSON parsing to not having Allow-Control-Allow-Origin: * headers, making it very hard to get the Ajax Request (Client ==> Python CGI) done.
In the meantime I also started dev'ing on my main desktop (which for some reason either has massive issues with Python versioning or has none). But due to the terminal on my desktop having Python 3.4+ , there was no module CGIHTTPServer. After a small amount of digging, I found that CGIHTTPServer had been transfered into http.server, yet when running plain old python -m http.server, I noticed the CGI script wouldn't run. It would just display. Ofcourse, I forgot the option -cgi.
Main solution
The times I was succesfully using CGIHTTPServer, I had troubles. The images wouldn't load as described above. I suspect that the module just couldn't take the decent amount of requests. Meaning that when suddenly Y x Z requests came in, it would struggle to deliver all the data. ==> Connection Refused.
Since switching to python -m http.server -cgi, no problems what so ever. Currently working on a Bootstrap Grid for all those images!
Thx #Lashane and #Ruud.
Related
I am trying to scrape data from an interactive map (looking to get crime data for a county). I am using R (rvest) and trying to use phantomjs too. I'm new to web scraping so I am not really understanding how all the elements work together (trying to get there).
The problem I believe I am having is that after I run the phantomjs and upload the html using R's rvest package, I end up with more scripts and no clear data in the html. My code is below.
writeLines("var url = 'http://www.google.com';
var page = new WebPage();
var fs = require('fs');
page.open(url, function (status) {
just_wait();
});
function just_wait() {
setTimeout(function() {
fs.write('cool.html', page.content, 'w');
phantom.exit();
}, 2500);
}
", con = "scrape.js")
A function that takes in the url that I want to scrape
s_scrape <- function(url = "https://gis.adacounty.id.gov/apps/crimemapper/",
js_path = "scrape.js",
phantompath = "/Users/alihoop/Documents/phantomjs/bin/phantomjs"){
# this section will replace the url in scrape.js to whatever you want
lines <- readLines(js_path)
lines[1] <- paste0("var url ='", url ,"';")
writeLines(lines, js_path)
command = paste(phantompath, js_path, sep = " ")
system(command)
}
Execute the js_scrape() function and get a html file saved as "cool.html"
js_scrape()
Where I am not understanding what to do next is the below R code:
map_data <- read_html('cool.html') %>%
html_nodes('script')
The output I get in the HTML via phantomjs is just scripts again. Looking for help on how to proceed when faced (in my mind) is javascript nested in javascript scripts(?)
Thank you!
This site uses javascript to make queries to the server. One solution is to reproduce the rest request and read the returning JSON file directly. This avoids the need to use Phantomjs.
From the developer tools options from your browser and looking through the xhr files, you will find a file(s) named "query" with a link similar to: "https://gisapi.adacounty.id.gov/arcgis/rest/services/CrimeMapper/CrimeMapperWAB/FeatureServer/11/query?f=json&where=1%3D1&returnGeometry=true&spatialRel=esriSpatialRelIntersects&outFields=*&outSR=102100&resultOffset=0&resultRecordCount=1000"
Read this JSON response directly and convert to a list with the use of the jsonlite package:
library(jsonlite)
output<-jsonlite::fromJSON("https://gisapi.adacounty.id.gov/arcgis/rest/services/CrimeMapper/CrimeMapperWAB/FeatureServer/11/query?f=json&where=1%3D1&returnGeometry=true&spatialRel=esriSpatialRelIntersects&outFields=*&outSR=102100&resultOffset=0&resultRecordCount=1000")
output$features
Find the first number in the link, (11 in this case) "FeatureServer/11/query?f=json". This number will determine which crime to query the server with. I found, it can take a value from 0 to 11. Enter 0 for arson, 4 for drugs, 11 for vandalism, etc.
I'm building a website right now in which I'm trying to execute Javascript code to obtain specific elements of a totally separate site's web page given its URL. I've figured out how to use Selenium.Webdriver on my laptop in Python to achieve this by:
driver = webdriver.Firefox()
driver.get(url)
price = driver.execute_script("var priceElements = document.getElementById('priceblock_ourprice'); var prices = []; for (var i = 0; i < 3; i++) { prices.push(priceElements.children[i].innerHTML); } var price = prices[1] + '.' + prices[2]; return price;")
driver.close()
where it opens up the Firefox browser, runs the JS and then finds this price value that I want. I know there is also a way to do it which is headless and it won't need to physically open up a browser, but I'm trying to figure this out for the case of doing something similar on the website that I'm building.
Is there some way that I can achieve this result but instead of it running on my personal machine, the code runs on the Apache web server? I'm just not sure if this is possible without the machine its running on having a browser that it can open.
I hope I'm clear enough on what I'm doing, but if anyone needs clarification then I'll be happy to answer any questions you may have about this situation.
I did something similar with "lxml" and "urllib"
Here is a example:
from lxml import html
import urllib
pageLink = ""
page = urllib.urlopen(pageLink)
pageString = page.read()
pageContent = html.fromstring(pageString)
imgLink = pageContent.xpath('//img[#id="img-1"]/#src')[0]
I've seen some answers to this that refer the askee to other libraries (like phantom.js), but I'm here wondering if it is at all possible to do this in just node.js?
Considering my code below. It requests a webpage using request, then using cheerio it explores the dom to scrape the page for data. It works flawlessly and if everything had gone as planned, I believe it would have outputted a file as i imagined in my head.
The problem is that the page I am requesting in order to scrape, build the table im looking at asynchronously using either ajax or jsonp, i'm not entirely sure how .jsp pages work.
So here I am trying to find a way to "wait" for this data to load before I scrape the data for my new file.
var cheerio = require('cheerio'),
request = require('request'),
fs = require('fs');
// Go to the page in question
request({
method: 'GET',
url: 'http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp'
}, function(err, response, body) {
if (err) return console.error(err);
// Tell Cherrio to load the HTML
$ = cheerio.load(body);
// Create an empty object to write to the file later
var toSort = {}
// Itterate over DOM and fill the toSort object
$('#emb table td.list_right').each(function() {
var row = $(this).parent();
toSort[$(this).text()] = {
[$("#lastdate").text()]: $(row).find(".idx1").html(),
[$("#currdate").text()]: $(row).find(".idx2").html()
}
});
//Write/overwrite a new file
var stream = fs.createWriteStream("/tmp/shipping.txt");
var toWrite = "";
stream.once('open', function(fd) {
toWrite += "{\r\n"
for(i in toSort){
toWrite += "\t" + i + ": { \r\n";
for(j in toSort[i]){
toWrite += "\t\t" + j + ":" + toSort[i][j] + ",\r\n";
}
toWrite += "\t" + "}, \r\n";
}
toWrite += "}"
stream.write(toWrite)
stream.end();
});
});
The expected result is a text file with information formatted like a JSON object.
It should look something like different instances of this
"QINHUANGDAO - GUANGZHOU (50,000-60,000DWT)": {
"2016-09-29": 26.7,
"2016-09-30": 26.8,
},
But since the name is the only thing that doesn't load async, (the dates and values are async) I get a messed up object.
I tried Actually just setting a setTimeout in various places in the code. The script will only be touched by developers that can afford to run the script several times if it fails a few times. So while not ideal, even a setTimeout (up to maybe 5 seconds) would be good enough.
It turns out the settimeouts don't work. I suspect that once I request the page, I'm stuck with the snapshot of the page "as is" when I receive it, and I'm in fact not looking at a live thing I can wait for to load its dynamic content.
I've wondered investigating how to intercept the packages as they come, but I don't understand HTTP well enough to know where to start.
The setTimeout will not make any difference even if you increase it to an hour. The problem here is that you are making a request against this url:
http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp
and their server returns back the html and in this html there are the js and css imports. This is the end of your case, you just have the html and that's it. Instead the browser knows how to use and to parse the html document, so it is able to understand the javascript scripts and to execute/run them and this is exactly your problem. Your program is not able to understand that has something to do with the HTML contents. You need to find or to write a scraper that is able to run javascript. I just found this similar issue on stackoverflow:
Web-scraping JavaScript page with Python
The guy there suggests https://github.com/niklasb/dryscrape and it seems that this tool is able to run javascript. It is written in python though.
You are trying to scrape the original page that doesn't include the data you need.
When the page is loaded, browser evaluates JS code it includes, and this code knows where and how to get the data.
The first option is to evaluate the same code, like PhantomJS do.
The other (and you seem to be interested in it) is to investigate the page's network activity and to understand what additional requests you should perform to get the data you need.
In your case, these are:
http://index.chineseshipping.com.cn/servlet/cbfiDailyGetContrast?SpecifiedDate=&jc=jsonp1475577615267&_=1475577619626
and
http://index.chineseshipping.com.cn/servlet/allGetCurrentComposites?date=Tue%20Oct%2004%202016%2013:40:20%20GMT+0300%20(MSK)&jc=jsonp1475577615268&_=1475577620325
In both requests:
_ is a decache parameter to prevent caching.
jc is a name of a JS wrapper function which should be invoked with the result (https://en.wikipedia.org/wiki/JSONP)
So, scrapping the table template at http://www1.chineseshipping.com.cn/en/indices/cbcfinew.jsp and performing two additional requests you will be able to combine them into the same data structure you see in the browser.
I am using the Javascript Ace text editor and need to load keywords into the (DynHighlightRules) to provide highlighting of the imported keywords. I have highlights working from static keywords in the
editor.getSession().setMode("ace/mode/highlightRules")
, but I need to import new rules after the editor is rendered. I found a great solution which works perfectly in an Apache server but not in the Web.py python server. I believe this is due to the template page not being at the root level of the server. Has anyone deployed the Ace editor in Webpy and solved this issue?
Ok, I found a solution to my problem. It's a work around and not the way I was originally going to solve the issue. My first attempts were to embed an ajax call in the "ace.define" function but it wouldn't process it correctly and parts would be missing causing errors. Then I tried to dynamically import the keywords but couldn't make that work in the python environment. Finally I thought to just wrap the whole thing in the success of the ajax call and now it works exactly right. I guess when embedded in the ace function the timing between the ajax events and the other parts of the definition were out of sync.
So the answer in short is to wrap the whole definition in the ajax success callback.
$.ajax({
url: "/readUserCreatedKeywords",
type: "POST",
success: function(response){
var keywordsString = "";
var tmpArr = response.split(",");
var tmpArrLen = tmpArr.length;
var s = 0;
var halfNum = 0;
while (s < tmpArrLen) { // Clean array and save keywords and args into respective arrays
halfNum = parseInt(s/2);
tmpArr[s] = tmpArr[s].replace("u'", "").replace("[", "").replace("'", "").replace("(u", "").replace(")", "").replace("]", "").replace("(", "").replace(" ", "");
if (s % 2 == 0){ //Store even values in keywords
keywordsString += tmpArr[s] + "|";
//console.log("tmpArr[" + s + "]" + tmpArr[s]);
}
s++;
}
ace.define("ace/mode/python_highlight_rules",["require","exports","module","ace/lib/oop","ace/mode/text_highlight_rules"], function(require, exports, module) {
...............................................lots of code..........................................
exports.Mode = Mode;
});
}
});
On IIS6, I can use WMI to list available websites, like this:
var iis = GetObject("winmgmts://localhost/root/MicrosoftIISv2");
var query = "SELECT * FROM IIsWebServerSetting"
// get the list of virtual servers
var results = iis.ExecQuery(query);
for(var e = new Enumerator(results); !e.atEnd(); e.moveNext()) {
var site = e.item();
// site.Name // W3SVC/1, W3SVC/12378398, etc
// site.Name.substr(6) // 1, 12378398, etc
// site.ServerComment) // "Default Web Site", "Site2", etc
// site.ServerBindings(0).Port // 80, 8080, etc
}
I know I can run this script on IIS7, if I have previously installed the IIS6 Compatibility Pack.
Is it possible to get the list of WebSites without requiring the compatibility pack as a pre-requisite?
I know I can run AppCmd to do this from the command line:
\Windows\system32\inetsrv\appcmd list sites
But... can I run that from a custom action in an MSI?
And... if not, how can I do the equivalent thing (list websites on IIS7) from javascript?
EDIT
Here's how I tried running the command from within Javascript.
function GetWebSites_IIS7()
{
var ParseOneLine = function(oneLine) {
...a bunch of regex parsing here....
};
LogMessage("GetWebSites_IIS7() ENTER");
var shell = new ActiveXObject("WScript.Shell");
var windir = shell.Environment("system")("windir");
// aka Session.Property("%WINDIR%")
var appcmd = windir + "\\system32\\inetsrv\\appcmd.exe list sites";
var oExec = shell.Exec(appcmd);
var sites = [];
while (!oExec.StdOut.AtEndOfStream) {
var oneLine = oExec.StdOut.ReadLine();
var line = ParseOneLine(oneLine);
LogMessage(" site: " + line.name);
sites.push(line);
}
return sites;
}
This works, but it briefly pops a visible console window, which then disappears. Doesn't look very polished. I think I can avoid the console window by using shell.Run() instead of shell.Exec(). But shell.Run() doesn't give access to the stdout, so I would have to redirect the output to a temporary file, then read the output. I haven't tried that yet. That may introduce some security issues; I'll have to see.
Related:
Where and how should my CustomAction create and read a temporary file?
Yes, you can run appcmd from the custom action the same way you do any custom action which runs exe. First off, you should author a DirectorySearch/FileSearch elements to find the full path to the executable. Next, add a custom action with ExeCommand attribute. You're probably trying to get feedback from a user, so leave it immediate. Also, think about using QuietExec in order not to show console window to your users.
By the way, if my guess is correct, you're trying to do something like this. Hope this helps.