How to retrieve results from a search from several pages - javascript

I am need to collect data from a website that allows me to download the results of a query but only the ones currently being displayed on that page.
I have no experience whatsoever with javascript or even any real programming. A friend told me that I might be able to do it with the "Custom JavaScript for websites" Addon for Chrome.
I managed to get it to download the file I want for each page using:
document.getElementById('dContent').value = 'full';
document.getElementById('submit-download').click();
I still need to manually change to the next page. I tried to do this automatically by adding a cycle:
for (i = 1; i < 20; i++) {
location.href='https://www.myurl.com/search?query=keyword&page=' +i;
document.getElementById('dContent').value = 'full';
document.getElementById('submit-download').click();
}
But it doesn't seem to work*. My google skills and limited knowledge only took me this far. Am I doing something wrong? I wonder if the addon actually allows this as I reckon it might not work as a new page is being loaded. Is there another software that I might use for this purpose?
Thank you in advance for your help.
the url for the queries follows the page=1 / =2 / ... rule for the results

setTimeout(function(){
document.getElementById('dContent').value = 'full';
document.getElementById('submit-download').click();
var page = parseInt(location.search.match(/\d\d?$/)[0])
if(!isNaN(page) && page < 20){
location.href='https://www.myurl.com/search?query=keyword&page=' +(page + 1);
}
}, 1000);

Related

Can I run Javascript code on a given URL via Python or PHP on the server side?

I'm building a website right now in which I'm trying to execute Javascript code to obtain specific elements of a totally separate site's web page given its URL. I've figured out how to use Selenium.Webdriver on my laptop in Python to achieve this by:
driver = webdriver.Firefox()
driver.get(url)
price = driver.execute_script("var priceElements = document.getElementById('priceblock_ourprice'); var prices = []; for (var i = 0; i < 3; i++) { prices.push(priceElements.children[i].innerHTML); } var price = prices[1] + '.' + prices[2]; return price;")
driver.close()
where it opens up the Firefox browser, runs the JS and then finds this price value that I want. I know there is also a way to do it which is headless and it won't need to physically open up a browser, but I'm trying to figure this out for the case of doing something similar on the website that I'm building.
Is there some way that I can achieve this result but instead of it running on my personal machine, the code runs on the Apache web server? I'm just not sure if this is possible without the machine its running on having a browser that it can open.
I hope I'm clear enough on what I'm doing, but if anyone needs clarification then I'll be happy to answer any questions you may have about this situation.
I did something similar with "lxml" and "urllib"
Here is a example:
from lxml import html
import urllib
pageLink = ""
page = urllib.urlopen(pageLink)
pageString = page.read()
pageContent = html.fromstring(pageString)
imgLink = pageContent.xpath('//img[#id="img-1"]/#src')[0]

Retrieving data from twitter with JavaScript

I am using this js/jquery to scroll down a twitter page:
var f = function(i) { if(i < 999999999999) {
window.scrollTo(0,document.body.scrollHeight);
setTimeout(function(){f(i+1)}, 1000)
}
}
f(1)
And by doing it, I would like to save the webpage and do an scrape on it. But the problem is that my browser stop working, because the huge amount of info. So I want to know if there's any way to go save the page in the process.

Share webpages on social media with counter

I'm creating a website that's going to have hundreds of pages. I want each page to be shareable on Facebook and Twitter. I've already created these buttons but I also want to have their respective share counters next to my share buttons. I don't want to use the standard Facebook method they provide because the coding looks bloated.
Right, so after doing some research, I found this example on codepen.
This looks exactly what I want - very simple!
However, I need some clarification and basic help with how this javascript code works:
var permalink = 'http://codepen.io';
var getTwitterCount = function () {
$.getJSON('http://urls.api.twitter.com/1/urls/count.json?
url='+permalink+'&callback=?', function(data){
var twitterShares = data.count;
$('.twitter .share-count').text(twitterShares);
});
};
getTwitterCount();
var getFacebookCount = function () {
$.getJSON('http://graph.facebook.com/?ids='+permalink+'&callback=?',
function(data){
var facebookShares = data[permalink].shares;
$('.facebook .share-count').text(facebookShares);
});
};
getFacebookCount();
This bit of code:
var permalink = 'http://codepen.io';
Does this have to be:
1) the url of the actual page I want shared, eg: http://www.example.com/page-1/
OR
2) Must this be the root of the domain name, eg: http://www.example.com/
?
Or am I missing something else?
If the answer is #1 above, then that means I have to include + edit this line for each page which isn't ideal because I have all my javascript code + plugins in ONE .js file to reduce http requests, so I'd prefer it that I don't have to add this javascript on-page for every page.
It would be the page that you want to share, but you could get around it without using a separate variable for each page by setting it to something like document.location.href for example?

MOSS 07 editform.aspx fails to commit and goes to blank html page

Found solution from Microsoft Blog... see below
OK, to start I don't like the word random but I cannot find any correlation in test cases for this problem so I am going to use random to describe parts of this problem.
The setup: I have a list where i have crated a customized UI for the EditForm.aspx and NewForm.aspx. I use the same JS file and JavaScript between the two of them. I have added in a google map to help illustrate the location selection. I have added extra code to the "OK" button for some dynamic validation. I have done a lot of dynamic menu things as well. All users use IE 9 and the site is on a MOSS 2007 server.
The problem: Only on the EditForm.aspx, clicking OK "Randomly" results in an immediate white screen. The form is not saved and when viewing the source code of the white screen i find a blank html page.
What I have tried to find this problem:
- I tried to narrow down the user and computer this happens on and found that it happens for everyone on every computer(once again "Randomly").
- I tried disabling the code that is pre-pended to the "OK" button
- I tried following the code with the IE9's external script debugged and found no errors
I can provide the code but it is a bit long and I really do not know where to begin. So i can provide it if needed.
Thanks for the help ahead of time.
Edit:
This is the code re-wiring my OK button(i reset the value to "Save" earlier)
var okBtns = $('input[value="Save"]')
$.each(okBtns, function(index,value){
okFunction=$(value).attr('onclick');
$(value).attr('onclick','return false;')
$(value).bind('click', function(){
if ($('#'+StatusBox).val()=='Draft') {$('#'+StatusBox).val('New Request')}
var err = clickOKbutton();
if(err==0) {okFunction()};
});
});
This is the clickOKbutton function witch is th code prepended to the orgianal sharepoint operations:
function clickOKbutton()
{
//all of the imput validation i could ever wish for!!!!
var NoteVal = ''
var NameAry = $('#'+PersonnelBox).parent().children(":first").children("SPAN").children("SPAN");
$.each(NameAry, function(index,value){
var $n=$(value).html();
if(NoteVal.length==0) {NoteVal=$n} else {NoteVal=NoteVal+';'+$n};
});
//$('#'+AddNotes).val(NoteVal);
var plh = $('#'+PersonnelBox).parent().html()
userNameTx = $('#zz8_Menu').text();
userNameTx = userNameTx.replace('Welcome ','');
$.each(OICUsers, function(i,v){
if(plh.indexOf(v) > -1 && st=='New Request'){
$('#'+StatusBox).val('OIC Bypassed')
$('#'+CommentsBox).val('OIC is travling on this TDY/TAD and cannot approve. So this request is bypassing the "OIC Approval" step')
}
});
/*userNameTx = $('#zz8_Menu').text();
userNameTx = userNameTx.replace('Welcome ','')
$('#' + ModBox).closest('TR').show();*/
var message=''
message = detectFieldChanges(AllFieldsArray,AllOrgValArray,"Draft,New Request,Modified")
if(message.length>0){
$('#'+ModBox).val(message);
AutoResizeTextarea(ModBox);
}
message = detectFieldChanges(ValFieldsArray,OrgValuesArray,"Draft,New Request,Modified,OIC Approved,OIC Bypassed,Pending RFI,Ready for COS")
userNameTx = $('#zz8_Menu').text();
userNameTx = userNameTx.replace('Welcome ','');
if(message.length>0&&$.inArray(userNameTx,COSUsers)==-1){
$('#'+StatusBox).val('Modified').change;
$('#'+StatusLongBox).val('Modified').change;
}
//Subject box
var pb = NoteVal;
var ep = $('#'+ExtPersonnel).val();
var ab = $('#'+AddressBox).val();
var sd = $('#'+sDateBox).val();
var ed = $('#'+eDateBox).val();
var st = $('#'+StatusBox).val();
var p = pb+';'+ep;
var p = p.replace(/mossaspnetmembershipprovider:/g,'');
var p = p.slice(0,-1);
var ad = ab+' '+sd+' to '+ed;
var s = 'eTDY | '+st+' - '+p+' - '+ad;
if(s.length>255){
var l = s.length-255;
p = p.substring(0,p.length-l);
s = 'eTDY | '+p+' - '+ad;
}
$('#'+Subject).val(s);
//check Lat/Lng value
if($('#'+LatBox).val()=='' || $('#'+LngBox).val()==''){
//alert("Cannot continue unless the Lat Lng has a vallid coordinate");
if($('#LatLngError').length==0){
errorHTML='<br><span class="ms-error" id="LatLngError">You must specify a value for Lat and Lng</span>'
$('#'+AddressBox).closest('TD').append(errorHTML)
}
return -1
}
return 0
};
It is messy but hopefully you can make sense of it.
Edit 2:
I think I have tracked the randomness down... I completely turned off all custom code and still have the problem. I then tried comparing a working record with a non working record. Everything looked normal until i got to the field with a multiple people picker. If i have more than 2 people in that field it will save normal but when i go to make a modification on that record with more than 2 people in the people picker field is causes this problem. I am going to do some more research and will post my results.
Edit 3:
http://blogs.msdn.com/b/jorman/archive/2009/12/22/mystery-of-the-sharepoint-white-screens.aspx
This problem all boils down to IIS configuration and the Impersonation Level. Apparently our server admins decided to change it without telling anyone.
Usually, when you get [seemingly random] behavior from a web page (especially in MOSS), it means that you have ambiguous events defined on the page. Usually, I get this when I add some kind of JScript to a button or form on_submit.
Without seeing your code, I can't really narrow it down further than that. I recommend: look for JavaScript events on your HTML form or on your button click events or look for anchor [a] tags that point to nowhere (href=#) but have javascript. Then decide to do it (strictly) the HTML way (forms, submit buttons) or the javascript way, (no forms, no asp:button) and un-wire the other.
This problem all boils down to IIS configuration and the Impersonation Level. Apparently our server admins decided to change it without telling anyone.
http://blogs.msdn.com/b/jorman/archive/2009/12/22/mystery-of-the-sharepoint-white-screens.aspx

Why does dojo.xhrGet needs different kinds of url to work on different computers (pc/mac)?

i'm writing an greasemonkey script for somebody else. he is a moderator and i am not. and the script will help him do some moderating things.
now the script works for me. as far as it can work for me.(as i am not a mod)
but even those things that work for me are not working for him..
i checked his version of greasemonkey plugin and firefox and he is up to date.
only thing that's really different is that i'm on a mac and he is pc, but i wouldn't think that would be any problem.
this is one of the functions that is not working for him. he does gets the first and third GM_log message. but not the second one ("got some(1) ..").
kmmh.trackNames = function(){
GM_log("starting to get names from the first "+kmmh.topAmount+" page(s) from leaderboard.");
kmmh.leaderboardlist = [];
for (var p=1; p<=(kmmh.topAmount); p++){
var page = "http://www.somegamesite.com/leaderboard?page="+ p;
var boardHTML = "";
dojo.xhrGet({
url: page,
sync: true,
load: function(response){
boardHTML = response;
GM_log("got some (1) => "+boardHTML.length);
},
handleAs: "text"
});
GM_log("got some (2) => "+boardHTML.length);
//create dummy div and place leaderboard html in there
var dummy = dojo.create('div', { innerHTML: boardHTML });
//search through it
var searchN = dojo.query('.notcurrent', dummy).forEach(function(node,index){
if(index >= 10){
kmmh.leaderboardlist.push(node.textContent); // add names to array
}
});
}
GM_log("all names from "+ kmmh.topAmount +" page(s) of leaderboard ==> "+ kmmh.leaderboardlist);
does anyone have any idea what could be causing this ??
EDIT: i know i had to write according to what he would see on his mod screen. so i asked him to copy paste source of pages and so on. and besides that, this part of the script is not depending on being a mod or not.
i got everything else working for him. just this function still doesn't on neither of his pc's.
EDIT2 (changed question): OK. so after some more trial and error, i got it to work, but it's still weird.
when i removed the www-part of the url thats being use in the dojo.xhrGet() i got the finally the same error he got. so i had him add www to his and now it works.
the odd thing is he now uses a script with the url containing "www" and i'm using a script with an url without "www"...
so for me:
var page = "http://somegamesite.com/leaderboard?page="+ p;
and for him:
var page = "http://www.somegamesite.com/leaderboard?page="+ p;
Why don't you have him try logging into an account that is not a moderator account so that you eliminate one of your variables from your problem space.
It's possible that the DOM of the page is different for a moderator than for a regular user. If you're making assumptions about the page as a regular user that are not true as a moderator, that could cause problems.
I suspect that to fix it, you may need access to a moderator account so you can more easily replicate the behavior.
ooops. it seemed that the url of this gamesite is accessible as www.gamesite.com as well as gamesite.com (without the www.part). this caused the problem.
sorry to bother you'all.
i go hide in shame now...

Categories