Search Increasing URL For Text - javascript

For stekhn, here's the proper link: var location = "http://www.roblox.com/Trade/inventoryhandler.ashx?filter=0&userid=" + i + "&page=1&itemsPerPage=14";
I'm trying to create a Javascript script where I can search through a users inventory, detect if they have what I'm looking for in their inventory and output the userID if they have it.
If I type in bluesteel, I need a Javascript script which will search through http://snackyrite.com/site.ashx?userid=1 and detect if it has the text 'bluesteel' is on it - if it is, I need it to display the user id, which is 1.
You may be thinking that's easy and I can easily find the script for that - well there's a catch, my objective isn't only to get it to search userid=1, I need it to search from userid=1 up to userid=45356
If the word 'bluesteel' is found on userid=5, userid=3054 and userid=12 (these are just examples), I need it to display 5, 3054 and 12 (the ID's) on the same page where the script was ran from.
This is the script I've tried, but the userid won't increase (I'm not sure how to do that).
var location = http://snackyrite.com/site.ashx?userid=1;
if(location.indexOf("bluesteel") > -1) {
output.userid
}
I do apologize, Javascript isn't my best.

Use a loop:
for (var i = 1; i <=45356; i++) {
var loc = "http://snackyrite.com/site.ashx?userid="+i;
// get contents of location
if (contents.indexOf("bluesteel") > -1) {
console.log(i);
}
}
Since getting the contents will presumably use AJAX, the if will probably be in the callback function. See Javascript infamous Loop issue? for how to write the loop so that i will be preserved in the callback function.

This kind of web scraping can't be done in the Browser (client-side JavaScript).
I would suggest building a scraper with Node.js.
Install Node.js
Install request npm i request
Install cheerio npm i cheerio
Create a file scraper.js
Run node scraper.js
Code for scraper.js
// Import the scraping libraries
var request = require("request");
var cheerio = require("cheerio");
// Array for the user IDs which match the query
var matches = [];
// Do this for all possible users
for (var i = 1; i <= 45356; i++) {
var location = "http://snackyrite.com/site.ashx?userid="+i;
request(location, function (error, response, body) {
if (!error) {
// Load the website content
var $ = cheerio.load(body);
var bodyText = $("body").text();
// Search the website content for bluesteel
if (bodyText.indexOf("bluesteel") > -1) {
console.log("Found bluesteel in inventory of user ", i);
// Save the user ID, if bluesteel was found
matches.push(i);
}
// Something goes wrong
} else {
console.log(error.message);
}
});
console.log("All users with bluesteel in inventory: ", matches);
}
The above code seems kind of complicated, but I think this is the way it should be done. Of corse you can use any other scraping tool, library.

Related

Parsing progress live from console output - NodeJS

Link to a similar problem that has no answers, but written in C
I'm using NodeJS to parse output from ark-server-tools, which is a layer on top of SteamCMD. What I'd like to do is parse the progress of the update and assign it to a variable, which I'll return as a GET call that the client can use to check progress of the update.
I put the log results of an update into a file to run my code against, which I've put in a PasteBin for brevity.
update.js
app.get('/update', function(req, res) {
var toReturn;
var outputSoFar;
var total;
var startPos;
var endPos = 0;
//var proc = spawn('arkmanager', ['update', '--safe']);
var proc = spawn('./update-log.sh'); //for testing purposes
proc.stdout.on('data', function(data){
outputSoFar += data.toString();
//if server is already updated
if (outputSoFar.indexOf('Your server is already up to date!') !== -1) {
toReturn = 'Server is already up-to-date.';
}
//find update progress
if (outputSoFar.indexOf('progress:') !== -1) {
for(var line in outputSoFar.split('\n')){
console.log('found progress');
startPos = outputSoFar[line].indexOf('progress:', endPos) + 10; //get the value right after progress:_, which should be a number
endPos = outputSoFar[line].indexOf(' (', startPos); // find the end of this value, which is signified by space + (
console.log(outputSoFar[line].substring(startPos, endPos).trim());
updatePercent = outputSoFar[line].substring(startPos, endPos).trim(); //returned to the `checkUpdateProgress` endpoint
}
toReturn = 'Updating...';
}
});
proc.stderr.on('data', function(data){
console.log(data);
});
proc.on('close', function (code, signal) {
res.writeHead(200, {'Content-Type': 'text/plain'});
res.write(JSON.stringify(toReturn));
res.end();
});
}
/*
* Returns progress of an update
*/
app.get('/updateProgress', function(req, res){
console.log('updatePercent: ' + updatePercent);
res.send(JSON.stringify(updatePercent));
});
Couple questions:
1) Is this the best way to architect my RESTful API? One call for the action of updating and another for checking the progress of the update?
2) I'd love a better way to test the function, as echoing the console log returns the data in one piece, as opposed to a data stream. How do I do this?
3) I'm pretty sure the parsing function itself isn't quite right, but I'm having a hard time testing it because of #2.
If you want to take a look at the project in its entirety, here's the repo.
Thanks in advance for your help!
For one of your questions:
Is this the best way to architect my RESTful API? One call for the
action of updating and another for checking the progress of the
update?
As implemented now, I don't think your service can support concurrent requests correctly. updatePercent is a shared global variable. If i hit /update endpoint with a single client, it will start the ./update-log.sh command.
If I request /update again, it will start another update and overwrite the global updateProgress. There doesn't seem to be anything mapping an updatePercent with the correct process
Additionally, there could be serious performance issues to each request spawning a new process. Node might be able to handle hundreds or thousands of concurrent connections using a single thread, but each request is going to spawn a new process, just something to profile

How to get google search output in the google application script environment?

If I use next function to get google output:
function myFunction() {
var post_url, result;
post_url = "http://www.google.com/search?q=stack+overflow";
result = UrlFetchApp.fetch(post_url);
Logger.log(result);
}
doesn't work.
P.S.
Sorry, I have to eхplore some dependences.
I take an example
function scrapeGoogle() {
var response = UrlFetchApp.fetch("http://www.google.com/search?q=labnol");
var myRegexp = /<h3 class=\"r\">([\s\S]*?)<\/h3>/gi;
var elems = response.getContentText().match(myRegexp);
for(var i in elems) {
var title = elems[i].replace(/(^\s+)|(\s+$)/g, "")
.replace(/<\/?[^>]+>/gi, "");
Logger.log(title);
}
}
and it works, than I begin to do some modifications and noticed that when I have some error in code it gives me an error
Request failed for http://www.google.com/search?q=labnol returned code
503.
So I did some researches without error's and it solution works. But when I began to form it to the function in lib it begans to throw me an error of 503 each time!
I'm very amazing of such behavior...
Here is short video only for fact. https://youtu.be/Lem9eiIVY0I
P.P.S.
Oh! I've broke some violations, so the google engine send me to stop list
so I run this:
function scrapeGoogle() {
var options =
{
'muteHttpExceptions': true
}
var response = UrlFetchApp.fetch("http://www.google.com/search?q=labnol", options);
Logger.log(response);
}
and get
About this pageOur systems have detected unusual traffic from your computer network. This page checks to see if it's really you sending the requests, and not a robot. Why did this happen?
As I see I have to use some special google services to get the search output and not to be prohibited?
You can use simple regex to extract Google search results.
var regex = /<h3 class=\"r\">([\s\S]*?)<\/h3>/gi;
var items = response.getContentText().match(regex);
Alternatively, you can use the ImportXML function in sheets.
=IMPORTXML(GOOGLE_URL, "//h3[#class='r']")
See: Scrape Google Search with Sheets

What is the best way to parse html in google apps script

var page = UrlFetchApp.fetch(contestURL);
var doc = XmlService.parse(page);
The above code gives a parse error when used, however if I replace the XmlService class with the deprecated Xml class, with the lenient flag set, it parses the html properly.
var page = UrlFetchApp.fetch(contestURL);
var doc = Xml.parse(page, true);
The problem is mostly caused because of no CDATA in the javascript part of the html and the parser complains with the following error.
The entity name must immediately follow the '&' in the entity reference.
Even if I remove all the <script>(.*?)</script> using regex, it still complains because the <br> tags aren't closed.
Is there a clean way of parsing html into a DOM tree.
I ran into this exact same problem. I was able to circumvent it by first using the deprecated Xml.parse, since it still works, then selecting the body XmlElement, then passing in its Xml String into the new XmlService.parse method:
var page = UrlFetchApp.fetch(contestURL);
var doc = Xml.parse(page, true);
var bodyHtml = doc.html.body.toXmlString();
doc = XmlService.parse(bodyHtml);
var root = doc.getRootElement();
Note: This solution may not work if the old Xml.parse is completely removed from Google Scripts.
In 2021, the best way to parse HTML on the .gs side that I know of is...
Click + next to Library
Enter 1ReeQ6WO8kKNxoaA_O0XEQ589cIrRvEBA9qcWpNqdOP17i47u6N9M5Xh0
Click "Look up"
Click Add
Sample usage:
const contentText = UrlFetchApp.fetch('https://www.somesite.com/').getContentText();
const $ = Cheerio.load(contentText);
$('.some-class').first().text();
That's it -- this is probably the closest we'll get to doing jQuery-like DOM selection in GAS. The .first() is important or else you may extract more content than you expected (think of it as using querySelector() instead of querySelectorAll()).
Credit where credit is due: https://github.com/tani/cheeriogs
As of May 2020, you can now use the Cheerio library for Google Apps Script to do this.
Returns the content of Wikipedia's Main Page
const content = getContent_('https://en.wikipedia.org');
const $ = Cheerio.load(content);
Logger.log($('#mp-right').text());
Returns the content of the first paragraph <p> of Wikipedia's Main Page
const content = getContent_('https://en.wikipedia.org');
const $ = Cheerio.load(content);
Logger.log($('p').first().text());
To add to your project:
Select Resources - Libraries... in the Google Apps Script editor. Enter the project key 1ReeQ6WO8kKNxoaA_O0XEQ589cIrRvEBA9qcWpNqdOP17i47u6N9M5Xh0 in the Add a library field, and click "Add". Select the highest version number, and click "Save".
I found that the best way to parse html in google apps is to avoid using XmlService.parse or Xml.parse. XmlService.parse doesn't work well with bad html code from certain websites.
Here a basic example on how you can parse any website easily without using XmlService.parse or Xml.parse. In this example, i am retrieving a list of president from "wikipedia.org/wiki/President_of_the_United_States"
whit a regular javascript document.getElementsByTagName(), and pasting the values into my google spreadsheet.
1- Create a new Google Sheet;
2- Click the menu Tools > Script editor... to open a new tab with the code editor window and copy the following code into your Code.gs:
function onOpen() {
var ui = SpreadsheetApp.getUi();
ui.createMenu("Parse Menu")
.addItem("Parse", "parserMenuItem")
.addToUi();
}
function parserMenuItem() {
var sideBar = HtmlService.createHtmlOutputFromFile("test");
SpreadsheetApp.getUi().showSidebar(sideBar);
}
function getUrlData(url) {
var doc = UrlFetchApp.fetch(url).getContentText()
return doc
}
function writeToSpreadSheet(data) {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getSheets()[0];
var row=1
for (var i = 0; i < data.length; i++) {
var x = data[i];
var range = sheet.getRange(row, 1)
range.setValue(x);
var row = row+1
}
}
3- Add an HTML file to your Apps Script project. Open the Script Editor and choose File > New > Html File, and name it 'test'.Then copy the following code into your test.html
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<input id= "mButon" type="button" value="Click here to get list"
onclick="parse()">
<div hidden id="mOutput"></div>
</body>
<script>
window.onload = onOpen;
function onOpen() {
var url = "https://en.wikipedia.org/wiki/President_of_the_United_States"
google.script.run.withSuccessHandler(writeHtmlOutput).getUrlData(url)
document.getElementById("mButon").style.visibility = "visible";
}
function writeHtmlOutput(x) {
document.getElementById('mOutput').innerHTML = x;
}
function parse() {
var list = document.getElementsByTagName("area");
var data = [];
for (var i = 0; i < list.length; i++) {
var x = list[i];
data.push(x.getAttribute("title"))
}
google.script.run.writeToSpreadSheet(data);
}
</script>
</html>
4- Save your gs and html files and Go back to your spreadsheet. Reload your Spreadsheet. Click on "Parse Menu" - "Parse". Then click on "Click here to get list" in the sidebar.
Xml.parse() has an option to turn on lenient parsing, which helps when parsing HTML. Note that the Xml service is deprecated however, and the newer XmlService doesn't have this functionality.
For simple tasks such as grabbing one value from a webpage, you could use a regular expression. Regex is notoriously bad for parsing HTML as there's all sorts of weird cases it can get tripped up, but if you're confident about the HTML you're accessing this can sometimes be the simplest way.
Here's an example that fetches the contents of the page's <title> tag:
var page = UrlFetchApp.fetch(contestURL);
var regExp = new RegExp("<title>(.*)</title>", "gi");
var result = regExp.exec(page.getContentText());
// [1] is the match group when using parenthesis in the pattern
var value = result ? result[1] : 'No title found';
I know it is not exactly what OP asked, but I found this question when I was looking for some html parsing options - so it might be useful for others as well.
There is an easy to use the library for TEXT parsing. It's useful if you want to get only one piece of information from the html(xml) code.
EDIT 2021: The script library id is:
1Mc8BthYthXx6CoIz90-JiSzSafVnT6U3t0z_W3hLTAX5ek4w0G_EIrNw
It works like in the picture above
function getData() {
var url = "https://chrome.google.com/webstore/detail/signaturesatori-central-s/fejomcfhljndadjlojamaklegghjnjfn?hl=en";
var fromText = '<span class="e-f-ih" title="';
var toText = '">';
var content = UrlFetchApp.fetch(url).getContentText();
var scraped = Parser
.data(content)
.from(fromText)
.to(toText)
.build();
Logger.log(scraped);
return scraped;
}
If you are using
Cheerio library for Google Apps Script
Source code
Library page (⭐ star it!)
Installation by library ID:
1ReeQ6WO8kKNxoaA_O0XEQ589cIrRvEBA9qcWpNqdOP17i47u6N9M5Xh0
A function to get current emojis from unicode.org:
function getEmojis() {
var t = new Date();
var url = 'https://unicode.org/emoji/charts/full-emoji-list.html';
var fetch = UrlFetchApp.fetch(url);
var contentText = fetch.getContentText();
//console.log(new Date() - t);
// Cherio
var $ = Cheerio.load(contentText);
var data = [];
$("table > tbody > tr").each((index, element) => {
var row = [];
$(element).find("td").each((index, child) => {
row.push($(child).text());
});
if (row.length > 0) {
data.push(row);
}
});
//console.log(data);
//console.log(new Date() - t);
// Result
return data;
}
↑ Sample code shows how to parse table and put it into [[array]]
May be used as a custom function:
Bonus
Parsing the site may be a time-consuming operation + you may reach the limit.
Here's a test file with a full version of the script:
https://docs.google.com/spreadsheets/d/1iO7YjYWyfseQu_YCfRbGDPg7NskOgMu_iO1iGjr7KxY/edit#gid=93365395
↑ it uses CasheService to reduce the number of calls.
Natively there's no way unless you do what you already tried which wont work if the html doesnt conform with the xml format.
There are two options
a) One is to use JavaScript's string functions. First locate your tag using string.indexOf() and then extract the data you want using string.substring().
b) The other option is to make use of the Xml Service.
It's not possible to create an HTML DOM server-side in Apps Script. Using regular expressions is likely your best option, at least for simple parsing.

How can I list websites on IIS7, from script, without using IIS6 compat pack (WMI veneer)

On IIS6, I can use WMI to list available websites, like this:
var iis = GetObject("winmgmts://localhost/root/MicrosoftIISv2");
var query = "SELECT * FROM IIsWebServerSetting"
// get the list of virtual servers
var results = iis.ExecQuery(query);
for(var e = new Enumerator(results); !e.atEnd(); e.moveNext()) {
var site = e.item();
// site.Name // W3SVC/1, W3SVC/12378398, etc
// site.Name.substr(6) // 1, 12378398, etc
// site.ServerComment) // "Default Web Site", "Site2", etc
// site.ServerBindings(0).Port // 80, 8080, etc
}
I know I can run this script on IIS7, if I have previously installed the IIS6 Compatibility Pack.
Is it possible to get the list of WebSites without requiring the compatibility pack as a pre-requisite?
I know I can run AppCmd to do this from the command line:
\Windows\system32\inetsrv\appcmd list sites
But... can I run that from a custom action in an MSI?
And... if not, how can I do the equivalent thing (list websites on IIS7) from javascript?
EDIT
Here's how I tried running the command from within Javascript.
function GetWebSites_IIS7()
{
var ParseOneLine = function(oneLine) {
...a bunch of regex parsing here....
};
LogMessage("GetWebSites_IIS7() ENTER");
var shell = new ActiveXObject("WScript.Shell");
var windir = shell.Environment("system")("windir");
// aka Session.Property("%WINDIR%")
var appcmd = windir + "\\system32\\inetsrv\\appcmd.exe list sites";
var oExec = shell.Exec(appcmd);
var sites = [];
while (!oExec.StdOut.AtEndOfStream) {
var oneLine = oExec.StdOut.ReadLine();
var line = ParseOneLine(oneLine);
LogMessage(" site: " + line.name);
sites.push(line);
}
return sites;
}
This works, but it briefly pops a visible console window, which then disappears. Doesn't look very polished. I think I can avoid the console window by using shell.Run() instead of shell.Exec(). But shell.Run() doesn't give access to the stdout, so I would have to redirect the output to a temporary file, then read the output. I haven't tried that yet. That may introduce some security issues; I'll have to see.
Related:
Where and how should my CustomAction create and read a temporary file?
Yes, you can run appcmd from the custom action the same way you do any custom action which runs exe. First off, you should author a DirectorySearch/FileSearch elements to find the full path to the executable. Next, add a custom action with ExeCommand attribute. You're probably trying to get feedback from a user, so leave it immediate. Also, think about using QuietExec in order not to show console window to your users.
By the way, if my guess is correct, you're trying to do something like this. Hope this helps.

Change query argument of jQuery suggest plugin

This question is kind-of crappy because I try to get around some limitations:
Current JS sends an ajax query with the following code
jQuery('#searchbox').suggest('/?live=1');
What the server get is the following query string:
?live=1&q=searchstring
Problem:
The server expects the query string to be preceded with 's=' not 'q='
I have to use the existing scripts so what I'm trying to so is find a way to change 'q=' to 's=' in javascript, without altering the exisiting suggest plugin or the php search script.
Thanks.
The only way you will do that is by modifying the plugin, if you want to avoid changing the script forever and need this only for one page, override the function temporarily.
$.suggest.suggest = function() {
var q = $.trim($input.val());
if (q.length >= options.minchars) {
cached = checkCache(q);
if (cached) {
displayItems(cached['items']);
} else {
//This is the line we r changing
$.get(options.source, {s: q}, function(txt) {
$results.hide();
var items = parseTxt(txt, q);
displayItems(items);
addToCache(q, items, txt.length);
});
}
} else {
$results.hide();
}
}

Categories