I'm trying to find a plaintext JSON within a webpage, using Javascript. The JSON will appear as plaintext as seen in the browser, but it is possible that it would be truncated into separate html tags. Example:
<div>
{"kty":"RSA","e":"AQAB","n":"mZT_XuM9Lwn0j7O_YNWN_f7S_J6sLxcQuWsRVBlAM3_5S5aD0yWGV78B-Gti2MrqWwuAhb_6SkBlOvEF8-UCHR_rgZhVR1qbrxvQLE_zpamGJbFU_c1Vm8hEAvMt9ZltEGFS22BHBW079ebWI3PoDdS-DJvjjtszFdnkIZpn4oav9fzz0
</div>
<div>
xIaaxp6-qQFjKXCboun5pto59eJnn-bJl1D3LloCw7rSEYQr1x5mxhIxAFVVsNGuE9fjk0ueTDcMUbFLPYn6PopDMuN0T1B2D1Y8ClItEVbVDFb-mRPz8THJ_gexJ8C20n8m-pBlpL4WyyPuY2ScDugmfG7UnBGrDmS5w"}
</div>
I've tried to use this RegEx.
{"?\w+"?:[^}<]+(?:(?:(?:<\/[^>]+>)[^}<]*(?:<[^>]+>)+)*[^}<]*)*}
But the problem is it fails to work with nested JSON.
I may also use javascript to count the number of { and } to find where the JSON actually ends, but there must be better options than using this slow and clumsy approach.
Many thanks
Update:
Perhaps there ain't better way to do this. Below is my current code (a bit verbose but probably needed):
let regex = /{[\s\n]*"\w+"[\s\n]*:/g;
// Consider both open and close curly brackets
let brackets = /[{}]/g;
let arr0, arr;
// Try to parse every matching JSON
arr0 = match.exec(body);
if (arr0 === null) { // Nothing found
return new Promise(resolve => resolve());
}
try {
brackets.lastIndex = match.lastIndex; // After beginning of current JSON
let count = 1;
// Count for { and } to find the end of JSON.
while ((count !== 0) && ((arr = brackets.exec(body)) !== null)) {
count += (arr[0] === "{" ? 1 : -1);
}
// If nothing special, complete JSON found when count === 0;
let lastIdx = brackets.lastIndex;
let json = body.substring(match.lastIndex - arr0[0].length, lastIdx);
try {
let parsed = JSON.parse(json);
// Process the JSON here to get the original message
} catch (error) {
console.log(err);
}
...
} catch(err) {
console.log(err);
};
That's not possible in a good way, it might be possible to take a parent element's innerText and parse that:
console.log(JSON.parse(document.getElementById('outer').innerText.replace(/\s|\n/g, '')));
<div id="outer">
<div>
{"kty":"RSA","e":"AQAB","n":"mZT_XuM9Lwn0j7O_YNWN_f7S_J6sLxcQuWsRVBlAM3_5S5aD0yWGV78B-Gti2MrqWwuAhb_6SkBlOvEF8-UCHR_rgZhVR1qbrxvQLE_zpamGJbFU_c1Vm8hEAvMt9ZltEGFS22BHBW079ebWI3PoDdS-DJvjjtszFdnkIZpn4oav9fzz0
</div>
<div>
xIaaxp6-qQFjKXCboun5pto59eJnn-bJl1D3LloCw7rSEYQr1x5mxhIxAFVVsNGuE9fjk0ueTDcMUbFLPYn6PopDMuN0T1B2D1Y8ClItEVbVDFb-mRPz8THJ_gexJ8C20n8m-pBlpL4WyyPuY2ScDugmfG7UnBGrDmS5w"}
</div>
</div>
But it's likely to fail sometimes
Related
I'm trying to compare a name retrieved from a JSON object, with a name as it exists on a google Sheet. Try as I might, I can't get a comparison that yields a positive.
I've tried:
IndexOf
localeCompare
==
===
I've tried
key===value
String(key)===String(value) and
String(key).valueof()=String(value).valueof.
I've also tried calling trim() on everything to make sure there are no leading/trailing white spaces (there aren't, as confirmed with a length() comparison.
As you can see from the screen shot, the values of key and value are exactly the same.
Any pointers would be gratefully received. This has held up my project for days!
Screenshot here
Description
This is not a solution but it might help find the problem. Perhaps the characters in one or the other is not what you think. Visually they compare, but what if one has a tab instead of a space. Try this, list the character codes for each and see if any character has a different value.
I've added another option that eliminates the for loop thanks to #TheMaster
Script (Option 1)
function test() {
try {
let text = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Test").getRange("A1").getValue();
console.log(text);
let code = [];
for( let i=0; i<text.length; i++ ) {
code.push(text.charCodeAt(i))
}
console.log(code.join());
}
catch(err) {
console.log(err);
}
}
Script (Option 2)
function test() {
try {
let text = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Test").getRange("A1").getValue();
console.log(text);
let code = [];
[...text].forEach( char => code.push(char.charCodeAt(0)) );
console.log(code.join());
}
catch(err) {
console.log(err);
}
}
Execution log
7:57:53 AM Notice Execution started
7:57:56 AM Info Archie White
7:57:56 AM Info 65,114,99,104,105,101,32,87,104,105,116,101
7:57:54 AM Notice Execution completed
Compare
function compare() {
const json = '{"values":[["good","bad","ugly"]]}';
const obj = JSON.parse(json);
const ss = SpreadsheetApp.getActive();
const sh = ss.getSheetByName("Sheet0");
const [hA,...vs] = sh.getDataRange().getDisplayValues();
let matches = [];
vs.forEach((r,i) => {
r.forEach((c,j) => {
let idx = obj.values[0].indexOf(c);
if(~idx) {
matches.push({row:i+1,col:j+1,index: idx});
}
})
})
Logger.log(JSON.stringify(matches))
}
Execution log
9:31:50 AM Notice Execution started
9:31:52 AM Info [{"row":1,"col":1,"index":0},{"row":2,"col":1,"index":1},{"row":3,"col":1,"index":2}]
9:31:51 AM Notice Execution completed
Sheet1:
COL1
good
bad
ugly
Purpose:
User inputs information. Script goes and checks if appendJSON.json has anything in its contents. Either returns the contents of .json or [].
Problem:
When comparing findNote === ''the if statment doesn't not fire off return [].
If there is something in appendJSON.json the else statment fires off like intented return findnote.
Attempts:
Tried comparing findNote to '', null and undefiend. Same outcome, what was returned was nothing.
1st Code-block: that accepts input then checks .json for any contents.
log to see what is coming back from fetchNotes()
function addNote(argv) {
const newSubmission = argv;
const getLibrary = fetchNotes().toString();
log(getLibrary);
}
2nd Code-block: fetchNotes():
function fetchNotes() {
const findNote = fs.readFileSync("./appendJSON.json");
if (findNote === "") {
return [];
} else {
return findNote;
}
}
I've tried slightly refactoring the code to remove the else statment:
function fetchNotes() {
const findNote = fs.readFileSync("./appendJSON.json");
if (findNote === "") {
return [];
}
return findNote;
}
Since you're not providing the encoding option, readFileSync returns a Buffer, not a string. A Buffer will never be === to ''. You probably wanted:
const findNote = fs.readFileSync('./appendJSON.json', 'utf8');
...but that assumes the contents are UTF-8, not (say) Windows-1252 or ISO-8859-1. Be sure you don't assume incorrectly, as you'll get corrupted characters for characters outside the ASCII range...
Side note: It seems quite odd to return an empty array if the file is empty, but a Buffer or string if it isn't.
selectedContentWrap: HTML nodes.
htmlVarTag: is an string.
How do I check if the HTML element exists in the nodes?
The htmlVarTag is a string and don't understand how to convert it so it check again if there is a tag like that so that if there is I can remove it?
here is output of my nodes that is stored in selectedContentWrap
var checkingElement = $scope.checkIfHTMLinside(selectedContentWrap,htmlVarTag );
$scope.checkIfHTMLinside = function(selectedContentWrap,htmlVarTag){
var node = htmlVarTag.parentNode;
while (node != null) {
if (node == selectedContentWrap) {
return true;
}
node = node.parentNode;
}
return false;
}
Well if you could paste the content of selectedContentWrap I would be able to test this code, but I think this would work
// Code goes here
var checkIfHTMLinside = function(selectedContentWrap,htmlVarTag){
for (item of selectedContentWrap) {
if (item.nodeName.toLowerCase() == htmlVarTag.toLowerCase()){
return true;
}
}
return false;
}
Simplest is use angular.element which is a subset of jQuery compatible methods
$scope.checkIfHTMLinside = function(selectedContentWrap,htmlVarTag){
// use filter() on array and return filtered array length as boolean
return selectedContentWrap.filter(function(str){
// return length of tag collection found as boolean
return angular.element('<div>').append(str).find(htmlVarTag).length
}).length;
});
Still not 100% clear if objective is only to look for a specific tag or any tags (ie differentiate from text only)
Or as casually mentioned to actually remove the tag
If you want to remove the tag it's not clear if you simply want to unwrap it or remove it's content also ... both easily achieved using angular.element
Try using: node.innerHTML and checking against that
is it me or post a question on stackoverflow and 20min after test testing I figure it.,...
the answer is that in the selectedContentWrap I already got list of nodes, all I need to do i compare , so a simple if for loop will fit.
To compare the names I just need to use .nodeName as that works cross browser ( correct me if I am wrong)
Some dev say that "dictionary of tag names and anonymous closures instead" - but couldn't find anything. If anyone has this library could you please post it to the question?
here is my code.
var node = selectedContentWrap;
console.log('node that is selectedwrapper', selectedContentWrap)
for (var i = 0; i < selectedContentWrap.length; i++) {
console.log('tag name is ',selectedContentWrap[i].nodeName);
var temptagname = selectedContentWrap[i].nodeName; // for debugging
if(selectedContentWrap[i].nodeName == 'B' ){
console.log('contains element B');
}
}
First of all I am not an expert on JavaScript, in fact I am newbie.
I know PHP and there are functions to get all occurences of a regex pattern preg_match() and preg_match_all().
In the internet I found many resources that shows how to get all occurences in a string. But when I do several regex matches, it looks ugly to me.
This is what I found in the internet:
var fileList = []
var matches
while ((matches = /<item id="(.*?)" href="(.*?)" media-type="(?:.*?)"\/>/g.exec(data)) !== null) {
fileList.push({id: matches[1], file: matches[2]})
}
fileOrder = []
while ((matches = /<itemref idref="(.*?)"\/>/g.exec(data)) !== null) {
fileOrder.push({id: matches[1]})
}
Is there a more elegant way other than this code?
Using regexes on html is generally held to be a bad idea, because regexes lack sufficient power to reliably match a^n b^n arbitrarily nested occurrences such as balanced parens or HTML/XML open/close tags. Its also trivially easy to get data out of the DOM in JavaScript without treating it like a string, that's what the DOM is for. For example:
let mapOfIDsToFiles = Array.from(document.querySelectorAll('item'))
.reduce((obj, item) => {
obj[item.id] = item.href;
return obj;
}, {});
This has the added advantage of being much faster, simpler, and more robust. DOM access is slow, but you'll be accessing the DOM anyway to get the HTML you run your regexes over.
Modifying built-in prototypes like String.prototype is generally held to be a bad idea, because it can cause random breakages with third-party code that defines the same function but differently, or if the JavaScript standard gets updated to include that function but it works differently.
UPDATE
If the data is already a string, you can easily turn it into a DOM element without affecting the page:
let elem = document.createElement('div')
div.innerHTML = data;
div.querySelectorAll('item'); // gives you all the item elements
As long as you don't append it to the document, its just a JavaScript object in memory.
UPDATE 2
Yes, this also works for XML but converting it to DOM is slightly more complicated:
// define the function differently if IE, both do the same thing
let parseXML = (typeof window.DOMParser != null && typeof window.XMLDocument != null) ?
xml => ( new window.DOMParser() ).parseFromString(xml, 'text/xml') :
xml => {
let xmlDoc = new window.ActiveXObject('Microsoft.XMLDOM');
xmlDoc.async = "false";
xmlDoc.loadXML(xml);
return xmlDoc;
};
let xmlDoc = parseXML(data).documentElement;
let items = Array.from(xmlDoc.querySelectorAll('item'));
Note that if the parse fails (i.e. your document was malformed) then you will need to check for the error document like so:
// check for error document
(() => {
let firstTag = xmlDoc.firstChild.firstChild;
if (firstTag && firstTag.tagName === 'parsererror') {
let message = firstTag.children[1].textContent;
throw new Error(message);
}
})();
I came up with the idea of creating a method in String.
I wrote a String.prototype that simplyfy things for me:
String.prototype.getMatches = function(regex, callback) {
var matches = []
var match
while ((match = regex.exec(this)) !== null) {
if (callback)
matches.push(callback(match))
else
matches.push(match)
}
return matches
}
Now I can get all matches with more elegant way. Also it's resembles preg_match_all() function of PHP.
var fileList = data.getMatches(/<item id="(.*?)" href="(.*?)" media-type="(?:.*?)"\/>/g, function(matches) {
return {id: matches[1], file: matches[2]}
})
var fileOrder = data.getMatches(/<itemref idref="(.*?)"\/>/g, function(matches) {
return matches[1]
})
I hope this helps you too.
Trying to convert the following function to Coffeescript:
var parse = function (elem) {
for each(var subelem in elem) {
if (subelem.name() !== null ) {
console.log(subelem.name());
if (subelem.children().length() > 0) {
parse(subelem);
}
} else {
console.log(subelem);
}
}
};
var xml = new XML(content);
parse(xml);
It merely prints the element tags and any text to the console.
Tried using:
parse = (elem) ->
if elem.name()?
console.log elem.name()
if elem.children().length() > 0
parse subelem for own elkey, subelem of elem
else
console.log elem
xml = new XML content
parse subelem for own elkey, subelem of xml
But it never seems to parse anything under the root xml node and ends up in an infinite recursion loop continuously printing out the root nodes tag until it blows up. Any ideas as to what I am doing wrong? Thanks.
Hmm. I tested this, and the issue seems to go away if you drop the own keyword, which adds a hasOwnProperty check. Somehow, the first child of each element seems to pass that check, while others fail it. I'm a bit mystified by this, but there's your answer.