Converting href perl variables to normal scalar variables - javascript

I have these two variables that I am trying to compare. They both have the same value, however, one is a href variable - meaning, it's being read from a file like this
<a href=http://google.com>Variable</a>
It's read like this, but displayed as an anchor tag in the browser, so when I go to compare a value using print "$collect_zids{$key} --> $temp";I see in the browser as
Variable --> Variable
How it appears in the browser. One text another link.
I'm assuming these two values are different hence why this code does not run
if($collect_zids{$key} eq $from_picture){
print "<h1>Hello</h1>";
}
Is there a way I can convert the href variable into a normal scalar variable so that I can compare them?
Thanks!
P.S. I think Javascript might be the only way, however, I don't have any experience with it.

There is no such thing as an "href variable". You have two scalar variables. One contains plain text and the other contains HTML. Your task is to extract the text inside the HTML <a> tag from the HTML variable and to compare that text with the text from the plain text variable.
One way to do that would be to remove the HTML from the HTML variable.
my $html = '<a href=http://google.com>Variable</a>';
my $text = 'Variable';
$html =~ s/<.+?>//g;
if ($html eq $text) {
say "Equal";
} else {
say "Not Equal [$html/$text]";
}
But it cannot be emphasised enough that parsing HTML using a regular expression is very fragile and is guaranteed not to work in many cases. Far better to use a real HTML parser. HTML::Strip is made for this very purpose.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use HTML::Strip;
my $html = '<a href=http://google.com>Variable</a>';
my $text = 'Variable';
my $parser = HTML::Strip->new;
$html = $parser->parse($html);
if ($html eq $text) {
say "Equal";
} else {
say "Not Equal [$html/$text]";
}
It's also worth pointing out that this is answered in the Perl FAQ
How do I remove HTML from a string?
Use HTML::Strip, or HTML::FormatText which not only removes HTML but
also attempts to do a little simple formatting of the resulting plain
text.
Update: In a comment, you say
I have no way of using these methods since I am not explicitly defining the variable.
Which is clearly not true. How a variable is initialised has no bearing whatsoever on how you can use it.
I assume your HTML text is in the variable $from_picture, so you would strip the HTML with code like this:
my $parser = HTML::Strip->new;
my $stripped = $parser->parse($from_picture);
if($collect_zids{$key} eq $stripped){
print "<h1>Hello</h1>";
}
I have no idea where you got the idea that you couldn't use my solution because I was directly initialising the variables, where you were reading the data from a file. An important skill in programming is the ability to see through complex situations and extract the relevant details. It appears you need to do some more work in this area :-)

I found the answer using the Perl module HTML::FormatText;
use HTML::FormatText;
my $formatter = HTML::FormatText->new();
my $string = HTML::FormatText->format_file("path_to_the_file"); #$string variable to hold the result and the path must be for a file.
After using the HTML::FormatText module, I was able to get the raw string that was being read, instead of it being interpreted as HTML. So, I was getting <a href=http://google.com>Variable</a> returned, instead of just Variable. After getting the raw string, I could use regex to extract the parts that I needed.
Credit to - https://metacpan.org/pod/HTML::FormatText

Related

Setting an element value using HTML entities

I'm having an issue trying to set an input field's value when using HTML entities in that they are coming out literally as " rather than ".
Here is the code I am using:
document.getElementById('inputSurname').setAttribute('value', 'test"""');
in which the output is test""" though I want the output to be test""".
It doesn't look like a double-encoding issue since in the source code I am seeing it the same way I have set it here.
I know I could decode the value from its HTML entity format though this is something I want to avoid if possible for security.
Any help would be much appreciated :)
Try this:
document.getElementById('inputSurname').value = 'test"""';
Or if you want to keep &quot:
function myFunction() {
document.getElementById('myText').value = replaceQuot('test&quot&quot&quot');
}
function replaceQuot(string){
return string.replace('&quot', '""');
}
Or you can use escape characters.
document.getElementById("inputSurname").setAttribute("value", "test\"\"\"\"");
Well you could just write the new value as 'test"""'.
For other characters however, I'm going to refer you to this answer: HTML Entity Decode

Extracting inline javascript var in html source to a PHP array

There is some data that I need to get from a local crawled page. It's inline javascript like this:
<script type="text/javascript">
//<![CDATA[
var graph_data = {"query":{"cid":["13908"],"timestamp_from":1402531200,"timestamp_till":1402531200,"views":1138942,"data":
etc, the variable is very long but you get the idea. I want to put "graph_data" into an array called $data. What is the best way to do this? I should add this is all being done by me locally & I don't need to execute any javascript code, just extract the data.
I have make a suggestion purely as an idea with some code worth trying!
As suggested, the DOM Document may provide this much easier, but here's another possible solution for extracting...
I'm guessing that the var graph_data is a JSON string that you want to extract from the HTML page so that you can store as a PHP var. The problem is that your code doesn't show how the variable ends but I'm going to assume that a new line break would be the way to identify the end of the variable being set. It may be a semi-colon though and if it is, instead of "\r\n" try ";"
// Assuming your html code is stored in $html...
preg_match("/var graph_data[^\{]*?(\{[^\r\n]+?)/is",$html,$match);
print "<pre>";
print_r($match);
print "</pre>";
This should result in $match[1] storing the part you need.
You can try passing that into json_decode() but that's touching some wood with crossed fingers.
Good luck...

Can something help me to see how to deal with single quote escaping in the following scenario

We write js programs for clients which allow them to craft the display text. Here is what we did
We have a raw js file which replaced those strings with tokens, for example
month = [_MonthToken_];
name = '_NameToken_';
and have a xml file to allow user to specify the text like
<xml>
<token name="MonthToken">'Jan','Feb','March'</token>
<token name="NameToken">Alice</token>
</xml>
and have a generator to replace the token with the text and generate the final js file.
month = ['Jan','Feb','March'];
name = 'Alice';
However, I found there is a bug in this scenario. When somebody specifies the name to be "D'Angelo" (for example.) the js will run into a error because the name variable will become
name='D'Angelo'
We have thought of several ways to fix the problem but none of which are perfect.
We may ask our clients to escape the characters, may it seems not appropriate given that they may not know js and there are more cases to escape (", ), which could make them unhappy :|
We also think of changing the generator to escape ', but sometimes the text may be replacing an array, the single quote there should not be escaped. (there are other cases, we may detect it case by case, but it is tedious)
We may have done something wrong for the whole scenario/architecture. but we don't want to change that unless we have confirmed that it is definitely necessary.
So, is there any solution? I will look into every ideas. Thank you in advanced!
(I may also need a better title :P)
I think your xml schema is poor designed, and this is the root cause of your problems.
Basically, you are forcing the author of the xml to put Javascript code inside of the name="MonthToken" element, while you pretend that she can do this without Javascript syntax knowledgement. I guess that you are planning to use eval on the parsed element content to build month and name variables.
The problem you discovered it's not the only one: you also are subject to Javascript code injection: what if a user forge an element such as:
<token name="MonthToken">alert('put some evil instruction here')</token>
I would suggest to change the xml schema in this way:
<xml>
<token name="MonthToken">Jan</token>
<token name="MonthToken">Feb</token>
<token name="MonthToken">March</token>
<token name="NameToken">Alice</token>
</xml>
Then in your generator, you'll have to parse each MonthToken element content, and add it to the month array. Do the same for the name variable.
In this way:
You don't use eval, and so you have no possibility of code injection
Your user doesn't no more have to know how to quote month names
You automatically handle quotes or apostrophe in names, because you are not using them as js code.
If you want month variable to become a string when user enter just a month, then simply transform the variable: with something similar to this:
if (month.length == 1) {
month = month[0];
}

Javascript creating an HTML object vs creating an HTML string

Pretty simple question that I couldn't find an answer to, maybe because it's a non-issue, but I'm wondering if there is a difference between creating an HTML object using Javascript or using a string to build an element. Like, is it a better practice to declare any HTML elements in JS as JS objects or as strings and let the browser/library/etc parse them? For example:
jQuery('<div />', {'class': 'example'});
vs
jQuery('<div class="example></div>');
(Just using jQuery as an example, but same question applies for vanilla JS as well.)
It seems like a non-issue to me but I'm no JS expert, and I want to make sure I'm doing it right. Thanks in advance!
They're both "correct". And both are useful at different times for different purposes.
For instance, in terms of page-speed, these days it's faster to just do something like:
document.body.innerHTML = "<header>....big string o' html text</footer>";
The browser will spit it out in an instant.
As a matter of safety, when dealing with user-input, it's safer to build elements, attach them to a documentFragment and then append them to the DOM (or replace a DOM node with your new version, or whatever).
Consider:
var userPost = "My name is Bob.<script src=\"//bad-place.com/awful-things.js\"></script>",
paragraph = "<p>" + userPost + "</p>";
commentList.innerHTML += paragraph;
Versus:
var userPost = "My name is Bob.<script src=\"//bad-place.com/awful-things.js\"></script>",
paragraph = document.createElement("p");
paragraph.appendChild( document.createTextNode(userPost) );
commentList.appendChild(paragraph);
One does bad things and one doesn't.
Of course, you don't have to create textNodes, you could use innerText or textContent or whatever (the browser will create the text node on its own).
But it's always important to consider what you're sharing and how.
If it's coming from anywhere other than a place you trust (which should be approximately nowhere, unless you're serving static pages, in which case, why are you building html?), then you should keep injection in mind -- only the things you WANT to be injected should be.
Either can be preferable depending on your particular scenario—ie, if everything is hard-coded, option 2 is probably better, as #camus said.
One limitation with the first option though, is that this
$("<div data-foo='X' />", { 'class': 'example' });
will not work. That overload expects a naked tag as the first parameter with no attributes at all.
This was reported here
1/ is better if your attribubes depends on variables set before calling the $ function , dont have to concatenate strings and variables. Aside from that fact ,since you can do both , and it's just some js code somebody else wrote , not a C++ DOM API hardcoded in the browser...

JavaScript multiline strings and templating?

I have been wondering if there is a way to define multiline strings in JavaScript like you can do in languages like PHP:
var str = "here
goes
another
line";
Apparently this breaks up the parser. I found that placing a backslash \ in front of the line feed solves the problem:
var str = "here\
goes\
another\
line";
Or I could just close and reopen the string quotes again and again.
The reason why I am asking because I am making JavaScript based UI widgets that utilize HTML templates written in JavaScript. It is painful to type HTML in strings especially if you need to open and close quotes all the time. What would be a good way to define HTML templates within JavaScript?
I am considering using separate HTML files and a compilation system to make everything easier, but the library is distributed among other developers so that HTML templates have to be easy to include for the developers.
No thats basically what you have to do to do multiline strings.
But why define the templates in javascript anwyay? why not just put them into a file and have a ajax call load them up in a variable when you need them?
For instantce (using jquery)
$.get('/path/to/template.html', function(data) {
alert(data); //will alert the template code
});
#slebetman, Thanks for the detailed example.
Quick comment on the substitute_strings function.
I had to revise
str.replace(n,substitutions[n]);
to be
str = str.replace(n,substitutions[n]);
to get it to work. (jQuery version 1.5? - it is pure javascript though.)
Also when I had below situation in my template:
$CONTENT$ repeated twice $CONTENT$ like this
I had to do additional processing to get it to work.
str = str.replace(new RegExp(n, 'g'), substitutions[n]);
And I had to refrain from $ (regex special char) as the delimiter and used # instead.
Thought I would share my findings.
There are several templating systems in javascript. However, my personal favorite is one I developed myself using ajax to fetch XML templates. The templates are XML files which makes it easy to embed HTML cleanly and it looks something like this:
<title>This is optional</title>
<body><![CDATA[
HTML content goes here, the CDATA block prevents XML errors
when using non-xhtml html.
<div id="more">
$CONTENT$ may be substituted using replace() before being
inserted into $DOCUMENT$.
</div>
]]></body>
<script><![CDATA[
/* javascript code to be evaled after template
* is inserted into document. This is to get around
* the fact that this templating system does not
* have its own turing complete programming language.
* Here's an example use:
*/
if ($HIDE_MORE$) {
document.getElementById('more').display = 'none';
}
]]></script>
And the javascript code to process the template goes something like this:
function insertTemplate (url_to_template, insertion_point, substitutions) {
// Ajax call depends on the library you're using, this is my own style:
ajax(url_to_template, function (request) {
var xml = request.responseXML;
var title = xml.getElementsByTagName('title');
if (title) {
insertion_point.innerHTML += substitute_strings(title[0],substitutions);
}
var body = xml.getElementsByTagName('body');
if (body) {
insertion_point.innerHTML += substitute_strings(body[0],substitutions);
}
var script = xml.getElementsByTagName('script');
if (script) {
eval(substitute_strings(script[0],substitutions));
}
});
}
function substitute_strings (str, substitutions) {
for (var n in substitutions) {
str.replace(n,substitutions[n]);
}
return str;
}
The way to call the template would be:
insertTemplate('http://path.to.my.template', myDiv, {
'$CONTENT$' : "The template's content",
'$DOCUMENT$' : "the document",
'$HIDE_MORE$' : 0
});
The $ sign for substituted strings is merely a convention, you may use % of # or whatever delimiters you prefer. It's just there to make the part to be substituted unambiguous.
One big advantage to using substitutions on the javascript side instead of server side processing of the template is that this allows the template to be plain static files. The advantage of that (other than not having to write server side code) is that you can then set the caching policy for the template to be very aggressive so that the browser only needs to fetch the template the first time you load it. Subsequent use of the template would come from cache and would be very fast.
Also, this is a very simple example of the implementation to illustrate the mechanism. It's not what I'm using. You can modify this further to do things like multiple substitution, better handling of script block, handle multiple content blocks by using a for loop instead of just using the first element returned, properly handling HTML entities etc.
The reason I really like this is that the HTML is simply HTML in a plain text file. This avoids quoting hell and horrible string concatenation performance issues that you'll usually find if you directly embed HTML strings in javascript.
I think I found a solution I like.
I will store templates in files and fetch them using AJAX. This works for development stage only. For production stage, the developer has to run a compiler once that compiles all templates with the source files. It also compiles JavaScript and CSS to be more compact and it compiles them to a single file.
The biggest problem now is how to educate other developers doing that. I need to build it so that it is easy to do and understand why and what are they doing.
You could also use \n to generate newlines. The html would however be on a single line and difficult to edit. But if you generate the JS using PHP or something it might be an alternative

Categories