JS: Regex - replace parts of domains in full text - javascript

I'm confronted with a problem. I have a whole html page stored in one variable and I would like to change certain URLs that meets criteria.
We are looking only for certain top level domain, let's say ".XYZ" and if we find it we would like the ending to be ".XYZ.ABC" - just adding ".ABC".
For example link would be changed to link
But if domain already has ".abc" ending we should leave it alone.
I would like to change text of all links, so src, href and also js values like var link="123.xyz/1.bmp";
Other examples
www.123.xyz -> www.123.xyz.abc
abc.xyz/1.bmp -> abc.xyz.abc/1.bmp
eee.xyz.abc -> eee.xyz.abc

Something like this should work:
text.replace(/\.xyz(?!\.abc)/g, ".xyz.abc")

Related

How to find a unique string within html and wrap it with a tag, but exclude links and urls

I'm looking for a way to look for a specific string within a page in the visible text and then wrap that string in <em> tags. I have tried used HTML Agility Pack and had some success with a Regex.Replace but if the string is included within a url it also gets replaced which I do not want, if it's within an image name, it gets replaced and this obviously breaks the link or image url.
An example attempt:
var markup = Encoding.UTF8.GetString(buffer);
var replaced = Regex.Replace(markup, "product-xs", " <em>product</em>-xs", RegexOptions.IgnoreCase);
var output = Encoding.UTF8.GetBytes(replaced);
_stream.Write(output, 0, output.Length);
This does not work as it would replace a <a href="product/product-xs"> with <a href="product/<em>product</em>-xs"> - which I don't want.
The string is coming from a text string value within a CMS so the user can't wrap the words there and ideally, I want to catch all instances of the word that are already published.
Ideally I would want to exclude <title> tags, <img> tags and <a> tags, everything else should get the wrapped tag.
Before I used the HTML Agility Pack, a fellow front end dev tried it with JavaScript but that had an unexpected impact on dropdown menus.
If you need any more info, just ask.
You can use HTML Agility Pack to select only the text nodes (i.e. the text that exists between any two tags) with a bit of XPath and modify them like this.
Looking only in body will exclude <title>, <meta> etc. The not excludes script tags, you can exclude others in the same way (or check the parent node in the loop).
foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//body//*[not(self::script)]/text()"))
{
var newNode = htmlDoc.CreateTextNode(node.InnerText.Replace("product-xs", "<em>product</em>-xs"));
node.ParentNode.ReplaceChild(newNode, node);
}
I've used a simple replace, regex will work fine too, prob best to check the performance of each approach and choose which works best for your use case.

Jump to string an website without an anchor

Is it possible, probably using javascript, to jump to a particular sentence (string) via a link on a website?
Like an anchor, only without the anchor in HTML.
This is an example from the search results, searchstring was "content directory":
Result: planets/ on line 20: <br>
If you create a folder within the content directory (e.g. <code class="hljs lua">content/<span class="hljs-built_in">sub</span></code>) and...
After the link has been opened, the browser should jump to this line (The line number is of course only the one from the searched text file.) and like to color the search string.
I found some usefull script: http://www.seabreezecomputers.com/tips/find6.htm
It offers a On-Side-Search-Button

How do I allow <img> and <a> tags for innerHTML, but no others? (Making a forum)

I am currently programming a forum using only javascript (No JQuery please). I am doing very well, however, there is one issue I would love help with.
Currently I am getting the post from a database, assigning it to variable MainPost, and then attaching it to a div via a text node:
var theDiv = document.getElementById("MainBody");
var content = document.createTextNode(MainPost);
theDiv.appendChild(content);
This is working quite well, however, I would LOVE to be able to do this:
document.getElementById("MainBody").innerHTML += MainPost;
But I know this would allow people to use ANY html tag they want, even something like "script" followed by javascript code. This would be bad for business, obviously, but I do like the idea of allowing posters to use the "img" tag as well as the "a href" tags. Is there a way to somehow disable all tags except these two for the innerHTML?
Thank you all so much for any help you can offer.
Ok, the first thought that came to my mind when I read this question was to find a regular expression to exclude a specific string in a word. Simple search gave a lot of results from SO.
Starting point - To remove all the HTML tags from a string (from this answer):
var regex = /(<([^>]+)>)/ig
, body = "<p>test</p>"
, result = body.replace(regex, "");
console.log(result);
To exclude a string you would do something like this (again from all the source mentioned above):
(?!StringToBeExcluded)
Since you want to exlcude the <a href and <img tags. The suitable regex in your case could be:
(<(?![\/]?a)(?![\/]?img)([^>]+)>)
Explanation :
Think of it as three capturing groups in succession:
(?![\/]?a) : Negative Lookahead to assert that it is impossible to match the regex containing the string "a" prefixed by zero or one backslashes (Should take care of the a href tags)
(?![\/]?img) : Same as 1, just here it looks for the string "img". I don't know why I allowed the </img> tag. Yes, <img> doesn't have a closing tag. You could remove the [\/]? bit from it to fix this.
([^>]+) : Makes sure to not match > zero or one times to take care of tags that have opening and closing tags.
Now all these capture groups lie between < and >. You might want to try a regex demo that I've created incorporating these three capture groups to take care of ignoring all HTML elements except the image and link tags.
Sidenote - I haven't thoroughly given this regex a try. Feel free to play around with it and tweak it according to your needs. In any case, I hope this gets you started in the right direction.

javascript or jquery method of getting the source of an image if.. matching a certain pattern

I have a very old site with lots of files, for years it was put together and orgainized by year and in a particular fashion. Well I am upgrading the site. But in one section an "Articles" section I have close to 1,000 files, that I don't want to go through and manually edit one or two images per.
So I am hoping i can figure out a way to match the source tag where when the source is ../imgs/ I can find the actual file name being used and just change the source. My problem is finding the source attribute matching it to the dired and then getting the image name off the end of it. when the full path could be like
../imgs/i2012/image-file.png, ../imgs/i2011/image-file.png, ../imgs/i2010/image-file.png, ../imgs/i20xx/image-file.png "image-file.png" just being an arbitrary example
Its hard to visualize a image tag, so I am adding one here for reference..
<img src="../imgs/i2012/example-image.png" alt="example-image" vspace="5" hspace="5" align="left" />
And to try and re-elaborate, not all images on the site stopped working, and all else, but the ones that have use an image source attribute similar to the above, but the i20xx and file name are different. So. I am trying to figure out how to take that image src tag. Find if it has ../imgs/ in it, and and if it does, I'd like to grab the "example-image.png" from it, so I can apply a new URL overall to the src attribute in the end using JS, which changing it isn't my problem, matching it so I can then get the file, and change the url appending that file to the url is my problem
You can use the attribute starts with selector to match any images with a source starting with
../imgs/, and then iterate over those images, getting the filename into an array:
var images = [];
$('img[src^="../imgs/"]').each(function(i, el) {
images.push( $(el).attr('src').split('/').pop() );
});

How to activage class based on anchor tag #id in URL

A while ago I noticed a page that highlighted certain areas of the page by getting the #some_random_id from the url.
for instance /mypage-destination/#codex_destination_5 would obviously drop you down to the page area in question, but then highlight the area so you don't miss it.
I looked into it, and I cannot seem to find the way to extract a URL property # anchor destination.
document.getElementById(window.location.hash.substring(1)).style.backgroundColor = "#aaa";
You can use window.location.hash to get the id linked in the URL(which would return #codex_destination_5 using your example).
You can then use substring(1) to omit the hash symbol(#codex_destination_5 becomes codex_destination_5).
Using this as the id would be the next logical step.

Categories