Inserting Next.js Links into an html string - javascript

I have an html string that I retrieve from an API.
const htmlString = "<p>Hello World</p>";
I'm using react's dangerouslySetInnerHTML to display that content on my page.
Now before displaying it, I'd like to process it and put a link on "World". But not an <a> Tag. I need an actual Next.js <Link> tag. So a string replace probably won't do the job.
The result I want to achieve in jsx is:
const processed = <p>Hello <Link href="/my-route"><a>World</a></Link></p>
I've thought of using React.createElement but I'm not sure how to interpolate the content inside the string.
Any ideas how I could achieve this?

Hello<Link href="https://github.com">World</Link>
You dont need to specify another anchor tag inside a link. This should work fine.

Related

How to find a unique string within html and wrap it with a tag, but exclude links and urls

I'm looking for a way to look for a specific string within a page in the visible text and then wrap that string in <em> tags. I have tried used HTML Agility Pack and had some success with a Regex.Replace but if the string is included within a url it also gets replaced which I do not want, if it's within an image name, it gets replaced and this obviously breaks the link or image url.
An example attempt:
var markup = Encoding.UTF8.GetString(buffer);
var replaced = Regex.Replace(markup, "product-xs", " <em>product</em>-xs", RegexOptions.IgnoreCase);
var output = Encoding.UTF8.GetBytes(replaced);
_stream.Write(output, 0, output.Length);
This does not work as it would replace a <a href="product/product-xs"> with <a href="product/<em>product</em>-xs"> - which I don't want.
The string is coming from a text string value within a CMS so the user can't wrap the words there and ideally, I want to catch all instances of the word that are already published.
Ideally I would want to exclude <title> tags, <img> tags and <a> tags, everything else should get the wrapped tag.
Before I used the HTML Agility Pack, a fellow front end dev tried it with JavaScript but that had an unexpected impact on dropdown menus.
If you need any more info, just ask.
You can use HTML Agility Pack to select only the text nodes (i.e. the text that exists between any two tags) with a bit of XPath and modify them like this.
Looking only in body will exclude <title>, <meta> etc. The not excludes script tags, you can exclude others in the same way (or check the parent node in the loop).
foreach (HtmlNode node in htmlDoc.DocumentNode.SelectNodes("//body//*[not(self::script)]/text()"))
{
var newNode = htmlDoc.CreateTextNode(node.InnerText.Replace("product-xs", "<em>product</em>-xs"));
node.ParentNode.ReplaceChild(newNode, node);
}
I've used a simple replace, regex will work fine too, prob best to check the performance of each approach and choose which works best for your use case.

Localizing html content in vue

So I have been using v-html tag to render the html in my vue pages.
But I encountered a string which was a proper html file and it contained text kind of like this:
<html xmlns="https://www.w3.org/1999/xhtml">
<head>
....
<style>
</style>
</head>
<body style="....">
</body>
</html>
The problem is, I have the v-html on a div, but this code starts affecting the whole page and adds its styling to the whole page and not only to that specific div.
I tried adding "scope" to the style tags but it did not work. Maybe because there's also a style inline tag on body?
I need to find a way to make the html affect only on the div it is on, and not the whole page.
Your best bet would probably be to have a better control over the HTML added using v-html. I would suggest to parse it before and keep only the <body> tag. You could do it using a regex, but it would be easier using a dom parser lib. Example with DomParser:
const DomParser = require("dom-parser");
const parser = new DomParser();
export default {
// ...
computed: {
html() {
const rawHtml = "<html><body><div>test</div></body></html>"; // This data should come from your server
const dom = parser.parseFromString(rawHtml);
return dom.getElementsByTagName("body")[0].innerHTML;
}
}
}
Please note that it is an oversimplified solution as it does not handle the case where there is no <body> tag.
First, you should be very careful when using external HTML with v-html as it can make your site vulnerable to various sorts of attacks (see Vue docs).
Now if you trust the HTML source, other problem is how to embed it without affecting your own side. There is special element for this case, <iframe> - it is not without risk and you should definitely read a bit on how to make it safe but it should solve your problem because is "sandbox" external HMTL so it does not affect your site.
https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Other_embedding_technologies

Insert schema data inside html tag with javascript

I am wondering if there is a way to insert text, schema data in particular, into a html div tag using javascript. I know there are methods for modifying existing values inside a tag such as class, href, title, but can't seem to find a way to add something new.
Basically I have <div id="main"> and I want to modify it to be <div id="main" itemtype="http://schema.org/SomeCategory" itemscope> and to be able to remove it later.
The context for such a need is using fetch / js to replace parts of webpages rather than reloading the entire page. The product pages use the schema notation, whereas general info pages do not, though all templates use the "main" div.
Unbeknownst to me, the innerHTML function attached to the body tag allows me to change actual div tags using replace. So it is simple:
input ='<div id="main">';
output='<div id="main" itemtype="http://schema.org/SomeCategory" itemscope>';
document.body.innerHTML = document.body.innerHTML.replace(input,output);

Ionic 2/ angular 2 : string replaced with HTML tag isn't rendered properly

I have an array of contents in my JSON which contain URLs in there as text. I want to find and replace those text URLs with links. What's actually happening? It does replace the URL with <a> tag and proper URL in href="" but still render this <a> tag as text and not as HTML
urlifyContent(){
let urlRegex =/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
for (let index = 0; index < this.message.Content.length; ++index){
this.message.Content[index] = this.message.Content[index].replace(urlRegex,`"<a href='$1'target='_blank'>$1</a>"`);
}
}
According to your comment you are using {{message.Content}} to bind the html you created using your urlifyContent() method. This will not work, because {{}} will parse the value to a string, and not HTML.
To add HTML to your page, you should use the [innerHTML] directive:
<div [innerHTML]="message.Content"></div>
However, it's worth mentioning that this won't parse any angular components or directives or binding present in that HTML snippet. It should just be plain html

jQuery parse HTML without loading images

I load HTML from other pages to extract and display data from that page:
$.get('http://example.org/205.html', function (html) {
console.log( $(html).find('#c1034') );
});
That does work but because of the $(html) my browser tries to load images that are linked in 205.html. Those images do not exist on my domain so I get a lot of 404 errors.
Is there a way to parse the page like $(html) but without loading the whole page into my browser?
Actually if you look in the jQuery documentation it says that you can pass the "owner document" as the second argument to $.
So what we can then do is create a virtual document so that the browser does not automatically load the images present in the supplied HTML:
var ownerDocument = document.implementation.createHTMLDocument('virtual');
$(html, ownerDocument).find('.some-selector');
Use regex and remove all <img> tags
html = html.replace(/<img[^>]*>/g,"");
Sorry for resuscitating an old question, but this is the first result when searching for how to try to stop parsed html from loading external assets.
I took Nik Ahmad Zainalddin's answer, however there is a weakness in it in that any elements in between <script> tags get wiped out.
<script>
</script>
Inert text
<script>
</script>
In the above example Inert text would be removed along with the script tags. I ended up doing the following instead:
html = html.replace(/<\s*(script|iframe)[^>]*>(?:[^<]*<)*?\/\1>/g, "").replace(/(<(\b(img|style|head|link)\b)(([^>]*\/>)|([^\7]*(<\/\2[^>]*>)))|(<\bimg\b)[^>]*>|(\b(background|style)\b=\s*"[^"]*"))/g, "");
Additionally I added the capability to remove iframes.
Hope this helps someone.
Using the following way to parse html will load images automatically.
var wrapper = document.createElement('div'),
html = '.....';
wrapper.innerHTML = html;
If use DomParser to parse html, the images will not be loaded automatically. See https://github.com/panzi/jQuery-Parse-HTML/blob/master/jquery.parsehtml.js for details.
You could either use jQuerys remove() method to select the image elements
console.log( $(html).find('img').remove().end().find('#c1034') );
or remove then from the HTML string. Something like
console.log( $(html.replace(/<img[^>]*>/g,"")) );
Regarding background images, you could do something like this:
$(html).filter(function() {
return $(this).css('background-image') !== '';
}).remove();
The following regex replace all occurance of <head>, <link>, <script>, <style>, including background and style attribute from data string returned by ajax load.
html = html.replace(/(<(\b(img|style|script|head|link)\b)(([^>]*\/>)|([^\7]*(<\/\2[^>]*>)))|(<\bimg\b)[^>]*>|(\b(background|style)\b=\s*"[^"]*"))/g,"");
Test regex: https://regex101.com/r/nB1oP5/1
I wish there is a a better way to work around (other than using regex replace).
Instead of removing all img elements altogether, you can use the following regex to delete all src attributes instead:
html = html.replace(/src="[^"]*"/ig, "");

Categories