Hi Javascript developers i am facing an issue about bold the string that is enclosed inside of square brackets.
I have a string that is dynamically generated and the string look like
[Number] years of practice in accounting and financial administration. Showcased skills in [Area of Expertise]
I want to bold every letter inside a string that is enclosed inside the square bracket just like the above string ( [Number], [Area of Expertise] ).
Like Number and Area of Expertise must be bold.
Remember that the string is dynamic.
Please see below snippet.
Also, it's a good idea to html-escape the string before doing the replacement, and generally before inserting dynamic content into the DOM (read more).
const s = '[Number] years of practice in accounting and financial administration. Showcased skills in [Area of Expertise]';
const html = s.replace(/\[([^\]]+)\]+/g, '<b>$1</b>');
console.log(html);
// => <b>Number</b> years of practice in accounting and financial administration. Showcased skills in <b>Area of Expertise</b>
Related
So I'm messing around with this API and for the description it has the links hard coded like Bunch of words, so it just shows exactly that on my browser. How would I display the description to look normal in my app?
Here is the API https://api.coingecko.com/api/v3/coins/bitcoin
and this is just a simple way I got the description to display
const Tokens = ({coin}) => {
return (
<p>{coin.description.en}</p>
)
}
This would end up showing all the a tags on the browser instead of converting them into a clickable link
Peercoin, Primecoin, and so on.\r\n\r\nThe cryptocurrency then took off with the innovation of the turing-complete smart contract by Ethereum which led to the development of other amazing projects such as EOS, Tron, and even crypto-collectibles such as CryptoKitties.",
Is there a way to display the description so that it looks like a normal paragraph with hyperlinks instead of literally showing the hard coded tags?
Also, if I only wanted to show like the first two sentences, how would I cut out the rest of the paragraph?
Like Jacob said, you can use the dangerouslySetInnerHTML prop by utilizing interpolated string literals.
const apiResponse = `Peercoin, Primecoin, and so on.\r\n\r\nThe cryptocurrency then took off with the innovation of the turing-complete smart contract by Ethereum which led to the development of other amazing projects such as EOS, Tron, and even crypto-collectibles such as CryptoKitties."`
...
<p dangerouslySetInnerHTML={{__html: apiResponse}}></p>.
However, \r and \n won't be understood by the HTML. You can replace these whitespace characters with unicode escape sequences that HTML will understand like so:
const cleanedAPIResponse = apiResponse.replace("\n", "<br\>").replace("\r", "\u000D");
...
<p dangerouslySetInnerHTML={{__html: cleanedAPIResponse}}></p>.
Note: Not too sure about these replacements. FYI, \r is known as a 'carriage return'.
If you wanted only the first two sentences, an idea could be to search for the second instance of '.' in the API response. Then you can truncate the rest of the string literal, and append the appropriate closing tags based on the appearance of the tags going from left to right and which do not have matching closing tags in the string already.
I’m using cheeriojs to scrape content off a webpage, with the following HTML.
<p>
Although the PM's office could neither confirm nor deny this, the spokesperson, John Doe said the meeting took place on Sunday.
<br>
<br>
“The outcome will be made public in due course,” John said in an SMS yesterday.
<br>
<br>
</p>
I’m able to reach the content of interest, by class and id tags, as follows:
$('.top-stories .line.more').each(function(i, el){
//Do something…
let content = $(this).next().html();
}
Once I’ve captured the content of interest, I “clean” it up using regular expressions, as below:
let cleanedContent = content.split(/<br>/).join(' \n ');
Inserting a newline where an empty tag (<br>) is matched. So far all is good, until I look at the cleaned content below:
Although the PM's office could neither confirm nor deny this, the spokesperson, Saima Shaanika said the meeting took place on Friday.
“The outcome will be made public in due course,”
It appears that punctuation marks, and perhaps some other characters, are stored according to their unicode codes. I may be wrong on this, and would welcome some correction to this line of thought.
Assuming that they are stored as unicode codes, is there a module that I could pass the “cleanedContent” variable, through to convert the unicodes to human readable punctuation marks/characters?
Should this not be possible, is there a better implementation of cheeriojs that would avoid this? I'm totally open to the notion that I'm not using cherriojs correctly, and would love some direction as to new approaches I could try instead.
One way I can think of, is writing a module containing several unicodes and their corresponding unicodes, then look for matches, and replace a matched code with the corresponding human readable character. I have some intuitive feeling that someone's already done this or something similar. I'd rather not try to reinvent the wheel.
Thanks in advance.
Cheerio uses htmlparser2 internally.
Because of this, you can use htmlparser2's decodeEntities option during the load of the HTML string, which allows you configure how HTML entities should be treated.
Example:
$ = cheerio.load('<ul id="fruits">...</ul>', {
decodeEntities: false
});
Relevant docs:
Cheerio
htmlparser2
Background
I have burned myself out looking for this answer. The closest code I could find that works was from Stack Edit specifically the Markdown.Converter.js script; copied below. This is a pretty heavy hitting regular expression though, my regex for finding ** for example happens in almost 1/5 of the steps and I don't need this much extra support.
function _DoItalicsAndBold(text) {
// <strong> must go first:
text = text.replace(/([\W_]|^)(\*\*|__)(?=\S)([^\r]*?\S[\*_]*)\2([\W_]|$)/g,"$1<strong>$3</strong>$4");
text = text.replace(/([\W_]|^)(\*|_)(?=\S)([^\r\*_]*?\S)\2([\W_]|$)/g,"$1<em>$3</em>$4");
return text;
}
Question
I'm trying to make my own very simple markdown script that makes these transformations:
* ---> Italics
** ---> Bold
__ ---> Underline
So far I can find all uses of ** (two stars, bold text) with this regex:
/(\*\*)(?:(?=(\\?))\2.)*?\1/g
However I can not for the life of me figure out how to match only * (single star, italicized text) with one regular expression. If I decide to go further I may have to distinguish between _ and __ as well.
Can someone point me in the right direction on how to properly write the regular expressions that will do this?
Update / Clarifty of OP's Question
I am aware of parser's and I am afraid that this question is going to be derailed from the point. I am not asking for parser help (but I do welcome and appreciate it) I am looking specifically for regular expression help. If this helps people get away from parser answers here is another example. Lets say I have an app that looks for strings inside double quotes and pulls them out to make tags or something. I want to avoid troll users trying to mess things up or sneak things by me so if they use double double quotes I should just ignore it and not bother making a tag out of it. Example:
In this "sentence" my regex would match "sentence" and use other code I'm not showing you to pull out only the word: sentence.
Now if someone does double double quotes I just ignore it because no match was found. Meaning the inner word should not be found as a match in this instance.
In this ""sentence"" I have two double quotes around the word sentence and it should be completely ignored now. I don't even care about ignoring the outer double quotes and matching on the inner ones. I want no match in this case.
I'm trying to match a string with a pattern, that can have sub strings with the same pattern.
Here's a example string:
Nicaragua [[NOTE|note|Congo was a member of ICCROM from 1999 and Nicaragua from 1971. Both were suspended by the ICCROM General Assembly in November 2013 having omitted to pay contributions for six consecutive calendar years (ICCROM [[Statutes|s|url|www.iccrom.org/about/statutes/]], article 9).]]. Another [[link|url|google.com]] that might appear.
and here's the pattern:
[[display_text|code|type|content]]
So, what I want with that is get the string within the brackets, and then look for some more string that match the pattern within the top level one.
and what I want is match this:
[[NOTE|s|note|Congo was a member of ICCROM from 1999 and Nicaragua from 1971. Both were suspended by the ICCROM General Assembly in November 2013 having omitted to pay contributions for six consecutive calendar years (ICCROM [[Statutes|s|url|www.iccrom.org/about/statutes/]], article 9).]]
1.1 [[Statutes|s|url|www.iccrom.org/about/statutes/]]
[[link|s|url|google.com]]
I was using this /(\[\[.*]])/ but it gets everything until the last ]].
What I want with that is be able to identify the matched string and convert them to HTML elements, where |note| is going to be a blockquote tag and |url| an a tag. So, a blockquote tag can have link tag inside it.
BTW, I'm using CoffeeScript to do that.
Thanks in advance.
In general, regex is not good at dealing with nested expressions. If you use greedy patterns, they'll match too much, and if you use non-greedy patterns, as #bjfletcher suggests, they'll match too little, stopping inside the outer content. The "traditional" approach here is a token-based parser, where you step through characters one by one and build an abstract syntax tree (AST) which you then reformat as desired.
One slightly hacky approach I've used here is to convert the string to a JSON string, and let the JSON parser do the hard work of converting into nested objects: http://jsfiddle.net/t09q783d/1/
function toPoorMansAST(s) {
// escape double-quotes, as they'll cause problems otherwise. This converts them
// to unicode, which is safe for JSON parsing.
s = s.replace(/"/g, "\u0022");
// Transform to a JSON string!
s =
// Wrap in array delimiters
('["' + s + '"]')
// replace token starts
.replace(/\[\[([^\|]+)\|([^\|]+)\|([^\|]+)\|/g,
'",{"display_text":"$1","code":"$2","type":"$3","content":["')
// replace token ends
.replace(/\]\]/g, '"]},"');
return JSON.parse(s);
}
This gives you an array of strings and structured objects, which you can then run through a formatter to spit out the HTML you'd like. The formatter is left as an exercise for the user :).
It seems + is not the right operator to handle the concatenation of strings in JavaScript. what are some alternatives to handle the both the ltr and rtl cases?
The problem is, + is not right operator to concatenate strings at all. Or maybe it is, but concatenating string is an Internationalization bug.
Instead of simply concatenating them, one should actually format them. So what you should actually do, is use placeholders:
var somePattern = "This language is written {0}.";
var someMessage = somePattern.format("LTR");
This way, the translator would be able to re-order the sentence, including word order. And I believe it solves your problem.
For formatting function, let me quote this excellent answer:
String.prototype.format = function() {
var args = arguments;
return this.replace(/\{(\d+)\}/g, function() {
return args[arguments[1]];
});
};
EDIT: Adding information about directionality marks.
Sometimes, when you have multiple placeholders you may lose the control of string direction, i.e. {0}/{1} would still be shown as first/second instead of desired second/last. To fix this, you would add Strong Directionality Mark to the pattern, i.e. {0}/{1}. is an HTML entity that resolves to Unicode code point U+200F, that is right-to-left strong directionality mark.
Actually, assuming both string are localized and you want the string on the right to be displayed logically after the string on the left, then + sometimes works fine. Strings in languages such as Arabic should be displayed RTL (right to left) on the screen, but the character ordering is still meant to be LTR (left to right) in memory. So + operator is logically consistent to use for generating an 'ordered list' of terms in any language.
But there are also scenarios where + does not solve the problem correctly. There are scenarios where the correct solution is to follow the grammar of the containing language. For instance, are you really embedding an English word in an Arabic sentence? Or vice versa? Regardless, the solution here is to do string formatting, where the containing sentence localized has a placeholder for the foreign term, like {0}.
The third case is what if there is no grammatical relationship because it is just two separate sentences? In this case there is no correct ordering. E.g. if you have an English sentence displayed in front of an Arabic sentence. An English speaker will probably read the sentences LTR (left sentence first, then right). An Arabic speaker will probably read the sentences RTL. Either way it's unclear to everyone which order the author intended the sentences to be read in. :)