React/Javascript display API data shows hard coded links instead of hyperlink? - javascript

So I'm messing around with this API and for the description it has the links hard coded like Bunch of words, so it just shows exactly that on my browser. How would I display the description to look normal in my app?
Here is the API https://api.coingecko.com/api/v3/coins/bitcoin
and this is just a simple way I got the description to display
const Tokens = ({coin}) => {
return (
<p>{coin.description.en}</p>
)
}
This would end up showing all the a tags on the browser instead of converting them into a clickable link
Peercoin, Primecoin, and so on.\r\n\r\nThe cryptocurrency then took off with the innovation of the turing-complete smart contract by Ethereum which led to the development of other amazing projects such as EOS, Tron, and even crypto-collectibles such as CryptoKitties.",
Is there a way to display the description so that it looks like a normal paragraph with hyperlinks instead of literally showing the hard coded tags?
Also, if I only wanted to show like the first two sentences, how would I cut out the rest of the paragraph?

Like Jacob said, you can use the dangerouslySetInnerHTML prop by utilizing interpolated string literals.
const apiResponse = `Peercoin, Primecoin, and so on.\r\n\r\nThe cryptocurrency then took off with the innovation of the turing-complete smart contract by Ethereum which led to the development of other amazing projects such as EOS, Tron, and even crypto-collectibles such as CryptoKitties."`
...
<p dangerouslySetInnerHTML={{__html: apiResponse}}></p>.
However, \r and \n won't be understood by the HTML. You can replace these whitespace characters with unicode escape sequences that HTML will understand like so:
const cleanedAPIResponse = apiResponse.replace("\n", "<br\>").replace("\r", "\u000D");
...
<p dangerouslySetInnerHTML={{__html: cleanedAPIResponse}}></p>.
Note: Not too sure about these replacements. FYI, \r is known as a 'carriage return'.
If you wanted only the first two sentences, an idea could be to search for the second instance of '.' in the API response. Then you can truncate the rest of the string literal, and append the appropriate closing tags based on the appearance of the tags going from left to right and which do not have matching closing tags in the string already.

Related

Replacing three dots between square brackets JS

I'm having a trouble with blog posts excerpt..
I'm using Wordpress as headless CMS and it returns me post excerpt in specific format.
It looks like <p>some text here [...]</p> and I'm trying to write one regular expression that will get rid of paragraphs and those brackets with dots in between.
I ended up with something like
excerpt.replace(/<p>|<\/p>/g, '') and it works with paragraphs but I can't find any solution to get rid of those three dots in one regular expression..
Is that possible at all?
Ok - I didn't notice one thing xD
Wordpress returns […] instead of [...]..
Now everything works perfectly with something like
const parsedExcerpt = excerpt.replace(/<p>|<\/p>|\r?\n|\r|\[…]/g, '');

How to detect and remove unwanted lines from a string?

I am working on a project in which i have to extract text data from a PDF.
I am able to extract text from the PDF, but extracted text sometimes contains lines which i would like to strip off from it.
Here's and example of unwanted lines -
ISBN 0-7225-3293-8. = CONTENTS = Part One Part Two Epilogue
Page 1 / 94
And, here's an example of good line (which i'd like to keep) -
Dusk was falling as the boy arrived with his herd at an abandoned church.
I wanted to sleep a little longer, he thought. He had had the same dream that night as a week ago
Different PDFs can give out different unwanted lines.
How can i detect them ?
Option 1 - Give the computer a rule: If you are able to narrow down what content it is that you would like to keep, the obvious criteria that sticks out to me is the exclusion of special characters, then you can filter your results based on this.
So let's say you agree that all "good lines" will be without special characters ('/', '-', and '=') for example, if a line DOES contain one of these items, you know you can remove it from the content you are keeping. This could be done in a for loop containing an if-then condition that looks something like this..
var lineArray = //code needed to make each line of the file an element of the array
For (cnt = 0; cnt < totalLines; cnt++)
{
var line = lineArray[cnt];
if (line.contains("/") || line.contains("-") || line.contains("="))
lineArray[cnt] = "";
}
At the end of this code you could simply get all the text within the array and it would no longer contain the unwanted lines. If there are unwanted lines however, that are virtually indistinguishable by characters, length, positioning etc. the previous approach begins to break down on some of the trickier lines.
This is because there is no rule you can give the computer to distinguish between the good and the bad without giving it a brain such as yours that recognizes parts of speech and sentence structure. In which case you might consider option 2, which is just that.
Option 2- Give the computer a brain: Given that the text you want to remove will more or less be incoherent documentation based on what you have shown us, an open source (or purchased) natural language processor may be what you are looking for.
I found a good beginner's intro at http://myreaders.info/10_Natural_Language_Processing.pdf with some information that might be of use to you. From the source,
"Linguistics is the science of language. Its study includes:
sounds (phonology),
word formation (morphology),
sentence structure (syntax),
meaning (semantics), and understanding (pragmatics) etc.
Syntactic Analysis : Here the analysis is of words in a sentence to know the grammatical structure of the sentence. The words are transformed into structures that show how the words relate to each others. Some word sequences may be rejected if they violate the rules of the language for how words may be combined. Example: An English syntactic analyzer would reject the sentence say : 'Boy the go the to store.' "
Using some sort of NLP, you can discover whether a given section of text contains a sentence or some incoherent rambling. This test could then be used as a filter in your program for what you would like to keep or remove.
Side note- As it appears your sample text is not just sentences but literature, sometimes characters will speak in sentence fragments as part of their nature given by the author. In this case, you could add a separate condition that if the text is contained within two quotations and has no special characters, you want to keep the text regardless.
In the end NLP may be more work than you require or that you want to do, in which case Option 1 is likely going to be your best bet. On the other hand, it may be just the thing you are looking for. Whatever the case or if you decide you need some combination of the two, best of luck! I hope this answer helps.

Remove unicode characters from Cheerio.js content

I’m using cheeriojs to scrape content off a webpage, with the following HTML.
<p>
Although the PM's office could neither confirm nor deny this, the spokesperson, John Doe said the meeting took place on Sunday.
<br>
<br>
“The outcome will be made public in due course,” John said in an SMS yesterday.
<br>
<br>
</p>
I’m able to reach the content of interest, by class and id tags, as follows:
$('.top-stories .line.more').each(function(i, el){
//Do something…
let content = $(this).next().html();
}
Once I’ve captured the content of interest, I “clean” it up using regular expressions, as below:
let cleanedContent = content.split(/<br>/).join(' \n ');
Inserting a newline where an empty tag (<br>) is matched. So far all is good, until I look at the cleaned content below:
Although the PM&apos;s office could neither confirm nor deny this, the spokesperson, Saima Shaanika said the meeting took place on Friday.
“The outcome will be made public in due course,”
It appears that punctuation marks, and perhaps some other characters, are stored according to their unicode codes. I may be wrong on this, and would welcome some correction to this line of thought.
Assuming that they are stored as unicode codes, is there a module that I could pass the “cleanedContent” variable, through to convert the unicodes to human readable punctuation marks/characters?
Should this not be possible, is there a better implementation of cheeriojs that would avoid this? I'm totally open to the notion that I'm not using cherriojs correctly, and would love some direction as to new approaches I could try instead.
One way I can think of, is writing a module containing several unicodes and their corresponding unicodes, then look for matches, and replace a matched code with the corresponding human readable character. I have some intuitive feeling that someone's already done this or something similar. I'd rather not try to reinvent the wheel.
Thanks in advance.
Cheerio uses htmlparser2 internally.
Because of this, you can use htmlparser2's decodeEntities option during the load of the HTML string, which allows you configure how HTML entities should be treated.
Example:
$ = cheerio.load('<ul id="fruits">...</ul>', {
decodeEntities: false
});
Relevant docs:
Cheerio
htmlparser2

Putting content from <pre> into array using Javascript

We have report pages that append a <pre> at the bottom of pages that lists information line by line.
Example:
<pre>
Site Report Info
This is where any error will appear as a query string of numbers: 938109283091238109281092
This is where the account ID will be.
This is where the account reference pin will be.
So on...
So on...
So on...
So on...
So on...
</pre>
Using javascript or jquery, perhaps regex, how can I place all of this into one array where each line is an array element? I assume regex, since the way to determine the lines is by identifying line breaks \n ?
var lines = $('pre').text().split('\n') should do the trick.
You don't need to use jQuery to get the text, of course, but if you're doing web programming, jQuery is pretty ubiquitous.
You may also want to trim the results to get rid of extra whitespace (or not, depending on your application):
var lines = $('pre').text().split('\n').map(function(l) { return l.trim(); });

Is there a way to automatically control orphaned words in an HTML document?

I was wondering if there's a way to automatically control orphaned words in an HTML file, possibly by using CSS and/or Javascript (or something else, if anyone has an alternative suggestion).
By 'orphaned words', I mean singular words that appear on a new line at the end of a paragraph. For example:
"This paragraph ends with an undesirable orphaned
word."
Instead, it would be preferable to have the paragraph break as follows:
"This paragraph no longer ends with an undesirable
orphaned word."
While I know that I could manually correct this by placing an HTML non-breaking space ( ) between the final two words, I'm wondering if there's a way to automate the process, since manual adjustments like this can quickly become tedious for large blocks of text across multiple files.
Incidentally, the CSS2.1 properties orphans (and widows) only apply to entire lines of text, and even then only for the printing of HTML pages (not to mention the fact that these properties are largely unsupported by most major browsers).
Many professional page layout applications, such as Adobe InDesign, can automate the removal of orphans by automatically adding non-breaking spaces where orphans occur; is there any sort of equivalent solution for HTML?
You can avoid orphaned words by replacing the space between the last two words in a sentence with a non-breaking space ( ).
There are plugins out there that does this, for example jqWidon't or this jquery snippet.
There are also plugins for popular frameworks (such as typogrify for django and widon't for wordpress) that essentially does the same thing.
I know you wanted a javascript solution, but in case someone found this page a solution but for emails (where Javascript isn't an option), I decided to post my solution.
Use CSS white-space: nowrap. So what I do is surround the last two or three words (or wherever I want the "break" to be) in a span, add an inline CSS (remember, I deal with email, make a class as needed):
<td>
I don't <span style="white-space: nowrap;">want orphaned words.</span>
</td>
In a fluid/responsive layout, if you do it right, the last few words will break to a second line until there is room for those words to appear on one line.
Read more about about the white-space property on this link: http://www.w3schools.com/cssref/pr_text_white-space.asp
EDIT: 12/19/2015 - Since this isn't supported in Outlook, I've been adding a non-breaking space between the last two words in a sentence. It's less code, and supported everywhere.
EDIT: 2/20/2018 - I've discovered that the Outlook App (iOS and Android) doesn't support the entity, so I've had to combine both solutions: e.g.:
<td>
I don't <span style="white-space:nowrap;">want orphaned words.</span>
</td>
In short, no. This is something that has driven print designers crazy for years, but HTML does not provide this level of control.
If you absolutely positively want this, and understand the speed implications, you can try the suggestion here:
detecting line-breaks with jQuery?
That is the best solution I can imagine, but that does not make it a good solution.
I see there are 3rd party plugins suggested, but it's simpler to do it yourself. if all you want to do is replace the last space character with a non-breaking space, it's almost trivial:
const unorphanize = (str) => {
let iLast = str.lastIndexOf(' ');
let stArr = str.split('');
stArr[iLast] = ' ';
return stArr.join('')
}
I suppose this may miss some unique cases but it's worked for all my use cases. the caveat is that you can't just plug the output in where text would go, you have to set innerHTML = unorphanize(text) or otherwise parse it
If you want to handle it yourself, without jQuery, you can write a javascript snippet to replace the text, if you're willing to make a couple assumptions:
A sentence always ends with a period.
You always want to replace the whitespace before the last word with
Assuming you have this html (which is styled to break right before "end" in my browser...monkey with the width if needed):
<div id="articleText" style="width:360px;color:black; background-color:Yellow;">
This is some text with one word on its own line at the end.
<p />
This is some text with one word on its own line at the end.
</div>
You can create this javascript and put it at the end of your page:
<script type="text/javascript">
reformatArticleText();
function reformatArticleText()
{
var div = document.getElementById("articleText");
div.innerHTML = div.innerHTML.replace(/\S(\s*)\./g, " $1.");
}
</script>
The regex simply finds all instances (using the g flag) of a whitespace character (\S) followed by any number of non-whitespace characters (\s) followed by a period. It creates a back-reference to the non-white-space that you can use in the replace text.
You can use a similar regex to include other end punctuation marks.
If third-party JavaScript is an option, one can use typogr.js, a JavaScript "typogrify" implementation. This particular filter is called, unsurprisingly, Widont.
<script src="https://cdnjs.cloudflare.com/ajax/libs/typogr/0.6.7/typogr.min.js"></script>
<script>
document.body.innerHTML = typogr.widont(document.body.innerHTML);
</script>
</body>

Categories