Detecting new lines in markdown string

Detecting new lines in markdown string - javascript

I am processing markdown files in my nodeJS application. So I have my markdown held as a string. I am trying to determine the difference between markdown like:
```Javascript
var code_block = something;
and
```
var code_block = something;
so I am approaching the issue like:
var language = markdown_string.substr(0, markdown_string.search("\n"));
console.log("Language: " + language);
So I am searching the string for the code between the ``` and the newline, however, the \n isn't being found, so the string represents the rest of the file. if I search for (blank space), then I get the var included, so my string doesn't seem to have anything detectable between the end of the backticks or the language and the next line.
Is this correct? Can you see any way I can pick up the rest of the top line after the triple backticks but before the var on the next line?

I have found that by splitting the string into chars, i.e. var chars = markdown.split(""); and then looping through the next 15 values until chars[i] equals \n and using that as an index to substring the necessary part of the string. It works, but is a bit messy ...

Related

Regex to convert markdown to html

My goal is to take a markdown text and create the necessary bold/italic/underline html tags.
Looked around for answers, got some inspiration but I'm still stuck.
I have the following typescript code, the regex matches the expression including the double asterisk:
var text = 'My **bold\n\n** text.\n'
var bold = /(?=\*\*)((.|\n)*)(?<=\*\*)/gm
var html = text.replace(bold, '<strong>$1</strong>');
console.log(html)
Now the result of this is : My <\strong>** bold\n\n **<\strong> text.
Everything is great aside from the leftover double asterisk.
I also tried to remove them in a later 'replace' statement, but this creates further issues.
How can I ensure they are removed properly?

With your pattern (?=\*\*)((.|\n)*)(?<=\*\*) you assert (not match) with (?=\*\*) that there is ** directly to the right.
Then directly after that, you capture the ** using ((.|\n)*) so then it becomes part of the match.
Then at the end you assert again with (?<=\*\*) that there is ** directly to the left, but ((.|\n)*) has already matched it.
This way so you will end up with all the ** in the match.
You don't need lookarounds at all, as you are already using a capture group.
In Javascript you could match the ** on the left and right and capture any character in a capture group:
\*\*([^]*?)\*\*
Regex demo
But I would suggest using a dedicated parser to parse markdown instead of using a regex.

Just make another call to replaceAll removing the ** with and empty string.
var text = 'My **bold\n\n** text.\n'
var bold = /(?=\*\*)((.|\n)*)(?<=\*\*)/gm
var html = text.replace(bold, '<strong>$1</strong>');
html = html.replaceAll(/\*\*/gm,'');
console.log(html)

Searching + Replacing a raw string with double backslashes using regex outputs only a single backslash

I am hoping to get some insight into this issue I am having. I couldn't really find any other questions like this doing the same thing. I am using python 3.7.
I am pulling in contents of strings using regex which then will replace other parts of a JavaScript file I am reading in. I am familiar with raw strings when reading data in, but when I go to replace certain parts containing the double backslash only one is printed. I know it is escaping the string due to the backslash on the \", but I am at a loss with this. Adding a third "\" to it making it "\\\" will solve the issue, but I cannot do that due to the type of data I am working with.
For example, I want to replace all instances that the string "Ch" is found in a file I am reading in with "\\" using regex. This new data is then outputted to a new file.
How do I go about replacing certain content with the string '"\\"' ensuring that nothing is escaped and only "\\" is outputted?
Simplified sample code for testing:
string = R'"\\"'
Converted_Text = re.sub("Ch", ' ' + string, Converted_Text)
with open('output.js','w') as w:
w.write(Converted_Text.strip().replace('\n',''))
Sample file being read in:
var test = Ex("Temp") + Ch + wn;
var num1= 1;
var num2 =2;
var sum = num1+num2;
var blah, meh;

You can first replace CH with \\CH and again replace CH with \\.

RegEx working in JavaScript but not in C#

I currently have a working WordWrap function in Javascript that uses RegEx. I pass the string I want wrapped and the length I want to begin wrapping the text, and the function returns a new string with newlines inserted at appropriate locations in the string as shown below:
wordWrap(string, width) {
let newString = string.replace(
new RegExp(`(?![^\\n]{1,${width}}$)([^\\n]{1,${width}})\\s`, 'g'), '$1\n'
);
return newString;
}
For consistency purposes I won't go into, I need to use an identical or similar RegEx in C#, but I am having trouble successfully replicating the function. I've been through a lot of iterations of this, but this is what I currently have:
private static string WordWrap(string str, int width)
{
Regex rgx = new Regex("(?![^\\n]{ 1,${" + width + "}}$)([^\\n]{1,${" + width + "}})\\s");
MatchCollection matches = rgx.Matches(str);
string newString = string.Empty;
if (matches.Count > 0)
{
foreach (Match match in matches)
{
newString += match.Value + "\n";
}
}
else
{
newString = "No matches found";
}
return newString;
}
This inevitably ends up finding no matches regardless of the string and length I pass. I've read that the RegEx used in JavaScript is different than the standard RegEx functionality in .NET. I looked into PCRE.NET but have had no luck with that either.
Am I heading in the right general direction with this? Can anyone help me convert the first code block in JavaScript to something moderately close in C#?
edit: For those looking for more clarity on what the working function does and what I am looking for the C# function to do: What I am looking to output is a string that has a newline (\n) inserted at the width passed to the function. One thing I forgot to mention (but really isn't related to my issue here) is that the working JavaScript version finds the end of the word so it doesn't cut up the word. So for example this string:
"This string is really really long so we want to use the word wrap function to keep it from running off the page.\n"
...would be converted to this with the width set to 20:
"This string is really \nreally long so we want \nto use the word wrap \nfunction to keep it \nfrom running off the \npage.\n"
Hope that clears it up a bit.

JavaScript and C# Regex engines are different. Also each language has it's own regex pattern executor, so Regex is language dependent. It's not the case, if it is working for one language so it will work for another.
C# supports named groups while JavaScript doesn't support them.
So you can find multiple difference between these two languages regex.

There are issues with the way you've translated the regex pattern from a JavaScript string to a C# string.
You have extra whitespace in the c# version, and you've also left in $ symbols and curly brackets { that are part of the string interpolation syntax in the JavaScript version (they are not part of the actual regex pattern).
You have:
"(?![^\\n]{ 1,${" + width + "}}$)([^\\n]{1,${" + width + "}})\\s"
when what I believe you want is:
"(?![^\\n]{1," + width + "}$)([^\\n]{1," + width + "})\\s"

Most efficient way to grab XML tag from file with JavaScript and Regex

I'm doing some more advanced automation on iOS devices and simulators for an enterprise application. The automation is written in browserless Javascript. One of the methods works on the device but not on the simulator, so I need to code a workaround. For the curious, it's UIATarget.localTarget().frontMostApp().preferencesValueForKey(key).
What we need to do is read a path to a server (which varies) from a plist file on disk. As a workaround on the simulator, I've used the following lines to locate the plist file containing the preferences:
// Get the alias of the user who's logged in
var result = UIATarget.localTarget().host().performTaskWithPathArgumentsTimeout("/usr/bin/whoami", [], 5).stdout;
// Remove the extra newline at the end of the alias we got
result = result.replace('\n',"");
// Find the location of the plist containing the server info
result = UIATarget.localTarget().host().performTaskWithPathArgumentsTimeout("/usr/bin/find", ["/Users/"+result+"/Library/Application Support/iPhone Simulator", "-name", "redacted.plist"], 100);
// For some reason we need a delay here
UIATarget.localTarget().delay(.5);
// Results are returned in a single string separated by newline characters, so we can split it into an array
// This array contains all of the folders which have the plist file under the Simulator directory
var plistLocations = result.stdout.split("\n");
...
// For this example, let's just assume we want slot 0 here to save time
var plistBinaryLocation = plistLocations[0];
var plistXMLLocation = plistLocations[i] + ".xml";
result = UIATarget.localTarget().host().performTaskWithPathArgumentsTimeout("/usr/bin/plutil", ["-convert","xml1", plistBinaryLocation,"-o", plistXMLLocation], 100);
From here, I think the best way to get the contents is to cat or grep the file, since we can't read the file directly from disk. However, I'm having trouble getting the syntax down. Here's an edited snippet of the plist file I'm reading:
<key>server_url</key>
<string>http://pathToServer</string>
There are a bunch of key/string pairs in the file, where the server_url key is unique. Ideally I'd do something like a lookback, but because JavaScript doesn't appear to support it, I figured I'd just get the pair from the file and whittle it down a bit later.
I can search for the key with this:
// This line works
var expression = new RegExp(escapeRegExp("<key>server_url</key>"));
if(result.stdout.match(expression))
{
UIALogger.logMessage("FOUND IT!!!");
}
else
{
UIALogger.logMessage("NOPE :(");
}
Where the escapeRegExp method looks like this:
function escapeRegExp(str)
{
var result = str.replace(/([()[{*+.$^\\|?])/g, '\\$1');
UIALogger.logMessage("NEW STRING: " + result);
return result;
}
Also, this line returns a value (but gets the wrong line):
var expression = new RegExp(escapeRegExp("<string>(.*?)</string>"));
However, when you put the two together, it (the Regex syntax) works on the terminal but doesn't work in code:
var expression = new RegExp(escapeRegExp("<key>server_url</key>[\s]*<string>(.*?)</string>"));
What am I missing? I also tried grep and egrep without any luck.

There are two problems affecting you here getting the regex to work in your JavaScript code.
First, you are escaping the whole regex expression string, which means that your capturing (.*?) and your whitespace ignoring [\s]* will also be escaped and won't be evaluated the way you're expecting. You need to escape the XML parts and add in the regex parts without escaping them.
Second, the whitespace ignoring part, [\s]* is falling prey to JavaScript's normal string escaping rules. the "\s" is turning into "s" in the output. You need to escape that backslash with "\s" so that it stays as "\s" in the string that you pass to construct the regular expression.
I've built a working script that I've verified in the UI Automation engine itself. It should extract and print out the expected URL:
var testString = "" +
"<plistExample>\n" +
" <key>dont-find-me</key>\n" +
" <string>bad value</string>\n" +
" <key>server_url</key>\n" +
" <string>http://server_url</string>\n" +
"</plistExample>";
function escapeRegExp(str)
{
var result = str.replace(/([()[{*+.$^\\|?])/g, '\\$1');
UIALogger.logMessage("NEW STRING: " + result);
return result;
}
var strExp = escapeRegExp("<key>server_url</key>") + "[\\s]*" + escapeRegExp("<string>") + "(.*)" + escapeRegExp("</string>");
UIALogger.logMessage("Expression escaping only the xml parts:" + strExp);
var exp = new RegExp(strExp);
var match = testString.match(exp);
UIALogger.logMessage("Match: " + match[1]);
I should point out, though, that the only thing you need to escape in the regex is the forward slashes in the XML closing tags. That means that you don't need your escapeRegExp() function and can write the expression you want like this:
var exp = new RegExp("<key>server_url<\/key>[\\s]*<string>(.*)<\/string>");

Parsing Text with jQuery

I'm attempting to parse a text string with jQuery and to make a variable out of it. The string is below:
Publications Deadlines: armadllo
I'm trying to just get everything past "Publications Deadlines: ", so it includes whatever the name is, regardless of how long or how many words it is.
I'm getting the text via a the jQuery .text() function like so:
$('.label_im_getting').text()
I feel like this may be a simple solution that I just can't put together. Traditional JS is fine as well if it's more efficient than JQ!

Try this,
Live Demo
First part
str = $.trim($('.label_im_getting').text().split(':')[0]);
Second part
str = $.trim($('.label_im_getting').text().split(':')[1]);

var string = input.split(':') // splits in two halfs based on the position of ':'
string = input[1] // take the second half
string = string.replace(/ /g, ''); // removes all the spaces.

We Keep Coding

JavaScript is the programming language of the Web.

Detecting new lines in markdown string - javascript

I have found that by splitting the string into chars, i.e. var chars = markdown.split(""); and then looping through the next 15 values until chars[i] equals \n and using that as an index to substring the necessary part of the string. It works, but is a bit messy ...

Related

Regex to convert markdown to html

Searching + Replacing a raw string with double backslashes using regex outputs only a single backslash

RegEx working in JavaScript but not in C#

Most efficient way to grab XML tag from file with JavaScript and Regex

Parsing Text with jQuery

Categories

Resources