Unicode 'not perpendicular' symbol - javascript

According to this source, we have a neat way to show parallel, not parallel and perpendicular symbols. What I'm looking for though, would be crossed perpendicular symbol, or 'not perpendicular' symbol. I know it's very rare, but usage can be easily justified from math point of view, hence my question.
But it seems like it's not within Unicode character list.
How can I add the symbol and render it with HTML and JavaScript.

(Turning my comment into an answer.)
You can use Unicode's combining characters, e.g:
U+27C2 (PERPENDICULAR) followed by either
U+0338 (COMBINING LONG SOLIDUS OVERLAY):
⟂̸
or U+0337 (COMBINING SHORT SOLIDUS OVERLAY):
⟂̷
Or some other character that you find fitting.
Naturally this will only work with supportive fonts and proper renderers (i.e. browsers in this case), but then again font limitations can always cause problems, with or without combining characters.

Related

General (rough) algorithm for transliterating a RTL language to a LTR language

I am beginning thinking about how to transliterate a RTL string (i.e. arabic, hebrew) to a LTR string (i.e. the romanization of the sounds/letters). It's relatively straightforward if it's LTR -> LTR, but more tricky mentally for RTL -> LTR. For LTR -> LTR, you could have a simple mapping for each letter in A to each letter in B. Maybe multiple A's combined make a B in some cases, or a single a single A makes a chain of Bs.
a b
- -
X 1
YZ 2
ABC 3
D 456
E 78
Then given a string like XYZYZDDEABC you would get 122456456783. Basic enough, though the actual algorithm would be a bit tricky because it might have to lookahead and have a prioritization on the elements. But this is the gist of it.
Now for a RTL -> LTR transformation, I'm confused on two levels. First, how do you iterate through a RTL string? The characters are actually in LTR order, correct? It's just the visual layout in browsers and such which makes it RTL. So from a code perspective, your RTL language is actually read LTR (it's not like we have to do anything in reverse or anything). Just making sure I'm interpreting this correctly. That would mean I can just do like the above LTR -> LTR transformation for all intents and purposes.
If it's not like that, and there's something else to consider, I would like to know generally how to do this. If a language is needed for a demo, then JavaScript would be good.
You're correct. Text is stored in "logical order", which is the order it would be typed (or, in most cases, the order in which it is spoken). So you don't need to take directionality into account during transliteration.
Note that in many writing systems, including both Arabic and Hebrew, numbers are written "big-endian", with the most significant digit on the left. They are also typed in this order, meaning that the text is actually bidirectional. That is also the case when texts of different directionality are mixed together, such as when names written in Latin script are included in an Arabic or Hebrew document. Fortunately, you don't need to worry about that either, unless you're writing a Unicode renderer. (If you are, you'd need to read Annex 9 to the Unicode standard, which goes into all the details of bidirectional rendering.)

How to represent a fraction using javascript and query [duplicate]

Is it possible to use any fraction symbol on a website, represented as ¼ rather than 1/4 for example?
From what I've gathered, these are the only ones I can use:
½
⅓ ⅔
¼ ¾
Is this right and why is that? The reason why I ask this is because I've done a Google web search and can't seem to locate any others ... eg. 2/4
You can test http://www.mathjax.org/ it is a JavasScript library to make a Math Formula if this is what you want.
The image below displays all unicode-defined fraction symbols. Each of them is treated as one single character. You can use all of them freely, of course, but if you want more, e.g. 123/321, then you should look out for a library that can create fractions dynamically.
An option for doing so would be using LaTeX. There is another question (with very good answers) on how to do this.
Image from http://symbolcodes.tlt.psu.edu/bylanguage/mathchart.html#fractions
As I undserstand HTML5 includes MathML which can represent any fraction you want.
While searching the unicode table I also found these: ⅑ ⅒ ⅕ ⅖ ⅗ ⅘ ⅙ ⅚ ⅛ ⅜ ⅝ ⅞.
A web page is built up with text, and that text is encoded in a certain character set. The character set you select decides on which characters can be displayed. This also means that characters or symbols that don't exist in the character set cannot be displayed.
As shown in Michael's answer, Unicode defines symbols for a number of fractions. These can be displayed without using all kinds of tricks, for example server or client side generated small bitmaps showing the desired fraction, or as indicated by
mohammad mohsenipur a Javascript library that transforms TeX or MathML.
There are several possibilities:
Use special character for fractions. Not possible for 2/4 for example, and problematic in font support for all but the three most common (vulgar) fractions you had found.
Use markup like <sub>2</sub>/<sup>4</sup>. Probably messes up your line spacing, and does not look particularly good.
Construct a fraction using some CSS for positioning and size control and using fraction slash character instead of the common slash. Rather awkward really, I would say.
Use OpenType <code>"frac"</code> feature. Rather limited support in browsers and especially in fonts.
MathJax, e.g. \(\frac{2}{4}\) or some more elaborated TeX code to produce a different style for fraction.
MathML. Verbose, and browser support to MathML inside HTML could be better.
These are explained more and illustrated in my page “Math in HTML (and CSS)”, section Fractions.
The choice thus depends on many factors. It also depends on the font quite a lot. I suggest that you test the different options using the font family declaration you intend to use. Despite the many alternatives, you might end up with using just the simple linear notation like 2/4.

Greek fonts for Katex

I suceeded to use katex on my blog instead of MathJax. However some of the equations contained greek symbols and Katex does not contain the fonts for rendering the greek characters.
(Matjax is very good at rendering the greek letters)
Are there Katex fonts available to render an equation that contains greek characters? How to use these fonts (how to include them together with the Katex script on my site)?
For example the equation
hν0=hν+Ek+W(1)
(ν is \nu) is rendering good with mathjax but not with Katex.
KaTeX doesn't currently support Greek letters as input, though as the comment says, \nu does work. See this issue for more details: Symbol unicode replacement doesn’t work
Different formulae-rendering js libs behave in one of 3 different ways:
process \pi and tolerate π​​ (MathJax; MathQuill, although the result is somewhat different)
process \pi but don't tolerate π​​ (jsMath, KaTeX)
don't process \pi and tolerate π​​ (jqMath)
Unfortunately, like Ben has answered, KaTeX is not the one that tolerates raw greek characters. However, you may try to do some pre-parsing to "fix" this in a manner like this: before
<script>renderMathInElement(document.body,{delimiters:
[{left: "$", right: "$", display: false}]
});</script>
add some "replace" stuff like desribed here (replace π with \pi and so on), although you should modify replaceTextOnPage function proposed there to replace all greek letters at once rather than launch a copy of replaceTextOnPage many times. You can do some other optimization since the solution there is somewhat general purpose but you know where to expect formulae on you pages.

Remove Jargon but keep real characters

I"m getting bombarded by spam with posts like below, so what would be the best and most efficient way of remove all the jargon from something like this:
<texarea id="comment">ȑ̉̽ͧ̔͆ͦ̊͛̿͗҉̷̢̧̫̗̗͎͈͕e̷̪͓̼̼̣̻̻͙͔̳̘̗͙̬̱͎ͭ̃͗ͩͯͥͬ̂ͧ͐͌̑̅͢͜ͅd̴̦̺̖̣͎̲̥͕̗̺̯̤͗ͬ͌ͧ̓͒ͭ́̋ͩͥ͊̇̓̌ͫ̃́́͠</textarea>
I'm assuming RegEx, but what exactly are those things called and how would it be referenced in RegExp? The problem lays within a <textarea> tag, and upon retrieving the value, I'd like to be able to remove all that jargon from the value and have it only display the real characters which in this case should be red.
Allowing other Unicode type of characters are essential, but not characters that stack on top of each other.
Zalgo waits behind the wall.
You want to filter out combining characters, such as the diacritical marks listed here.
You should be able to get away with a simple character class pattern match, i.e.:
fooString.replace(/[\u0300-\u036f\u0483-\u0489\u1dc0-\u1dff\u20d0-\u20ff\ufe20-\ufe2f]/, "");
If you want to limit content to one combination per character (not that this really alleviates all negative side-effects), you could simply use
fooString.replace(/([\u0300-\u036f\u0483-\u0489\u1dc0-\u1dff\u20d0-\u20ff\ufe20-\ufe2f])[\u0300-\u036f\u0483-\u0489\u1dc0-\u1dff\u20d0-\u20ff\ufe20-\ufe2f]*/, "$1");
EDIT: Added a number of other combining character ranges. This is most likely still not exhaustive.
Removing combining diacriticals will make input of some languages (such as Vietnamese) difficult or impossible, so you should reconsider.

Why is the Alternative symbol of the ECMAScript RegExp grammar left recursive?

I cannot for the life of me figure out why Alternative is left recursive. It really throws a wrench into my parser.
Alternative ::
[empty]
Alternative Term
Here is a note in the semantics portion of the spec that is not exactly clear. Maybe the reasoning would be revealed once I understand this?
NOTE Consecutive Terms try to
simultaneously match consecutive
portions of the input String. If the
left Alternative, the right Term, and
the sequel of the regular expression
all have choice points, all choices in
the sequel are tried before moving on
to the next choice in the right Term,
and all choices in the right Term are
tried before moving on to the next
choice in the left Alternative.
What kind of parser can properly handle a left recursive grammar?
Because for certain types of parser left-recursion is much better (e.g. for yacc - see section 6.2 here for an explanation).
If it's causing trouble for your particular parser, then by all means swap it over - it doesn't affect the definition of the language in any way.

Categories