php.js serialize function breaking on entity characters - UTF8 cause?

php.js serialize function breaking on entity characters - UTF8 cause? - javascript

OK, on my commerce platform, shopping cart data is stored as a serialized array in the session. An issue popped up today where one of the item's size options had special characters for 1/4 and 1/2 sizes, eg; 7¼, 7½ etc., However when viewing the order that a customer placed for this item, the size value showed up like so: 7Â½
Troublshooting 101, I took to the site and placed the exact same order the customer placed... oddly, everything worked as expected and the issue did not replicate. Then I noticed that the order was placed via a "phone order" which is a back-end script that allows the store staff to compile a new order and charge / finalize it all on one screen. The system does this via use of jquery and ajax stuff to keep everything on-screen without having to have multiple "checkout pages".
Anyway, placed the same order over the phone order script and the problem reproduced just fine. Taking a deeper look into things, I noticed that on the "site" side where customers place orders, the cart data is serialized using the PHP function, then base64_encoded and stored "COMPRESSED" as such in the database with the order.
On the phone order side, the serialization is done via a php.js serialize function that is suppose to emulate the php serialize function identically. The serialized data is then POSTED to a handler along with the customers information etc. The CC is charged and the order is saved to the database just like on the site side.
Looking into it further, I compared the two "serialized" strings from both sides for the same exact order and there is a difference. On php's side, the value 7½ is shown as "s:2:7½", the Javascript version is shown as "s:3:7½"...
I've verified that my site's character encoding is Western (iso-8859-1) on all pages involved in this process.
I was able to work around the issue... once the "bad" serialized object is passed to php to save the order, then I unserialize it, then pass the resulting array/object off to a recursive function that runs a converter function(char2html) on every value, it converts any special character to its entity name code, essentially ½ becomes ½ and Â becomes Â. I then use str_replace to get rid of any "Â" strings, then run it through another one of my functions(html2char) which coverts any &entityNames; back to their actual chcaracter.
Once that function is finshed running the object/array is then re-serialized using php's serialize function and everything works perfectly, no more Acric chars showing up on the orders placed on the phone order script.
So specifically I'm looking to figure out if there is some way I can force convert the UTF8 version of the ½ character in javascript to the 1 byte version (rather than the two byte UTF8 version that javascript is seeing), before I pass it to the serialize function in hopes that the serialize function will return the correct string, absent UTF8 characters.
That also being said there is some UTF8ness going on in the php.js serialize function but I'm not advanced enough in my JS coding to figure out if there is something INSIDE the serialize function doing this... or how to change it if there is.
Function reference:
http://phpjs.org/functions/serialize:508
Thoughts?

Related

How to access changing the order number in each opening PDF file?

I would like to create order number (following) in header, which would create automatically for each different opening the file by customer. Can I achieve this by using some functions in JS? or another? In attached screen this number should generate in each opening file

I presume that you are using Acrobat Pro to create the PDF form.
The quick and easy way to do this is to auto generate an order number based on the current date and time. Create a text field in your form (I've called mine "ordernumber"), double click it and go to the calculate tab then insert the following two lines into the custom calculation script box:
f = this.getField("ordernumber");
f.value = util.printd("yyyy/ddmm/hhmmss", new Date());
This will give you a unique order code (unless someone creates two orders in the same second!). You can change around the year (yyyy), day (dd), etc to make something that you like as a format.
If the order number needs to conform to an existing format or align with other systems then you would need to get the PDF to access an external database or something like that which would be a bit more complicated and beyond my knowledge.

It depends on whether your order number has to be unique only, or whether order numbers have to be consecutive.
In the first case, #Chris' answer pretty much gives the solution; you may be fiddling around with the base data, but that's it.
If the number has to be consecutive, there is a possibility if the use of the form can be limited to one single computer. In this case, you would create a Persistent Global Variable (which is a variable that is written back to the system, and can be reused the next time you open the document). See Acrobat JavaScript documentation for code samples. When you open the document, you read in that number, increment it and feed it into your order number field, and write it back.
If the number has to be consecutive, and the order form is used by several users, you will have to maintain the order number externally (which means, on a server). In this case, it might be even better to have a server-side order management, where the user may enter some base data, and then gets the prefilled order form made available.

Best way in Javascript to check if string is inside an huge txt/csv?

My real world problem is: users of my mobile app type their city and I have to make sure it really exists, and that it is correctly written (caseinsensitive, so these are correct: New York, NEW york, new york. This is not correct: newyork)
There are online apis that work quite well (Google Geocode API for example) but:
After a very little amount of requests, you have to pay (2.500/day right now)
Users must be connected to the internet
That's why I tought that an offline-local solution would be better. There are many websites (like Maxmind) where you can download a list containing every city in the world. I could embed this huge txt/csv right inside my application and do a string search locally (it's a big file, ok, but not that big. It's just a onetime download of something like 30-40MB of uncompressed .txt)
I'm trying to avoid jQuery at all costs and I don't want to use any PHP/MySQL solutions (even if fulltext indexes could be handy), that's why I'm trying to do all this just using javascript.
Given a string as input, let's say "city3", what's the best/fastest way to check if it's inside an (external) huge list like:
city1,
city2,
city3,
city4,
[...]
After solving this (big) problem: if there are no exact matches, is there a way to search for the correct city without freezing the device for 10 minutes?
In the example before, lets say the user types "cit y3" or "cyty3" or "cìty3": can any js function tell him that he might be looking for "city3"? Is this kind of search too slow in this scenario?
Thanks

If speed is an issue then I would recommend loading the data into a JavaScript object and performing an in-memory search rather than repeatedly scanning a big blob of text in a file.
Try formatting the data into JSON with the city names as keys, that will give you good search performance.

A Workaround is creating a Database either SQL either noSQL, and Query this database through your JavaScript Code, using jquery Json functions.
Using a SQL Database ideal would be either MySQL either MariaDB An enhanced, drop-in replacement for MySQL.
In this solution you will probably need a Backend such as PHP to fetch the data from your Database convert them to JSON Format, and then get them through your JavaScript using jQUery Library , with the $.getJSON function
Using a noSQL Database ideal would be MongoDB.
In this solution you can fetch your data directly from javascript, also with the $.getJSON function.
Example for MongoDB Provided Here

if you dont want to use database i think you can do this:
-first , instead use one big file split it into several files. (you can write a script for this and use it just one time for split the big file). in each file put cities that starts with (example) aa , second file cityes that starts with ab.
-then for each city check first letters and then search inside that file.
For example if you need to search for city "Ahmedabad" it will search only in the files with cities that starts with Ah. Probably this is not the best solution ,at the end you got 421 file instead 1 , but reasearch will be faster.

Should I worry that using GET in a form element doesn't automatically URL-encode angle brackets?

So I decided to use GET in my form element, point it to my cshtml page, and found (as expected) that it automatically URL encodes any passed form values.
I then, however, decided to test if it encodes angle brackets and surprisingly found that it did not when the WebMatrix validator threw a server error warning me about a potentially dangerous value being passed.
I said to myself, "Okay, then I guess I'll use Request.Unvalidated["searchText"] instead of Request.QueryString["searchText"]. Then, as any smart developer who uses Request.Unvalidated does, I tried to make sure that I was being extra careful, but I honestly don't know much about inserting JavaScript into URLs so I am not sure if I should worry about this or not. I have noticed that it encodes apostrophes, quotations, parenthesis, and many other JavaScript special characters (actually, I'm not even sure if an angle bracket even has special meaning in JavaScript OR URLs, but it probably does in one, if not both. I know it helps denote a List in C#, but in any event you can write script tags with it if you could find a way to get it on the HTML page, so I guess that's why WebMatrix's validator screams at me when it sees them).
Should I find another way to submit this form, whereas I can intercept and encode the user data myself, or is it okay to use Request.Unvalidated in this instance without any sense of worry?
Please note, as you have probably already noticed, my question comes from a WebMatrix C#.net environment.
Bonus question (if you feel like saving me some time and you already know the answer off the top of your head): If I use Request.Unvalidated will I have to URL-decode the value, or does it do that automatically like Request.QueryString does?
---------------------------UPDATE----------------------------
Since I know I want neither a YSOD nor a custom error page to appear simply because a user included angle brackets in their "searchText", I know I have to use Request.Unvalidated either way, and I know I can encode whatever I want once the value reaches the cshtml page.
So I guess the question really becomes: Should I worry about possible XSS attacks (or any other threat for that matter) inside the URL based on angle brackets alone?
Also, in case this is relevant:
Actually, the value I am using (i.e. "searchText") goes straight to a cshtml page where the value is ran through a (rather complex) SQL query that queries many tables in a database (using both JOINS and UNIONS, as well as Aliases and function-based calculations) to determine the number of matches found against "searchText" in each applicable field. Then I remember the page locations of all of these matches, determine a search results order based on relevance (determined by type and number of matches found) and finally use C# to write the search results (as links, of course) to a page.
And I guess it is important to note that the database values could easily contain angle brackets. I know it's safe so far (thanks to HTML encoding), but I suppose it may not be necessary to actually "search" against them. I am confused as to how to proceed to maximum security and functional expecations, but if I choose one way or the other, I may not know I chose the wrong decision until it is much too late...

URL and special caracters
The url http://test.com/?param="><script>alert('xss')</script> is "benign" until it is read and ..
print in a template : Hello #param. (Potential reflected/persisted XSS)
or use in Javascript : divContent.innerHTML = '<a href="' + window.location.href + ... (Potential DOM XSS)
Otherwise, the browser doesn't evaluate the query string as html/script.
Request.Unvalidated/Request.QueryString
You should use Request.Unvalidated["searchText"] if you are expecting to receive special caracters.
For example : <b>User content</b><p>Some text...</p>
If your application is working as expected with QueryString["searchText"], you should keep it since it validate for potential XSS.
Ref: http://msdn.microsoft.com/en-us/library/system.web.httprequest.unvalidated.aspx

What is the best way to store a field that supports markdown in my database when I need to render both HTML and "simple text" views?

I have a database and I have a website front end. I have a field in my front end that is text now but I want it to support markdown. I am trying to figure out the right was to store in my database because I have various views that needs to be supported (PDF reports, web pages, excel files, etc)?
My concern is that since some of those views don't support HTML, I don't just want to have an HTML version of this field.
Should I store 2 copies (one text only and one HTML?), or should I store HTML and on the fly try to remove them HTML tags when I am rendering out to Excel for example?
I need to figure out correct format (or formats) to store in the database to be able to render both:
HTML, and
Regular text (with no markdown or HTML syntax)
Any suggestions would be appreciated as I don't want to go down the wrong path. My point is that I don't want to show any HTML tags or markdown syntax in my Excel output.

Decide like this:
Store the original data (text with markdown).
Generate the derived data (HTML and plaintext) on the fly.
Measure the performance:
If it's acceptable, you're done, woohoo!
If not, cache the derived data.
Caching can be done in many ways... you can generate the derived data immediately, and store it in the database, or you can initially store NULLs and do the generation lazily (when and if it's needed). You can even cache it outside the database.
But whatever you do, make sure the cache is never "stale" - i.e. when the original data changes, the derived data in the cache must be re-generated or at least marked as "dirty" somehow. One way to do that is via triggers.

You need to store your data in a canonical format. That is, in one true format within your database. It sounds like this format should be a text column that contains markdown. That answers the database-design part of your question.
Then, depending on what format you need to export, you should take the canonical format and convert it to the required output format. This might be just outputting the markdown text, or running it through some sort of parser to remove the markdown or convert it to HTML.

Most everyone seems to be saying to just store the data as HTML in the database and then process it to turn it into plain text. In my opinion there are some downsides to that:
You will likely need application code to strip the HTML and extract the plain text. Imagine if you did this in SQL Server. What if you want to write a stored procedure/query that has the plain text version? How do you extract plain text in SQL? It's possible with a function, but it's a lot of work.
Processing the HTML blob can be slow. I would imagine for small HTML blobs it will be very fast, but there is certainly more overhead than just reading a plain text field.
HTML parsers don't always work well/they can be complex. The idea is that your users can be very creative and insert blobs that won't work well with your parser. I know from experience that it's not always trivial to extract plain text from HTML well.
I would propose what most email providers do:
Store a rich text/HTML version and a plain text version. Two fields in the database.
As is the use case with email providers, the users might want those two fields to have different content.
You can write a UI function that lets the user enter in HTML and then transforms it via the application into a plain text version. This gives the user a nice starting point and they can massage/edit the plain text version before saving to the database.

Always store the source, in your case it is markdown.
Also store the formats that are frequently used.
Use on demand conversion/rendering for less frequent used formats.
Explanation:
Always have the source. You may need it for various purpose, e.g. the same input can be edited, audit trail, debugging etc etc.
No overhead for processor/ram if the same format is frequently requested, you are trading it with the disk storage which is cheap comparing to the formars.
Occasional overhead, see the #2

I would suggest to store it in the HTML format, since is the richest one in this case, and remove the tags when obtaining the data for other formats (such PDF, Latex or whatever). In the following question you'll find a way to remove tags easily.
Regular expression to remove HTML tags
From my point of view, storing data (original and downgraded) in two separate fields is a waste of space, but also an integrity problem, since one of the fields could be -in theory- modified without changing the second one.
Good luck!

I think that what I'd do - if storage is not an issue - would be store the canonical version, but automatically generate from it, in persisted, computed fields, whatever other versions one might need. You want the fields to be persisted because it's pointless doing the conversion every time you need the data. And you want them to be computed because you don't want them to get out of synch with the canonical version.
In essence this is using the database as a cache for the other versions, but a cache that guarantees you data integrity.

Passing an ActionScript JPG Byte Array to Javascript (and eventually to PHP)

Our web application has a feature which uses Flash (AS3) to take photos using the user's web cam, then passes the resulting byte array to PHP where it is reconstructed and saved on the server.
However, we need to be able to take this web application offline, and we have chosen Gears to do so. The user takes the app offline, performs his tasks, then when he's reconnected to the server, we "sync" the data back with our central database.
We don't have PHP to interact with Flash anymore, but we still need to allow users to take and save photos. We don't know how to save a JPG that Flash creates in a local database. Our hope was that we could save the byte array, a serialized string, or somehow actually persist the object itself, then pass it back to either PHP or Flash (and then PHP) to recreate the JPG.
We have tried:
- passing the byte array to Javascript instead of PHP, but javascript doesn't seem to be able to do anything with it (the object seems to be stripped of its methods)
- stringifying the byte array in Flash, and then passing it to Javascript, but we always get the same string:
ÿØÿà
Now we are thinking of serializing the string in Flash, passing it to Javascript, then on the return route, passing that string back to Flash which will then pass it to PHP to be reconstructed as a JPG. (whew). Since no one on our team has extensive Flash background, we're a bit lost.
Is serialization the way to go? Is there a more realistic way to do this? Does anyone have any experience with this sort of thing? Perhaps we can build a javascript class that is the same as the byte array class in AS?

I'm not sure why you would want to use Javascript here. Anyway, the string you pasted looks like the beginning of a JPG header. The problem is that a JPG will for sure contain NULs (characters with 0 as its value). This will most likely truncate the string (as it seems to be the case with the sample you posted). If you want to "stringify" the JPG, the standard approach is encoding it as Base 64.
If you want to persist data locally, however, there's a way to do it in Flash. It's simple, but it has some limitations.
You can use a local Shared Object for this. By default, there's a 100 Kb limit, which is rather inadequate for image files; you could ask the user to allot more space to your app, though. In any case, I'd try to store the image as JPG, not the raw pixels, since the difference in size is very significative.
Shared Objects will handle serialization / deserialization for you transparently. There are some caveats: not every object can really be serialized; for starters, it has to have a parameterless constructor; DisplayObjects such as Sprites, MovieClips, etc, won't work. It's possible to serialize a ByteArray, however, so you could save your JPGs locally (if the user allows for the extra space). You should use AMF3 as the encoding scheme (which is the default, I think); also, you should map the class you're serializing with registerClassAlias to preserve the type of serialized the object (otherwise it will be treated as an Object object). You only need to do it once in the app life cycle, but it must be done before any read / write to the Shared Object.
Something along the lines of:
registerClassAlias("flash.utils.ByteArray",ByteArray);
I'd use Shared Objects rather than Javascript. Just keep in mind that you'll most likely have to ask the user to give you more space for storing the images (which seems reasonable enough if you're allowing them to work offline), and that the user could delete the data at any time (just like he could delete their browser's cookies).
Edit
I realize I didn't really pay much attention the "we have chosen Gears to do so" part of your question.
In that case, you could give the base 64 approach a try to pass the data to JS. From the Actionscript side it's easy (grab one of the many available Base64 encoders/decoders out there), and I assume the Gear's API must have an encoder / decoder available already (or at least it shouldn't be hard to find one). At that point you'll probably have to turn that into a Blob and store it to disk (maybe using the BlobAPI, but I'm not sure as I don't have experience with Gears).

We Keep Coding

JavaScript is the programming language of the Web.