Parse base64 string coming fron javascript front end (data-uri, rfc2397) - javascript

My javascript frontend is sending the base64 encoded string:
...`
I need to get just the base64 data, that means the iVBORw0KGgoAAAANSUhEUgAAAM8AAADkCAIAAACwiOf9AAAAA3NCSVQICAjb4U/gAAAgAElEQVR4nO.... Basically the data:image/png;base64, needs to be removed. Is there some standard python library that I can use to perform this operation, or must I roll my own regexes?
The base64 library just offers support for encoding/decoding, which is not what I need: I want to keep the base64 encoded data, just without the prefix.

For reference for others, I have prepared a small library for this:
_compiled = False
_compiled1 = None
_compiled2 = None
def compile_it():
global _compiled, _compiled1, _compiled2
if not _compiled:
regex1 = r'^data:(?P<mediatype>[^\;]*);base64,(?P<data>.*)'
regex2 = r'^data:(?P<mediatype>[^\;]*),(?P<data>.*)'
_compiled = True
_compiled1 = re.compile(regex1)
_compiled2 = re.compile(regex2)
def clean_data_uri(data_in):
# Clean base64 data coming from the frontend
# ... -> iVBORw0KGgoAAAA...
# As specified in RFC 2397
# http://stackoverflow.com/q/20517429/647991
# http://en.wikipedia.org/wiki/Data_URI_scheme
# http://tools.ietf.org/html/rfc2397
# Format is : data:[<mediatype>][;base64],<data>
compile_it()
try:
m = _compiled1.match(data_in)
success = True
base64 = True
mediatype = m.group('mediatype')
data = m.group('data')
except:
try:
m = _compiled2.match(data_in)
success = True
base64 = False
mediatype = m.group('mediatype')
data = m.group('data')
except Exception, e:
log.warning('clean_data_uri > Not possible to parse data_in > %s', e)
success = False
base64 = False
mediatype = None
data = None
if not success:
log.error('clean_data_uri > Problems splitting data')
return success, mediatype, base64, data

Related

I want to decompress a GZIP string with JavaScript

I have this GZIPed string: H4sIAAAAAAAA//NIzcnJVyguSUzOzi9LLUrLyS/XUSjJSMzLLlZIyy9SSMwpT6wsVshIzSnIzEtXBACs78K6LwAAAA==
I created that with this website: http://www.txtwizard.net/compression
I have tried using pako to ungzip it.
import { ungzip } from 'pako';
const textEncoder = new TextEncoder();
const gzipedData = textEncoder.encode("H4sIAAAAAAAA//NIzcnJVyguSUzOzi9LLUrLyS/XUSjJSMzLLlZIyy9SSMwpT6wsVshIzSnIzEtXBACs78K6LwAAAA==");
console.log('gzipeddata', gzipedData);
const ungzipedData = ungzip(gzipedData);
console.log('ungziped data', ungzipedData);
The issue is that Pako throws the error: incorrect header check
What am I missing here?
A JSbin
The "H4sIAAAAAAAA//NIzcnJVyguSUzOzi9LLUrLyS/XUSjJSMzLLlZIyy9SSMwpT6wsVshIzSnIzEtXBACs78K6LwAAAA==" is a base64 encoded string, you first need to decode that into a buffer.
textEncoder.encode just encodes that base64 encoded string into a byte stream.
How to do that depend on whether you are in a browser or on nodejs.
node.js version
To convert the unzipped data to a string you further have use new TextDecoder().decode()
For node you will use Buffer.from(string, 'base64') to decode the base64 encoded string:
import { ungzip } from 'pako';
// decode the base64 encoded data
const gzipedData = Buffer.from("H4sIAAAAAAAA//NIzcnJVyguSUzOzi9LLUrLyS/XUSjJSMzLLlZIyy9SSMwpT6wsVshIzSnIzEtXBACs78K6LwAAAA==", "base64");
console.log('gzipeddata', gzipedData);
const ungzipedData = ungzip(gzipedData);
console.log('ungziped data', new TextDecoder().decode(ungzipedData));
browser version
In the browser, you have to use atob, and you need to convert the decoded data to an Uint8Array using e.g. Uint8Array.from.
The conversion I used was taken from Convert base64 string to ArrayBuffer, you might need to verify if that really works in all cases.
// decode the base64 encoded data
const gezipedData = atob("H4sIAAAAAAAA//NIzcnJVyguSUzOzi9LLUrLyS/XUSjJSMzLLlZIyy9SSMwpT6wsVshIzSnIzEtXBACs78K6LwAAAA==")
const gzipedDataArray = Uint8Array.from(gezipedData, c => c.charCodeAt(0))
console.log('gzipeddata', gzipedDataArray);
const ungzipedData = pako.ungzip(gzipedDataArray);
console.log('ungziped data', new TextDecoder().decode(ungzipedData));
<script src="https://cdnjs.cloudflare.com/ajax/libs/pako/2.0.4/pako.min.js"></script>
This string is base64-encoded.
You first need to decode it to a buffer:
const gzippedString = 'H4sIAAAAAAAA//NIzcnJVyguSUzOzi9LLUrLyS/XUSjJSMzLLlZIyy9SSMwpT6wsVshIzSnIzEtXBACs78K6LwAAAA==';
const gzippedBuffer = new Buffer(gzippedString, 'base64');
Then you can un(g)zip it:
const unzippedBuffer = ungzip(gzippedBuffer);
The result on ungzip is a Unit8Array. If you want to convert it back to a string you'll need to decode it again:
const unzippedString = new TextDecoder('utf8').decode(unzipped);

Add = or == to a base64 if necessary

I am using the Ionic Choose plugin to take a file in an app, this one in the resolve returns me a base64 of the file I chose, the problem is that this base64 never ends with = or == when necessary in some cases.
When trying to send this base64 to the server it returns an error because the = or == is missing at the end of the string.
So I get the base64
this.chooser.getFile()
.then((file) => {
let base64 = file.dataURI;
})
.catch((err) => {
console.log(err);
});
Can I somehow detect if base64 needs either = or == to complete the string?
Base64 'blocks' consists of 4 characters, so if the final block contains less than 4 characters it gets padded with "=" sign, so it can have anywhere from 0 to 2 "=" signs in them, that's why the length of the dataURI needs to be a multiple of 4, You can do something like this:
let base64 = file.dataURI;
const len = base64.length;
const modifiedBase64 = base64 + "=".repeat(len%4);

Base64 in Javascript

I send some image from java.
I need to do URLEncoder because I need to send the Base64 data in the URL
String base64Document = Base64.encode(documentBytes);
base64Document = URLEncoder.encode(base64Document, "UTF-8"); // This class contains static methods for converting a String to the application/x-www-form-urlencoded MIMEformat
But when I load this data I dont have any image.
I need a way to decode this.
My code works for canvas but not for image
console.log("Encoded" + vm.documentData.base64Document);
vm.documentData.base64Document = urlDecode(vm.documentData.base64Document);
console.log("Decoded" + vm.documentData.base64Document);
// Decode Safe URL Base64
function urlDecode(encoded) {
encoded = encoded.replace(/-/g, '+').replace(/_/g, '/');
while (encoded.length % 4)
encoded += '=';
return new ArrayBuffer(encoded || '', 'base64').toString('utf8');
};
html
<img class="document" src="data:image/png;base64,{{vm.documentData.base64Document}}" />
GET data:image/png;base64,[object ArrayBuffer] net::ERR_INVALID_URL
ArrayBuffer.prototype.toString will always return [object ArrayBuffer].
If you need to convert it into a string, I suggest to read this answer: Converting between strings and ArrayBuffers

Pull variable value from javascript source using BeautifulSoup4 Python

I'm newbie in python programming. I'm learning beautifulsoup to scrap website.
I want to extract and store the value of "stream" to my variable.
My Python code as follows :
import bs4 as bs #Importing BeautifulSoup4 Python Library.
import urllib.request
import requests
import json
import re
headers = {'User-Agent':'Mozilla/5.0'}
url = "http://thoptv.com/partners/mhdTVlive/Core.php?level=1200&channel=Dsports_HD"
page = requests.get(url)
soup = bs.BeautifulSoup(page.text,"html.parser")
pattern = re.compile('var stream = (.*?);')
scripts = soup.find_all('script')
for script in scripts:
if(pattern.match(str(script.string))):
data = pattern.match(script.string)
links = json.loads(data.groups()[0])
print(links)
This is the source javascript code to get the stream url value.
https://content.jwplatform.com/libraries/oncyToRO.js'>if( navigator.userAgent.match(/android/i)||
navigator.userAgent.match(/webOS/i)||
navigator.userAgent.match(/iPhone/i)||
navigator.userAgent.match(/iPad/i)||
navigator.userAgent.match(/iPod/i)||
navigator.userAgent.match(/BlackBerry/i)||
navigator.userAgent.match(/Windows Phone/i)) {var stream =
"http://ssrigcdnems01.cdnsrv.jio.com/jiotv.live.cdn.jio.com/Dsports_HD/Dsports_HD_800.m3u8?jct=ibxIPxc6rkq1yIUJb4RlEV&pxe=1504146411&st=AQIC5wM2LY4SfczRaEwgGl4Dyvly_3HihdlD_Oduojk5Kxs.AAJTSQACMDIAAlNLABQtNjUxNDEwODczODgxNzkyMzg5OQACUzEAAjYw";}else{var
stream =
"http://hd.simiptv.com:8080//index.m3u8?key=VIoVSsGRLRouHWGNo1epzX&exp=932213423&domain=thoptv.stream&id=461";}jwplayer("THOPTVPlayer").setup({"title":
'thoptv.stream',"stretching":"exactfit","width": "100%","file":
none,"height": "100%","skin": "seven","autostart": "true","logo":
{"file":"https://i.imgur.com/EprI2uu.png","margin":"-0",
"position":"top-left","hide":"false","link":"http://mhdtvlive.co.in"},"androidhls":
true,});jwplayer("THOPTVPlayer").onError(function(){jwplayer().load({file:"http://content.jwplatform.com/videos/7RtXk3vl-52qL9xLP.mp4",image:"http://content.jwplatform.com/thumbs/7RtXk3vl-480.jpg"});jwplayer().play();});jwplayer("THOPTVPlayer").onComplete(function(){window.location
= window.location.href;});jwplayer("THOPTVPlayer").onPlay(function(){clearTimeout(theTimeout);});
I need to extract the url from stream.
var stream = "http://ssrigcdnems01.cdnsrv.jio.com/jiotv.live.cdn.jio.com/Dsports_HD/Dsports_HD_800.m3u8?jct=ibxIPxc6rkq1yIUJb4RlEV&pxe=1504146411&st=AQIC5wM2LY4SfczRaEwgGl4Dyvly_3HihdlD_Oduojk5Kxs.AAJTSQACMDIAAlNLABQtNjUxNDEwODczODgxNzkyMzg5OQACUzEAAjYw";}
Rather then thinking complicated with regex, if the link is the only dynamically changing part, you can split the string with some known separating tokens.
x = """
https://content.jwplatform.com/libraries/oncyToRO.js'>if( navigator.userAgent.match(/android/i)|| navigator.userAgent.match(/webOS/i)|| navigator.userAgent.match(/iPhone/i)|| navigator.userAgent.match(/iPad/i)|| navigator.userAgent.match(/iPod/i)|| navigator.userAgent.match(/BlackBerry/i)|| navigator.userAgent.match(/Windows Phone/i)) {var stream = "http://ssrigcdnems01.cdnsrv.jio.com/jiotv.live.cdn.jio.com/Dsports_HD/Dsports_HD_800.m3u8?jct=ibxIPxc6rkq1yIUJb4RlEV&pxe=1504146411&st=AQIC5wM2LY4SfczRaEwgGl4Dyvly_3HihdlD_Oduojk5Kxs.AAJTSQACMDIAAlNLABQtNjUxNDEwODczODgxNzkyMzg5OQACUzEAAjYw";}else{var stream = "http://hd.simiptv.com:8080//index.m3u8?key=VIoVSsGRLRouHWGNo1epzX&exp=932213423&domain=thoptv.stream&id=461";}jwplayer("THOPTVPlayer").setup({"title": 'thoptv.stream',"stretching":"exactfit","width": "100%","file": none,"height": "100%","skin": "seven","autostart": "true","logo": {"file":"https://i.imgur.com/EprI2uu.png","margin":"-0", "position":"top-left","hide":"false","link":"http://mhdtvlive.co.in"},"androidhls": true,});jwplayer("THOPTVPlayer").onError(function(){jwplayer().load({file:"http://content.jwplatform.com/videos/7RtXk3vl-52qL9xLP.mp4",image:"http://content.jwplatform.com/thumbs/7RtXk3vl-480.jpg"});jwplayer().play();});jwplayer("THOPTVPlayer").onComplete(function(){window.location = window.location.href;});jwplayer("THOPTVPlayer").onPlay(function(){clearTimeout(theTimeout);});
"""
left1, right1 = x.split("Phone/i)) {var stream =")
left2, right2 = right1.split(";}else")
print(left2)
# "http://ssrigcdnems01.cdnsrv.jio.com/jiotv.live.cdn.jio.com/Dsports_HD/Dsports_HD_800.m3u8?jct=ibxIPxc6rkq1yIUJb4RlEV&pxe=1504146411&st=AQIC5wM2LY4SfczRaEwgGl4Dyvly_3HihdlD_Oduojk5Kxs.AAJTSQACMDIAAlNLABQtNjUxNDEwODczODgxNzkyMzg5OQACUzEAAjYw"
pattern.match() matches the pattern from the beginning of the string. Try using pattern.search() instead - it will match anywhere within the string.
Change your for loop to this:
for script in scripts:
data = pattern.search(script.text)
if data is not None:
stream_url = data.groups()[0]
print(stream_url)
You can also get rid of the surrounding quotes by changing the regex pattern to:
pattern = re.compile('var stream = "(.*?)";')
so that the double quotes are not included in the group.
You might also have noticed that there are two possible stream variables depending on the accessing user agent. For tablet like devices the first would be appropriate, while all other user agents should use the second stream. You can use pattern.findall() to get all of them:
>>> pattern.findall(script.text)
['"http://ssrigcdnems01.cdnsrv.jio.com/jiotv.live.cdn.jio.com/Dsports_HD/Dsports_HD_800.m3u8?jct=LEurobVVelOhbzOZ6EkTwr&pxe=1571716053&st=AQIC5wM2LY4SfczRaEwgGl4Dyvly_3HihdlD_Oduojk5Kxs.*AAJTSQACMDIAAlNLABQtNjUxNDEwODczODgxNzkyMzg5OQACUzEAAjYw*"', '"http://hd.simiptv.com:8080//index.m3u8?key=vaERnLJswnWXM8THmfvDq5&exp=944825312&domain=thoptv.stream&id=461"']
this code works for me
import bs4 as bs #Importing BeautifulSoup4 Python Library.
import urllib.request
import requests
import json
headers = {'User-Agent':'Mozilla/5.0'}
url = "http://thoptv.com/partners/mhdTVlive/Core.php?
level=1200&channel=Dsports_HD"
page = requests.get(url)
soup = bs.BeautifulSoup(page.text,"html.parser")
scripts = soup.find_all('script')
out = list()
for c, i in enumerate(scripts): #go over list
text = i.text
if(text[:2] == "if"): #if the (if) comes first
for count, t in enumerate(text): # then we have reached the correct item in the list
if text[count] == "{" and text[count + 1] == "v" and text[count + 5] == "s": # and if this is here that stream is set
tmp = text[count:] # add this to the tmp varible
break # and end
co = 0
for m in tmp: #loop over the results from prev. result
if m == "\"" and co == 0: #if string is starting
co = 1 #set count to "true" 1
elif m == "\"" and co == 1: # if it is ending stop
print(''.join(out)) #results
break
elif co == 1:
# as long as we are looping over the rigth string
out.append(m) #add to out list
pass
result = ''.join(out) #set result
it basicly filters the string manuely.
but if we use user1767754 method (brilliant by the way) we will end up something like this:
import bs4 as bs #Importing BeautifulSoup4 Python Library.
import urllib.request
import requests
import json
headers = {'User-Agent':'Mozilla/5.0'}
url = "http://thoptv.com/partners/mhdTVlive/Core.php?level=1200&channel=Dsports_HD"
page = requests.get(url)
soup = bs.BeautifulSoup(page.text,"html.parser")
scripts = soup.find_all('script')
x = scripts[3].text
left1, right1 = x.split("Phone/i)) {var stream =")
left2, right2 = right1.split(";}else")
print(left2)

Decompress gzip and zlib string in javascript

I want to get compress layer data from tmx file . Who knows libraries for decompress gzip and zlib string in javascript ? I try zlib but it doesn't work for me . Ex , layer data in tmx file is :
<data encoding="base64" compression="zlib">
eJztwTEBAAAAwqD1T20JT6AAAHgaCWAAAQ==
</data>
My javascript code is
var base64Data = "eJztwTEBAAAAwqD1T20JT6AAAHgaCWAAAQ==";
var compressData = atob(base64Data);
var inflate = new Zlib.Inflate(compressData);
var output = inflate.decompress();
It runs with displays message error "unsupported compression method" . But I try decompress with online tool as http://i-tools.org/gzip , it returns correct string.
Pako is a full and modern Zlib port.
Here is a very simple example and you can work from there.
Get pako.js and you can decompress byteArray like so:
<html>
<head>
<title>Gunzipping binary gzipped string</title>
<script type="text/javascript" src="pako.js"></script>
<script type="text/javascript">
// Get datastream as Array, for example:
var charData = [31,139,8,0,0,0,0,0,0,3,5,193,219,13,0,16,16,4,192,86,214,151,102,52,33,110,35,66,108,226,60,218,55,147,164,238,24,173,19,143,241,18,85,27,58,203,57,46,29,25,198,34,163,193,247,106,179,134,15,50,167,173,148,48,0,0,0];
// Turn number array into byte-array
var binData = new Uint8Array(charData);
// Pako magic
var data = pako.inflate(binData);
// Convert gunzipped byteArray back to ascii string:
var strData = String.fromCharCode.apply(null, new Uint16Array(data));
// Output to console
console.log(strData);
</script>
</head>
<body>
Open up the developer console.
</body>
</html>
Running example: http://jsfiddle.net/9yH7M/
Alternatively you can base64 encode the array before you send it over as the Array takes up a lot of overhead when sending as JSON or XML. Decode likewise:
// Get some base64 encoded binary data from the server. Imagine we got this:
var b64Data = 'H4sIAAAAAAAAAwXB2w0AEBAEwFbWl2Y0IW4jQmziPNo3k6TuGK0Tj/ESVRs6yzkuHRnGIqPB92qzhg8yp62UMAAAAA==';
// Decode base64 (convert ascii to binary)
var strData = atob(b64Data);
// Convert binary string to character-number array
var charData = strData.split('').map(function(x){return x.charCodeAt(0);});
// Turn number array into byte-array
var binData = new Uint8Array(charData);
// Pako magic
var data = pako.inflate(binData);
// Convert gunzipped byteArray back to ascii string:
var strData = String.fromCharCode.apply(null, new Uint16Array(data));
// Output to console
console.log(strData);
Running example: http://jsfiddle.net/9yH7M/1/
To go more advanced, here is the pako API documentation.
I can solve my problem by zlib . I fix my code as below
var base64Data = "eJztwTEBAAAAwqD1T20JT6AAAHgaCWAAAQ==";
var compressData = atob(base64Data);
var compressData = compressData.split('').map(function(e) {
return e.charCodeAt(0);
});
var inflate = new Zlib.Inflate(compressData);
var output = inflate.decompress();
For anyone using Ruby on Rails, who wants to send compressed encoded data to the browser, then uncompress it via Javascript on the browser, I've combined both excellent answers above into the following solution. Here's the Rails server code in my application controller which compresses and encodes a string before sending it the browser via a #variable to a .html.erb file:
require 'zlib'
require 'base64'
def compressor (some_string)
Base64.encode64(Zlib::Deflate.deflate(some_string))
end
Here's the Javascript function, which uses pako.min.js:
function uncompress(input_field){
base64data = document.getElementById(input_field).innerText;
compressData = atob(base64data);
compressData = compressData.split('').map(function(e) {
return e.charCodeAt(0);
});
binData = new Uint8Array(compressData);
data = pako.inflate(binData);
return String.fromCharCode.apply(null, new Uint16Array(data));
}
Here's a javascript call to that uncompress function, which wants to unencode and uncompress data stored inside a hidden HTML field:
my_answer = uncompress('my_hidden_field');
Here's the entry in the Rails application.js file to call pako.min.js, which is in the /vendor/assets/javascripts directory:
//= require pako.min
And I got the pako.min.js file from here:
https://github.com/nodeca/pako/tree/master/dist
All works at my end, anyway! :-)
I was sending data from a Python script and trying to decode it in JS. Here's what I had to do:
Python
import base64
import json
import urllib.parse
import zlib
...
data_object = {
'_id': '_id',
...
}
compressed_details = base64.b64encode(zlib.compress(bytes(json.dumps(data_object), 'utf-8'))).decode("ascii")
urlsafe_object = urllib.parse.quote(str(compressed_details))#.replace('%', '\%') # you likely don't need this last part
final_URL = f'https://my.domain.com?data_object={urlsafe_object}'
...
JS
// npm install this
import pako from 'pako';
...
const urlParams = new URLSearchParams(window.location.search);
const data_object = urlParams.get('data_object');
if (data_object) {
const compressedData = Uint8Array.from(window.atob(data_object), (c) => c.charCodeAt(0));
originalObject = JSON.parse(pako.inflate(compressedData, { to: 'string' }));
};
...

Categories