Validating a URL in Node.js - javascript

I want to validate a URL of the types:
www.google.com
http://www.google.com
google.com
using a single regular expression, is it achievable? If so, kindly share a solution in JavaScript.
Please note I only expect the underlying protocols to be HTTP or HTTPS. Moreover, the main question on hand is how can we map all these three patterns using one single regex expression in JavaScript? It doesn't have to check whether the page is active or not. If the value entered by the user matches any of the above listed three cases, it should return true on the other hand if it doesn't it should return false.

There is no need to use a third party library.
To check if a string is a valid URL
const URL = require("url").URL;
const stringIsAValidUrl = (s) => {
try {
new URL(s);
return true;
} catch (err) {
return false;
}
};
stringIsAValidUrl("https://www.example.com:777/a/b?c=d&e=f#g"); //true
stringIsAValidUrl("invalid"): //false
Edit
If you need to restrict the protocol to a range of protocols you can do something like this
const { URL, parse } = require('url');
const stringIsAValidUrl = (s, protocols) => {
try {
new URL(s);
const parsed = parse(s);
return protocols
? parsed.protocol
? protocols.map(x => `${x.toLowerCase()}:`).includes(parsed.protocol)
: false
: true;
} catch (err) {
return false;
}
};
stringIsAValidUrl('abc://www.example.com:777/a/b?c=d&e=f#g', ['http', 'https']); // false
stringIsAValidUrl('abc://www.example.com:777/a/b?c=d&e=f#g'); // true
Edit
Due to parse depreciation the code is simplified a little bit more. To address protocol only test returns true issue, I have to say this utility function is a template. You can adopt it to your use case easily. The above mentioned issue is covered by a simple test of url.host !== ""
const { URL } = require('url');
const stringIsAValidUrl = (s, protocols) => {
try {
url = new URL(s);
return protocols
? url.protocol
? protocols.map(x => `${x.toLowerCase()}:`).includes(url.protocol)
: false
: true;
} catch (err) {
return false;
}
};

There's a package called valid-url
var validUrl = require('valid-url');
var url = "http://bla.com"
if (validUrl.isUri(url)){
console.log('Looks like an URI');
}
else {
console.log('Not a URI');
}
Installation:
npm install valid-url --save
If you want a simple REGEX - check this out

The "valid-url" npm package did not work for me. It returned valid, for an invalid url. What worked for me was "url-exists"
const urlExists = require("url-exists");
urlExists(myurl, function(err, exists) {
if (exists) {
res.send('Good URL');
} else {
res.send('Bad URL');
}
});

Using the url module seems to do the trick.
Node.js v15.8.0 Documentation - url module
const url = require('url');
try {
const myURL = new URL(imageUrl);
} catch (error) {
console.log(`${Date().toString()}: ${error.input} is not a valid url`);
return res.status(400).send(`${error.input} is not a valid url`);
}

Other easy way is use Node.JS DNS module.
The DNS module provides a way of performing name resolutions, and with it you can verify if the url is valid or not.
const dns = require('dns');
const url = require('url');
const lookupUrl = "https://stackoverflow.com";
const parsedLookupUrl = url.parse(lookupUrl);
dns.lookup(parsedLookupUrl.protocol ? parsedLookupUrl.host
: parsedLookupUrl.path, (error,address,family)=>{
console.log(error || !address ? lookupUrl + ' is an invalid url!'
: lookupUrl + ' is a valid url: ' + ' at ' + address);
}
);
That way you can check if the url is valid and if it exists

I am currently having the same problem, and Pouya's answer will do the job just fine. The only reason I won't be using it is because I am already using the NPM package validate.js and it can handle URLs.
As you can see from the document, the URL validator the regular expression based on this gist so you can use it without uing the whole package.
I am not a big fan of Regular Expressions, but if you are looking for one, it is better to go with a RegEx used in popular packages.

Related

Retrieve a pdf's bookmark data using VanillaJS/Node.js

I'm trying to retrieve a pdf's meta data, looking specifically for a bookmark's page number using VanillaJS/node.js with no libraries. The file is located locally on the desktop.
I found this bit of code in another answer but it only returns the length of the document. I have tried to change the regex to look for letters, but if then returns an array of 500000 letters.
Is it even possible? If libraries are required, does anyone know of one that can do this?
Thanks
const fs = require('fs').promises
let rawData = await fs.readFile(fullPath, 'utf8', (err, data) => {
if (err) {
console.error('test error', err);
return;
}
});
async function pdfDetails(data) {
return new Promise(done => {
let Pages2 = data.match(/[a-zA-Z]/g);
let regex = /<xmp.*?:(.*?)>(.*?)</g;
let meta = [{
Pages
}];
let matches = regex.exec(data);
while (matches != null) {
matches.shift();
meta.push({
[matches.shift()]: matches.shift()
});
matches = regex.exec(data);
}
done(meta);
});
}
let details = await pdfDetails(rawData)
console.log(details)
Due to the difficulty of using vanilla JS, and problems with libraries that may have worked (due to node version conflicts), I ended up using PDFTron services.

Access_token works in localhost not in server

I am using the following code in my application to check if some headers are provided .The code works fine in localhost but not when the application is deployed to server . Basically I am trying to check if headers are present in the request. On the server , I keep getting invalid request . When I pass accesstoken instead of access_token , the request goes through successfully . So by changing if ((request.headers.access_token && request.headers.refresh_token && request.headers.id_token) || request.headers.token)
to
The code works , my question is why is this happening
const Hapi = require('hapi');
const Path = require('path');
const axios = require('axios');
var tokenValidation = function (request, reply) {
if ((request.headers.access_token && request.headers.refresh_token && request.headers.id_token) || request.headers.token) {
if (request.headers.access_token != undefined) {
//do something
}
else {
return reply.continue();
}
} else
return reply.continue();
}
else {
var err = Boom.badRequest(‘Invalid request.');
reply(err);
}
}
server.ext('onRequest', tokenValidation);
Missing (disappearing) HTTP Headers
If you do not explicitly set underscores_in_headers on;, NGINX will silently drop HTTP headers with underscores (which are perfectly valid according to the HTTP standard). This is done in order to prevent ambiguities when mapping headers to CGI variables as both dashes and underscores are mapped to underscores during that process.
https://www.nginx.com/resources/wiki/start/topics/tutorials/config_pitfalls/#missing--28disappearing-29-http-headers
We have to explicitly underscores_in_headers on in NGINX , else they will be ignored

JavaScript crypto hashing translation to Python 3

I am trying to make a Python 3 application to download weather data from my account at http://www.osanywhereweather.com. I have found JavaScript source code that does exactly this at https://github.com/zrrrzzt/osanywhereweather. I am assuming that the github code works. When inspecting the source of osanywhereweather.com, it seems to me that the github code resembles that very much.
I am new to Python 3 and I have never coded in JavaScript, and I know nothing about cryptographics. I have, however, done a fair share of coding over the last 35 or years, so I read code fairly well. I therefore thought it would be relatively easy to translate the github JavaScript code to Python 3. I was wrong, it seems.
The code of interest is the part of the code that hashes e-mail and password together with a "challenge" received from osanwhereweather.com in order to authenticate me.
I have not been able to test the JavaScript code, but as I said I think it compares well with the source of the osanywhereweather.com page. By analyzing the traffic in my web browser, I can see the information exchanged between osanywhereweather.com and my browser, so that I have got a consistent set of challenge and saltedHash.
When trying to create the same saltedHash based on the corresponding challenge with my Python 3 code, I get a different result.
I have tried internet searches to see if I can find out what I'm doing wrong, but to no avail. If anyone is proficient in JavaScript, Python and cryptographics and is able to point out what I'm doing wrong, I would indeed be grateful.
JavaScript code:
'use strict';
var crypto = require('crypto');
function osaHash(email, password) {
var shasum = crypto.createHash('sha1').update(email);
var e = '$p5k2$2710$' + shasum.digest('hex').toString().substring(0, 8);
var res = crypto.pbkdf2Sync(password, e, 1e4, 32, 'sha256');
var r = res.toString('base64').replace(/\+/g, '.');
return e + '$' + r;
}
function createHash(opts, callback) {
if (!opts) {
return callback(new Error('Missing required input: options'), null);
}
if (!opts.email) {
return callback(new Error('Missing required param: options.email'), null);
}
if (!opts.password) {
return callback(new Error('Missing required param: options.password'), null);
}
if (!opts.challenge) {
return callback(new Error('Missing required param: options.challenge'), null);
}
var hash = osaHash(opts.email, opts.password);
var hmac = crypto.createHmac('sha1', hash).update(opts.challenge);
var saltedHash = hmac.digest('hex');
return callback(null, saltedHash);
}
module.exports = createHash;
Python 3 code:
import hmac
import hashlib
import base64
e_mail = 'me#mydomain.com'
password = 'Secret'
''' challenge is received from osanywhereweather.com '''
challenge = '15993b900f954e659a016cf073ef90c1'
shasum = hashlib.new('sha1')
shasum.update(e_mail.encode())
shasum_hexdigest = shasum.hexdigest()
shasum_substring = shasum_hexdigest[0:8]
e = '$p5k2$2710$' + shasum_substring
res = hashlib.pbkdf2_hmac('sha256',password.encode(),e.encode(),10000,32)
r = base64.b64encode(res,b'./')
hashstr = str(e) + '$' + str(r)
hmac1 = hmac.new(challenge.encode(), hashstr.encode(), 'sha1')
saltedHash = hmac1.hexdigest()
hashstr = str(e) + '$' + str(r)
In the above line, str(r) will give you: "b'ZogTXTk8T72jy01H9G6Y0L7mjHHR7IG0VKMcWZUbVqQ='".
You need to use r.decode() to get "ZogTXTk8T72jy01H9G6Y0L7mjHHR7IG0VKMcWZUbVqQ=".
hashstr = str(e) + '$' + r.decode()
UPDATE 1
Arugments to hmac.new should be fixed:
hmac1 = hmac.new(hashstr.encode(), challenge.encode(), 'sha1')
UPDATE 2
According to OP's comment, OP doesn't need to do the following.
Another thing is that, crypto.pbkdf2Sync seems does not respect digest argument. It seems always use sha1 digest (At least in my system, NodeJS 0.10.25). So you need to specify sha1 in python side:
res = hashlib.pbkdf2_hmac('sha1', password.encode(), e.encode(), 10000, 32)
Based on falsetru's response, the following Python 3 code has been verified to work with the osanywhereweather.com site:
import hmac
import hashlib
import base64
e_mail = 'me#mydomain.com'
password = 'Secret'
''' challenge is received from osanywhereweather.com '''
challenge = '15993b900f954e659a016cf073ef90c1'
shasum = hashlib.new('sha1')
shasum.update(e_mail.encode())
shasum_hexdigest = shasum.hexdigest()
shasum_substring = shasum_hexdigest[0:8]
e = '$p5k2$2710$' + shasum_substring
res = hashlib.pbkdf2_hmac('sha256',password.encode(),e.encode(),10000,32)
r = base64.b64encode(res,b'./')
hashstr = str(e) + '$' + r.decode()
hmac1 = hmac.new(hashstr.encode(), challenge.encode(), 'sha1')
saltedHash = hmac1.hexdigest()
Thank you to falsetru!

How to pull url file extension out of url string using javascript

How do I find the file extension of a URL using javascript?
example URL:
http://www.adobe.com/products/flashplayer/include/marquee/design.swf?width=792&height=294
I just want the 'swf' of the entire URL.
I need it to find the extension if the url was also in the following format
http://www.adobe.com/products/flashplayer/include/marquee/design.swf
Obviously this URL does not have the parameters behind it.
Anybody know?
Thanks in advance
function get_url_extension( url ) {
return url.split(/[#?]/)[0].split('.').pop().trim();
}
example:
get_url_extension('https://example.com/folder/file.jpg');
get_url_extension('https://example.com/fold.er/fil.e.jpg?param.eter#hash=12.345');
outputs ------> jpg
Something like this maybe?
var fileName = 'http://localhost/assets/images/main.jpg';
var extension = fileName.split('.').pop();
console.log(extension, extension === 'jpg');
The result you see in the console is.
jpg true
if for some reason you have a url like this something.jpg?name=blah or something.jpg#blah then you could do
extension = extension.split(/\#|\?/g)[0];
drop in
var fileExtension = function( url ) {
return url.split('.').pop().split(/\#|\?/)[0];
}
For the extension you could use this function:
function ext(url) {
// Remove everything to the last slash in URL
url = url.substr(1 + url.lastIndexOf("/"));
// Break URL at ? and take first part (file name, extension)
url = url.split('?')[0];
// Sometimes URL doesn't have ? but #, so we should aslo do the same for #
url = url.split('#')[0];
// Now we have only extension
return url;
}
Or shorter:
function ext(url) {
return (url = url.substr(1 + url.lastIndexOf("/")).split('?')[0]).split('#')[0].substr(url.lastIndexOf("."))
}
Examples:
ext("design.swf")
ext("/design.swf")
ext("http://www.adobe.com/products/flashplayer/include/marquee/design.swf")
ext("/marquee/design.swf?width=792&height=294")
ext("design.swf?f=aa.bb")
ext("../?design.swf?width=792&height=294&.XXX")
ext("http://www.example.com/some/page.html#fragment1")
ext("http://www.example.com/some/dynamic.php?foo=bar#fragment1")
Note:
File extension is provided with dot (.) at the beginning. So if result.charat(0) != "." there is no extension.
This is the answer:
var extension = path.match(/\.([^\./\?]+)($|\?)/)[1];
Take a look at regular expressions. Specifically, something like /([^.]+.[^?])\?/.
// Gets file extension from URL, or return false if there's no extension
function getExtension(url) {
// Extension starts after the first dot after the last slash
var extStart = url.indexOf('.',url.lastIndexOf('/')+1);
if (extStart==-1) return false;
var ext = url.substr(extStart+1),
// end of extension must be one of: end-of-string or question-mark or hash-mark
extEnd = ext.search(/$|[?#]/);
return ext.substring (0,extEnd);
}
url.split('?')[0].split('.').pop()
usually #hash is not part of the url but treated separately
This method works fine :
function getUrlExtension(url) {
try {
return url.match(/^https?:\/\/.*[\\\/][^\?#]*\.([a-zA-Z0-9]+)\??#?/)[1]
} catch (ignored) {
return false;
}
}
You can use the (relatively) new URL object to help you parse your url. The property pathname is especially useful because it returns the url path without the hostname and parameters.
let url = new URL('http://www.adobe.com/products/flashplayer/include/marquee/design.swf?width=792&height=294');
// the .pathname method returns the path
url.pathname; // returns "/products/flashplayer/include/marquee/design.swf"
// now get the file name
let filename = url.pathname.split('/').reverse()[0]
// returns "design.swf"
let ext = filename.split('.')[1];
// returns 'swf'
var doc = document.location.toString().substring(document.location.toString().lastIndexOf("/"))
alert(doc.substring(doc.lastIndexOf(".")))
const getUrlFileType = (url: string) => {
const u = new URL(url)
const ext = u.pathname.split(".").pop()
return ext === "/"
? undefined
: ext.toLowerCase()
}
function ext(url){
var ext = url.substr(url.lastIndexOf('/') + 1),
ext = ext.split('?')[0],
ext = ext.split('#')[0],
dot = ext.lastIndexOf('.');
return dot > -1 ? ext.substring(dot + 1) : '';
}
If you can use npm packages, File-type is another option.
They have browser support, so you can do this (taken from their docs):
const FileType = require('file-type/browser');
const url = 'https://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg';
(async () => {
const response = await fetch(url);
const fileType = await FileType.fromStream(response.body);
console.log(fileType);
//=> {ext: 'jpg', mime: 'image/jpeg'}
})();
It works for gifs too!
Actually, I like to imporve this answer, it means my answer will support # too:
const extExtractor = (url: string): string =>
url.split('?')[0].split('#')[0].split('.').pop() || '';
This function returns the file extension in any case.
If you wanna use this solution. these packages are using latest import/export method.
in case you wanna use const/require bcz your project is using commonJS you should downgrade to older version.
i used
"got": "11.8.5","file-type": "16.5.4",
const FileType = require('file-type');
const got = require('got');
const url ='https://upload.wikimedia.org/wikipedia/en/a/a9/Example.jpg';
(async () => {
const stream = got.stream(url);
console.log(await FileType.fromStream(stream));
})();
var fileExtension = function( url ) {
var length=url.split(?,1);
return length
}
document.write("the url is :"+length);

How to determine the OS path separator in JavaScript?

How can I tell in JavaScript what path separator is used in the OS where the script is running?
Use path module in node.js returns the platform-specific file separator.
example
path.sep // on *nix evaluates to a string equal to "/"
Edit: As per Sebas's comment below, to use this, you need to add this at the top of your js file:
const path = require('path')
Afair you can always use / as a path separator, even on Windows.
Quote from http://bytes.com/forum/thread23123.html:
So, the situation can be summed up
rather simply:
All DOS services since DOS 2.0 and all Windows APIs accept either forward
slash or backslash. Always have.
None of the standard command shells (CMD or COMMAND) will accept forward
slashes. Even the "cd ./tmp" example
given in a previous post fails.
The Correct Answer
Yes all OS's accept CD ../ or CD ..\ or CD .. regardless of how you pass in separators. But what about reading a path back. How would you know if its say, a 'windows' path, with ' ' and \ allowed.
The Obvious 'Duh!' Question
What happens when you depend on, for example, the installation directory %PROGRAM_FILES% (x86)\Notepad++. Take the following example.
var fs = require('fs'); // file system module
var targetDir = 'C:\Program Files (x86)\Notepad++'; // target installer dir
// read all files in the directory
fs.readdir(targetDir, function(err, files) {
if(!err){
for(var i = 0; i < files.length; ++i){
var currFile = files[i];
console.log(currFile);
// ex output: 'C:\Program Files (x86)\Notepad++\notepad++.exe'
// attempt to print the parent directory of currFile
var fileDir = getDir(currFile);
console.log(fileDir);
// output is empty string, ''...what!?
}
}
});
function getDir(filePath){
if(filePath !== '' && filePath != null){
// this will fail on Windows, and work on Others
return filePath.substring(0, filePath.lastIndexOf('/') + 1);
}
}
What happened!?
targetDir is being set to a substring between the indices 0, and 0 (indexOf('/') is -1 in C:\Program Files\Notepad\Notepad++.exe), resulting in the empty string.
The Solution...
This includes code from the following post: How do I determine the current operating system with Node.js
myGlobals = { isWin: false, isOsX:false, isNix:false };
Server side detection of OS.
// this var could likely a global or available to all parts of your app
if(/^win/.test(process.platform)) { myGlobals.isWin=true; }
else if(process.platform === 'darwin'){ myGlobals.isOsX=true; }
else if(process.platform === 'linux') { myGlobals.isNix=true; }
Browser side detection of OS
var appVer = navigator.appVersion;
if (appVer.indexOf("Win")!=-1) myGlobals.isWin = true;
else if (appVer.indexOf("Mac")!=-1) myGlobals.isOsX = true;
else if (appVer.indexOf("X11")!=-1) myGlobals.isNix = true;
else if (appVer.indexOf("Linux")!=-1) myGlobals.isNix = true;
Helper Function to get the separator
function getPathSeparator(){
if(myGlobals.isWin){
return '\\';
}
else if(myGlobals.isOsx || myGlobals.isNix){
return '/';
}
// default to *nix system.
return '/';
}
// modifying our getDir method from above...
Helper function to get the parent directory (cross platform)
function getDir(filePath){
if(filePath !== '' && filePath != null){
// this will fail on Windows, and work on Others
return filePath.substring(0, filePath.lastIndexOf(getPathSeparator()) + 1);
}
}
getDir() must be intelligent enough to know which its looking for.
You can get even really slick and check for both if the user is inputting a path via command line, etc.
// in the body of getDir() ...
var sepIndex = filePath.lastIndexOf('/');
if(sepIndex == -1){
sepIndex = filePath.lastIndexOf('\\');
}
// include the trailing separator
return filePath.substring(0, sepIndex+1);
You can also use 'path' module and path.sep as stated above, if you want to load a module to do this simple of a task. Personally, i think it sufficient to just check the information from the process that is already available to you.
var path = require('path');
var fileSep = path.sep; // returns '\\' on windows, '/' on *nix
And Thats All Folks!
As already answered here, you can find the OS specific path separator with path.sep to manually construct your path. But you can also let path.join do the job, which is my preferred solution when dealing with path constructions:
Example:
const path = require('path');
const directory = 'logs';
const file = 'data.json';
const path1 = `${directory}${path.sep}${file}`;
const path2 = path.join(directory, file);
console.log(path1); // Shows "logs\data.json" on Windows
console.log(path2); // Also shows "logs\data.json" on Windows
Just use "/", it works on all OS's as far as I know.

Categories