Equivalent of simple storage API for restartless firefox extension - javascript

Is there an equivalent to this API or a way to call it from a restartless extension? I need to store a few strings between browser sessions.
I have found this but it seems too complicated for simple string storage. Does the SS API use the same thing behind the scene?

The simple-storage/localStorage APIs suck because of synchronous file I/O.
There are alternatives such as IndexedDB which can be used from chrome/add-on code quite easily.
You can also use localStorage in your add-on (no need to use the SDK simple-storage API), but should not use window.localStorage in overlays because that would be shared between add-ons, and cannot use window.localStorage in bootstrap.js and/or js code modules because there simply is no window. But you can construct a storage object yourself.
function getStorage(uri) {
if (!(uri instanceof Ci.nsIURI)) {
uri = Services.io.newURI(uri, null, null);
}
let principal = Cc["#mozilla.org/scriptsecuritymanager;1"].
getService(Ci.nsIScriptSecurityManager).
getNoAppCodebasePrincipal(uri);
let dsm = Cc["#mozilla.org/dom/localStorage-manager;1"].
getService(Ci.nsIDOMStorageManager);
return dsm.createStorage(principal, "");
}
var s1 = getStorage("chrome://my-addon/content/whatever.xul"); // does not actually have to point to a resource.
The usual limitations of localStorage apply (quotas and such).
BTW: The code also lets you access the localStorage of websites, e.g. getStorage("http://stackoverflow.com/");.

You can import any SDK module into normal restartless extensions this way:
const { devtools } = Cu.import("resource://gre/modules/devtools/Loader.jsm", {});
const { require } = devtools;
let ss = require('sdk/simple-storage');

You could use Session store API (nsISessionStore):
const ss = Cc["#mozilla.org/browser/sessionstore;1"].getService(Ci.nsISessionStore);
ss.setGlobalValue("my-extension-few-strings", "blah blah blah");
const fewStrings = ss.getGlobalValue("my-extension-few-strings");
// fewStrings === "blah blah blah";
ss.deleteGlobalValue("my-extension-few-strings");
Session store is shared across all extensions, so choose unique names for stored values (e.g. prepend all key names with your extension name). And unlike simple-storage and localStorage it's not limited in size.
p.s. setGlobalValue, getGlobalValue, deleteGlobalValue are not documented anywhere.

Related

Serve different cache versions using the same URL through cloudflare worker

There's a very common problem I have seen from many people who use different versions of their site for mobile and desktop, many themes have this feature. The issue is Cloudflare caches the same page regardless of the user device causing mixes and inconsistencies between desktop and mobile versions.
The most common solution is to separate the mobile version into another URL, but in my case, I want to use the same URL and make Cloudflare cache work for both desktop and mobile properly.
I found this very nice guide showing how to fix this issue, however, the worker code seems to be outdated, I had to modify some parts to make it work.
I created a new subdomain for my workers and then assigned the route to my site so it starts running.
The worker is caching everything, however, it does not have the desired feature of having different cached versions according to the device.
async function run(event) {
const { request } = event;
const cache = caches.default;
// Read the user agent of the request
const ua = request.headers.get('user-agent');
let uaValue;
if (ua.match(/mobile/i)) {
uaValue = 'mobile';
} else {
uaValue = 'desktop';
}
console.log(uaValue);
// Construct a new response object which distinguishes the cache key by device
// type.
const url = new URL(request.url);
url.searchParams.set('ua', uaValue);
const newRequest = new Request(url, request);
let response = await cache.match(newRequest);
if (!response) {
// Use the original request object when fetching the response from the
// server to avoid passing on the query parameters to our backend.
response = await fetch(request, { cf: { cacheTtl: 14400 } });
// Store the cached response with our extended query parameters.
event.waitUntil(cache.put(newRequest, response.clone()));
}
return response;
}
addEventListener('fetch', (event) => {
event.respondWith(run(event));
});
it is indeed detecting the right user agent, but it should be having two separate cache versions according to the assigned query string...
I think maybe I'm missing some configuration, I don't know why it's not working as expected. As it is right now I still get mixed my mobile and desktop cache versions.
The problem here is that fetch() itself already does normal caching, independent of your use of the Cache API around it. So fetch() might still return a cached response that is for the wrong UA.
If you could make your back-end ignore the query parameter, then you could include the query in the request passed to fetch(), so that it correctly caches the two results differently. (Enterprise customers can use custom cache keys as a way to accomplish this without changing the URL.)
If you do that, then you can also remove the cache.match() and cache.put() calls since fetch() itself will handle caching.

Overriding Browser API's

I am developing a webextension in javascript for Firefox, Chrome etc.
It is designed to prevent the users browser from being fingerprinted.
Since the majority of information used to build browser fingerprints comes from javascript API's in the browser itself, is it possible to change/spoof the values that common API's might return from within a webextension/addon?
If this is not directly possible then is there any way to control what values these API's return to the website doing the fingerprinting to protect the users privacy?
Examples of API's I am talking about are:
user agent
screen print
color depth
current resolution
available resolution
device XDPI
device YDPI
plugin list
font list
local storage
session storage
timezone
language
system language
cookies
canvas print
You can try using Object.defineProperty():
The Object.defineProperty() method defines a new property directly on an object, or modifies an existing property on an object, and returns the object.
console.log(window.screen.colorDepth); // 24
Object.defineProperty(window.screen, 'colorDepth', {
value: 'hello world',
configurable: true
});
console.log(window.screen.colorDepth); // hello world
In the above we're using Object.defineProperty to change the value of the property window.screen.colorDepth. This is where you would spoof the values using whatever method you want. You can use this same logic for modifying whichever properties you want to spoof (navigator.userAgent for example)
But there is a separation between the page's global object and the plugins global object. You should be able to overcome that by injecting the script into the document:
var code = function() {
console.log(window.screen.colorDepth); // 24
Object.defineProperty(window.screen, 'colorDepth', {
value: 'hello world',
configurable: true
});
console.log(window.screen.colorDepth); // hello world
};
var script = document.createElement('script');
script.textContent = '(' + code + ')()';
(document.head||document.documentElement).appendChild(script);
See here and here for more info.
You can download a working chrome extension using the above code here (unzip the folder, navigate to chrome://extensions in chrome and drop the folder into the window)

Detect Search Crawlers via JavaScript

I am wondering how would I go abouts in detecting search crawlers? The reason I ask is because I want to suppress certain JavaScript calls if the user agent is a bot.
I have found an example of how to to detect a certain browser, but am unable to find examples of how to detect a search crawler:
/MSIE (\d+\.\d+);/.test(navigator.userAgent); //test for MSIE x.x
Example of search crawlers I want to block:
Google
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Googlebot/2.1 (+http://www.googlebot.com/bot.html)
Googlebot/2.1 (+http://www.google.com/bot.html)
Baidu
Baiduspider+(+http://www.baidu.com/search/spider_jp.html)
Baiduspider+(+http://www.baidu.com/search/spider.htm)
BaiDuSpider
This is the regex the ruby UA agent_orange library uses to test if a userAgent looks to be a bot. You can narrow it down for specific bots by referencing the bot userAgent list here:
/bot|crawler|spider|crawling/i
For example you have some object, util.browser, you can store what type of device a user is on:
util.browser = {
bot: /bot|googlebot|crawler|spider|robot|crawling/i.test(navigator.userAgent),
mobile: ...,
desktop: ...
}
Try this. It's based on the crawlers list on available on https://github.com/monperrus/crawler-user-agents
var botPattern = "(googlebot\/|bot|Googlebot-Mobile|Googlebot-Image|Google favicon|Mediapartners-Google|bingbot|slurp|java|wget|curl|Commons-HttpClient|Python-urllib|libwww|httpunit|nutch|phpcrawl|msnbot|jyxobot|FAST-WebCrawler|FAST Enterprise Crawler|biglotron|teoma|convera|seekbot|gigablast|exabot|ngbot|ia_archiver|GingerCrawler|webmon |httrack|webcrawler|grub.org|UsineNouvelleCrawler|antibot|netresearchserver|speedy|fluffy|bibnum.bnf|findlink|msrbot|panscient|yacybot|AISearchBot|IOI|ips-agent|tagoobot|MJ12bot|dotbot|woriobot|yanga|buzzbot|mlbot|yandexbot|purebot|Linguee Bot|Voyager|CyberPatrol|voilabot|baiduspider|citeseerxbot|spbot|twengabot|postrank|turnitinbot|scribdbot|page2rss|sitebot|linkdex|Adidxbot|blekkobot|ezooms|dotbot|Mail.RU_Bot|discobot|heritrix|findthatfile|europarchive.org|NerdByNature.Bot|sistrix crawler|ahrefsbot|Aboundex|domaincrawler|wbsearchbot|summify|ccbot|edisterbot|seznambot|ec2linkfinder|gslfbot|aihitbot|intelium_bot|facebookexternalhit|yeti|RetrevoPageAnalyzer|lb-spider|sogou|lssbot|careerbot|wotbox|wocbot|ichiro|DuckDuckBot|lssrocketcrawler|drupact|webcompanycrawler|acoonbot|openindexspider|gnam gnam spider|web-archive-net.com.bot|backlinkcrawler|coccoc|integromedb|content crawler spider|toplistbot|seokicks-robot|it2media-domain-crawler|ip-web-crawler.com|siteexplorer.info|elisabot|proximic|changedetection|blexbot|arabot|WeSEE:Search|niki-bot|CrystalSemanticsBot|rogerbot|360Spider|psbot|InterfaxScanBot|Lipperhey SEO Service|CC Metadata Scaper|g00g1e.net|GrapeshotCrawler|urlappendbot|brainobot|fr-crawler|binlar|SimpleCrawler|Livelapbot|Twitterbot|cXensebot|smtbot|bnf.fr_bot|A6-Indexer|ADmantX|Facebot|Twitterbot|OrangeBot|memorybot|AdvBot|MegaIndex|SemanticScholarBot|ltx71|nerdybot|xovibot|BUbiNG|Qwantify|archive.org_bot|Applebot|TweetmemeBot|crawler4j|findxbot|SemrushBot|yoozBot|lipperhey|y!j-asr|Domain Re-Animator Bot|AddThis)";
var re = new RegExp(botPattern, 'i');
var userAgent = navigator.userAgent;
if (re.test(userAgent)) {
console.log('the user agent is a crawler!');
}
The following regex will match the biggest search engines according to this post.
/bot|google|baidu|bing|msn|teoma|slurp|yandex/i
.test(navigator.userAgent)
The matches search engines are:
Baidu
Bingbot/MSN
DuckDuckGo (duckduckbot)
Google
Teoma
Yahoo!
Yandex
Additionally, I've added bot as a catchall for smaller crawlers/bots.
This might help to detect the robots user agents while also keeping things more organized:
Javascript
const detectRobot = (userAgent) => {
const robots = new RegExp([
/bot/,/spider/,/crawl/, // GENERAL TERMS
/APIs-Google/,/AdsBot/,/Googlebot/, // GOOGLE ROBOTS
/mediapartners/,/Google Favicon/,
/FeedFetcher/,/Google-Read-Aloud/,
/DuplexWeb-Google/,/googleweblight/,
/bing/,/yandex/,/baidu/,/duckduck/,/yahoo/, // OTHER ENGINES
/ecosia/,/ia_archiver/,
/facebook/,/instagram/,/pinterest/,/reddit/, // SOCIAL MEDIA
/slack/,/twitter/,/whatsapp/,/youtube/,
/semrush/, // OTHER
].map((r) => r.source).join("|"),"i"); // BUILD REGEXP + "i" FLAG
return robots.test(userAgent);
};
Typescript
const detectRobot = (userAgent: string): boolean => {
const robots = new RegExp(([
/bot/,/spider/,/crawl/, // GENERAL TERMS
/APIs-Google/,/AdsBot/,/Googlebot/, // GOOGLE ROBOTS
/mediapartners/,/Google Favicon/,
/FeedFetcher/,/Google-Read-Aloud/,
/DuplexWeb-Google/,/googleweblight/,
/bing/,/yandex/,/baidu/,/duckduck/,/yahoo/, // OTHER ENGINES
/ecosia/,/ia_archiver/,
/facebook/,/instagram/,/pinterest/,/reddit/, // SOCIAL MEDIA
/slack/,/twitter/,/whatsapp/,/youtube/,
/semrush/, // OTHER
] as RegExp[]).map((r) => r.source).join("|"),"i"); // BUILD REGEXP + "i" FLAG
return robots.test(userAgent);
};
Use on server:
const userAgent = req.get('user-agent');
const isRobot = detectRobot(userAgent);
Use on "client" / some phantom browser a bot might be using:
const userAgent = navigator.userAgent;
const isRobot = detectRobot(userAgent);
Overview of Google crawlers:
https://developers.google.com/search/docs/advanced/crawling/overview-google-crawlers
isTrusted property could help you.
The isTrusted read-only property of the Event interface is a Boolean
that is true when the event was generated by a user action, and false
when the event was created or modified by a script or dispatched via
EventTarget.dispatchEvent().
eg:
isCrawler() {
return event.isTrusted;
}
⚠ Note that IE isn't compatible.
Read more from doc: https://developer.mozilla.org/en-US/docs/Web/API/Event/isTrusted
People might light to check out the new navigator.webdriver property, which allows bots to inform you that they are bots:
https://developer.mozilla.org/en-US/docs/Web/API/Navigator/webdriver
The webdriver read-only property of the navigator interface indicates whether the user agent is controlled by automation.
It defines a standard way for co-operating user agents to inform the document that it is controlled by WebDriver, for example, so that alternate code paths can be triggered during automation.
It is supported by all major browsers and respected by major browser automation software like Puppeteer. Users of automation software can of course disable it, and so it should only be used to detect "good" bots.
I combined some of the above and removed some redundancy. I use this in .htaccess on a semi-private site:
(google|bot|crawl|spider|slurp|baidu|bing|msn|teoma|yandex|java|wget|curl|Commons-HttpClient|Python-urllib|libwww|httpunit|nutch|biglotron|convera|gigablast|archive|webmon|httrack|grub|netresearchserver|speedy|fluffy|bibnum|findlink|panscient|IOI|ips-agent|yanga|Voyager|CyberPatrol|postrank|page2rss|linkdex|ezooms|heritrix|findthatfile|Aboundex|summify|ec2linkfinder|facebook|slack|instagram|pinterest|reddit|twitter|whatsapp|yeti|RetrevoPageAnalyzer|sogou|wotbox|ichiro|drupact|coccoc|integromedb|siteexplorer|proximic|changedetection|WeSEE|scrape|scaper|g00g1e|binlar|indexer|MegaIndex|ltx71|BUbiNG|Qwantify|lipperhey|y!j-asr|AddThis)
The "test for MSIE x.x" example is just code for testing the userAgent against a Regular Expression. In your example the Regexp is the
/MSIE (\d+\.\d+);/
part. Just replace it with your own Regexp you want to test the user agent against. It would be something like
/Google|Baidu|Baiduspider/.test(navigator.userAgent)
where the vertical bar is the "or" operator to match the user agent against all of your mentioned robots. For more information about Regular Expression you can refer to this site since javascript uses perl-style RegExp.
I found this isbot package that has the built-in isbot() function. It seams to me that the package is properly maintained and that they keep everything up-to-date.
USAGE:
const isBot = require('isbot');
...
isBot(req.get('user-agent'));
Package: https://www.npmjs.com/package/isbot

Identify tab that made request in Firefox Addon SDK

I'm using the Firefox Addon SDK to build something that monitors and displays the HTTP traffic in the browser. Similar to HTTPFox or Live HTTP Headers. I am interested in identifying which tab in the browser (if any) generated the request
Using the observer-service I am monitoring for "http-on-examine-response" events. I have code like the following to identify the nsIDomWindow that generated the request:
const observer = require("observer-service"),
{Ci} = require("chrome");
function getTabFromChannel(channel) {
try {
var noteCB= channel.notificationCallbacks ? channel.notificationCallbacks : channel.loadGroup.notificationCallbacks;
if (!noteCB) { return null; }
var domWin = noteCB.getInterface(Ci.nsIDOMWindow);
return domWin.top;
} catch (e) {
dump(e + "\n");
return null;
}
}
function logHTTPTraffic(sub, data) {
sub.QueryInterface(Ci.nsIHttpChannel);
var ab = getTabFromChannel(sub);
console.log(tab);
}
observer.add("http-on-examine-response", logHTTPTraffic);
Mostly cribbed from the documentation for how to identify the browser that generated the request. Some is also taken from the Google PageSpeed Firefox addon.
Is there a recommended or preferred way to go from the nsIDOMWindow object domWin to a tab element in the SDK tabs module?
I've considered something hacky like scanning the tabs list for one with a URL that matches the URL for domWin, but then I have to worry about multiple tabs having the same URL.
You have to keep using the internal packages. From what I can tell, getTabForWindow() function in api-utils/lib/tabs/tab.js package does exactly what you want. Untested code:
var tabsLib = require("sdk/tabs/tab.js");
return tabsLib.getTabForWindow(domWin.top);
The API has changed since this was originally asked/answered...
It should now (as of 1.15) be:
return require("sdk/tabs/utils").getTabForWindow(domWin.top);
As of Addon SDK version 1.13 change:
var tabsLib = require("tabs/tab.js");
to
var tabsLib = require("sdk/tabs/helpers.js");
If anyone still cares about this:
Although the Addon SDK is being deprecated in support of the newer WebExtensions API, I want to point out that
var a_tab = require("sdk/tabs/utils").getTabForContentWindow(window)
returns a different 'tab' object than the one you would typically get by using
worker.tab in a PageMod.
For example, a_tab will not have the 'id' attribute, but would have linkedPanel property that's similar to the 'id' attribute.

Can javascript access a filesystem? [duplicate]

This question already has answers here:
Local file access with JavaScript
(14 answers)
Closed 8 years ago.
I was pretty sure the answer was NO, and hence google gears, adobe AIR, etc.
If I was right, then how does http://tiddlywiki.com work? It is persistent and written in javascript. It is also just a single HTML file that has no external (serverside) dependencies. WTF? Where/how does it store its state?
Tiddlywiki has several methods of saving data, depending on which browser is used. As you could see in the source.
If ActiveX is enabled, it uses Scripting.FileSystemObject.
On Gecko-based browsers, it tries to use UniversalXPConnect.
If Java is enabled, it uses the TiddlySaver Java applet.
If Java LiveConnect is enabled, it tries to use Java's file classes.
HTML5's File[1], FileWriter[2], and FileSystem[3] APIs are available in the latest Developer channel of Google Chrome. The FileSystem API lets you read/write to a sandbox filesystem within a space the browser knows about. You cannot, for example, open 'My Pictures' folder on the user's local FS and read/write to that. That's something in the works, but it won't be ready for a while. Example of writing a file:
window.requestFileSystem(
TEMPORARY, // persistent vs. temporary storage
1024 * 1024, // 1MB. Size (bytes) of needed space
initFs, // success callback
opt_errorHandler // opt. error callback, denial of access
);
function initFs(fs) {
fs.root.getFile('logFile.txt', {create: true}, function(fileEntry) {
fileEntry.createWriter(function(writer) { // FileWriter
writer.onwrite = function(e) {
console.log('Write completed.');
};
writer.onerror = function(e) {
console.log('Write failed: ' + e.toString());
};
var bb = new BlobBuilder();
bb.append('Lorem ipsum');
writer.write(bb.getBlob('text/plain'));
}, errorHandler);
}
}
Check out this HTML5 Storage slide deck for more code snippets.
It uses a java file references like this:
drivers.tiddlySaver = {
name: "tiddlySaver",
deferredInit: function() {
if(!document.applets["TiddlySaver"] && !$.browser.mozilla && !$.browser.msie && document.location.toString().substr(0,5) == "file:") {
$(document.body).append("<applet style='position:absolute;left:-1px' name='TiddlySaver' code='TiddlySaver.class' archive='TiddlySaver.jar' width='1'height='1'></applet>");
}
},
isAvailable: function() {
return !!document.applets["TiddlySaver"];
},
loadFile: function(filePath) {
var r;
try {
if(document.applets["TiddlySaver"]) {
r = document.applets["TiddlySaver"].loadFile(javaUrlToFilename(filePath),"UTF-8");
return (r === undefined || r === null) ? null : String(r);
}
} catch(ex) {
}
return null;
},
saveFile: function(filePath,content) {
try {
if(document.applets["TiddlySaver"])
return document.applets["TiddlySaver"].saveFile(javaUrlToFilename(filePath),"UTF-8",content);
} catch(ex) {
}
return null;
}
}
Technically you can do
netscape.security.PrivilegeManager.enablePrivilege('UniversalBrowserWrite');
in a netscape-compatible browser (Firefox, Mozilla, Netscape), and it will ask the user* whether or not to allow filesystem access, but this is not portable.
*once per browser process
Can javascript access a filesystem?
Not outside of the sandbox area mentioned above, to the best of my knowledge. However, it can access a signed java applet that has callable public methods which can get to all files. I have done it and it works fine and is cross browser.
The signing part is somewhat involved and for professional use you might need to pay for a code signing certificate which authorises your identity. Get it from some place like Verisign. That way users at least know who the applet is written by (if that helps). You can sign it yourself for free but one of those "possible security risk" popups will occur at first use for authorisation by the user.
You would think that such signed applets for file writing would exist already for download but I couldn't find any via searching. If they did, you could just plug it in your page, learn the API and off you go.
The answer is indeed NO. Java applets, and the dreaded ActiveX plugins are usually used if this is required

Categories