I need to understand what Xdata protocol is. I searched on internet but I can't find anything helps. We are a REU team trying to add a sensor on a interface that only take sensors with Xdata protocol.
Looking for the same information, this is the best document I could find:
ftp://aftp.cmdl.noaa.gov/user/jordan/iMet-1-RSB%20Radiosonde%20XDATA%20Daisy%20Chaining.pdf
Basically, an XDATA packet is ASCII-formated, and looks like this:
xdata=01010123456789abcdef
Where:
"xdata=" - header
"01" - instrument ID
"01" - instrument position in daisy chain (more than one instrument can be connected)
the rest is data
Another document mentions that packet should be terminated with CR/LF: ftp://aftp.cmdl.noaa.gov/user/jordan/XDATA%20Packet%20Example.pdf
Please note that XDATA doesn't seem to specify the data structure. It appears that the data processing is up to the ground station.
Since you didn't specify the radiosonde and instrument you plan to use, I won't write too much here, but you can go to http://www.esrl.noaa.gov/gmd/ozwv/wvap/sw.html where there is a little bit more information about the protocol (but not too much).
I'm the creator of the docs and website linked in the previous answer. Blagus summarized it well. Any instrument you want to send data through an xdata-compatible radiosonde (typically the iMet-1-RSB) to the ground should output 3.3V (3V works too) UART serial packets at 9600 baud (8-N-1, typically no flow control) according to the protocol. We keep track of a list of instrument ID numbers here to avoid conflicts, feel free to contact us to add a new instrument in the future (while prototyping, you can just make one up).
You can also "daisy chain" several xdata instruments together. Any incoming xdata packets have their DC index incremented and then are forwarded down the chain. When xdata packets reach the radiosonde they are stripped of their header info, a CRC is calculated and appended, then they are transmitted as binary FSK radio data to the antenna on the ground. The antenna/preamp is connected to a receiver and then SkySonde Server/Client can decode the data (if using an iMet radiosonde) from the receiver's audio output.
Related
I'm creating a browser game, I know browsers aren't really safe but I was wondering if there's any workaround my issue.
A player kills a monster and the platform sends an ID to my backend like so:
Axios({
url: this.server + "reward",
headers: { token: "foo123", charToken: "bar123" },
method: "POST",
data: {
id: 10001, // Monster ID
value: 10 // How many monsters were killed
},
});
The problem is, I see no possible way to prevent a user to just send random requests to the server, saying he actually did this level 300 times in a second and getting 300x reward items/exp.
I thought about requesting a token before sending this reward request, but this just makes it harder and doesn't really solve anything.
Your current design
As it currently stands, the design you've implied here won't be manageable to mitigate your concerns of a replay attack on your game. A helpful way to think about this is to frame it in the context of a "conversational" HTTP request/response paradigm. Currently, what you're allowing is something akin to:
Your client tells your server, "Hey, I've earned this reward!"
Your server, having no way to verify this, responds: "Great! I've added that reward to your account."
As you guessed, this paradigm doesn't provide very much protection against even a slightly motivated attacker.
Target design
What your design should enforce is something more along the lines of the following:
Your clients should only be able to ask the server for something. ("I've chosen to attack monster #123, what was the result?")
Your server should respond to that request with the information pertinent to the result of that ask.
"You're not in range of that enemy; you can't attack them."
"That enemy was already dispatched by you."
"That enemy was already dispatched by another player previously."
"You can't attack another enemy while you're in combat with another."
"You missed, and the enemy still has 100hp. The enemy struck back and hit you; you now have 90hp."
"You hit the enemy, and the enemy now has 1hp. The enemy struck back and missed; you still have 100hp."
"You hit the enemy, and the enemy now has 0hp. You've earned 5 coins as a reward."
In this design, your server would be the gatekeeper to all this information, and the data in particular that attackers would seek to modify. Instead of implicitly trusting that the client hasn't been modified, your server would be calculating all this on its own and simply sending the results of those calculations back to the client for display after recording it in a database or other storage medium.
Other items for your consideration
Sequential identifiers
While your server might still coordinate all these actions and calculate the results of said actions itself, it's still possible that any sufficiently motivated attacker could still find ways to exploit your backend, as you've used predictably-incremented values to identify your back-end entities. A small amount of study of your endpoints along with a short script could still successfully yield unintended in-game riches.
When creating entities that you'd prefer players not be able to enumerate (read: it probably is all of them), you can use something akin to a UUID to generate difficult-to-predict unique identifiers. Instead of one enemy being #3130, and the next one being #3131, your enemies are now known internally in your application as 5e03e6b9-1ec2-45f6-927d-794e13d9fa82 and 31e95709-2187-4b02-a521-23b874e10a03. While these aren't, by mathematical definition, reliably cryptographically secure, this makes guessing the identifiers themselves several orders of magnitude more difficult than sequential integers.
I generally allow my database to generate UUIDs automatically for every entity I create so I don't have to think about it in my backend implementation; support for this out of the box will be dependent on the RDBMS you've chosen. As an example in SQL Server, you'd have your id field set to the type UNIQUEIDENTIFIER and have a default value of NEWID().
If you do choose to generate these for whatever reason in Node.js (as you've tagged), something like uuidjs/uuid can generate these for you.
Rate limiting
You haven't mentioned anything about it in your question (whether you've already used it or not), but you really should be enforcing a rate limit for users of your endpoints that's reasonable for your game. Is it really plausible that your user JoeHacker could attack 15 times in the span of a second? No, probably not.
The way this would be accomplished varies based on whatever back-end server you're running. For Express-based servers, several packages including the aptly-named express-rate-limit will get the job done in a relatively simple yet effective manner.
What is actually meant by the direction of data flow?
Consider the composition pattern:
I have a class A, within that class A creates an instance of another class B upon instantiation of class A.
Class A holds public data accessible to both Class A and B.
Instance of class B is instantiated with data from Class A.
Instance of class B calls a method within Class A to manipulated data for Instance of Class A.
What is the data in the data flow considered as? The data held by a Class or hierarchy and permissions of Classes?
For example,
Child class should not be able to call Parent class methods on Instances of Parent class.
What is actually meant by the direction of data flow?
Let us consider a specific type of software, so that we can limit the data flow count to larger ideas.
There are two big divisions of data flow in telecom transport.
A. The Transport HW carries east/west data ... from one data transport equipment to another. Typically, this data flow is too fast for software to deal with directly, and the data flows both directions simultaneously.
B. The telecom transport data control is called north/south data. This flow contain 5 software data flows. The flow is to/from the local hw, and from/to an operation user or host.
There are typically 5 flows in B (telecom transport data control):
status_update -- software periodically reads hw status information (often once per second) and delivers the info captured to local 'fast' storage (where other commands can find it for display or delivery to host)
alarms-update -- very much like status update (i.e. a periodic read), but only alarms. Alarms have duration and timeouts, and are reasonably complex.
pm-update -- very much like status update (i.e. a periodic read), but the software collects summary counts of specific activities ... how many bytes output, how many seconds-in-error, etc. This also has round robin 15 minute time buckets, timeouts and other complications.
configuration control -- user applied commands can change the operational configuration. The sw, in response to user command, changes specific hw config registers. Most T1 / E1 hardware can run in either mode, and the user is required to configure each sub-system at startup.
provisioning control -- the user applies commands that enable (or disable) the availability of a specific hardware type to 'transport' east/west data. (think customer billable service)
Each of these flows, from an architectural approach, may be demand-pull OR supply-push, but probably not both.
Example of demand-pull: status update (direction of flow is north, from hw to local storage)
On most of the systems I worked on, the status-update was triggered off the system clock, and, in a typical demand-pull, the clock event triggered a read of all the status conditions from the hw. This collected info is typically stashed into local 'fast' storage, where other commands can find it for display or delivery to host.
Examples of supply-push: configuration (direction of flow is south, from user to hw)
Any config (or provisioning) command is asynchronous to the system clock (because humans don't like and are not good at syncing). Thus, an action is triggered when the user presses the enter-key, and the supply-pushed data (command parameters) flowed to the hardware.
For supply-push, there are sometimes coordination efforts, i.e. no config (nor prov) changes are allowed while specific other things are going on, but this is typically handled with a simple mutex.
Summary - at the above architectural level, the flows may seem simpler than they actually can be.
For example, sometimes, to read during a demand-pull, the software must 'tickle' some feature of the hw, and that 'tickle' feels like the 'wrong direction' ... but in this case, the 'tickle' is not part of the data flow, just overhead to accomplish the flow by extracting / pulling the data out.
Similarly, to write config data, the sw must sometimes determine if the hw will allow the change to that next configuration, and perhaps that is checked by reading from the hw. These reads feel like the wrong direction, but again, it is not part of the data flow, just some flow overhead.
This duality happens at many levels.
I can't speak as much to desktop software, but I have seen demand-pull and supply-push in many places. Perhaps somewhat less disciplined in my view point, but that is more likely about my lack of experience with large desktop applications.
I am new to that area, so the question may seem strange. However before asking I've read bunch of introductory articles about what are the key points about in machine learning and what are the acting parts of neural networks. Including very useful that one What is machine learning. Basically as I got it - an educated NN is (correct me if it's wrong):
set of connections between neurons (maybe self-connected, may have gates, etc.)
formed activation probabilities on each connection.
Both things are adjusted during the training to fit expected output as close as possible. Then, what we do with an educated NN - we load the test subset of data into it and check how good it performs. But what happens if we're happy with the test results and we want to store the education results and not run training again later when dataset get new values.
So my question is - is that education knowledge is stored somewhere except RAM? can be dumped (think of object serialisation in a way) so that you don't need to educate your NN with data you get tomorrow or later.
Now I am trying to make simple demo with my dataset using synaptic.js but I could not spot that kind of concept of saving education in project's wiki.
That library is just an example, if you reference some python lib would be good to!
With regards to storing it via synaptic.js:
This is quite easy to do! It actually has a built-in function for this. There are two ways to do this.
If you want to use the network without training it again
This will create a standalone function of your network, you can use it anywhere with javascript without requiring synaptic.js! Wiki
var standalone = myNetwork.standalone();
If you want to modify the network later on
Just convert your network to a JSON. This can be loaded up anytime again with synaptic.js! Wiki
// Export the network to a JSON which you can save as plain text
var exported = myNetwork.toJSON();
// Conver the network back to useable network
var imported = Network.fromJSON(exported);
I will assume in my answer that you are working with a simple multi-layer perceptron (MLP), although my answer is applicable to other networks too.
The purpose of 'training' an MLP is to find the correct synaptic weights that minimise the error on the network output.
When a neuron is connected to another neuron, its input is given a weight. The neuron performs a function, such as the weighted sum of all inputs, and then outputs the result.
Once you have trained your network, and found these weights, you can verify the results using a validation set.
If you are happy that your network is performing well, you simply record the weights that you applied to each connection. You can store these weights wherever you like (along with a description of the network structure) and then retrieve them later. There is no need to re-train the network every time you would like to use it.
Hope this helps.
We are investigating using Breeze for field deployment of some tools. The scenario is this -- an auditor will visit sites in the field, where most of the time there will be no -- or very degraded -- internet access. Rather than replicate our SQL database on all the laptops and tablets (if that's even possible), we are hoping to use Breeze to cache the data and then store it locally so it is accessible when there is not a usable connection.
Unfortunately, Breeze seems to choke when caching any significant amount of data. Generally on Chrome it's somewhere between 8 and 13MB worth of entities (as measured by the HTTPResponse headers). This can change a bit depending on how many tabs I have open and such, but I have not been able to move that more than 10%. the error I get is the Chrome tab crashes and tells me to reload. The error is replicable (I download the data in 100K chunks and it fails on the same read every time and works fine if I stop it after the previous read) When I change the page size, it always fails within the same range.
Is this a limitation of Breeze, or Chrome? Or windows? I tried it on Firefox, and it handles even less data before the whole browser crashes. IE fares a little better, but none of them do great.
Looking at performance in task manager, I get the following:
IE goes from 250M memory usage to 1.7G of memory usage during the caching process and caches a total of about 14MB before throwing an out-of-memory error.
Chrome goes from 206B memory usage to about 850M while caching a total of around 9MB
Firefox goes from around 400M to about 750M and manages to cache about 5MB before the whole program crashes.
I can calculate how much will be downloaded with any selection criteria, but I cannot find a way to calculate how much data can be handled by any specific browser instance. This makes using Breeze for offline auditing close to useless.
Has anyone else tackled this problem yet? What are the best approaches to handling something like this. I've thought of several things, but none of them are ideal. Any ideas would be appreciated.
ADDED At Steve Schmitt's request:
Here are some helpful links:
Metadata
Entity Diagram (pdf) (and html and edmx)
The first query, just to populate the tags on the page runs quickly and downloads minimal data:
var query = breeze.EntityQuery
.from("Countries")
.orderBy("Name")
.expand("Regions.Districts.Seasons, Regions.Districts.Sites");
Once the user has select the Sites s/he wishes to cache, the following two queries are kicked off (used to be one query, but I broke it into two hoping it would be less of a burden on resources -- it didn't help). The first query (usually 2-3K entities and about 2MB) runs as expected. Some combination of the predicates listed are used to filter the data.
var qry = breeze.EntityQuery
.from("SeasonClients")
.expand("Client,Group.Site,Season,VSeasonClientCredit")
.orderBy("DistrictId,SeasonId,GroupId,ClientId")
var p = breeze.Predicate("District.Region.CountryId", "==", CountryId);
var p1 = breeze.Predicate("SeasonId", "==", SeasonId);
var p2 = breeze.Predicate("DistrictId", "==", DistrictId);
var p3 = breeze.Predicate("Group.Site.SiteId", "in", SiteIds);
After the first query runs, the second query (below) runs (also using some combination of the predicates listed to filter the data. At about 9MB, it will have about 50K rows to download). When the total download burden between the two queries is between 10MB and 13MB, browsers will crash.
var qry = breeze.EntityQuery
.from("Repayments")
.orderBy('SeasonId,ClientId,RepaymentDate');
var p1 = breeze.Predicate("District.Region.CountryId", "==", CountryId);
var p2 = breeze.Predicate("SeasonId", "==", SeasonId);
var p3 = breeze.Predicate("DistrictId", "==", DistrictId);
var p4 = breeze.Predicate("SiteId", "in", SiteIds);
Thanks for the interest, Steve. You should know that the Entity Relationships are inherited and currently in production supporting the majority of the organization's operations, so as few changes as possible to that would be best. Also, the hope is to grow this from a reporting application to one with which data entry can be done in the field (so, as I understand it, using projections to limit the data wouldn't work).
Thanks for the interest, and let me know if there is anything else you need.
Here are some suggestions based on my experience building on an offline capable web application using breeze. Some or all of these might not make sense for your use cases...
Identify which entity types need to be editable vs which are used to fill drop-downs etc. Load non-editable data using the noTracking query option and cache them in localStorage yourself using JSON.stringify. This avoids the overhead of coercing the data into entities, change tracking, etc. Good candidates for this approach in your model might be entity types like Country, Region, District, Site, etc.
If possible, provide a facility in your application for users to identify which records they want to "take offline". This way you don't need to load and cache everything, which can get quite expensive depending on the number of relationships, entities, properties, etc.
In conjunction with suggestion #2, avoid loading all the editable data at once and avoid using the same EntityManager instance to load each set of data. For example, if the Client entity is something that needs to be editable out in the field without a connection, create a new EntityManager, load a single client (expanding any children that also need to be editable) and cache this data separately from other clients.
Cache the breeze metadata once. When calling exportEntities the includeMetadata argument should be false. More info on this here.
To create new EntityManager instances make use of the createEmptyCopy method.
EDIT:
I want to respond to this comment:
Say I have a client who has bills and payments. That client is in a
group, in a site, in a region, in a country. Are you saying that the
client, payment, and bill information might each have their own EM,
while the location hierarchy might be in a 4th EM with no-tracking?
Then when I refer to them, I wire up the relationships as needed using
LINQs on the different EMs (give me all the bills for customer A, give
me all the payments for customer A)?
It's a bit of a judgement call in terms of deciding how to separate things out. Some of what I'm suggesting might be overkill, it really depends on the amount of data and the way your application is used.
Assuming you don't need to edit groups, sites, regions and countries while offline, the first thing I'd do would be to load the list of groups using the noTracking option and cache them in localStorage for offline use. Then do the same for sites, regions and countries. Keep in mind, entities loaded with the noTracking option aren't cached in the entity manager so you'll need to grab the query result, JSON.stringify it and then call localStorage.setItem. The intent here is to make sure your application always has access to the list of groups, sites, regions, etc so that when you display a form to edit a client entity you'll have the data you need to populate the group, site, region and country select/combobox/dropdown.
Assuming the user has identified the subset of clients they want to work with while offline, I'd then load each of these clients one at a time (including their payment and bill information but not expanding their group, site, region, country) and cache each client+payments+bills set using entityManager.exportEntities. Reasoning here is it doesn't make sense to load several clients plus their payments and bills into the same EntityManager each time you want to edit a particular client. That could be a lot of unnecessary overhead, but again, this is a bit of a judgement call.
#Jeremy's answer was excellent and very helpful, but didn't actually answer the question, which I was starting to think was unanswerable, or at least the wrong question. However #Steve in the comments gave me the most appropriate information for this question.
It is neither Breeze nor the Browser, but rather Knockout. Apparently the knockout wrapper around the breeze entities uses all that memory (at least while loading the entities and in my environment). As described above, Knockout/Breeze would crap out after reading around 5MB of data, causing Chrome to crash with over 1.7GB of memory usage (from a pre-download memory usage around 300MB). Rewriting the app in ANgularJS eliminated the problem. So far I have been able to download over 50MB from the exact same EF6 model into Breeze/Angular, total Chrome memory usage never went above 625MB.
I will be testing larger payloads, but 50 MB more than satisfies my needs for the moment. Thanks everyone for your help.
Poor performance of autocomplete fields reduces their usefulness. If the client-side implementation has to call an endpoint that does heavy db lookup, the response time can easily get frustrating.
One neat approach comes from AWS Case Study: IMDb. It used to come with a diagram (no longer available), but in a nutshell a prediction tree would be generated and stored for every combination that can resolve in a meaningful way. E.g. resolutions for sta would include Star Wars, Star Trek, Sylvester Stallone which will be stored, but stb will not resolve to anything meaningful and will not be stored.
To get the lowest possible latency, all possible results are
pre-calculated with a document for every combination of letters in
search. Each document is pushed to Amazon Simple Storage Service
(Amazon S3) and thereby to Amazon CloudFront, putting the documents
physically close to the users. The theoretical number of possible
searches to calculate is mind-boggling—a 20-character search has 23 x
1030 combinations—but in practice, using IMDb's authority on movie and
celebrity data can reduce the search space to about 150,000 documents,
which Amazon S3 and Amazon CloudFront can distribute in just a few
hours. IMDb creates indexes in several languages with daily updates
for datasets of over 100,000 movie and TV titles and celebrity names.
How would one achieve a similarly performant experience be achieved with private data? E.g. autocompleting client names, job ids, invoice numbers... Storing different documents/decision trees for separate users sounds expensive, especially if some of the data (client names?) could be available for multiple users.
You right that such workload requires some special optimizations.
You can use ready search engine like Apache lucene or Solr (wich is REST API wrapper for lucene)
This engine optimized for full text searches and can work with private data.
Work steps:
Install solr (or lucene)
Design schema for storing information (what fields and what types of searchs you need)
Load data into it ( via bach operations or on update basis)
Query searches based on solrs query language (similar to google search).
In this place you could add special restrictions based on user_id or any over parameter in addition to original user query. So private data wouldn't mess between users.
I actually agree with CGI. The best solution is a 3rd party search engine. Anything else is trying to build your own search engine. I'm really not sure what the hardware at your disposal by your post so i'll give a possible solution for a lowbrow if all you got is LAMP hosting.
So in your PHP code you would make a query string like:
$qstr = "SELECT * FROM Clients WHERE `name` like '%".$search."%' ORDER BY popularity DESC LIMIT 0,100";
Than increment the popularity column for every record that is found via the "search engine."
On the front end (Lets say your using Dojo) you could do something like...
<script>
require(["dojo/on", "dojo/dom", "dojo/request/xhr", "dojo/domReady!"], function (on, dom, xhr) {
on(dom.byId('txtSearch'), "change", function(evt) {
if (typeof searchCheck !== undefined) clearTimeout(searchCheck);
searchCheck = setTimeout(function() { //keep from flooding XHR
xhr("fetch-json-results.php", {
handleAs: "json"
}).then(function(data){
//update txtSearch combo store
});
}, 500);
});
});
</script>
<input id="txtSearch" type="text" data-dojo-type="dijit/form/ComboBox" data-dojo-props="intermediateChanges:true">
This would be a low tech low budget (LAMP) equiv answer.