I am working with php and heatmap js to generate a heat-map.
I was thinking of going down the path of allowing the user to upload a floor-map jpg file initially and then allow him to add the sensor names to different locations in the floor-map.
Once the sensor locations are specified, I need to save that configuration to an XML file. Once I have this set of information (img_id, [sensorid1,x1,y1], [sensorid2,x2,y2],..,[sensoridn,xn,yn]), I can query my database for the latest values of sensors and then display as heat-map on the image (on the specific sensors' x and y coordinates) real-time.
I would like to know if saving the configuration as XML is the right way of doing it. Is there there a better way of temporarily storing the information using javascript/PHP?
There are likely a bunch of ways to solve this. My preference would be for JSON, as it is natively supported by Javascript and PHP. It is also MUCH easier to read and write.
When you say "saving", what do you mean? If you need it to be stored server side, then creating DB entities that the data structure can be mapped to and stored in will be far better than trying to create files server-side. Depending on how the app gets hosted, you may not have permission to do that, and if your server ever goes away you could loose that data (However, there are safe ways to create files using a service like AWS S3). Storing it in a database not only gives you a single place to worry about backups, but also lets you query the data in interesting and powerful ways (SQL etc) easily, without having to figure out how to do that for files with every new query.
Related
We're working with an API-based data provided that allows us to analyze large sets of GIS data in relation to provided GeoJSON areas and specified timestamps. When the data is aggregated by our provider, it can be marked as complete and alert our service via a callback URL. From there, we have a list of the reports we've run with their relevant download links. One of the reports we need to work with is a TSV file with 4 columns, and looks like this:
deviceId | timestamp | lat | lng
Sometimes, if the area we're analyzing is large enough, these files can be 60+GB large. The download link links to a zipped version of the files, so we can't read them directly from the download URL. We're trying to get the data in this TSV grouped by deviceId and sorted by timestamp so we can route along road networks using the lat/lng in our routing service. We've used Javascript for most of our application so far, but this service poses unique problems that may require additional software and/or languages.
Curious how others have approached the problem of handling and processing data of this size.
We've tried downloading the file, piping it into a ReadStream, and allocating all the available cores on the machine to process batches of the data individually. This works, but it's not nearly as fast as we would like (even with 36 cores).
From Wikipedia:
Tools that correctly read ZIP archives must scan for the end of central directory record signature, and then, as appropriate, the other, indicated, central directory records. They must not scan for entries from the top of the ZIP file, because ... only the central directory specifies where a file chunk starts and that it has not been deleted. Scanning could lead to false positives, as the format does not forbid other data to be between chunks, nor file data streams from containing such signatures.
In other words, if you try to do it without looking at the end of the zip file first, you may end up accidentally including deleted files. So you can't trust streaming unzippers. However, if the zip file hasn't been modified since it was created, perhaps streaming parsers can be trusted. If you don't want to risk it, then don't use a streaming parser. (Which means you were right to download the file to disk first.)
To some extent it depends on the structure of the zip archive: If it consists of many moderately sized files, and if they can all be processed independently, then you don't need to have very much of it in memory at any one time. On the other hand, if you try to process many files in parallel then you may run into the limit on the number of filehandles that can be open. But you can get round this using something like queue.
You say you have to sort the data by device ID and timestamp. That's another part of the process that can't be streamed. If you need to sort a large list of data, I'd recommend you save it to a database first; that way you can make it as big as your disk will allow, but also structured. You'd have a table where the columns are the columns of the TSV. You can stream from the TSV file into the database, and also index the database by deviceId and timestamp. And by this I mean a single index that uses both of those columns, in that order.
If you want a distributed infrastructure, maybe you could store different device IDs on different disks with different CPUs etc ("sharding" is the word you want to google). But I don't know whether this will be faster. It would speed up the disk access. But it might create a bottleneck in network connections, through either latency or bandwidth, depending on how interconnected the different device IDs are.
Oh, and if you're going to be running multiple instances of this process in parallel, don't forget to create separate databases, or at the very least add another column to the database to distinguish separate instances.
Does anyone know how i can parse LinkedIn accounts? Or Any tool( not paid ).
For example:
I will look for "Software Engineer" from Dallas,TX.
Tool will automatically pick all candidates from linkedin or for example first 100 candidates, and store their First Name, Last Name , LinkedinLink and Experience in excel document? ( Or from specific company)
Is it should be done threw API, or there specific account which allow to do this? Or does anyone knows tools which will help to do this? Or Script?
I need to parse a large amount of candidates , 100+ maybe 1000+ and store them.
I have multiple thoughts about implementation but i feel that it 100% already implemented.
https://developer.linkedin.com/docs/rest-api
Use linked in APIs to fetch data and process it however you would like. I don't know how much of 'private' fields you can get access to but names seem to be there.
I use nodeJS to process excel data - xlsx is a very good option but it only allows synchronous execution so you would have to spawn another process. It also has filter function so you can do whatever you want with it.
The problem that I had faced with parsing large data into excel is that excel file is a compressed xml format so it takes a long time to parse both reading and writing. A faster option would be to create and read csv which excel can naturally do as well.
I have an app that indexes thousands of files and stores information about the files and how they relate in JSON format on the user's computer. I'm using JavaScript and IndexedDB. The important points are the data isn't stored in a central database I control, it must be in JSON format and there's lots of data.
As I add more features in the future, it's likely I'll want to change the JSON format e.g. adding new fields, renaming fields, normalising data that wasn't normalised before.
I haven't released the app yet and I'm nervous about doing so because 1) if I change the data format, I have to be careful I don't break loading of data in the previous format 2) having to account for old data formats will slow down how aggressively I can change the app.
Are there any strategies I can use to lessen the impact file format changes have on my development speed and risk of bugs?
That's why you have to specify a version when you open the database. Then if your schema changes, increment the version and write code in your onupgradeneeded handler to deal with altering the stored data from old versions.
What dumbmatter said, but another thing to consider is to store a version field in the object itself. Read this in first, then dynamically determine how to interpret the object's other fields.
I have a flat data file in the form of xml, but there isn't a real Windows viewer for the file, currently. I decided to create a simple application with Node-WebKit, just for basic viewing - the data file won't need to be written to by the application.
My problem is, I don't know the proper way to read a large file. The data file is a backup of phone SMS's and MMS's, and the MMS entries contain Base64 image strings where applicable - so, the file gets pretty big, with large amounts of images (generallly, around 250mb). I didn't create/format the original data in the file, so I can't modify it's structure.
So, the question is - assuming I already have a way to parse the XML into JavaScript objects, should I,
a) Parse the entire file when the application is first run, storing an array of objects in memory for the duration of the applications lifetime, or
b) Read through the entire file each time I want to extract a conversation (all of the messages with a specific outgoing or incoming number), and only store that data in memory, or
c) Employ some alternate, more efficient, solution that I don't know about yet.
Convert your XML data into an SQLite db. SQLite is NOT memory based by default. Query the db when you need the data, problem solved :)
I'm coding a website that involves storing very simple data, just a very long list of names with no additional data, on the server. As this data is so simple, I don't really want to use MySQL (it would be a bit too clunky) so I'm asking what's the best way to store very simple data on the server.
I definitely would favour speed over anything else, and easy access to the data via javascript and AJAX would be very good too as the rest of the site is coded in javascript/jQuery. I don't really care if the data can be viewed freely (as it will be available anyway), as long as it can't be changed by unauthorised users.
There are a lot of things to think about with this.
Is the information the same for all users with just a single set that applies to all users out there? Or is there a separate set of data for each user?
How is the data going to be served to the client, my guess here is that you would be having a web service or otherwise that might return a JSON.
From a security standpoint, do you want someone to be able to just "grab" the data and run?
Personally I find that a database if often a better choice, but otherwise i would use an XML file. Keep in mind though that you have to be careful with loading/reading of XML files to serve web requests to prevent any potential file locking issues.
Use an XML file that is web-accessible. Then you can query the XML file from the browser if need be, and still parse/write it in PHP. You'll want to use the flock function in PHP to make sure that two instances of a page don't try to write to the file at the same time.
Write it to a file and save the data as a serialized object. This way when you read in the data it's instantly accessible as the variable type you need (array, obj, etc). This will be faster than XML parsing.