I've strong background in a JS frameworks. Javascript is single threaded language and during the execution of code when it encounters any async task, the event loop plays the important role there.
Now I've been into Python/Django and things are going good but I'm very curious about how python handle any asynchronous tasks whenever I need to make a database query in django, I simple use the queryset method like
def func(id):
do_somthing()
user = User.objects.get(id=id)
do_somthing_with_user()
func()
another_func()
I've used the queryset method that fetch something from database and is a asynchronous function without await keyword. what will be the behaviour of python there, is it like block the execution below until the queryset response, is it continue the execution for the rest of code on another thread.
Also whenever we need to make an external request from python, we uses requests library method without await keyword,
r = request.get("https://api.jsonplaceholder.com/users")
# will the below code executes when the response of the request arrive?
do_somthing()
.
.
.
do_another_thing()
does python block the execution of code until the response arrives?
In JS the async task is handled by event loop while the rest of the code executes normally
console.log("first")
fetch("https://api.jsonplaceholder.com/users").then(() => console.log("third"))
console.log("second")
the logs in console will be
first
second
third
How Python Handles these type things under the hood?
The await command needs to be used specifically when you are running an asynchronous function that calls synchronous functions. This synchronous function is ran as a coroutine and has an object name coroutine when debugging. To run a coroutine, it must be scheduled on the event loop. Using the
await keyword releases control back to the event loop. Instead of awaiting a coroutine, you can also setup a task to schedule a coroutine to run at a certain time using the asyncio package. See documentation
asyncio.create_task(coro, *, name=None, context=None)
Python does not block execution but rather ensures that it happens at the correct time using CPU parallelism. Using a method like request.get() should not create any forthcoming errors because you are accessing values over the internet and not locally. Network requests move, essentially at the speed of light, where as database requests on the server are limited by the speed of the SSD or HDD. This was more of an issue on old drives where you are literally waiting for a disk to spin, but the idea of properly managing these local resources is still a issue that has to be solved in order for the asynchronous application to work. When making calls to the sql database, make sure to use the decorator #database_sync_to_async above the regular 'synchronous' functions that your asynchronous function calls.
from channels.db import database_sync_to_async
For more information check out the django documentation on asynchronous support.
Related
I know how node.js executes code asynchronously without blocking the main thread of execution by using the event-loop for scheduling the asynchronous tasks, but I'm not clear on how the main thread actually decides to put aside a piece of code for asynchronous execution.
(Basically what indicates that this piece of code should be executed asynchronously and not synchronously, what are the differentiating factors?)
And also, what are the asynchronous and synchronous APIs provided by node?
There is a mistake in your assumption when you ask:
what indicates that this piece of code should be executed asynchronously and not synchronously
The mistake is thinking that some code are executed asynchronously. This is not true.
Javascript (including node.js) executes all code synchronously. There is no part of your code that is executed asynchronously.
So at first glance that is the answer to your question: there is no such thing as asynchronous code execution.
Wait, what?
But what about all the asynchronous stuff?
Like I said, node.js (and javascript in general) executes all code synchronously. However javascript is able to wait for certain events asynchronously. There are no asynchronous code execution, however there is asynchronous waiting.
What's the difference between code execution and waiting?
Let's look an an example. For the sake of clarity I'll use pseudocode in a fake made-up language to remove any confusion from javascript syntax. Let's say we want to read from a file. This fake language supports both synchronous and asynchronous waiting:
Example1. Synchronously wait for the drive to return bytes form the file
data = readSync('filename.txt');
// the line above will pause the execution of code until all the
// bytes have been read
Example2. Asynchronously wait for the drive to return bytes from the file
// Since it is asynchronous we don't want the read function to
// pause the execution of code. Therefore we cannot return the
// data. We need a mechanism to accept the returned value.
// However, unlike javascript, this fake language does not support
// first-class functions. You cannot pass functions as arguments
// to other functions. However, like Java and C++ we can pass
// objects to functions
class MyFileReaderHandler inherits FileReaderHandler {
method callback (data) {
// process file data in here!
}
}
myHandler = new MyFileReaderHandler()
asyncRead('filename.txt', myHandler);
// The function above does not wait for the file read to complete
// but instead returns immediately and allows other code to execute.
// At some point in the future when it finishes reading all data
// it will call the myHandler.callback() function passing it the
// bytes form the file.
As you can see, asynchronous I/O is not special to javascript. It has existed long before javascript in C++ and even C libraries dealing with file I/O, network I/O and GUI programming. In fact it has existed even before C. You can do this kind of logic in assembly (and indeed that's how people design operating systems).
What's special about javascript is that due to it's functional nature (first-class functions) the syntax for passing some code you want to be executed in the future is simpler:
asyncRead('filename.txt', (data) => {
// process data in here
});
or in modern javascript can even be made to look like synchronous code:
async function main () {
data = await asyncReadPromise('filename.txt');
}
main();
What's the difference between waiting and executing code. Don't you need code to check on events?
Actually, you need exactly 0% CPU time to wait for events. You just need to execute some code to register some interrupt handlers and the CPU hardware (not software) will call your interrupt handler when the interrupt occurs. And all manner of hardware are designed to trigger interrupts: keyboards, hard disk, network cards, USB devices, PCI devices etc.
Disk and network I/O are even more efficient because they also use DMA. These are hardware memory readers/writers that can copy large chunks (kilobytes/megabytes) of memory from one place (eg. hard disk) to another place (eg. RAM). The CPU actually only needs to set up the DMA controller and then is free to do something else. Once the DMA controller completes the transfer it will trigger an interrupt which will execute some device driver code which will inform the OS that some I/O event has completed which will inform node.js which will execute your callback or fulfill your Promise.
All the above use dedicated hardware instead of executing instructions on the CPU to transfer data. Thus waiting for data takes 0% CPU time.
So the parallelism in node.js has more to do with how many PCIe lanes your CPU supports than how many CPU cores it has.
You can, if you want to, execute javascript asynchronously
Like any other language, modern javascript has multi-threading support in the form of webworkers in the browser and worker_threads in node.js. But this is regular multi-threading like any other language where you deliberately start another thread to execute your code asynchronously.
I've studied sync and async in JavaScript. I'm going to make a crawling program using Puppeteer.
There are many code examples of crawling in Puppeteer.
But, I have one question: Why do they use async in basic Puppeteer example scripts?
Can't I use sync programming in Puppeteer? Is there an issue that I don't know about that makes async necessary?
It doesn't seem useful if I don't use multiple threads (multi-crawling).
For starters, I recommend reading How the single threaded non blocking IO model works in Node.js. This thread motivates the callback and promise-based models Node provides for achieving concurrency.
Whenever the Node process needs to access an out-of-process resource such as the file system or a network socket (as Puppeteer does to communicate with the browser it's connected to), there are two options:
Block the whole process and wait for the response, as fs.readFileSync does.
Use a promise or a callback to be notified of the response and go about other things, as fs.readFile (either via callback or fs.promises) and Puppeteer do.
The first option is a poor choice, with the only advantage being easier syntax to write. Blocking the thread to wait for a resource is like ordering a pizza, then doing nothing until the pizza arrives. You might as well read a book or water your plants while you wait.
Historically, callbacks were originally the only way to write concurrent code in Node. Eventually, promises and then arrived, which were better, but still posed readability burdens. With the advent of async/await, it's no longer difficult to write asynchronous code that reads like synchronous code. Synchronous APIs like fs's __Sync functions that alias an asynchronous API are historical artifacts. It's normal that Puppeteer doesn't offer page.waitForSelectorSync, page.$evalSync, etc.
Now, it's understandable to think that Puppeteer's asynchronous API is pointless in a simple, straight-line script since your Node process doesn't have anything else to do while awaiting responses, but having to type await for each call is the least evil of the available design options for the API.
Simply not awaiting promises isn't an option even when a script is a single sequence of straight-line code. Without await, ordering of operations/results becomes nondeterministic as each promise runs concurrently, independent of the others. This interleaving would be unintended in sequential code, but is a useful tool in cases when concurrency is desired.
For the authors of an asynchronous API where almost all calls are accessesing an external resource, as is the case with Puppeteer, the options are:
Write and maintain two versions of the API, a synchronous and an asynchronous version. No libraries that I know of do this -- it's a major pain with little benefit and plenty of room for misuse.
Write and maintain a synchronous API only to cater to the simple use case at the expense of making the library virtually unusable for anyone that cares about concurrency. Clearly, this is horrible design, like forcing everyone who orders a pizza (in the above real-world example) to do nothing until it arrives.
Write and maintain one asynchronous API, and make clients who don't care about concurrency in a particular program have to write await in front of all the calls. That's what Puppeteer does.
Incidentally, the fact that the browser is in a separate process tends to cause all manner of confusion in Puppeteer beginners. For example, the fact that data is serialized and deserialized (converted to a string) on every call to page.evaluate (and family) means that you can't pass complex structures like DOM nodes across the inter-process gap. You can't access variables you've defined in Node from the body of an evaluate callback without passing them as arguments to the evaluate call, and these variables need to be able to respond correctly to JSON.stringify() (that is, be serializable).
Just 13 hours before this post, someone asked node.js puppeteer "document is not defined" -- they were trying to access the browser process' document object inside of Node.
If you're on Windows, try running a simple Puppeteer Node script that doesn't close the browser, then look at your task manager. On Linux, you can run ps -a. You'll see that there's a Chromium browser and a Node process. The two processes communicate over a socket, which has much higher latency than intra-process communication and involves the operating system's network stack. Every Puppeteer call provides an opportunity for concurrency that'd be lost if Puppeteer's API was synchronous.
Understanding the inter-process gap is critical to success in Puppeteer because it motivates why the API calls are asynchronous, and helps clarify which code is executing in which process.
async is very important for data fetching/crawling. You can imagine this case, you have 1 element is book-container, but inside book-container, it will have book data coming later on UI with API fetch.
const scraperObject = {
url: 'http://book-store.com',
scraper(browser){
let page = browser.newPage();
page.goto(this.url);
page.waitForSelector('.book-container');
page.waitForSelector('.book');
//TODO: save book data after this
});
}
}
With this code snippet, it will run like this
page.goto(this.url) Go to the page with certain URL
page.waitForSelector('.book-container') No async here, so it will try to get .book-container element immediately (of course, it won't be there because the page is possibly still loading due to some network problem)
page.waitForSelector('.book') Similarly, it try to get book data immediately (even though book-container has not been in HTML yet)
To solve this problem, we should have async to WAIT for elements ready in HTML.
const scraperObject = {
url: 'http://book-store.com',
async scraper(browser){
let page = await browser.newPage();
await page.goto(this.url);
await page.waitForSelector('.book-container');
await page.waitForSelector('.book');
//TODO: save book data after this
});
}
}
Explain it again with async/await.
page.goto(this.url) Go to the page with certain URL and wait till the page loaded
page.waitForSelector('.book-container') Wait till .book-container element appears in HTML
page.waitForSelector('.book') Wait till .book element appears in HTML (we can understand that API's data responded)
I thought that they were basically the same thing — writing programs that split tasks between processors (on machines that have 2+ processors). Then I'm reading this, which says:
Async methods are intended to be non-blocking operations. An await
expression in an async method doesn’t block the current thread while
the awaited task is running. Instead, the expression signs up the rest
of the method as a continuation and returns control to the caller of
the async method.
The async and await keywords don't cause additional threads to be
created. Async methods don't require multithreading because an async
method doesn't run on its own thread. The method runs on the current
synchronization context and uses time on the thread only when the
method is active. You can use Task.Run to move CPU-bound work to a
background thread, but a background thread doesn't help with a process
that's just waiting for results to become available.
and I'm wondering whether someone can translate that to English for me. It seems to draw a distinction between asynchronicity (is that a word?) and threading and imply that you can have a program that has asynchronous tasks but no multithreading.
Now I understand the idea of asynchronous tasks such as the example on pg. 467 of Jon Skeet's C# In Depth, Third Edition
async void DisplayWebsiteLength ( object sender, EventArgs e )
{
label.Text = "Fetching ...";
using ( HttpClient client = new HttpClient() )
{
Task<string> task = client.GetStringAsync("http://csharpindepth.com");
string text = await task;
label.Text = text.Length.ToString();
}
}
The async keyword means "This function, whenever it is called, will not be called in a context in which its completion is required for everything after its call to be called."
In other words, writing it in the middle of some task
int x = 5;
DisplayWebsiteLength();
double y = Math.Pow((double)x,2000.0);
, since DisplayWebsiteLength() has nothing to do with x or y, will cause DisplayWebsiteLength() to be executed "in the background", like
processor 1 | processor 2
-------------------------------------------------------------------
int x = 5; | DisplayWebsiteLength()
double y = Math.Pow((double)x,2000.0); |
Obviously that's a stupid example, but am I correct or am I totally confused or what?
(Also, I'm confused about why sender and e aren't ever used in the body of the above function.)
Your misunderstanding is extremely common. Many people are taught that multithreading and asynchrony are the same thing, but they are not.
An analogy usually helps. You are cooking in a restaurant. An order comes in for eggs and toast.
Synchronous: you cook the eggs, then you cook the toast.
Asynchronous, single threaded: you start the eggs cooking and set a timer. You start the toast cooking, and set a timer. While they are both cooking, you clean the kitchen. When the timers go off you take the eggs off the heat and the toast out of the toaster and serve them.
Asynchronous, multithreaded: you hire two more cooks, one to cook eggs and one to cook toast. Now you have the problem of coordinating the cooks so that they do not conflict with each other in the kitchen when sharing resources. And you have to pay them.
Now does it make sense that multithreading is only one kind of asynchrony? Threading is about workers; asynchrony is about tasks. In multithreaded workflows you assign tasks to workers. In asynchronous single-threaded workflows you have a graph of tasks where some tasks depend on the results of others; as each task completes it invokes the code that schedules the next task that can run, given the results of the just-completed task. But you (hopefully) only need one worker to perform all the tasks, not one worker per task.
It will help to realize that many tasks are not processor-bound. For processor-bound tasks it makes sense to hire as many workers (threads) as there are processors, assign one task to each worker, assign one processor to each worker, and have each processor do the job of nothing else but computing the result as quickly as possible. But for tasks that are not waiting on a processor, you don't need to assign a worker at all. You just wait for the message to arrive that the result is available and do something else while you're waiting. When that message arrives then you can schedule the continuation of the completed task as the next thing on your to-do list to check off.
So let's look at Jon's example in more detail. What happens?
Someone invokes DisplayWebSiteLength. Who? We don't care.
It sets a label, creates a client, and asks the client to fetch something. The client returns an object representing the task of fetching something. That task is in progress.
Is it in progress on another thread? Probably not. Read Stephen's article on why there is no thread.
Now we await the task. What happens? We check to see if the task has completed between the time we created it and we awaited it. If yes, then we fetch the result and keep running. Let's suppose it has not completed. We sign up the remainder of this method as the continuation of that task and return.
Now control has returned to the caller. What does it do? Whatever it wants.
Now suppose the task completes. How did it do that? Maybe it was running on another thread, or maybe the caller that we just returned to allowed it to run to completion on the current thread. Regardless, we now have a completed task.
The completed task asks the correct thread -- again, likely the only thread -- to run the continuation of the task.
Control passes immediately back into the method we just left at the point of the await. Now there is a result available so we can assign text and run the rest of the method.
It's just like in my analogy. Someone asks you for a document. You send away in the mail for the document, and keep on doing other work. When it arrives in the mail you are signalled, and when you feel like it, you do the rest of the workflow -- open the envelope, pay the delivery fees, whatever. You don't need to hire another worker to do all that for you.
In-browser Javascript is a great example of an asynchronous program that has no multithreading.
You don't have to worry about multiple pieces of code touching the same objects at the same time: each function will finish running before any other javascript is allowed to run on the page. (Update: Since this was written, JavaScript has added async functions and generator functions. These functions do not always run to completion before any other javascript is executed: whenever they reach a yield or await keyword, they yield execution to other javascript, and can continue execution later, similar to C#'s async methods.)
However, when doing something like an AJAX request, no code is running at all, so other javascript can respond to things like click events until that request comes back and invokes the callback associated with it. If one of these other event handlers is still running when the AJAX request gets back, its handler won't be called until they're done. There's only one JavaScript "thread" running, even though it's possible for you to effectively pause the thing you were doing until you have the information you need.
In C# applications, the same thing happens any time you're dealing with UI elements--you're only allowed to interact with UI elements when you're on the UI thread. If the user clicked a button, and you wanted to respond by reading a large file from the disk, an inexperienced programmer might make the mistake of reading the file within the click event handler itself, which would cause the application to "freeze" until the file finished loading because it's not allowed to respond to any more clicking, hovering, or any other UI-related events until that thread is freed.
One option programmers might use to avoid this problem is to create a new thread to load the file, and then tell that thread's code that when the file is loaded it needs to run the remaining code on the UI thread again so it can update UI elements based on what it found in the file. Until recently, this approach was very popular because it was what the C# libraries and language made easy, but it's fundamentally more complicated than it has to be.
If you think about what the CPU is doing when it reads a file at the level of the hardware and Operating System, it's basically issuing an instruction to read pieces of data from the disk into memory, and to hit the operating system with an "interrupt" when the read is complete. In other words, reading from disk (or any I/O really) is an inherently asynchronous operation. The concept of a thread waiting for that I/O to complete is an abstraction that the library developers created to make it easier to program against. It's not necessary.
Now, most I/O operations in .NET have a corresponding ...Async() method you can invoke, which returns a Task almost immediately. You can add callbacks to this Task to specify code that you want to have run when the asynchronous operation completes. You can also specify which thread you want that code to run on, and you can provide a token which the asynchronous operation can check from time to time to see if you decided to cancel the asynchronous task, giving it the opportunity to stop its work quickly and gracefully.
Until the async/await keywords were added, C# was much more obvious about how callback code gets invoked, because those callbacks were in the form of delegates that you associated with the task. In order to still give you the benefit of using the ...Async() operation, while avoiding complexity in code, async/await abstracts away the creation of those delegates. But they're still there in the compiled code.
So you can have your UI event handler await an I/O operation, freeing up the UI thread to do other things, and more-or-less automatically returning to the UI thread once you've finished reading the file--without ever having to create a new thread.
Native javascript is single threaded and synchronous. Only a few objects can run asynchronously and get added to the callback queue such as HTTP requests, timers and events. These asynchronous objects are a result of the actual javascript environment and not javascript itself. setTimeout() seems to be the go to for examples for asynchronous code. That function gets moved to the web API container and then eventually the callback queue. There doesn't seem to be a way to write asynchronous code in javascript that doesn't involve using objects that get moved to the Web API container. I can write my own custom objects with callbacks, but the most that will do is structure it to run in the proper order. You can never write javascript code that runs in parallel that doesn't rely on those objects.
This is my understanding of how it works, if I am wrong please correct me.
setTimeout and setInterval are just a convenient way to trigger some asynchronous behavior for an example. It is part of standard javascript library in all implementations, and not just part of the browser environment.
But all other sources of async code depend on some external process. When making an HTTP request, your javascript thread tells the browser to make a request (what headers, what url, etc). The browser, according to its own compiled internals, then formats the request, sends it, waits for a response, and eventually adds an item to javascript's event loop to be processed next time the event loop runs. File system access and database queries are two other common examples of async code that depends on external processes (the OS, and a database, respectively)
How javascript handles async code in a single threaded process is all down to this event loop. This event loop in psuedo code is basically this:
while (queue.waitForMessage()) {
queue.processNextMessage();
}
setTimeout tells the environment to pop something into that queue at some point of the future. But the processing of that queue is single threaded. Only one event message can be handled at the same time, but any number can be added to that queue.
You can get true concurrency with workers, but this basically adds a new javascript process that is itself single threaded, and has a method of communicating with messages to and from the main javascript process.
When using the Node.js *Sync functions (I understand you shouldn't, and the reasons why), what (if any) background processes continue to run?
For example, if I'm using http.createServer, and from one of the requests I call fs.writeFileSync(), will the server continue to serve new clients whilst that write is in progress (not just accept the connection, but process the entire request)? I.e. would writeFileSync() block the entire process, or just the current call chain?
Basically the effect of using any *Sync function will just block the process as long-running javascript does. Running those *Sync, means that when the process is waiting for them to complete it just sits and do nothing. If you use the asynchronous counter part of those functions, when the process is waiting for them to complete and it will handle other pending events such as new http requests to be handled.
If you have http server running and you call a *Sync function, when the *Sync function is running all the new http requests in this time will be queued to be handled after your current javascript and *Sync function has finished executing.
Here's a quote from node documentation:
In busy processes, the programmer is strongly encouraged to use the asynchronous versions of these calls. The synchronous versions will block the entire process until they complete--halting all connections.