成人午夜精品网站在线观看,688欧美人禽杂交狂配

Home

類庫下載

java類庫

JAVA IO and NIO understanding

壞嘻嘻

Sep 14, 2018 am 09:23 AM

java io

Thanks to Netty, I learned some knowledge about asynchronous IO. NIO in JAVA is a supplement to the original IO. This article mainly records the underlying implementation principles of IO in JAVA and introduces Zerocopy technology.

IO actually means: data is constantly moved in and out of the buffer (the buffer is used). For example, if the user program initiates a read operation, resulting in a "syscall read" system call, the data will be moved into a buffer; if the user initiates a write operation, resulting in a "syscall write" system call, the data in a buffer will be moved out ( Send to the network or write to a disk file)

The above process seems simple, but how the underlying operating system is implemented and the details of the implementation are very complicated. It is precisely because of the different implementation methods that there are implementation methods for file transfer under ordinary circumstances (let’s call it ordinary IO for the time being), and there are also implementation methods for large file transfer or batch big data transfer, such as zerocopy technology.

The flow of the entire IO process is as follows:

1) The programmer writes code to create a buffer (this buffer is a user buffer): Haha. Then call the read() method in a while loop to read the data (triggering the "syscall read" system call)

byte[] b = new byte[4096];
while((read = inputStream.read(b))>=0) { 
        total = total + read; 
            // other code…. 
        }

2) When the read() method is executed, a lot of operations are actually happening at the bottom ’s:

①The kernel sends a command to the disk controller saying: I want to read the data on a certain disk block on the disk. –kernel issuing a command to the disk controller hardware to fetch the data from disk.

②Under the control of DMA, read the data on the disk into the kernel buffer. –The disk controller writes the data directly into a kernel memory buffer by DMA

③The kernel copies the data from the kernel buffer to the user buffer. –kernel copies the data from the temporary buffer in kernel space

The user buffer here should be the byte[] array of new in the code we wrote.

What can be analyzed from the above steps?

?For the operating system, the JVM is just a user process, located in the user mode space. Processes in user space cannot directly operate the underlying hardware. IO operations require operating the underlying hardware, such as disks. Therefore, IO operations must be completed with the help of the kernel (interrupt, trap), that is, there will be a switch from user mode to kernel mode.

?When we write code new byte[] array, we usually create an array of "any size" "at will". For example, new byte[128], new byte[1024], new byte[4096]...

However, for reading disk blocks, every time you access the disk to read data, you are not reading any size of data, but: read one disk block or several disk blocks at a time (this is because the cost of accessing the disk operation is very high, and we also believe in the principle of locality) Therefore, there is a need for an "intermediate buffer" ” – i.e. kernel buffer. First read the data from the disk into the kernel buffer, and then move the data from the kernel buffer to the user buffer.

This is why we always feel that the first read operation is slow, but subsequent read operations are very fast. Because, for subsequent read operations, the data it needs to read is likely to be in the kernel buffer. At this time, you only need to copy the data in the kernel buffer to the user buffer, and the underlying data is not involved. Reading disk operations is of course fast.

The kernel tries to cache and/or prefetch data, so the data being requested by the process may already be available in kernel space. 
If so, the data requested by the process is copied out.  
If the data isn’t available, the process is suspended while the kernel goes about bringing the data into memory.

If the data is not available, the process will be suspended and need to wait for the kernel to fetch the data from the disk into the kernel buffer.

Then we might say: Why doesn’t DMA directly read the data on the disk into the user buffer? On the one hand is the kernel buffer mentioned in ? as an intermediate buffer. Used to "fit" the "arbitrary size" of the user buffer and the fixed size of each disk block read. On the other hand, the user buffer is located in the user mode space, and the operation of DMA reading data involves the underlying hardware. The hardware generally cannot directly access the user mode space (probably because of the OS)

In summary, since DMA cannot directly access user space (user buffer), ordinary IO operations need to move data back and forth between the user buffer and the kernel buffer, which affects the IO speed in certain programs. Is there any corresponding solution?

That is direct memory mapped IO, which is the memory mapped file mentioned in JAVA NIO, or direct memory... In short, they express similar meanings. Kernel space buffers and user space buffers are mapped to the same physical memory area.

Its main features are as follows:

① There is no need to issue read or write system calls to operate the file—The user process sees the file data asmemory, so there is no need to issue read() or write() system calls.

②When the user process accesses the "memory mapped file" address, a page fault is automatically generated, and then the underlying OS is responsible for sending the data on the disk to the memory. Regarding page storage management, please refer to: Some understandings of memory allocation and memory management

As the user process touches the mapped memory space, page faults will be generated automatically to bring in the file data from disk.  
If the user modifies the mapped memory space, the affected page is automatically marked as dirty and will be subsequently  
flushed to disk to update the file.

這就是是JAVA NIO中提到的內(nèi)存映射緩沖區(qū)（Memory-Mapped-Buffer）它類似于JAVA NIO中的直接緩沖區(qū)(Directed Buffer)。MemoryMappedBuffer可以通過java.nio.channels.FileChannel.java(通道)的 map方法創(chuàng)建。

使用內(nèi)存映射緩沖區(qū)來操作文件，它比普通的IO操作讀文件要快得多。甚至比使用文件通道(FileChannel)操作文件還要快。因為，使用內(nèi)存映射緩沖區(qū)操作文件時，沒有顯示的系統(tǒng)調(diào)用(read,write)，而且OS還會自動緩存一些文件頁(memory page)

zerocopy技術介紹

看完了上面的IO操作的底層實現(xiàn)過程，再來了解zerocopy技術就很easy了。IBM有一篇名為《Efficient data transfer through zero copy》的論文對zerocopy做了完整的介紹。感覺非常好，下面就基于這篇文來記錄下自己的一些理解。

zerocopy技術的目標就是提高IO密集型JAVA應用程序的性能。在本文的前面部分介紹了：IO操作需要數(shù)據(jù)頻繁地在內(nèi)核緩沖區(qū)和用戶緩沖區(qū)之間拷貝，而zerocopy技術可以減少這種拷貝的次數(shù)，同時也降低了上下文切換(用戶態(tài)與內(nèi)核態(tài)之間的切換)的次數(shù)。

比如，大多數(shù)WEB應用程序執(zhí)行的一項操作就是：接受用戶請求—>從本地磁盤讀數(shù)據(jù)—>數(shù)據(jù)進入內(nèi)核緩沖區(qū)—>用戶緩沖區(qū)—>內(nèi)核緩沖區(qū)—>用戶緩沖區(qū)—>socket發(fā)送

數(shù)據(jù)每次在內(nèi)核緩沖區(qū)與用戶緩沖區(qū)之間的拷貝會消耗CPU以及內(nèi)存的帶寬。而zerocopy有效減少了這種拷貝次數(shù)。

Each time data traverses the user-kernel boundary, it must be copied, which consumes CPU cycles and memory bandwidth.
Fortunately, you can eliminate these copies through a technique called—appropriately enough —zero copy

那它是怎么做到的呢？

我們知道，JVM(JAVA虛擬機)為JAVA語言提供了跨平臺的一致性，屏蔽了底層操作系統(tǒng)的具體實現(xiàn)細節(jié)，因此，JAVA語言也很難直接使用底層操作系統(tǒng)提供的一些“奇技淫巧”。

而要實現(xiàn)zerocopy，首先得有操作系統(tǒng)的支持。其次，JDK類庫也要提供相應的接口支持。幸運的是，自JDK1.4以來，JDK提供了對NIO的支持，通過java.nio.channels.FileChannel類的transferTo()方法可以直接將字節(jié)傳送到可寫的通道中(Writable Channel)，并不需要將字節(jié)送入用戶程序空間(用戶緩沖區(qū))

You can use the transferTo()method to transfer bytes directly from the channel on which it is invoked to ?
another writable byte channel, without requiring data to flow through the application

下面就來詳細分析一下經(jīng)典的web服務器(比如文件服務器)干的活：從磁盤中中讀文件，并把文件通過網(wǎng)絡(socket)發(fā)送給Client。

File.read(fileDesc, buf, len);
Socket.send(socket, buf, len);
從代碼上看，就是兩步操作。第一步：將文件讀入buf；第二步：將 buf 中的數(shù)據(jù)通過socket發(fā)送出去。但是，這兩步操作需要四次上下文切換(用戶態(tài)與內(nèi)核態(tài)之間的切換) 和四次拷貝操作才能完成。

①第一次上下文切換發(fā)生在 read()方法執(zhí)行，表示服務器要去磁盤上讀文件了，這會導致一個 sys_read()的系統(tǒng)調(diào)用。此時由用戶態(tài)切換到內(nèi)核態(tài)，完成的動作是：DMA把磁盤上的數(shù)據(jù)讀入到內(nèi)核緩沖區(qū)中（這也是第一次拷貝）。

②第二次上下文切換發(fā)生在read()方法的返回(這也說明read()是一個阻塞調(diào)用)，表示數(shù)據(jù)已經(jīng)成功從磁盤上讀到內(nèi)核緩沖區(qū)了。此時，由內(nèi)核態(tài)返回到用戶態(tài)，完成的動作是：將內(nèi)核緩沖區(qū)中的數(shù)據(jù)拷貝到用戶緩沖區(qū)（這是第二次拷貝）。

③第三次上下文切換發(fā)生在 send()方法執(zhí)行，表示服務器準備把數(shù)據(jù)發(fā)送出去了。此時，由用戶態(tài)切換到內(nèi)核態(tài)，完成的動作是：將用戶緩沖區(qū)中的數(shù)據(jù)拷貝到內(nèi)核緩沖區(qū)(這是第三次拷貝)

④第四次上下文切換發(fā)生在 send()方法的返回【這里的send()方法可以異步返回，所謂異步返回就是：線程執(zhí)行了send()之后立即從send()返回，剩下的數(shù)據(jù)拷貝及發(fā)送就交給底層操作系統(tǒng)實現(xiàn)了】。此時，由內(nèi)核態(tài)返回到用戶態(tài)，完成的動作是：將內(nèi)核緩沖區(qū)中的數(shù)據(jù)送到 protocol engine.（這是第四次拷貝）

這里對 protocol engine不是太了解，但是從上面的示例圖來看：它是NIC(NetWork Interface Card) buffer。網(wǎng)卡的buffer???

下面這段話，非常值得一讀：這里再一次提到了為什么需要內(nèi)核緩沖區(qū)。

Copy code
Use of the intermediate kernel buffer (rather than a direct transfer of the data
into the user buffer)might seem inefficient. But intermediate kernel buffers were
introduced into the process to improve performance . Using the intermediate
buffer on the read side allows the kernel buffer to act as a “readahead cache”
when the application hasn't asked for as much data as the kernel buffer holds.
This significantly improves performance when the requested data amount is less
than the kernel buffer size. The intermediate buffer on the write side allows the write to complete asynchronously.
Copy code
A core point is that the kernel buffer improves performance . Huh? Isn't it strange? Because it has been said before that it is precisely because of the introduction of the kernel buffer (intermediate buffer) that the data is copied back and forth, which reduces efficiency.

Let’s first take a look at why it says that the kernel buffer improves performance.

For read operations, the kernel buffer is equivalent to a "readahead cache". When the user program only needs to read a small part of the data at a time, the operating system first reads a large block of data from the disk to the kernel. The user program only takes away a small part of the buffer (I can just new a 128B byte array! new byte[128]). When the user program reads data next time, it can directly fetch it from the kernel buffer, and the operating system does not need to access the disk again! Because the data the user wants to read is already in the kernel buffer! This is also the reason mentioned earlier: why subsequent read operations (read() method calls) are obviously faster than the first time. From this perspective, the kernel buffer does improve the performance of read operations.

Let’s look at the write operation: it can be done “asynchronously” (write asynchronously). That is: when write(dest[]), the user program tells the operating system to write the contents of the dest[] array to the XX file, so the write method returns. The operating system silently copies the contents of the user buffer (dest[]) to the kernel buffer in the background, and then writes the data in the kernel buffer to the disk. Then, as long as the kernel buffer is not full, the user's write operation can return quickly. This should be the asynchronous disk brushing strategy.

(Actually, this is it. A tangled issue in the past was that the difference between synchronous IO, asynchronous IO, blocking IO, and non-blocking IO no longer makes much sense. These concepts are just to look at the problem. The perspectives are just different. Blocking and non-blocking are for the thread itself; synchronous and asynchronous are for the thread and the external events that affect it...) [For a more perfect and incisive explanation, please refer to this series of articles: Between Systems Communication (3) - IO Communication Model and JAVA Practice Part 1】

Since you said the kernel buffer is so powerful and perfect, why do you need zerocopy? ? ?

Unfortunately, this approach itself can become a performance bottleneck if the size of the data requested
is considerably larger than the kernel buffer size. The data gets copied multiple times among the disk, kernel buffer,
and user buffer before it is finally delivered to the application.
Zero copy improves performance by eliminating these redundant data copies.
It’s finally zerocopy’s turn to make its debut. When the data that needs to be transferred is much larger than the size of the kernel buffer, the kernel buffer becomes a bottleneck. This is why zerocopy technology is suitable for large file transfers. Why has the kernel buffer become a bottleneck? —I think a big reason is that it can no longer function as a "buffer". After all, the amount of data transmitted is too large.

Let’s take a look at how zerocopy technology handles file transfer.

When the transferTo() method is called, it switches from user mode to kernel mode. The completed action is: DMA reads data from the disk into the Read buffer (first data copy). Then, still in the kernel space, the data is copied from the Read buffer to the Socket buffer (the second data copy), and finally the data is copied from the Socket buffer to the NIC buffer (the third data copy). Then, return from kernel mode to user mode.

The entire process above only involves: three data copies and two context switches. It feels like only one data copy is saved. But the user space buffer is no longer involved here.

Among the three data copies, only one copy requires CPU intervention. (The second copy), while the previous traditional data copy requires four times and three copies require CPU intervention.

This is an improvement: we've reduced the number of context switches from four to two and reduced the number of data copies
from four to three (only one of which involves the CPU)

If zerocopy technology can only accomplish this step, then it is just so so.

We can further reduce the data duplication done by the kernel if the underlying network interface card supports
gather operations. In Linux kernels 2.4 and later, the socket buffer descriptor was modified to accommodate this requirement.
This approach not only reduces multiple context switches but also eliminates the duplicated data copies that
require CPU involvement.
That is to say, if the underlying network hardware and operating system support it, it can further reduce the number of data copies and CPU intervention times.

There are only two copies and two context switches. Moreover, these two copies are DMA copies and do not require CPU intervention (to be more rigorous, it is not completely necessary.).

The entire process is as follows:

The user program executes the transferTo() method, resulting in a system call and switching from user mode to kernel mode. The completed action is: DMA copies the data from the disk to the Read buffer

Use a descriptor to mark the address and length of the data to be transferred, and the DMA directly transfers the data from the Read buffer to the NIC buffer. The data copy process does not require CPU intervention.

Related recommendations:

Summary of non-blocking IO and event loop in Node.js_node.js

Node. Discussion on asynchronous IO performance of js_node.js

The above is the detailed content of JAVA IO and NIO understanding. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn