Kako deluje JavaScript: pod pokrovom motorja V8

Danes bomo pogledali pod pokrov motorja JavaScript V8 in ugotovili, kako natančno se izvaja JavaScript.

V prejšnjem članku smo izvedeli, kako je brskalnik strukturiran, in dobili pregled Chroma na visoki ravni. Povzemimo malo, da smo se pripravljeni potopiti sem.

Ozadje

Spletni standardi so nabor pravil, ki jih uporablja brskalnik. Opredeljujejo in opisujejo vidike svetovnega spleta.

W3C je mednarodna skupnost, ki razvija odprte standarde za splet. Poskrbijo, da vsi sledijo istim smernicam in da jim ni treba podpirati ducatov popolnoma različnih okolij.

Sodobni brskalnik je precej zapleten del programske opreme s kodno bazo deset milijonov vrstic kode. Torej je razdeljen na veliko modulov, ki so odgovorni za drugačno logiko.

In dva najpomembnejša dela brskalnika sta mehanizem JavaScript in mehanizem upodabljanja.

Blink je mehanizem upodabljanja, ki je odgovoren za celoten cevovod upodabljanja, vključno z drevesi DOM, slogi, dogodki in integracijo V8. Razčleni drevo DOM, razreši sloge in določi vizualno geometrijo vseh elementov.

Medtem ko nenehno spremlja dinamične spremembe prek animacijskih okvirjev, Blink barva vsebino na zaslonu. Motor JS je velik del brskalnika - vendar v te podrobnosti še nismo vstopili.

JavaScript Engine 101

Mehanizem JavaScript izvaja in prevede JavaScript v izvorno strojno kodo. Vsak večji brskalnik je razvil svoj JS motor: Googlov Chrome uporablja V8, Safari uporablja JavaScriptCore, Firefox pa SpiderMonkey.

Z V8 bomo še posebej sodelovali zaradi njegove uporabe v Node.js in Electron, vendar so tudi drugi motorji izdelani na enak način.

Vsak korak bo vseboval povezavo do kode, ki je zanjo odgovorna, tako da se boste lahko seznanili s kodno bazo in nadaljevali raziskave po tem članku.

Na GitHubu bomo delali z ogledalom V8, saj ponuja priročen in dobro znan uporabniški vmesnik za krmarjenje po bazi kod.

Priprava izvorne kode

V8 mora najprej prenesti izvorno kodo. To lahko storite prek omrežja, predpomnilnika ali serviserjev.

Ko prejmemo kodo, jo moramo spremeniti tako, da jo lahko prevajalnik razume. Ta postopek se imenuje razčlenitev in je sestavljen iz dveh delov: optičnega bralnika in samega razčlenjevalnika.

Optični bralnik vzame datoteko JS in jo pretvori na seznam znanih žetonov. V datoteki keywords.txt je seznam vseh žetonov JS.

Razčlenjevalnik ga pobere in ustvari drevo abstraktne sintakse (AST): drevesna predstavitev izvorne kode. Vsako vozlišče drevesa označuje konstrukt, ki se pojavlja v kodi.

Oglejmo si preprost primer:

function foo() { let bar = 1; return bar; }

Ta koda bo ustvarila naslednjo drevesno strukturo:

To kodo lahko izvedete tako, da izvedete preusmeritev prednaročila (root, left, right):

  1. Določite foofunkcijo.
  2. Navedite barspremenljivko.
  3. Dodeli 1za bar.
  4. Vrnitev bariz funkcije.

Videli boste tudi VariableProxy- element, ki poveže abstraktno spremenljivko s prostorom v spominu. Postopek reševanja VariableProxyse imenuje analiza obsega .

V našem primeru bi bil rezultat postopka vsi, VariableProxyki kažejo na isto barspremenljivko.

Paradigma Just-in-Time (JIT)

Za izvajanje kode je na splošno treba programski jezik pretvoriti v strojno kodo. Obstaja več pristopov, kako in kdaj se lahko ta preobrazba zgodi.

Najpogostejši način preoblikovanja kode je izvedba predčasnega prevajanja. Deluje natanko tako, kot se sliši: koda se pred izvajanjem programa med fazo prevajanja pretvori v strojno kodo.

Ta pristop uporabljajo številni programski jeziki, kot so C ++, Java in drugi.

On the other side of the table, we have interpretation: each line of the code will be executed at runtime. This approach is usually taken by dynamically typed languages like JavaScript and Python because it’s impossible to know the exact type before execution.

Because ahead-of-time compilation can assess all the code together, it can provide better optimization and eventually produce more performant code. Interpretation, on the other side, is simpler to implement, but it’s usually slower than the compiled option.

To transform the code faster and more effectively for dynamic languages, a new approach was created called Just-in-Time (JIT) compilation. It combines the best from interpretation and compilation.

While using interpretation as a base method, V8 can detect functions that are used more frequently than others and compile them using type information from previous executions.

However, there is a chance that the type might change. We need to de-optimize compiled code and fallback to interpretation instead (after that, we can recompile the function after getting new type feedback).

Let's explore each part of JIT compilation in more detail.

Interpreter

V8 uses an interpreter called Ignition. Initially, it takes an abstract syntax tree and generates byte code.

Byte code instructions also have metadata, such as source line positions for future debugging. Generally, byte code instructions match the JS abstractions.

Now let's take our example and generate byte code for it manually:

LdaSmi #1 // write 1 to accumulator Star r0 // read to r0 (bar) from accumulator Ldar r0 // write from r0 (bar) to accumulator Return // returns accumulator

Ignition has something called an accumulator — a place where you can store/read values.

The accumulator avoids the need for pushing and popping the top of the stack. It’s also an implicit argument for many byte codes and typically holds the result of the operation. Return implicitly returns the accumulator.

You can check out all the available byte code in the corresponding source code. If you’re interested in how other JS concepts (like loops and async/await) are presented in byte code, I find it useful to read through these test expectations.

Execution

After the generation, Ignition will interpret the instructions using a table of handlers keyed by the byte code. For each byte code, Ignition can look up corresponding handler functions and execute them with the provided arguments.

As we mentioned before, the execution stage also provides the type feedback about the code. Let’s figure out how it’s collected and managed.

First, we should discuss how JavaScript objects can be represented in memory. In a naive approach, we can create a dictionary for each object and link it to the memory.

However, we usually have a lot of objects with the same structure, so it would not be efficient to store lots of duplicated dictionaries.

To solve this issue, V8 separates the object's structure from the values itself with Object Shapes (or Maps internally) and a vector of values in memory.

For example, we create an object literal:

let c = { x: 3 } let d = { x: 5 } c.y = 4

In the first line, it will produce a shape Map[c] that has the property x with an offset 0.

In the second line, V8 will reuse the same shape for a new variable.

After the third line, it will create a new shape Map[c1] for property y with an offset 1 and create a link to the previous shape Map[c] .

In the example above, you can see that each object can have a link to the object shape where for each property name, V8 can find an offset for the value in memory.

Object shapes are essentially linked lists. So if you write c.x, V8 will go to the head of the list, find y there, move to the connected shape, and finally it gets x and reads the offset from it. Then it’ll go to the memory vector and return the first element from it.

As you can imagine, in a big web app you’ll see a huge number of connected shapes. At the same time, it takes linear time to search through the linked list, making property lookups a really expensive operation.

To solve this problem in V8, you can use the Inline Cache (IC).It memorizes information on where to find properties on objects to reduce the number of lookups.

You can think about it as a listening site in your code: it tracks all CALL, STORE, and LOAD events within a function and records all shapes passing by.

The data structure for keeping IC is called Feedback Vector. It’s just an array to keep all ICs for the function.

function load(a) { return a.key; }

For the function above, the feedback vector will look like this:

[{ slot: 0, icType: LOAD, value: UNINIT }]

It’s a simple function with only one IC that has a type of LOAD and value of UNINIT. This means it’s uninitialized, and we don’t know what will happen next.

Let’s call this function with different arguments and see how Inline Cache will change.

let first = { key: 'first' } // shape A let fast = { key: 'fast' } // the same shape A let slow = { foo: 'slow' } // new shape B load(first) load(fast) load(slow)

After the first call of the load function, our inline cache will get an updated value:

[{ slot: 0, icType: LOAD, value: MONO(A) }]

That value now becomes monomorphic, which means this cache can only resolve to shape A.

After the second call, V8 will check the IC's value and it'll see that it’s monomorphic and has the same shape as the fast variable. So it will quickly return offset and resolve it.

The third time, the shape is different from the stored one. So V8 will manually resolve it and update the value to a polymorphic state with an array of two possible shapes.

[{ slot: 0, icType: LOAD, value: POLY[A,B] }]

Now every time we call this function, V8 needs to check not only one shape but iterate over several possibilities.

For the faster code, you can initialize objects with the same type and not change their structure too much.

Note: You can keep this in mind, but don’t do it if it leads to code duplication or less expressive code.

Inline caches also keep track of how often they're called to decide if it’s a good candidate for optimizing the compiler — Turbofan.

Compiler

Ignition only gets us so far. If a function gets hot enough, it will be optimized in the compiler, Turbofan, to make it faster.

Turbofan takes byte code from Ignition and type feedback (the Feedback Vector) for the function, applies a set of reductions based on it, and produces machine code.

As we saw before, type feedback doesn’t guarantee that it won’t change in the future.

For example, Turbofan optimized code based on the assumption that some addition always adds integers.

But what would happen if it received a string? This process is called deoptimization. We throw away optimized code, go back to interpreted code, resume execution, and update type feedback.

Summary

In this article, we discussed JS engine implementation and the exact steps of how JavaScript is executed.

To summarize, let’s have a look at the compilation pipeline from the top.

We’ll go over it step by step:

  1. It all starts with getting JavaScript code from the network.
  2. V8 parses the source code and turns it into an Abstract Syntax Tree (AST).
  3. Based on that AST, the Ignition interpreter can start to do its thing and produce bytecode.
  4. Takrat motor zažene kodo in zbira povratne informacije o tipu.
  5. Za hitrejše delovanje lahko bajtno kodo skupaj s podatki povratnih informacij pošljemo prevajalniku za optimizacijo. Prevajalnik za optimizacijo na podlagi tega določi nekatere predpostavke in nato izdela visoko optimizirano strojno kodo.
  6. Če se na neki točki ena od predpostavk izkaže za napačno, se optimizacijski prevajalnik de-optimizira in vrne k tolmaču.

To je to! Če imate kakršna koli vprašanja o določeni stopnji ali želite izvedeti več podrobnosti o njej, se lahko potopite v izvorno kodo ali me prikličete na Twitterju.

nadaljnje branje

  • Googlov videoposnetek "Življenje scenarija"
  • Tečaj padca v JIT-ovih prevajalnikih iz Mozille
  • Lepa razlaga vstavljenih predpomnilnikov v V8
  • Odličen potop v Object Oblike