Commits · 40dc893c372c81c687eca2d0b964220a8f8aeab4 · many-archive / Suyu

Apr 19, 2019
- Merge pull request #2374 from lioncash/pagetable · 40dc893c
  bunnei authored Apr 19, 2019
```
core: Reorganize boot order
```
  40dc893c
Apr 18, 2019
- Merge pull request #2397 from lioncash/thread-unused · 83b830eb
  bunnei authored Apr 17, 2019
```
kernel/thread: Remove unused guest_handle member variable
```
  83b830eb
- Merge pull request #2318 from ReinUsesLisp/sampler-cache · 42940625
  bunnei authored Apr 17, 2019
```
gl_sampler_cache: Port sampler cache to OpenGL
```
  42940625
- Merge pull request #2348 from FernandoS27/guest-bindless · 5bd5140b
  bunnei authored Apr 17, 2019
```
Implement Bindless Textures on Shader Decompiler and GL backend
```
  5bd5140b
Apr 17, 2019
- Merge pull request #2315 from ReinUsesLisp/severity-decompiler · 0cfbd332
  bunnei authored Apr 16, 2019
```
shader_ir/decode: Reduce the severity of common assertions
```
  0cfbd332
- Merge pull request #2384 from ReinUsesLisp/gl-state-clear · 21d498bc
  bunnei authored Apr 16, 2019
```
gl_rasterizer: Apply just the needed state on Clear
```
  21d498bc
- Merge pull request #2405 from lioncash/qt · be6b9e2d
  bunnei authored Apr 16, 2019
```
CMakeLists: Define QT_USE_QSTRINGBUILDER for the Qt target
```
  be6b9e2d
- Merge pull request #2092 from ReinUsesLisp/stg · 1b83f255
  bunnei authored Apr 16, 2019
```
shader/memory: Implement STG and global memory flushing
```
  1b83f255
- Merge pull request #2376 from lioncash/const · 2654eb65
  bunnei authored Apr 16, 2019
```
yuzu/configure_hotkey: Minor changes
```
  2654eb65
- Merge pull request #2401 from lioncash/guard · 382fbbb1
  bunnei authored Apr 16, 2019
```
common/{lz4_compression, zstd_compression}: Add missing header guards
```
  382fbbb1
Apr 16, 2019
- Merge pull request #2382 from lioncash/table · 9186f76b
  bunnei authored Apr 15, 2019
```
service: Update service function tables
```
  9186f76b
- Merge pull request #2393 from lioncash/svc · fc641565
  bunnei authored Apr 15, 2019
```
kernel/svc: Implement svcMapProcessCodeMemory/svcUnmapProcessCodeMemory
```
  fc641565
- Merge pull request #2398 from lioncash/boost · a7c3275b
  bunnei authored Apr 15, 2019
```
kernel/thread: Remove BoostPriority()
```
  a7c3275b
- Merge pull request #2399 from FernandoS27/fermi-fix · c1e35d11
  bunnei authored Apr 15, 2019
```
Correct Pitch in Fermi2D
```
  c1e35d11
Apr 15, 2019

CMakeLists: Define QT_USE_QSTRINGBUILDER for the Qt target · d28bb56c

Lioncash authored Apr 15, 2019

This is a compile definition introduced in Qt 4.8 for reducing the total
potential number of strings created when performing string
concatenation. This allows for less memory churn.

This can be read about here:
https://blog.qt.io/blog/2011/06/13/string-concatenation-with-qstringbuilder/

For a change that isn't source-compatible, we only had one occurrence
that actually need to have its type clarified, which is pretty good, as
far as transitioning goes.

d28bb56c

svc: Specify handle value in thread's name · 3283aa1e
Lioncash authored Apr 15, 2019
```
Allows the handle to be seen alongside the entry point.
```
3283aa1e
common/{lz4_compression, zstd_compression}: Add missing header guards · 4620ed47
Lioncash authored Apr 15, 2019
```
These two files were missing the #pragma once directive.
```
4620ed47
Correct Pitch in Fermi2D · bf561e43
Fernando Sahmkow authored Apr 15, 2019

bf561e43

kernel/thread: Remove BoostPriority() · e3566e6c

Lioncash authored Apr 15, 2019

This is a holdover from Citra that currently remains unused, so it can
be removed from the Thread interface.

e3566e6c

Apr 14, 2019
- kernel/thread: Remove unused guest_handle member variable · 09caf8a7
  Lioncash authored Apr 14, 2019
```
This member variable is entirely unused. It was only set but never
actually utilized. Given that, we can remove it to get rid of noise in
the thread interface.
```
  09caf8a7
- shader_ir: Implement STG, keep track of global memory usage and flush · 5c280e6f
  ReinUsesLisp authored Feb 07, 2019
  
  5c280e6f
- Merge pull request #2378 from lioncash/ro · 1f4dfb39
  bunnei authored Apr 13, 2019
```
ldr: Minor amendments to IPC-related parameters
```
  1f4dfb39
- Merge pull request #2373 from FernandoS27/z32 · c9454c84
  bunnei authored Apr 13, 2019
```
Set Pixel Format to Z32 if its R32F and depth compare enabled, and Implement format ZF32_X24S8
```
  c9454c84
- Merge pull request #2357 from zarroboogs/force-30fps-mode · 6088898b
  bunnei authored Apr 13, 2019
```
Add a toggle to force 30FPS mode
```
  6088898b
- Merge pull request #2381 from lioncash/fs · a788c861
  bunnei authored Apr 13, 2019
```
fsp_srv: Minor cleanup related changes
```
  a788c861
- Merge pull request #2386 from ReinUsesLisp/shader-manager · ee2206a1
  bunnei authored Apr 13, 2019
```
gl_shader_manager: Move code to source file and minor clean up
```
  ee2206a1
- Merge pull request #2017 from jroweboy/glwidget · 065f83c6
  bunnei authored Apr 13, 2019
```
Frontend: Migrate to QOpenGLWindow and support shared contexts
```
  065f83c6
- Merge pull request #2389 from FreddyFunk/rename-gamedir · ee3f5764
  bunnei authored Apr 13, 2019
```
ui_settings: Rename game directory variables
```
  ee3f5764
Apr 13, 2019

kernel/svc: Implement svcUnmapProcessCodeMemory · 4d293bb5

Lioncash authored Apr 12, 2019

Essentially performs the inverse of svcMapProcessCodeMemory. This unmaps
the aliasing region first, then restores the general traits of the
aliased memory.

What this entails, is:

- Restoring Read/Write permissions to the VMA.
- Restoring its memory state to reflect it as a general heap memory region.
- Clearing the memory attributes on the region.

4d293bb5

kernel/svc: Implement svcMapProcessCodeMemory · 76a24656

Lioncash authored Apr 11, 2019

This is utilized for mapping code modules into memory. Notably, the
ldr service would call this in order to map objects into memory.

76a24656

Merge pull request #2391 from lioncash/scope · b42595fa
bunnei authored Apr 12, 2019
```
common/scope_exit: Replace std::move with std::forward in ScopeExit()
```
b42595fa
Merge pull request #2392 from lioncash/swap · 0faf7b17
bunnei authored Apr 12, 2019
```
common/swap: Minor cleanup and improvements to byte swapping functions
```
0faf7b17

Apr 12, 2019

Fix Clang Format · 382722b9
FreddyFunk authored Apr 12, 2019

382722b9

common/swap: Improve codegen of the default swap fallbacks · 0d8ef2d3

Lioncash authored Apr 11, 2019

Uses arithmetic that can be identified more trivially by compilers for
optimizations. e.g. Rather than shifting the halves of the value and
then swapping and combining them, we can swap them in place.

e.g. for the original swap32 code on x86-64, clang 8.0 would generate:

    mov     ecx, edi
    rol     cx, 8
    shl     ecx, 16
    shr     edi, 16
    rol     di, 8
    movzx   eax, di
    or      eax, ecx
    ret

while GCC 8.3 would generate the ideal:

    mov     eax, edi
    bswap   eax
    ret

now both generate the same optimal output.

MSVC used to generate the following with the old code:

    mov     eax, ecx
    rol     cx, 8
    shr     eax, 16
    rol     ax, 8
    movzx   ecx, cx
    movzx   eax, ax
    shl     ecx, 16
    or      eax, ecx
    ret     0

Now MSVC also generates a similar, but equally optimal result as clang/GCC:

    bswap   ecx
    mov     eax, ecx
    ret     0

====

In the swap64 case, for the original code, clang 8.0 would generate:

    mov     eax, edi
    bswap   eax
    shl     rax, 32
    shr     rdi, 32
    bswap   edi
    or      rax, rdi
    ret

(almost there, but still missing the mark)

while, again, GCC 8.3 would generate the more ideal:

    mov     rax, rdi
    bswap   rax
    ret

now clang also generates the optimal sequence for this fallback as well.

This is a case where MSVC unfortunately falls short, despite the new
code, this one still generates a doozy of an output.

    mov     r8, rcx
    mov     r9, rcx
    mov     rax, 71776119061217280
    mov     rdx, r8
    and     r9, rax
    and     edx, 65280
    mov     rax, rcx
    shr     rax, 16
    or      r9, rax
    mov     rax, rcx
    shr     r9, 16
    mov     rcx, 280375465082880
    and     rax, rcx
    mov     rcx, 1095216660480
    or      r9, rax
    mov     rax, r8
    and     rax, rcx
    shr     r9, 16
    or      r9, rax
    mov     rcx, r8
    mov     rax, r8
    shr     r9, 8
    shl     rax, 16
    and     ecx, 16711680
    or      rdx, rax
    mov     eax, -16777216
    and     rax, r8
    shl     rdx, 16
    or      rdx, rcx
    shl     rdx, 16
    or      rax, rdx
    shl     rax, 8
    or      rax, r9
    ret     0

which is pretty unfortunate.

0d8ef2d3

core/core: Move process execution start to System's Load() · 612e1388

Lioncash authored Apr 09, 2019

This gives us significantly more control over where in the
initialization process we start execution of the main process.

Previously we were running the main process before the CPU or GPU
threads were initialized (not good). This amends execution to start
after all of our threads are properly set up.

612e1388

core/process: Remove unideal page table setting from LoadFromMetadata() · 32a6ceb4

Lioncash authored Apr 09, 2019

Initially required due to the split codepath with how the initial main
process instance was initialized. We used to initialize the process
like:

Init() {
    main_process = Process::Create(...);
    kernel.MakeCurrentProcess(main_process.get());
}

Load() {
    const auto load_result = loader.Load(*kernel.GetCurrentProcess());
    if (load_result != Loader::ResultStatus::Success) {
        // Handle error here.
    }
    ...
}

which presented a problem.

Setting a created process as the main process would set the page table
for that process as the main page table. This is fine... until we get to
the part that the page table can have its size changed in the Load()
function via NPDM metadata, which can dictate either a 32-bit, 36-bit,
or 39-bit usable address space.

Now that we have full control over the process' creation in load, we can
simply set the initial process as the main process after all the loading
is done, reflecting the potential page table changes without any
special-casing behavior.

We can also remove the cache flushing within LoadModule(), as execution
wouldn't have even begun yet during all usages of this function, now
that we have the initialization order cleaned up.

32a6ceb4

core/core: Move main process creation into Load() · a4b0a855

Lioncash authored Apr 09, 2019

Now that we have dependencies on the initialization order, we can move
the creation of the main process to a more sensible area: where we
actually load in the executable data.

This allows localizing the creation and loading of the process in one
location, making the initialization of the process much nicer to trace.

a4b0a855

video_core/gpu: Create threads separately from initialization · 6d055119

Lioncash authored Apr 09, 2019

Like with CPU emulation, we generally don't want to fire off the threads
immediately after the relevant classes are initialized, we want to do
this after all necessary data is done loading first.

This splits the thread creation into its own interface member function
to allow controlling when these threads in particular get created.

6d055119

core/cpu_core_manager: Create threads separately from initialization. · f2331a80

Lioncash authored Apr 09, 2019

Our initialization process is a little wonky than one would expect when
it comes to code flow. We initialize the CPU last, as opposed to
hardware, where the CPU obviously needs to be first, otherwise nothing
else would work, and we have code that adds checks to get around this.

For example, in the page table setting code, we check to see if the
system is turned on before we even notify the CPU instances of a page
table switch. This results in dead code (at the moment), because the
only time a page table switch will occur is when the system is *not*
running, preventing the emulated CPU instances from being notified of a
page table switch in a convenient manner (technically the code path
could be taken, but we don't emulate the process creation svc handlers
yet).

This moves the threads creation into its own member function of the core
manager and restores a little order (and predictability) to our
initialization process.

Previously, in the multi-threaded cases, we'd kick off several threads
before even the main kernel process was created and ready to execute (gross!).
Now the initialization process is like so:

Initialization:
  1. Timers

  2. CPU

  3. Kernel

  4. Filesystem stuff (kind of gross, but can be amended trivially)

  5. Applet stuff (ditto in terms of being kind of gross)

  6. Main process (will be moved into the loading step in a following
                   change)

  7. Telemetry (this should be initialized last in the future).

  8. Services (4 and 5 should ideally be alongside this).

  9. GDB (gross. Uses namespace scope state. Needs to be refactored into a
          class or booted altogether).

  10. Renderer

  11. GPU (will also have its threads created in a separate step in a
           following change).

Which... isn't *ideal* per-se, however getting rid of the wonky
intertwining of CPU state initialization out of this mix gets rid of
most of the footguns when it comes to our initialization process.

f2331a80

Merge pull request #2235 from ReinUsesLisp/spirv-decompiler · ea80e2bc
bunnei authored Apr 11, 2019
```
vk_shader_decompiler: Implement a SPIR-V decompiler
```
ea80e2bc