1. 19 Apr, 2019 1 commit
  2. 18 Apr, 2019 3 commits
  3. 17 Apr, 2019 6 commits
  4. 16 Apr, 2019 4 commits
  5. 15 Apr, 2019 5 commits
  6. 14 Apr, 2019 9 commits
  7. 13 Apr, 2019 4 commits
  8. 12 Apr, 2019 8 commits
    • FreddyFunk's avatar
      Fix Clang Format · 382722b9
      FreddyFunk authored
      382722b9
    • Lioncash's avatar
      common/swap: Improve codegen of the default swap fallbacks · 0d8ef2d3
      Lioncash authored
      Uses arithmetic that can be identified more trivially by compilers for
      optimizations. e.g. Rather than shifting the halves of the value and
      then swapping and combining them, we can swap them in place.
      
      e.g. for the original swap32 code on x86-64, clang 8.0 would generate:
      
          mov     ecx, edi
          rol     cx, 8
          shl     ecx, 16
          shr     edi, 16
          rol     di, 8
          movzx   eax, di
          or      eax, ecx
          ret
      
      while GCC 8.3 would generate the ideal:
      
          mov     eax, edi
          bswap   eax
          ret
      
      now both generate the same optimal output.
      
      MSVC used to generate the following with the old code:
      
          mov     eax, ecx
          rol     cx, 8
          shr     eax, 16
          rol     ax, 8
          movzx   ecx, cx
          movzx   eax, ax
          shl     ecx, 16
          or      eax, ecx
          ret     0
      
      Now MSVC also generates a similar, but equally optimal result as clang/GCC:
      
          bswap   ecx
          mov     eax, ecx
          ret     0
      
      ====
      
      In the swap64 case, for the original code, clang 8.0 would generate:
      
          mov     eax, edi
          bswap   eax
          shl     rax, 32
          shr     rdi, 32
          bswap   edi
          or      rax, rdi
          ret
      
      (almost there, but still missing the mark)
      
      while, again, GCC 8.3 would generate the more ideal:
      
          mov     rax, rdi
          bswap   rax
          ret
      
      now clang also generates the optimal sequence for this fallback as well.
      
      This is a case where MSVC unfortunately falls short, despite the new
      code, this one still generates a doozy of an output.
      
          mov     r8, rcx
          mov     r9, rcx
          mov     rax, 71776119061217280
          mov     rdx, r8
          and     r9, rax
          and     edx, 65280
          mov     rax, rcx
          shr     rax, 16
          or      r9, rax
          mov     rax, rcx
          shr     r9, 16
          mov     rcx, 280375465082880
          and     rax, rcx
          mov     rcx, 1095216660480
          or      r9, rax
          mov     rax, r8
          and     rax, rcx
          shr     r9, 16
          or      r9, rax
          mov     rcx, r8
          mov     rax, r8
          shr     r9, 8
          shl     rax, 16
          and     ecx, 16711680
          or      rdx, rax
          mov     eax, -16777216
          and     rax, r8
          shl     rdx, 16
          or      rdx, rcx
          shl     rdx, 16
          or      rax, rdx
          shl     rax, 8
          or      rax, r9
          ret     0
      
      which is pretty unfortunate.
      0d8ef2d3
    • Lioncash's avatar
      core/core: Move process execution start to System's Load() · 612e1388
      Lioncash authored
      This gives us significantly more control over where in the
      initialization process we start execution of the main process.
      
      Previously we were running the main process before the CPU or GPU
      threads were initialized (not good). This amends execution to start
      after all of our threads are properly set up.
      612e1388
    • Lioncash's avatar
      core/process: Remove unideal page table setting from LoadFromMetadata() · 32a6ceb4
      Lioncash authored
      Initially required due to the split codepath with how the initial main
      process instance was initialized. We used to initialize the process
      like:
      
      Init() {
          main_process = Process::Create(...);
          kernel.MakeCurrentProcess(main_process.get());
      }
      
      Load() {
          const auto load_result = loader.Load(*kernel.GetCurrentProcess());
          if (load_result != Loader::ResultStatus::Success) {
              // Handle error here.
          }
          ...
      }
      
      which presented a problem.
      
      Setting a created process as the main process would set the page table
      for that process as the main page table. This is fine... until we get to
      the part that the page table can have its size changed in the Load()
      function via NPDM metadata, which can dictate either a 32-bit, 36-bit,
      or 39-bit usable address space.
      
      Now that we have full control over the process' creation in load, we can
      simply set the initial process as the main process after all the loading
      is done, reflecting the potential page table changes without any
      special-casing behavior.
      
      We can also remove the cache flushing within LoadModule(), as execution
      wouldn't have even begun yet during all usages of this function, now
      that we have the initialization order cleaned up.
      32a6ceb4
    • Lioncash's avatar
      core/core: Move main process creation into Load() · a4b0a855
      Lioncash authored
      Now that we have dependencies on the initialization order, we can move
      the creation of the main process to a more sensible area: where we
      actually load in the executable data.
      
      This allows localizing the creation and loading of the process in one
      location, making the initialization of the process much nicer to trace.
      a4b0a855
    • Lioncash's avatar
      video_core/gpu: Create threads separately from initialization · 6d055119
      Lioncash authored
      Like with CPU emulation, we generally don't want to fire off the threads
      immediately after the relevant classes are initialized, we want to do
      this after all necessary data is done loading first.
      
      This splits the thread creation into its own interface member function
      to allow controlling when these threads in particular get created.
      6d055119
    • Lioncash's avatar
      core/cpu_core_manager: Create threads separately from initialization. · f2331a80
      Lioncash authored
      Our initialization process is a little wonky than one would expect when
      it comes to code flow. We initialize the CPU last, as opposed to
      hardware, where the CPU obviously needs to be first, otherwise nothing
      else would work, and we have code that adds checks to get around this.
      
      For example, in the page table setting code, we check to see if the
      system is turned on before we even notify the CPU instances of a page
      table switch. This results in dead code (at the moment), because the
      only time a page table switch will occur is when the system is *not*
      running, preventing the emulated CPU instances from being notified of a
      page table switch in a convenient manner (technically the code path
      could be taken, but we don't emulate the process creation svc handlers
      yet).
      
      This moves the threads creation into its own member function of the core
      manager and restores a little order (and predictability) to our
      initialization process.
      
      Previously, in the multi-threaded cases, we'd kick off several threads
      before even the main kernel process was created and ready to execute (gross!).
      Now the initialization process is like so:
      
      Initialization:
        1. Timers
      
        2. CPU
      
        3. Kernel
      
        4. Filesystem stuff (kind of gross, but can be amended trivially)
      
        5. Applet stuff (ditto in terms of being kind of gross)
      
        6. Main process (will be moved into the loading step in a following
                         change)
      
        7. Telemetry (this should be initialized last in the future).
      
        8. Services (4 and 5 should ideally be alongside this).
      
        9. GDB (gross. Uses namespace scope state. Needs to be refactored into a
                class or booted altogether).
      
        10. Renderer
      
        11. GPU (will also have its threads created in a separate step in a
                 following change).
      
      Which... isn't *ideal* per-se, however getting rid of the wonky
      intertwining of CPU state initialization out of this mix gets rid of
      most of the footguns when it comes to our initialization process.
      f2331a80
    • bunnei's avatar
      Merge pull request #2235 from ReinUsesLisp/spirv-decompiler · ea80e2bc
      bunnei authored
      vk_shader_decompiler: Implement a SPIR-V decompiler
      ea80e2bc