Commit graph

599 commits

Author SHA1 Message Date
Richard Henderson cd538f0b7e
tcg: Initialize cpu_env generically
This is identical for each target. So, move the initialization to
common code. Move the variable itself out of tcg_ctx and name it
cpu_env to minimize changes within targets.

This also means we can remove tcg_global_reg_new_{ptr,i32,i64},
since there are no longer global-register temps created by targets.

Backports commit 1c2adb958fc07e5b3e81ed21b801c04a15f41f4f from qemu
2018-03-15 15:49:19 -04:00
Lioncash 72c18027a6
cpu: Unicorn-ify the qemu_tcg_mttcg_enabled() macro
Gets rid of reliance on a non-existent variable
2018-03-14 12:10:29 -04:00
Emilio G. Cota 3fe9866ffe
osdep: introduce qemu_mprotect_rwx/none
Backports commit 5fa64b3130af9a45e7e2a904bde1f8cfb72be5c9 from qemu
2018-03-14 12:10:28 -04:00
Emilio G. Cota 5ad6116f20
tcg: allocate optimizer temps with tcg_malloc
Groundwork for supporting multiple TCG contexts.

While at it, also allocate temps_used directly as a bitmap of the
required size, instead of using a bitmap of TCG_MAX_TEMPS via
TCGTempSet.

Performance-wise we lose about 1.12% in a translation-heavy workload
such as booting+shutting down debian-arm:

Performance counter stats for 'taskset -c 0 arm-softmmu/qemu-system-arm \
-machine type=virt -nographic -smp 1 -m 4096 \
-netdev user,id=unet,hostfwd=tcp::2222-:22 \
-device virtio-net-device,netdev=unet \
-drive file=die-on-boot.qcow2,id=myblock,index=0,if=none \
-device virtio-blk-device,drive=myblock \
-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
-name arm,debug-threads=on -smp 1' (10 runs):

exec time (s) Relative slowdown wrt original (%)
---------------------------------------------------------------
original 20.213321616 0.
tcg_malloc 20.441130078 1.1270214
TCGContext 20.477846517 1.3086662
g_malloc 20.780527895 2.8061013

The other two alternatives shown in the table are:
- TCGContext: embed temps[TCG_MAX_TEMPS] and TCGTempSet used_temps
in TCGContext. This is simple enough but it isn't faster than using
tcg_malloc; moreover, it wastes memory.
- g_malloc: allocate/deallocate both temps and used_temps every time
tcg_optimize is executed.

Backports commit 34184b071817b4f9edbfd1aa2225c196f05a0947 from qemu
2018-03-14 12:10:28 -04:00
Emilio G. Cota 66fa401871
exec-all: rename tb_free to tb_remove
We don't really free anything in this function anymore; we just remove
the TB from the binary search tree.

Backports commit be1e01171b556807198c84feac7cf4bca0d904c2 from qemu
2018-03-13 16:20:41 -04:00
Emilio G. Cota f7c984d21f
translate-all: use a binary search tree to track TBs in TBContext
This is a prerequisite for supporting multiple TCG contexts, since
we will have threads generating code in separate regions of
code_gen_buffer.

For this we need a new field (.size) in struct tb_tc to keep
track of the size of the translated code. This field uses a size_t
to avoid adding a hole to the struct, although really an unsigned
int would have been enough.

The comparison function we use is optimized for the common case:
insertions. Profiling shows that upon booting debian-arm, 98%
of comparisons are between existing tb's (i.e. a->size and b->size
are both !0), which happens during insertions (and removals, but
those are rare). The remaining cases are lookups. From reading the glib
sources we see that the first key is always the lookup key. However,
the code does not assume this to always be the case because this
behaviour is not guaranteed in the glib docs. However, we embed
this knowledge in the code as a branch hint for the compiler.

Note that tb_free does not free space in the code_gen_buffer anymore,
since we cannot easily know whether the tb is the last one inserted
in code_gen_buffer. The next patch in this series renames tb_free
to tb_remove to reflect this.

Performance-wise, lookups in tb_find_pc are the same as before:
O(log n). However, insertions are O(log n) instead of O(1), which
results in a small slowdown when booting debian-arm:

Performance counter stats for 'build/arm-softmmu/qemu-system-arm \
-machine type=virt -nographic -smp 1 -m 4096 \
-netdev user,id=unet,hostfwd=tcp::2222-:22 \
-device virtio-net-device,netdev=unet \
-drive file=img/arm/jessie-arm32.qcow2,id=myblock,index=0,if=none \
-device virtio-blk-device,drive=myblock \
-kernel img/arm/aarch32-current-linux-kernel-only.img \
-append console=ttyAMA0 root=/dev/vda1 \
-name arm,debug-threads=on -smp 1' (10 runs):

- Before:

8048.598422 task-clock (msec) # 0.931 CPUs utilized ( +- 0.28% )
16,974 context-switches # 0.002 M/sec ( +- 0.12% )
0 cpu-migrations # 0.000 K/sec
10,125 page-faults # 0.001 M/sec ( +- 1.23% )
35,144,901,879 cycles # 4.367 GHz ( +- 0.14% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
65,758,252,643 instructions # 1.87 insns per cycle ( +- 0.33% )
10,871,298,668 branches # 1350.707 M/sec ( +- 0.41% )
192,322,212 branch-misses # 1.77% of all branches ( +- 0.32% )

8.640869419 seconds time elapsed ( +- 0.57% )

- After:
8146.242027 task-clock (msec) # 0.923 CPUs utilized ( +- 1.23% )
17,016 context-switches # 0.002 M/sec ( +- 0.40% )
0 cpu-migrations # 0.000 K/sec
18,769 page-faults # 0.002 M/sec ( +- 0.45% )
35,660,956,120 cycles # 4.378 GHz ( +- 1.22% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
65,095,366,607 instructions # 1.83 insns per cycle ( +- 1.73% )
10,803,480,261 branches # 1326.192 M/sec ( +- 1.95% )
195,601,289 branch-misses # 1.81% of all branches ( +- 0.39% )

8.828660235 seconds time elapsed ( +- 0.38% )

Backports commit 2ac01d6dafabd4a726254eea98824c798d416ee4 from qemu
2018-03-13 16:18:29 -04:00
Richard Henderson 35e551dc45
tcg: Remove CF_IGNORE_ICOUNT
Now that we have curr_cflags, we can include CF_USE_ICOUNT
early and then remove it as necessary.

Backports commit 416986d3f97329655e30da7271a2d11c6d707b06 from qemu
2018-03-13 15:28:47 -04:00
Richard Henderson f04beeea78
tcg: Add CF_LAST_IO + CF_USE_ICOUNT to CF_HASH_MASK
These flags are used by target/*/translate.c,
and affect code generation.

Backports commit 0cf8a44c2f56ba884c2f6db47d27fbb24975daa3 from qemu
2018-03-13 15:25:09 -04:00
Emilio G. Cota c384da2f47
tcg: convert tb->cflags reads to tb_cflags(tb)
Convert all existing readers of tb->cflags to tb_cflags, so that we
use atomic_read and therefore avoid undefined behaviour in C11.

Note that the remaining setters/getters of the field are protected
by tb_lock, and therefore do not need conversion.

Luckily all readers access the field via 'tb->cflags' (so no foo.cflags,
bar->cflags in the code base), which makes the conversion easily
scriptable:

FILES=$(git grep 'tb->cflags' target include/exec/gen-icount.h \
accel/tcg/translator.c | cut -f1 -d':' | sort | uniq)

perl -pi -e 's/([^.>])tb->cflags/$1tb_cflags(tb)/g' $FILES
perl -pi -e 's/([a-z->.]*)(->|\.)tb->cflags/tb_cflags($1$2tb)/g' $FILES

Then manually fixed the few errors that checkpatch reported.

Compile-tested for all targets.

Backports commit c5a49c63fa26e8825ad101dfe86339ae4c216539 from qemu
2018-03-13 14:57:51 -04:00
Richard Henderson d6ca4d59dc
tcg: Include CF_COUNT_MASK in CF_HASH_MASK
Backports commit cdfef1715c779eb528d633e8b76cbc8a10e71ac8 from qemu
2018-03-13 14:42:42 -04:00
Richard Henderson 5d360366e9
tcg: Add CPUState cflags_next_tb
We were generating code during tb_invalidate_phys_page_range,
check_watchpoint, cpu_io_recompile, and (seemingly) discarding
the TB, assuming that it would magically be picked up during
the next iteration through the cpu_exec loop.

Instead, record the desired cflags in CPUState so that we request
the proper TB so that there is no more magic.

Backports commit 9b990ee5a3cc6aa38f81266fb0c6ef37a36c45b9 from qemu
2018-03-13 14:39:43 -04:00
Emilio G. Cota b5961a139b
tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK
This will enable us to decouple code translation from the value
of parallel_cpus at any given time. It will also help us minimize
TB flushes when generating code via EXCP_ATOMIC.

Note that the declaration of parallel_cpus is brought to exec-all.h
to be able to define there the "curr_cflags" inline.

Backports commit 4e2ca83e71b51577b06b1468e836556912bd5b6e from qemu
2018-03-13 14:32:43 -04:00
Emilio G. Cota 6bc05eeee4
tb hash: track translated blocks with qht
Having a fixed-size hash table for keeping track of all translation blocks
is suboptimal: some workloads are just too big or too small to get maximum
performance from the hash table. The MRU promotion policy helps improve
performance when the hash table is a little undersized, but it cannot
make up for severely undersized hash tables.

Furthermore, frequent MRU promotions result in writes that are a scalability
bottleneck. For scalability, lookups should only perform reads, not writes.
This is not a big deal for now, but it will become one once MTTCG matures.

The appended fixes these issues by using qht as the implementation of
the TB hash table. This solution is superior to other alternatives considered,
namely:

- master: implementation in QEMU before this patchset
- xxhash: before this patch, i.e. fixed buckets + xxhash hashing + MRU.
- xxhash-rcu: fixed buckets + xxhash + RCU list + MRU.
MRU is implemented here by adding an intermediate struct
that contains the u32 hash and a pointer to the TB; this
allows us, on an MRU promotion, to copy said struct (that is not
at the head), and put this new copy at the head. After a grace
period, the original non-head struct can be eliminated, and
after another grace period, freed.
- qht-fixed-nomru: fixed buckets + xxhash + qht without auto-resize +
no MRU for lookups; MRU for inserts.
The appended solution is the following:
- qht-dyn-nomru: dynamic number of buckets + xxhash + qht w/ auto-resize +
no MRU for lookups; MRU for inserts.

The plots below compare the considered solutions. The Y axis shows the
boot time (in seconds) of a debian jessie image with arm-softmmu; the X axis
sweeps the number of buckets (or initial number of buckets for qht-autoresize).
The plots in PNG format (and with errorbars) can be seen here:
http://imgur.com/a/Awgnq

Each test runs 5 times, and the entire QEMU process is pinned to a
single core for repeatability of results.

Host: Intel Xeon E5-2690

28 ++------------+-------------+-------------+-------------+------------++
A***** + + + master **A*** +
27 ++ * xxhash ##B###++
| A******A****** xxhash-rcu $$C$$$ |
26 C$$ A******A****** qht-fixed-nomru*%%D%%%++
D%%$$ A******A******A*qht-dyn-mru A*E****A
25 ++ %%$$ qht-dyn-nomru &&F&&&++
B#####% |
24 ++ #C$$$$$ ++
| B### $ |
| ## C$$$$$$ |
23 ++ # C$$$$$$ ++
| B###### C$$$$$$ %%%D
22 ++ %B###### C$$$$$$C$$$$$$C$$$$$$C$$$$$$C$$$$$$C
| D%%%%%%B###### @E@@@@@@ %%%D%%%@@@E@@@@@@E
21 E@@@@@@E@@@@@@F&&&@@@E@@@&&&D%%%%%%B######B######B######B######B######B
+ E@@@ F&&& + E@ + F&&& + +
20 ++------------+-------------+-------------+-------------+------------++
14 16 18 20 22 24
log2 number of buckets

Host: Intel i7-4790K

14.5 ++------------+------------+-------------+------------+------------++
A** + + + master **A*** +
14 ++ ** xxhash ##B###++
13.5 ++ ** xxhash-rcu $$C$$$++
| qht-fixed-nomru %%D%%% |
13 ++ A****** qht-dyn-mru @@E@@@++
| A*****A******A****** qht-dyn-nomru &&F&&& |
12.5 C$$ A******A******A*****A****** ***A
12 ++ $$ A*** ++
D%%% $$ |
11.5 ++ %% ++
B### %C$$$$$$ |
11 ++ ## D%%%%% C$$$$$ ++
| # % C$$$$$$ |
10.5 F&&&&&&B######D%%%%% C$$$$$$C$$$$$$C$$$$$$C$$$$$C$$$$$$ $$$C
10 E@@@@@@E@@@@@@B#####B######B######E@@@@@@E@@@%%%D%%%%%D%%%###B######B
+ F&& D%%%%%%B######B######B#####B###@@@D%%% +
9.5 ++------------+------------+-------------+------------+------------++
14 16 18 20 22 24
log2 number of buckets

Note that the original point before this patch series is X=15 for "master";
the little sensitivity to the increased number of buckets is due to the
poor hashing function in master.

xxhash-rcu has significant overhead due to the constant churn of allocating
and deallocating intermediate structs for implementing MRU. An alternative
would be do consider failed lookups as "maybe not there", and then
acquire the external lock (tb_lock in this case) to really confirm that
there was indeed a failed lookup. This, however, would not be enough
to implement dynamic resizing--this is more complex: see
"Resizable, Scalable, Concurrent Hash Tables via Relativistic
Programming" by Triplett, McKenney and Walpole. This solution was
discarded due to the very coarse RCU read critical sections that we have
in MTTCG; resizing requires waiting for readers after every pointer update,
and resizes require many pointer updates, so this would quickly become
prohibitive.

qht-fixed-nomru shows that MRU promotion is advisable for undersized
hash tables.

However, qht-dyn-mru shows that MRU promotion is not important if the
hash table is properly sized: there is virtually no difference in
performance between qht-dyn-nomru and qht-dyn-mru.

Before this patch, we're at X=15 on "xxhash"; after this patch, we're at
X=15 @ qht-dyn-nomru. This patch thus matches the best performance that we
can achieve with optimum sizing of the hash table, while keeping the hash
table scalable for readers.

The improvement we get before and after this patch for booting debian jessie
with arm-softmmu is:

- Intel Xeon E5-2690: 10.5% less time
- Intel i7-4790K: 5.2% less time

We could get this same improvement _for this particular workload_ by
statically increasing the size of the hash table. But this would hurt
workloads that do not need a large hash table. The dynamic (upward)
resizing allows us to start small and enlarge the hash table as needed.

A quick note on downsizing: the table is resized back to 2**15 buckets
on every tb_flush; this makes sense because it is not guaranteed that the
table will reach the same number of TBs later on (e.g. most bootup code is
thrown away after boot); it makes sense to grow the hash table as
more code blocks are translated. This also avoids the complication of
having to build downsizing hysteresis logic into qht.

Backports commit 909eaac9bbc2ed4f3a82ce38e905b87d478a3e00 from qemu
2018-03-13 14:16:26 -04:00
Lioncash e45c294405
Backport qht hashtable 2018-03-13 13:55:30 -04:00
Kevin Wolf 025e354370
qdict: Introduce qdict_rename_keys()
A few block drivers will need to rename .bdrv_create options for their
QAPIfication, so let's have a helper function for that.

Backports commit bcebf102ccc3c6db327f341adc379fdf0673ca6b from qemu
2018-03-12 10:11:48 -04:00
Lioncash a81439c7ca
exec: Drop unnecessary code for unicorn
The dirty memory code isn't strictly necessary
2018-03-12 10:11:46 -04:00
Alexey Kardashevskiy b90333a531
memory: Share special empty FlatView
This shares an cached empty FlatView among address spaces. The empty
FV is used every time when a root MR renders into a FV without memory
sections which happens when MR or its children are not enabled or
zero-sized. The empty_view is not NULL to keep the rest of memory
API intact; it also has a dispatch tree for the same reason.

On POWER8 with 255 CPUs, 255 virtio-net, 40 PCI bridges guest this halves
the amount of FlatView's in use (557 -> 260) and dispatch tables
(~800000 -> ~370000). In an unrelated experiment with 112 non-virtio
devices on x86 ("-M pc"), only 4 FlatViews are alive, and about ~2000
are created at startup.

Backports commit 092aa2fc65b7a35121616aad8f39d47b8f921618 from qemu
2018-03-11 22:34:28 -04:00
Alexey Kardashevskiy 1fd8b64072
memory: Get rid of address_space_init_shareable
Since FlatViews are shared now and ASes not, this gets rid of
address_space_init_shareable().

This should cause no behavioural change.

Backports commit b516572f31c0ea0937cd9d11d9bd72dd83809886 from qemu
2018-03-11 22:12:38 -04:00
Alexey Kardashevskiy f2c72dc278
memory: Share FlatView's and dispatch trees between address spaces
This allows sharing flat views between address spaces (AS) when
the same root memory region is used when creating a new address space.
This is done by walking through all ASes and caching one FlatView per
a physical root MR (i.e. not aliased).

This removes search for duplicates from address_space_init_shareable() as
FlatViews are shared elsewhere and keeping as::ref_count correct seems
an unnecessary and useless complication.

This should cause no change and memory use or boot time yet.

Backports commit 967dc9b1194a9281124b2e1ce67b6c3359a2138f from qemu
2018-03-11 22:05:44 -04:00
Alexey Kardashevskiy d9bc1bcc8c
memory: Rename mem_begin/mem_commit/mem_add helpers
This renames some helpers to reflect better what they do.

This should cause no behavioural change.

Backports commit 8629d3fcb77e9775e44d9051bad0fb5187925eae from qemu
2018-03-11 21:36:50 -04:00
Alexey Kardashevskiy aa2b76b4e8
memory: Switch memory from using AddressSpace to FlatView
FlatView's will be shared between AddressSpace's and subpage_t
and MemoryRegionSection cannot store AS anymore, hence this change.

In particular, for:

typedef struct subpage_t {
MemoryRegion iomem;
- AddressSpace *as;
+ FlatView *fv;
hwaddr base;
uint16_t sub_section[];
} subpage_t;

struct MemoryRegionSection {
MemoryRegion *mr;
- AddressSpace *address_space;
+ FlatView *fv;
hwaddr offset_within_region;
Int128 size;
hwaddr offset_within_address_space;
bool readonly;
};

This should cause no behavioural change.

Backports commit 166206845f7fd75e720e6feea0bb01957c8da07f from qemu
2018-03-11 21:21:37 -04:00
Lioncash 1591f208c0
memory: Move AddressSpaceDispatch from AddressSpace to FlatView
As we are going to share FlatView's between AddressSpace's,
and AddressSpaceDispatch is a structure to perform quick lookup
in FlatView, this moves ASD to FlatView.

After previosly open coded ASD rendering, we can also remove
as->next_dispatch as the new FlatView pointer is stored
on a stack and set to an AS atomically.

flatview_destroy() is executed under RCU instead of
address_space_dispatch_free() now.

This makes mem_begin/mem_commit to work with ASD and mem_add with FV
as later on mem_add will be taking FV as an argument anyway.

This should cause no behavioural change.

Backports commit 66a6df1dc6d5b28cc3e65db0d71683fbdddc6b62 from qemu
2018-03-11 20:40:24 -04:00
Marc-André Lureau aee9f7327f
machine: use class base init generated name
machine_class_base_init() member name is allocated by
machine_class_base_init(), but not freed by
machine_class_finalize().  Simply freeing there doesn't work,
because DEFINE_PC_MACHINE() overwrites it with a literal string.

Fix DEFINE_PC_MACHINE() not to overwrite it, and add the missing
free to machine_class_finalize().

Backports commit 8ea753718b2d1a42e9ce7b8db9f5e4e1f330e827 from qemu
2018-03-11 16:54:40 -04:00
Lioncash 8648b1df4f
include/elf: Update elf.h to commit f71a8eaffba3271cf7cdad95572f6996f7523a5b 2018-03-11 15:34:35 -04:00
Eduardo Habkost 7c7bb4c6d1
machine: Eliminate QEMUMachine and qemu_register_machine()
The struct is not used anymore and can be eliminated.

Backports commit 3b53e45f43825caaaf4fad6a5b85ce6a9949ff02 from qemu
2018-03-11 15:22:25 -04:00
Andreas Färber 048aaf05ca
Revert use of DEFINE_MACHINE() for registrations of multiple machines
The script used for converting from QEMUMachine had used one
DEFINE_MACHINE() per machine registered. In cases where multiple
machines are registered from one source file, avoid the excessive
generation of module init functions by reverting this unrolling.

Backports commit 8a661aea0e7f6e776c6ebc9abe339a85b34fea1d from qemu
2018-03-11 15:17:17 -04:00
Eduardo Habkost a7f59d7771
Use DEFINE_MACHINE() to register all machines
Convert all machines to use DEFINE_MACHINE() instead of QEMUMachine
automatically using a script.

Backports commit e264d29de28c5b0be3d063307ce9fb613b427cc3 from qemu
2018-03-11 15:12:46 -04:00
Eduardo Habkost 426b961644
machine: DEFINE_MACHINE() macro
The macro will allow easy registration of a TYPE_MACHINE subclass, using
only the machine name and a MachineClass initialization function as
parameter.

Backports commit ed0b6de343448d1014b53bcf541041373322fa1c from qemu
2018-03-11 14:42:12 -04:00
Eduardo Habkost 46e1c5482b
machine: Set MachineClass::name automatically
Now all TYPE_MACHINE subclasses use MACHINE_TYPE_NAME to generate the
class name. So instead of requiring each subclass to set
MachineClass::name manually, we can now set it automatically at the
TYPE_MACHINE class_base_init() function.

Backports commit 98cec76a7076c4a38e16f1a9de170a7942b3be54 from qemu
2018-03-11 14:38:58 -04:00
Eduardo Habkost 0261df973b
machine: Ensure all TYPE_MACHINE subclasses have the right suffix
Now that all non-abstract TYPE_MACHINE subclasses have the -machine
suffix, add an assert to ensure this will be always true.

Backports commit dcb3d601115eed77aef543fe3a920adc17544e06 from qemu
2018-03-11 14:30:38 -04:00
Eduardo Habkost df4cfe6804
machine: MACHINE_TYPE_NAME macro
The macro will be useful to ensure the machine class names follow the
right format to make machine class lookup by class name work correctly.

Backports commit c84a8f01b2a5d8bf98c447796d4a747333a5b1fd from qemu
2018-03-11 13:44:26 -04:00
Eduardo Habkost 940d2371ea
machine: Remove unused fields from QEMUMachine
This removes the following fields from QEMUMachine: family, alias,
reset, hot_add_cpu, units_per_default_bus, no_serial, no_parallel,
use_virtcon, use_sclp, no_floppy, no_cdrom, default_display,
compat_props, and hw_version.

The only users of those fields were already converted to use QOM and
MachineClass directly, so they are not needed anymore.

Backports commit d48f4fa69eb3efb03a2efe2e4606a97a17cf222f from qemu
2018-03-09 14:26:23 -05:00
Eduardo Habkost 12acb995fa
pc: Don't use QEMUMachine anymore
Now that we have a DEFINE_PC_MACHINE helper macro that just requires an
initialization function, it is trivial to convert them to register a QOM
machine class directly, instead of using QEMUMachine.

Backports commit 865906f7fdadd2732441ab158787f81f6a212bfe from qemu
2018-03-09 14:22:43 -05:00
Eduardo Habkost b65a3ece3b
machine: Remove unused fields from QEMUMachine
This removes the following fields from QEMUMachine: family, alias,
reset, hot_add_cpu, units_per_default_bus, no_serial, no_parallel,
use_virtcon, use_sclp, no_floppy, no_cdrom, default_display,
compat_props, and hw_version.

The only users of those fields were already converted to use QOM and
MachineClass directly, so they are not needed anymore.

Backports commit d48f4fa69eb3efb03a2efe2e4606a97a17cf222f from qemu
2018-03-09 13:41:30 -05:00
Marc-André Lureau 9ec040b74d
bus: simplify name handling
Simplify a bit the code by using g_strdup_printf() and store it in a
non-const value so casting is no longer needed, and ownership is
clearer.

Backports commit f73480c36f49562556b80bb5bf8acc45e20dcca1 from qemu
2018-03-09 13:02:15 -05:00
Thomas Huth af3cd62c4b
Introduce DEVICE_CATEGORY_CPU for CPU devices
Now that CPUs show up in the help text of "-device ?",
we should group them into an appropriate category.

Backports commit ba31cc7226ebcee639f18faa90c1542bd364fba3 from qemu
2018-03-09 13:00:32 -05:00
Peter Maydell 4149e877c4
configure: Drop ancient Solaris 9 and earlier support
Solaris 9 was released in 2002, its successor Solaris 10 was
released in 2005, and Solaris 9 was end-of-lifed in 2014.
Nobody has stepped forward to express interest in supporting
Solaris of any flavour, so removing support for the ancient
versions seems uncontroversial.

In particular, this allows us to remove a use of 'uname'
in configure that won't work if you're cross-compiling.

Backports commit 91939262ffcd3c85ea6a4793d3029326eea1d649 from qemu
2018-03-09 12:14:21 -05:00
Richard Henderson 7e327aaf84
util: Introduce include/qemu/cpuid.h
Clang 3.9 passes the CONFIG_AVX2_OPT configure test. However, the
supplied <cpuid.h> does not contain the bit_AVX2 define that we use
when detecting whether the routine can be enabled.

Introduce a qemu-specific header that uses the compiler's definition
of __cpuid et al, but supplies any missing bit_* definitions needed.
This avoids introducing any extra ifdefs to util/bufferiszero.c, and
allows quite a few to be removed from tcg/i386/tcg-target.inc.c.

Backports commit 5dd8990841a9e331d9d4838a116291698208cbb6 from qemu
2018-03-09 12:12:00 -05:00
Peter Maydell 6d0e83d218
Drop remaining bits of ia64 host support
We dropped support for ia64 host CPUs in the 2.11 release (removing
the TCG backend for it, and advertising the support as being
completely removed in the changelog).  However there are a few bits
and pieces of code still floating about.  Remove those, too.

We can drop the check in configure for "ia64 or hppa host?"
entirely, because we don't support hppa hosts either any more.

Backports commit b1cef6d02f84bd842fb94a6109ad4e2ad873e8e5 from qemu
2018-03-09 11:54:57 -05:00
Markus Armbruster 3277400723
qapi: Move qapi-schema.json to qapi/, rename generated files
Move qapi-schema.json to qapi/, so it's next to its modules, and all
files get generated to qapi/, not just the ones generated for modules.

Consistently name the generated files qapi-MODULE.EXT:
qmp-commands.[ch] become qapi-commands.[ch], qapi-event.[ch] become
qapi-events.[ch], and qmp-introspect.[ch] become qapi-introspect.[ch].
This gets rid of the temporary hacks in scripts/qapi/commands.py,
scripts/qapi/events.py, and scripts/qapi/common.py.

Backports commit eb815e248f50cde9ab86eddd57eca5019b71ca78 from qemu
2018-03-09 11:35:11 -05:00
Markus Armbruster 5500a5e912
Include less of the generated modular QAPI headers
In my "build everything" tree, a change to the types in
qapi-schema.json triggers a recompile of about 4800 out of 5100
objects.

The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h,
qapi-types.h. Each of these headers still includes all its shards.
Reduce compile time by including just the shards we actually need.

To illustrate the benefits: adding a type to qapi/migration.json now
recompiles some 2300 instead of 4800 objects. The next commit will
improve it further.

Backports commit 9af2398977a78d37bf184d6ff6bd04c72bfbf006 from qemu
2018-03-09 10:06:19 -05:00
Laurent Vivier 5fa3a97549
softfloat: use floatx80_infinity in softfloat
Since f3218a8 ("softfloat: add floatx80 constants")
floatx80_infinity is defined but never used.

This patch updates floatx80 functions to use
this definition.

This allows to define a different default Infinity
value on m68k: the m68k FPU defines infinity with
all bits set to zero in the mantissa.

Backports commit 0f605c889ca3fe9744166ad4149d0dff6dacb696 from qemu
2018-03-09 01:34:45 -05:00
Laurent Vivier b42fcb5496
softfloat: export some functions
Move fpu/softfloat-macros.h to include/fpu/

Export floatx80 functions to be used by target floatx80
specific implementations.

Exports:
propagateFloatx80NaN(), extractFloatx80Frac(),
extractFloatx80Exp(), extractFloatx80Sign(),
normalizeFloatx80Subnormal(), packFloatx80(),
roundAndPackFloatx80(), normalizeRoundAndPackFloatx80()

Also exports packFloat32() that will be used to implement
m68k fsinh, fcos, fsin, ftan operations.

Backports commit 88857aca93f6ec8f372fb9c8201394b0e5582034 from qemu
2018-03-09 01:22:00 -05:00
Alex Bennée 4b2577537b
arm/translate-a64: add FP16 FR[ECP/SQRT]S to simd_three_reg_same_fp16
As some of the constants here will also be needed
elsewhere (specifically for the upcoming SVE support) we move them out
to softfloat.h.

Backports commit 026e2d6ef74000afb9049f46add4b94f594c8fb3 from qemu
2018-03-08 15:47:34 -05:00
Alex Bennée a02b9b81a9
arm/translate-a64: add FP16 FMULA/X/S to simd_three_reg_same_fp16
Backports commit 2deb992b767d28035fac3b374c7730494ff0b43d from qemu

Also backports the fp16 changes introduced in commit f566c0474a9b9bbd9ed248607e4007e24d3358c0
2018-03-08 15:42:48 -05:00
Alex Bennée e56ed38819
include/exec/helper-head.h: support f16 in helper calls
This allows us to explicitly pass float16 to helpers rather than
assuming uint32_t and dealing with the result. Of course they will be
passed in i32 sized registers by default.

Backports commit 35737497008aeabce5dc381a41d3827bec486192 from qemu
2018-03-08 12:28:05 -05:00
Alex Bennée 283abedc68
fpu/softfloat: re-factor sqrt
This is a little bit of a departure from softfloat's original approach
as we skip the estimate step in favour of a straight iteration. There
is a minor optimisation to avoid calculating more bits of precision
than we need however this still brings a performance drop, especially
for float64 operations.

Backports commit c13bb2da9eedfbc5886c8048df1bc1114b285fb0 from qemu
2018-03-08 12:23:54 -05:00
Alex Bennée e2fb4b40c3
fpu/softfloat: re-factor compare
The compare function was already expanded from a macro. I keep the
macro expansion but move most of the logic into a compare_decomposed.

Backports commit 0c4c90929143a530730e2879204a55a30bf63758 from qemu
2018-03-08 12:21:20 -05:00
Alex Bennée c38b64f8a9
fpu/softfloat: re-factor minmax
Let's do the same re-factor treatment for minmax functions. I still
use the MACRO trick to expand but now all the checking code is common.

Backports commit 89360067071b1844bf745682e18db7dde74cdb8d from qemu
2018-03-08 12:18:35 -05:00
Alex Bennée 9b296329f6
fpu/softfloat: re-factor scalbn
This is one of the simpler manipulations you could make to a floating
point number.

Backports commit 0bfc9f195209593e91a98cf2233753f56a2e5c02 from qemu
2018-03-08 12:16:19 -05:00