Commit graph

501 commits

Author SHA1 Message Date
Emilio G. Cota 8552d95c52
exec-all: extract tb->tc_* into a separate struct tc_tb
In preparation for adding tc.size to be able to keep track of
TB's using the binary search tree implementation from glib.

Backports commit e7e168f41364c6e83d0f75fc1b3ce7f9c41ccf76 from qemu
2018-03-05 02:57:22 -05:00
Emilio G. Cota 5fc83f3eb2
exec-all: introduce TB_PAGE_ADDR_FMT
And fix the following warning when DEBUG_TB_INVALIDATE is enabled
in translate-all.c:

CC mipsn32-linux-user/accel/tcg/translate-all.o
/data/src/qemu/accel/tcg/translate-all.c: In function ‘tb_alloc_page’:
/data/src/qemu/accel/tcg/translate-all.c:1201:16: error: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘tb_page_addr_t {aka unsigned int}’ [-Werror=format=]
printf("protecting code page: 0x" TARGET_FMT_lx "\n",
^
cc1: all warnings being treated as errors
/data/src/qemu/rules.mak:66: recipe for target 'accel/tcg/translate-all.o' failed
make[1]: *** [accel/tcg/translate-all.o] Error 1
Makefile:328: recipe for target 'subdir-mipsn32-linux-user' failed
make: *** [subdir-mipsn32-linux-user] Error 2
cota@flamenco:/data/src/qemu/build ((18f3fe1...) *$)$

Backports commit 67a5b5d2f6eb6d3b980570223ba5c478487ddb6f from qemu
2018-03-05 02:49:44 -05:00
Emilio G. Cota b4a7d8b773
exec-all: bring tb->invalid into tb->cflags
This gets rid of a hole in struct TranslationBlock.

Backports commit 84f1c148da2b35fbb5a436597872765257e8914e from qemu
2018-03-05 02:46:21 -05:00
Emilio G. Cota 210d13ec49
tcg: consolidate TB lookups in tb_lookup__cpu_state
This avoids duplicating code. cpu_exec_step will also use the
new common function once we integrate parallel_cpus into tb->cflags.

Note that in this commit we also fix a race, described by Richard Henderson
during review. Think of this scenario with threads A and B:

(A) Lookup succeeds for TB in hash without tb_lock
(B) Sets the TB's tb->invalid flag
(B) Removes the TB from tb_htable
(B) Clears all CPU's tb_jmp_cache
(A) Store TB into local tb_jmp_cache

Given that order of events, (A) will keep executing that invalid TB until
another flush of its tb_jmp_cache happens, which in theory might never happen.
We can fix this by checking the tb->invalid flag every time we look up a TB
from tb_jmp_cache, so that in the above scenario, next time we try to find
that TB in tb_jmp_cache, we won't, and will therefore be forced to look it
up in tb_htable.

Performance-wise, I measured a small improvement when booting debian-arm.
Note that inlining pays off:

Performance counter stats for 'taskset -c 0 qemu-system-arm \
-machine type=virt -nographic -smp 1 -m 4096 \
-netdev user,id=unet,hostfwd=tcp::2222-:22 \
-device virtio-net-device,netdev=unet \
-drive file=jessie.qcow2,id=myblock,index=0,if=none \
-device virtio-blk-device,drive=myblock \
-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
-name arm,debug-threads=on -smp 1' (10 runs):

Before:
18714.917392 task-clock # 0.952 CPUs utilized ( +- 0.95% )
23,142 context-switches # 0.001 M/sec ( +- 0.50% )
1 CPU-migrations # 0.000 M/sec
10,558 page-faults # 0.001 M/sec ( +- 0.95% )
53,957,727,252 cycles # 2.883 GHz ( +- 0.91% ) [83.33%]
24,440,599,852 stalled-cycles-frontend # 45.30% frontend cycles idle ( +- 1.20% ) [83.33%]
16,495,714,424 stalled-cycles-backend # 30.57% backend cycles idle ( +- 0.95% ) [66.66%]
76,267,572,582 instructions # 1.41 insns per cycle
12,692,186,323 branches # 678.186 M/sec ( +- 0.92% ) [83.35%]
263,486,879 branch-misses # 2.08% of all branches ( +- 0.73% ) [83.34%]

19.648474449 seconds time elapsed ( +- 0.82% )

After, w/ inline (this patch):
18471.376627 task-clock # 0.955 CPUs utilized ( +- 0.96% )
23,048 context-switches # 0.001 M/sec ( +- 0.48% )
1 CPU-migrations # 0.000 M/sec
10,708 page-faults # 0.001 M/sec ( +- 0.81% )
53,208,990,796 cycles # 2.881 GHz ( +- 0.98% ) [83.34%]
23,941,071,673 stalled-cycles-frontend # 44.99% frontend cycles idle ( +- 0.95% ) [83.34%]
16,161,773,848 stalled-cycles-backend # 30.37% backend cycles idle ( +- 0.76% ) [66.67%]
75,786,269,766 instructions # 1.42 insns per cycle
12,573,617,143 branches # 680.708 M/sec ( +- 1.34% ) [83.33%]
260,235,550 branch-misses # 2.07% of all branches ( +- 0.66% ) [83.33%]

19.340502161 seconds time elapsed ( +- 0.56% )

After, w/o inline:
18791.253967 task-clock # 0.954 CPUs utilized ( +- 0.78% )
23,230 context-switches # 0.001 M/sec ( +- 0.42% )
1 CPU-migrations # 0.000 M/sec
10,563 page-faults # 0.001 M/sec ( +- 1.27% )
54,168,674,622 cycles # 2.883 GHz ( +- 0.80% ) [83.34%]
24,244,712,629 stalled-cycles-frontend # 44.76% frontend cycles idle ( +- 1.37% ) [83.33%]
16,288,648,572 stalled-cycles-backend # 30.07% backend cycles idle ( +- 0.95% ) [66.66%]
77,659,755,503 instructions # 1.43 insns per cycle
12,922,780,045 branches # 687.702 M/sec ( +- 1.06% ) [83.34%]
261,962,386 branch-misses # 2.03% of all branches ( +- 0.71% ) [83.35%]

19.700174670 seconds time elapsed ( +- 0.56% )

Backports commit f6bb84d53110398f4899c19dab4e0fe9908ec060 from qemu
2018-03-05 02:42:46 -05:00
Emilio G. Cota 68ddc0cb08
exec-all: fix typos in TranslationBlock's documentation
Backports commit eb5e2b9e3b141de0c435eedc31c26cbbdefbee1b from qemu
2018-03-05 02:10:28 -05:00
Peter Xu 0741c3880a
qom: provide root container for internal objs
We have object_get_objects_root() to keep user created objects, however
no place for objects that will be used internally. Create such a
container for internal objects.

Backports commit 7c47c4ead75d0b733ee8f2f51fd1de0644cc1308 from qemu
2018-03-05 01:16:50 -05:00
Peter Xu 4956effd11
bitmap: provide to_le/from_le helpers
Provide helpers to convert bitmaps to little endian format. It can be
used when we want to send one bitmap via network to some other hosts.

One thing to mention is that, these helpers only solve the problem of
endianess, but it does not solve the problem of different word size on
machines (the bitmaps managing same count of bits may contains different
size when malloced). So we need to take care of the size alignment issue
on the callers for now.

Backports commit d7788151a0807d5d2d410e3f8944d8c8a651f8d2 from qemu
2018-03-05 01:11:13 -05:00
Peter Xu 3d5fa79305
bitmap: introduce bitmap_count_one()
Count how many bits set in the bitmap.

Backports commit fc7deeea26af3d08f45bad85b8bd3fc3d790a090 from qemu
2018-03-05 01:08:29 -05:00
Cornelia Huck 6f9b7a9363
cpu: drop old comments describing members
These comments are obviously stale.

Backports commit 6fda014e1a65474c4877b36cc42e8a0f377817a4 from qemu
2018-03-05 00:03:31 -05:00
Eric Blake 3017797f7d
osdep.h: Prohibit disabling assert() in supported builds
We already have several files that knowingly require assert()
to work, sometimes because refactoring the code for proper
error handling has not been tackled yet; there are probably
other files that have a similar situation but with no comments
documenting the same. In fact, we have places in migration
that handle untrusted input with assertions, where disabling
the assertions risks a worse security hole than the current
behavior of losing the guest to SIGABRT when migration fails
because of the assertion. Promote our current per-file
safety-valve to instead be project-wide, and expand it to also
cover glib's g_assert().

Note that we do NOT want to encourage 'assert(side-effects);'
(that is a bad practice that prevents copy-and-paste of code to
other projects that CAN disable assertions; plus it costs
unnecessary reviewer mental cycles to remember whether a project
special-cases the crippling of asserts); and we would LIKE to
fix migration to not rely on asserts (but that takes a big code
audit). But in the meantime, we DO want to send a message
that anyone that disables assertions has to tweak code in order
to compile, making it obvious that they are taking on additional
risk that we are not going to support. At the same time, leave
comments mentioning NDEBUG in files that we know still need to
be scrubbed, so there is at least something to grep for.

It would be possible to come up with some other mechanism for
doing runtime checking by default, but which does not abort
the program on failure, while leaving side effects in place
(unlike how crippling assert() avoids even the side effects),
perhaps under the name q_verify(); but it was not deemed worth
the effort (developers should not have to learn a replacement
when the standard C macro works just fine, and it would be a lot
of churn for little gain). The patch specifically uses #error
rather than #warn so that a user is forced to tweak the header
to acknowledge the issue, even when not using a -Werror
compilation.

Backports commit 262a69f4282e44426c7a132138581d400053e0a1 from qemu
2018-03-05 00:01:57 -05:00
Richard Henderson bc23bab79d
tcg/s390: Use constant pool for movi
Split out maybe_out_small_movi for use with other operations
that want to add to the constant pool.

Backports commit 28eef8aaece5e83df4568d9842ab9611ec130b2c from qemu
2018-03-04 22:32:04 -05:00
Richard Henderson 31b8b67cd3
tcg: Move USE_DIRECT_JUMP discriminator to tcg/cpu/tcg-target.h
Replace the USE_DIRECT_JUMP ifdef with a TCG_TARGET_HAS_direct_jump
boolean test. Replace the tb_set_jmp_target1 ifdef with an unconditional
function tb_target_set_jmp_target.

While we're touching all backends, add a parameter for tb->tc_ptr;
we're going to need it shortly for some backends.

Move tb_set_jmp_target and tb_add_jump from exec-all.h to cpu-exec.c.

Backports commit a85833933628384d74ec412024d55cf012640287 from qemu
2018-03-04 21:52:35 -05:00
Peter Maydell 2070ef1c37
boards.h: Define new flag ignore_memory_transaction_failures
Define a new MachineClass field ignore_memory_transaction_failures.
If this is flag is true then the CPU will ignore memory transaction
failures which should cause the CPU to take an exception due to an
access to an unassigned physical address; the transaction will
instead return zero (for a read) or be ignored (for a write). This
should be set only by legacy board models which rely on the old
RAZ/WI behaviour for handling devices that QEMU does not yet model.
New board models should instead use "unimplemented-device" for all
memory ranges where the guest will attempt to probe for a device that
QEMU doesn't implement and a stub device is required.

We need this for ARM boards, where we're about to implement support for
generating external aborts on memory transaction failures. Too many
of our legacy board models rely on the RAZ/WI behaviour and we
would break currently working guests when their "probe for device"
code provoked an external abort rather than a RAZ.

Backports commit ed860129acd3fcd0b1e47884e810212aaca4d21b from qemu
2018-03-04 21:27:15 -05:00
Lluís Vilanova ed7225e685
tcg: Add generic translation framework
Backports commit bb2e0039dc07177f928f9fe24758967da02d60a2 from qemu
2018-03-04 14:31:16 -05:00
Paolo Bonzini 6997a5a090
gen-icount: check cflags instead of use_icount global
Backports commit cd42d5b23691ad73edfd6dbcfc935a960a9c5a65 from qemu
2018-03-04 14:26:26 -05:00
Lluís Vilanova 3a196c62ae
target: [tcg] Use a generic enum for DISAS_ values
Used later. An enum makes expected values explicit and
bounds the value space of switches.

Backports commit 77fc6f5e28667634916f114ae04c6029cd7b9c45 from qemu
2018-03-04 14:08:43 -05:00
Richard Henderson b8a16f841a
tcg: Add generic DISAS_NORETURN
This will allow some amount of cleanup to happen before
switching the backends over to enum DisasJumpType.

Backports commit 5dc66895b0113034cd37fd5e65911d7959fc26a9 from qemu
2018-03-04 13:49:18 -05:00
Peter Maydell c44d323359
cpu: Define new cpu_transaction_failed() hook
Currently we have a rather half-baked setup for allowing CPUs to
generate exceptions on accesses to invalid memory: the CPU has a
cpu_unassigned_access() hook which the memory system calls in
unassigned_mem_write() and unassigned_mem_read() if the current_cpu
pointer is non-NULL. This was originally designed before we
implemented the MemTxResult type that allows memory operations to
report a success or failure code, which is why the hook is called
right at the bottom of the memory system. The major problem with
this is that it means that the hook can be called even when the
access was not actually done by the CPU: for instance if the CPU
writes to a DMA engine register which causes the DMA engine to begin
a transaction which has been set up by the guest to operate on
invalid memory then this will casue the CPU to take an exception
incorrectly. Another minor problem is that currently if a device
returns a transaction error then this won't turn into a CPU exception
at all.

The right way to do this is to have allow the CPU to respond
to memory system transaction failures at the point where the
CPU specific code calls into the memory system.

Define a new QOM CPU method and utility function
cpu_transaction_failed() which is called in these cases.
The functionality here overlaps with the existing
cpu_unassigned_access() because individual target CPUs will
need some work to convert them to the new system. When this
transition is complete we can remove the old cpu_unassigned_access()
code.

Backports commit 0dff0939f6fc6a7abd966d4295f06a06d7a01df9 from qemu
2018-03-04 13:11:50 -05:00
Peter Maydell 26c8f31d9e
memory.h: Move MemTxResult type to memattrs.h
Move the MemTxResult type to memattrs.h. We're going to want to
use it in cpu/qom.h, which doesn't want to include all of
memory.h. In practice MemTxResult and MemTxAttrs are pretty
closely linked since both are used for the new-style
read_with_attrs and write_with_attrs callbacks, so memattrs.h
is a reasonable home for this rather than creating a whole
new header file for it.

Backports commit 3114d092b1740f9db9aa559aeb48ee387011e1da from qemu
2018-03-04 13:10:47 -05:00
Eduardo Habkost 382022929e
cpu: cpu_by_arch_id() helper
The helper can be used for CPU object lookup using the CPU's
arch-specific ID (the one returned by CPUClass::get_arch_id()).

Backports commit 5ce46cb34eecec0bc94a4b1394763f9a1bbe20c3 from qemu
2018-03-04 12:16:39 -05:00
Alexey Kardashevskiy e723b8dd49
memory: Open code FlatView rendering
We are going to share FlatView's between AddressSpace's and per-AS
memory listeners won't suit the purpose anymore so open code
the dispatch tree rendering.

Since there is a good chance that dispatch_listener was the only
listener, this avoids address_space_update_topology_pass() if there is
no registered listeners; this should improve starting time.

This should cause no behavioural change.

Backports commit 1b04a1580917d9e41fd37ca62cbff9b4bf061e96 from qemu
2018-03-04 02:06:48 -05:00
Eric Blake be742759b0
osdep: Fix ROUND_UP(64-bit, 32-bit)
When using bit-wise operations that exploit the power-of-two
nature of the second argument of ROUND_UP(), we still need to
ensure that the mask is as wide as the first argument (done
by using a ternary to force proper arithmetic promotion).
Unpatched, ROUND_UP(2ULL*1024*1024*1024*1024, 512U) produces 0,
instead of the intended 2TiB, because negation of an unsigned
32-bit quantity followed by widening to 64-bits does not
sign-extend the mask.

Broken since its introduction in commit 292c8e50 (v1.5.0).
Callers that passed the same width type to both macro parameters,
or that had other code to ensure the first parameter's maximum
runtime value did not exceed the second parameter's width, are
unaffected, but I did not audit to see which (if any) existing
clients of the macro could trigger incorrect behavior (I found
the bug while adding a new use of the macro).

While preparing the patch, checkpatch complained about poor
spacing, so I also fixed that here and in the nearby DIV_ROUND_UP.

Backports commit 33a599667a9e70588483a31286dfff8cfc27d513 from qemu
2018-03-04 01:54:09 -05:00
Michael S. Tsirkin fd472c53c6
Revert "cpu: add APIs to allocate/free CPU environment"
This reverts commit e2a7f28693aea7e194ec1435697ec4feb24f8a6f.

This was not supposed to go upstream yet. Reverting.

Backports commit cde0a63ad721dbb538419a00f9405587680be436 from qemu
2018-03-04 01:42:49 -05:00
Michael S. Tsirkin 71bf994214
cpu: add APIs to allocate/free CPU environment
These will be implemented and then used by follow-up patches.

Backports commit e2a7f28693aea7e194ec1435697ec4feb24f8a6f from qemu
2018-03-04 01:39:09 -05:00
Lluís Vilanova 32b3c3815d
tcg: Pass generic CPUState to gen_intermediate_code()
Needed to implement a target-agnostic gen_intermediate_code()
in the future.

Backports commit 9c489ea6bed134fecfd556b439c68bba48fbe102 from qemu
2018-03-03 23:34:18 -05:00
Richard Henderson fc52eea5e2
tcg: Expand glue macros before stringifying helper names
Backports commit 44368ac62dc5ba014b68b2c1a8ec6fedc3242a5d from qemu
2018-03-03 23:07:21 -05:00
Alex Bennée 7d02489baf
include/exec/exec-all: document common exit conditions
As a precursor to later patches attempt to come up with a more
concrete wording for what each of the common exit cases would be.

Backports commit df0311e634828fdc99ca59352aef68503d631aad from qemu
2018-03-03 22:31:28 -05:00
Peter Maydell 3bd5694a0a
memory: Rename memory_region_init_rom() and _rom_device() to _nomigrate()
Rename memory_region_init_rom() to memory_region_init_rom_nomigrate()
and memory_region_init_rom_device() to
memory_region_init_rom_device_nomigrate().

Backports commit b59821a95bd1d7cb4697fd7748725c910582e0e7 from qemu
2018-03-03 22:29:01 -05:00
Peter Maydell 7b0027a828
memory: Rename memory_region_init_ram() to memory_region_init_ram_nomigrate()
Rename memory_region_init_ram() to memory_region_init_ram_nomigrate().
This leaves the way clear for us to provide a memory_region_init_ram()
which does handle migration.

Backports commit 1cfe48c1ce219b60a9096312f7a61806fae64ab3 from qemu
2018-03-03 22:25:39 -05:00
Peter Maydell 152c56f6a9
memory: Document that the RAM MR initializers do not handle migration
The various functions for initializing RAM MemoryRegions do not do
anything to cause the data in the MemoryRegion to be migrated.
Note in their documentation comments that this is the responsibility
of the caller.

(We will shortly add a new function that *does* do this for you.)

Backports commit a5c0234bb2754f5248e67929a34c843dbe039da5 from qemu
2018-03-03 22:20:32 -05:00
Peter Maydell 3c2d3d8363
include/hw/boards.h: Document memory_region_allocate_system_memory()
Add a documentation comment for memory_region_allocate_system_memory().

In particular, the reason for this function's existence and the
requirement on board code to call it exactly once are non-obvious.

Backports commit 09ad643823dcda0a86eddce1291c28d0ccb09a3b from qemu
2018-03-03 22:18:49 -05:00
Igor Mammedov fe4152c6a5
qom: enforce readonly nature of link's check callback
link's check callback is supposed to verify/permit setting it,
however currently nothing restricts it from misusing it
and modifying target object from within.
Make sure that readonly semantics are checked by compiler
to prevent callback's misuse.

Backports commit 8f5d58ef2c92d7b82d9a6eeefd7c8854a183ba4a from qemu
2018-03-03 22:17:20 -05:00
Pranith Kumar d0a70720a3
Revert "exec.c: Fix breakpoint invalidation race"
Now that we have proper locking after MTTCG patches have landed, we
can revert the commit. This reverts commit

a9353fe897ca2687e5b3385ed39e3db3927a90e0.

Backports commit 406bc339b0505fcfc2ffcbca1f05a3756e338a65 from qemu
2018-03-03 22:14:35 -05:00
Yang Zhong 1135db176f
tcg: add CONFIG_TCG guards in headers
Add CONFIG_TCG around TLB-related functions and structure declarations.
Some of these functions are defined in ./accel/tcg/cputlb.c, which will
not be linked in if TCG is disabled, and have no stubs; therefore, their
callers will also be compiled out for --disable-tcg.

Backports commit b11ec7f2e44b285a3967d629b55d1a6970b06787 from qemu
2018-03-03 21:37:52 -05:00
Yang Zhong d70c141675
tcg: move page_size_init() function
translate-all.c will be disabled if tcg is disabled in the build,
so page_size_init() function and related variables will be moved
to exec.c file.

Backports commit a0be0c585f5dcc4d50a37f6a20d3d625c5ef3a2c from qemu
2018-03-03 21:30:08 -05:00
Thomas Huth cf5d583ef0
cpu: Introduce a wrapper for tlb_flush() that can be used in common code
Commit 1f5c00cfdb8114c ("qom/cpu: move tlb_flush to cpu_common_reset")
moved the call to tlb_flush() from the target-specific reset handlers
into the common code qom/cpu.c file, and protected the call with
"#ifdef CONFIG_SOFTMMU" to avoid that it is called for linux-user
only targets. But since qom/cpu.c is common code, CONFIG_SOFTMMU is
*never* defined here, so the tlb_flush() was simply never executed
anymore. Fix it by introducing a wrapper for tlb_flush() in a file
that is re-compiled for each target, i.e. in translate-all.c.

Backports commit 2cd53943115be5118b5b2d4b80ee0a39c94c4f73 from qemu
2018-03-03 21:24:55 -05:00
Emilio G. Cota f66e74d65b
tcg: consistently access cpu->tb_jmp_cache atomically
Some code paths can lead to atomic accesses racing with memset()
on cpu->tb_jmp_cache, which can result in torn reads/writes
and is undefined behaviour in C11.

These torn accesses are unlikely to show up as bugs, but from code
inspection they seem possible. For example, tb_phys_invalidate does:
/* remove the TB from the hash list */
h = tb_jmp_cache_hash_func(tb->pc);
CPU_FOREACH(cpu) {
if (atomic_read(&cpu->tb_jmp_cache[h]) == tb) {
atomic_set(&cpu->tb_jmp_cache[h], NULL);
}
}
Here atomic_set might race with a concurrent memset (such as the
ones scheduled via "unsafe" async work, e.g. tlb_flush_page) and
therefore we might end up with a torn pointer (or who knows what,
because we are under undefined behaviour).

This patch converts parallel accesses to cpu->tb_jmp_cache to use
atomic primitives, thereby bringing these accesses back to defined
behaviour. The price to pay is to potentially execute more instructions
when clearing cpu->tb_jmp_cache, but given how infrequently they happen
and the small size of the cache, the performance impact I have measured
is within noise range when booting debian-arm.

Note that under "safe async" work (e.g. do_tb_flush) we could use memset
because no other vcpus are running. However I'm keeping these accesses
atomic as well to keep things simple and to avoid confusing analysis
tools such as ThreadSanitizer.

Backports commit f3ced3c59287dabc253f83f0c70aa4934470c15e from qemu
2018-03-03 21:12:36 -05:00
Emilio G. Cota 1a4e5da043
gen-icount: use tcg_ctx.tcg_env instead of cpu_env
We are relying on cpu_env being defined as a global, yet most
targets (i.e. all but arm/a64) have it defined as a local variable.
Luckily all of them use the same "cpu_env" name, but really
compilation shouldn't break if the name of that local variable
changed.

Fix it by using tcg_ctx.tcg_env, which all targets set in their
translate_init function. This change also helps paving the way
for the upcoming "translation loop common to all targets" work.

Backports commit 53f6672bcf57d82b794a2cc3a3469be7d35c8653 from qemu
2018-03-03 21:08:58 -05:00
Laurent Vivier 4e8e8572c3
softfloat: define floatx80_round()
Add a function to round a floatx80 to the defined precision
(floatx80_rounding_precision)

Backports commit 0f72129281765ed64d26353284059f2bdcde7a23 from qemu
2018-03-03 20:57:27 -05:00
Marc-André Lureau ca25248ecd
object: add uint property setter/getter
Backports commit 3152779cd63ba41331ef41659406f65b03e7911a from qemu
2018-03-03 18:43:17 -05:00
Marc-André Lureau 6ca6050206
qnum: add uint type
In order to store integer values between INT64_MAX and UINT64_MAX, add
a uint64_t internal representation.

Backports commit 61a8f418b26a2d974e38e4ae55020aca8d402d88 from qemu
2018-03-03 18:37:56 -05:00
Marc-André Lureau a57d8a5b50
qapi: Remove visit_start_alternate() parameter promote_int
Before the previous commit, parameter promote_int = true made
visit_start_alternate() with an input visitor avoid QTYPE_QINT
variants and create QTYPE_QFLOAT variants instead. This was used
where QTYPE_QINT variants were invalid.

The previous commit fused QTYPE_QINT with QTYPE_QFLOAT, rendering
promote_int useless and unused.

Backports commit 60390d2dc85ffade8981ca41e02335cb07353a6d from qemu
2018-03-03 18:34:35 -05:00
Marc-André Lureau dd77730d49
qapi: merge QInt and QFloat in QNum
We would like to use a same QObject type to represent numbers, whether
they are int, uint, or floats. Getters will allow some compatibility
between the various types if the number fits other representations.

Add a few more tests while at it.

Backports commit 01b2ffcedd94ad7b42bc870e4c6936c87ad03429 from qemu
2018-03-03 18:16:28 -05:00
Markus Armbruster e9174563be
qapi: Document intended use of @name within alternate visits
Backports commit ed0ba0f47e8cb6d924db0a54090bbb7b095fe9ea from qemu
2018-03-03 17:37:12 -05:00
Markus Armbruster 5ab0d5af81
qapi: New QAPI_CLONE_MEMBERS()
QAPI_CLONE() returns a newly allocated QAPI object. Inconvenient when
we want to clone into an existing object. QAPI_CLONE_MEMBERS() does
exactly that.

Backports commit 4626a19c86c30d96cedbac2bd44ef8103303cb37 from qemu
2018-03-03 17:36:02 -05:00
Eric Blake 734778da93
qobject: Add helper macros for common scalar insertions
Rather than making lots of callers wrap a scalar in a QInt, QString,
or QBool, provide helper macros that do the wrapping automatically.

Update the Coccinelle script to make mass conversions easy, although
the conversion itself will be done as a separate patches to ease
review and backport efforts.

Backports commit a92c21591b5bb9543996538f14854ca6b528318b from qemu
2018-03-03 17:33:30 -05:00
Richard Henderson 68275ba6f3
tcg/arm: Use indirect branch for goto_tb
Backports commit 3fb53fb4d12f2e7833bd1659e6013237b130ef20 from qemu
2018-03-03 17:11:18 -05:00
Emilio G. Cota d3ada2feb5
tcg: allocate TB structs before the corresponding translated code
Allocating an arbitrarily-sized array of tbs results in either
(a) a lot of memory wasted or (b) unnecessary flushes of the code
cache when we run out of TB structs in the array.

An obvious solution would be to just malloc a TB struct when needed,
and keep the TB array as an array of pointers (recall that tb_find_pc()
needs the TB array to run in O(log n)).

Perhaps a better solution, which is implemented in this patch, is to
allocate TB's right before the translated code they describe. This
results in some memory waste due to padding to have code and TBs in
separate cache lines--for instance, I measured 4.7% of padding in the
used portion of code_gen_buffer when booting aarch64 Linux on a
host with 64-byte cache lines. However, it can allow for optimizations
in some host architectures, since TCG backends could safely assume that
the TB and the corresponding translated code are very close to each
other in memory. See this message by rth for a detailed explanation:

https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05172.html
Subject: Re: GSoC 2017 Proposal: TCG performance enhancements

Backports commit 6e3b2bfd6af488a896f7936e99ef160f8f37e6f2 from qemu
2018-03-03 17:05:49 -05:00
Emilio G. Cota 7d0440dec4
tb-hash: improve tb_jmp_cache hash function in user mode
Optimizations to cross-page chaining and indirect branches make
performance more sensitive to the hit rate of tb_jmp_cache.
The constraint of reserving some bits for the page number
lowers the achievable quality of the hashing function.

However, user-mode does not have this requirement. Thus,
with this change we use for user-mode a hashing function that
is both faster and of better quality than the previous one.

Measurements:

Note: baseline (i.e. speedup == 1x) is QEMU v2.9.0.

- SPECint06 (test set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz

2.2x +-+--------------------------------------------------------------------------------------------------------------+-+
| |
| jr |
2x +jr+multhash +....................................................+++++...................................+-+
| jr+hash |$$$ |
| |$+$ |
| ### $ |
1.8x +-+......................................................................#|#.$...................................+-+
| ++#+# $ |
| |# # $ |
1.6x +-+....................................................................***.#.$....................++$$$..........+-+
| $$$ *+* # $ |$+$ |
| ++$$$ ### $ * * # $ +++|$ $ |
| ++###+$ # # $ * * # $ ### ****## $ |
1.4x +-+...................***+#.$.........***.#.$..........................*.*.#.$...........#+#$$.*++*|#.$..........+-+
| *+* # $ * * # $ * * # $ # # $ * *+# $ |
| * * # $ +++++ * * # $ * * # $ *** # $ * * # $ ###$$ |
1.2x +-+...................*.*.#.$.***##$$.*.*.#.$..........................*.*.#.$.........*.*.#.$.*..*.#.$.***+#+$..+-+
| * * # $ *+* # $ * * # $ +++ * * # $ ++###$$ * * # $ * * # $ * * # $ |
| ***##$$ * * # $ * * # $ * * # $ ***##$$ ++### * * # $ *** #+$ * * # $ * * # $ * * # $ |
| *+*+#+$ ***##$$$ * * # $ * * # $ * * # $ *+* # $ ++####$$ ***+# * * # $ * * # $ * * # $ * * # $ * * # $ |
1x +-++-*+*+#+$+*+*+#-+$+*+*-#+$+*+*+#+$+*+*+#+$+*-*+#+$+***++#+$+*+*+#$$+*+*+#+$+*+*+#+$+*+*-#+$+*+-*+#+$+*+*+#+$-++-+
| * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ |
| * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ * * # $ |
0.8x +-+--***##$$-***##$$$-***##$$-***##$$-***##$$-***##$$-***###$$-***##$$-***##$$-***##$$-***##$$-****##$$-***##$$--+-+
astar bzip2 gcc gobmk h264ref hmmlibquantum mcf omnetpperlbench sjengxalancbmk hmean
png: http://imgur.com/4UXTrEc

Here I also tried the hash function suggested by Paolo ("multhash"):

return ((uint64_t) (pc * 2654435761) >> 32) & (TB_JMP_CACHE_SIZE - 1);

As you can see it is just as good as the other new function ("hash"),
which is what I ended up going with.

- SPECint06 (train set), x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz

2.6x +-+--------------------------------------------------------------------------------------------------------------+-+
| |
| jr ### |
2.4x +jr+hash...........................................................................................#.#...........+-+
| # # |
| # # |
2.2x +-+................................................................................................#.#...........+-+
| # # |
| # # |
2x +-+................................................................................................#.#...........+-+
| **** # |
| * * # |
1.8x +-+.............................................................................................*..*.#...........+-+
| +++ * * # |
| #### #### * * # |
1.6x +-+......................................####.............................#..#.****..#..........*..*.#...........+-+
| +++ #++# **** # * * # #### * * # |
| ### # # * * # * * # # # * * # |
1.4x +-+...................****+#..........****..#..........................*..*..#.*..*..#....#..#..*..*.#...........+-+
| *++* # * * # * * # * * # *** # * * # #### |
| * * # #### * * # * * # * * # * * # * * # **** # |
1.2x +-+...................*..*.#..****++#.*..*..#..........................*..*..#.*..*..#..*.*..#..*..*.#..*..*..#..+-+
| ****### * * # * * # * * # * * # * * # * * # * * # * * # |
| * * # ***### * * # * * # * * # ****## * * # * * # * * # * * # * * # |
1x +-+--****###--***###--****##--****###-****###--***###--***###--****##--****###-****###--***###--****##--****###--+-+
astar bzip2 gcc gobmk h264ref hmmlibquantum mcf omnetpperlbench sjengxalancbmk hmean
png: http://imgur.com/ArCbHqo

- NBench, x86_64-linux-user. Host: Intel i7-6700K @ 4.00GHz

1.12x +-+-------------------------------------------------------------------------------------------------------------+-+
| |
| jr +++ |
1.1x +jr+hash...........................................................####.........................................+-+
| +++#| # |
| | #++# |
1.08x +-+................................+++................+++.+++..*****..#.........................................+-+
| | +++ | | * | * # |
| | | | | *+++* # |
1.06x +-+................................****###.............|...|...*...*..#.........................+++.............+-+
| *| * |# ****### * * # | |
| *| *++# *| * |# * * # #### |
1.04x +-+................................*++*..#............*|.*.|#..*...*..#........................#.|#.............+-+
| * * # *++*++# * * # +++#++# |
| * * # * * # * * # | # # +++#### |
1.02x +-+................................*..*..#......+++...*..*..#..*...*..#.....................****..#..*****++#...+-+
| +++ * * # +++ | * * # * * # +++ *| * # *+++* # |
| +++ | +++ +++ ++++++ * * # *****### * * # * * # | +++ ++++++ *++* # * * # |
1x +-++-+++++####++****###++++-+####+-*++*++#-+*+++*-+#++*++*++#++*+-+*++#+-+++####-+*****###++*++*++#++*+-+*++#+-++-+
| *****| # *++* |# *****| # * * # * *++# * * # * * # **** |# * * # * * # * * # |
| * | *| # * *++# * | *++# * * # * * # * * # * * # *| *++# * * # * * # * * # |
0.98x +-+...*.|.*++#..*..*..#..*+++*..#..*..*..#..*...*..#..*..*..#..*...*..#..*++*..#..*...*..#..*..*..#..*...*..#...+-+
| *+++* # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # |
| * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # |
0.96x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT hmean
png: http://imgur.com/ZXFX0hJ

- NBench, arm-linux-user. Host: Intel i7-4790K @ 4.00GHz

1.3x +-+-------------------------------------------------------------------------------------------------------------+-+
| #### |
| jr # # +++ |
1.25x +jr+hash.....................#..#...........................................####................................+-+
| # # # # |
| # # # # |
1.2x +-+..........................#..#...........................................#..#................................+-+
| # # # # |
| # # # # |
1.15x +-+..........................#..#...........................................#..#................................+-+
| # # #### # # |
| # # # # # # |
1.1x +-+..........................#..#..................................#..#.....#..#................................+-+
| # # # # # # +++ |
| # # #### # # # # #### |
1.05x +-+..........................#..#...............#..#.....####......#..#.....#..#.........................#..#...+-+
| # # # # # # # # # # +++ # # |
| +++ ***** # #### ***** # # # +++# # **** # ****### # # |
1x +-++-+*****###++****+++++*+-+*++#+-****++#-+*+++*-+#+++++#++#++*****++#+-*++*++#-+*****-++++*++*++#++*****++#+-++-+
| * * # * * | * * # * * # * * # **** # * * # * * # * *### * *++# * * # |
| * * # * *### * * # * * # * * # * * # * * # * * # * * # * * # * * # |
0.95x +-+...*...*..#..*..*.|#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#..*..*..#..*...*..#...+-+
| * * # * * |# * * # * * # * * # * * # * * # * * # * * # * * # * * # |
| * * # * * |# * * # * * # * * # * * # * * # * * # * * # * * # * * # |
0.9x +-+---*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###--****###--*****###---+-+
ASSIGNMENT BITFIELD FOURFP EMULATION HUFFMAN LU DECOMPOSITIONEURAL NNUMERIC SOSTRING SORT hmean
png: http://imgur.com/FfD27ey

Backports commit 6f1653180f5701c6a8f1b35b89a80b1e3260928e from qemu
2018-03-03 14:11:29 -05:00
Emilio G. Cota 8f4f15e5f5
tcg: Introduce goto_ptr opcode and tcg_gen_lookup_and_goto_ptr
Instead of exporting goto_ptr directly to TCG frontends, export
tcg_gen_lookup_and_goto_ptr(), which calls goto_ptr with the pointer
returned by the lookup_tb_ptr() helper. This is the only use case
we have for goto_ptr and lookup_tb_ptr, so having this function is
very convenient. Furthermore, it trivially allows us to avoid calling
the lookup helper if goto_ptr is not implemented by the backend.

Backports commit cedbcb01529cb6cf9a2289cdbebbc63f6149fc18 from qemu
2018-03-02 21:05:18 -05:00