Commit graph

98 commits

Author SHA1 Message Date
Pavel Krajcevski 9144db4de6 Actually pass block coordinates to shape selection function 2014-03-22 19:25:21 -04:00
Pavel Krajcevski 891e2cfee8 Formatting 2014-03-22 19:24:51 -04:00
Pavel Krajcevski 9f259744de Get rid of comment 2014-03-21 20:36:54 -04:00
Pavel Krajcevski e936cce0cb More refactoring.
Change RGBACluster to be a class that only really persists once per block.
When we switch shapes and do operations on them, then we really only need
to change which points in the block are accessed. We don't need to do this
very often, so just change the mask whenever we need it. This brings us back
closer to our original performance, but we're still not where we were when
we started refactoring.
2014-03-21 20:27:00 -04:00
Pavel Krajcevski cf937f2ad3 Refactor shape and mode selection
We suffered another performance hit. This time it comes from the fact
that we're copying around a lot of data based on what partition we're
choosing. We can get rid of this a tad by only copying the data that we
need once and then using getters/setters that selectively pull from
an array based on our shape index.
2014-03-21 18:02:02 -04:00
Pavel Krajcevski 26e816b3db Add settings for BPTC compression 2014-03-21 12:45:47 -04:00
Pavel Krajcevski 6954d7b154 Refactor RGBAEndpoints
Changed the RGBAEndpoints to use the vector/matrix classes in
FasTCBase. This caused a ~20ms performance hit on an 8-core machine
which is likely due to the compiler having difficulty compiling away
some procedure call overheads. Upon profiling, the biggest bottleneck
is still by far the QuantizedError function, so any and all further
optimization should be focused on that.
2014-03-21 01:21:07 -04:00
Pavel Krajcevski e06f60c536 Fix some compiler warnings. 2014-03-21 01:14:36 -04:00
Pavel Krajcevski c6948e8421 Merge branch 'master' into ModularizeBPTC 2014-02-27 14:20:50 -05:00
Pavel Krajcevski 1a5b748b2c Check for C++11 types in base library 2014-01-30 13:55:55 -05:00
Pavel Krajcevski c37dca1068 Split calculation of compression parameters from packing them. 2014-01-21 16:23:18 -05:00
Pavel Krajcevski ea953979fe Move bitstream to FasTC base lib 2014-01-21 15:04:39 -05:00
Pavel Krajcevski f12ee09f7e Some formatting and rearrange the BPTC code to be more structured like the others 2014-01-21 14:46:25 -05:00
Pavel Krajcevski 3734d643a6 Fix some compiler warnings on MSVC 2013-12-02 12:52:44 -05:00
Pavel Krajcevski 7359f9e758 Some compilers treat hex literals as unsigned, which causes problems 2013-11-19 14:54:59 -05:00
Pavel Krajcevski 6794a0fffb Add hooks to NVTT bc7_export library if present on the users machine. Assumes that all of the cross platform problems are fixed for incorporation into FasTC... Otherwise the options to use NVTT are ignored. 2013-11-19 12:03:03 -05:00
Pavel Krajcevski a80944901e Refactor CompressionJob struct.
In order to better facilitate the change from block stream order to non-block stream order,
a lot of changes were introduced to the way that we feed texture data to the compressors. This
data is embodied in the CompressionJob struct. We have made it so that the compression job
points to both the in and out pointers for our compressed and uncompressed data. Furthermore,
we have made sure that the struct also contains the format that its compressing for, so that if
any threading programs would like to chop up a compression job into smaller chunks based on the
format, it doesn't need to know the format explicitly, it just needs to know certain properties
about the format.

Moreover, the user can now define the start and end pixels from which we would like to compress
to. We can compress subsets of data by changing the in and out pointers and the width and height
values. The compressors will read data linearly until they reach the out pixels based on the width
of the given pixel.
2013-11-08 16:31:19 -05:00
Pavel Krajcevski f70b26a47f Change interface of compression/decompression jobs. 2013-11-06 18:55:53 -05:00
Pavel Krajcevski 8e76d149ba Remove a bunch of code that assumes that we get our pixel data in block stream order... 2013-11-06 18:23:19 -05:00
Pavel Krajcevski 289bcc9d44 Make the block index for the stat function the pointer reinterpreted as an integer. This way we know exactly what block it is because we simply need to sort the stats in the output log. 2013-09-28 22:39:27 -04:00
Pavel Krajcevski baab69dc99 Fix some MSVC compiler snafus 2013-09-28 22:21:31 -04:00
Pavel Krajcevski f1924bd221 Try to send a single string that encompasses a stat to the stream so that when we do synchronization it will crunch the entire string at once. 2013-09-28 21:43:25 -04:00
Pavel Krajcevski dcf389d346 Merge PVRTC compressor into split library. 2013-09-27 17:30:16 -04:00
Pavel Krajcevski e0ec005ac8 Fix link problems 2013-09-18 14:00:53 -04:00
Pavel Krajcevski 29bd1368e6 Fix a few compiler warnings and add the BPTCEncoder license. 2013-09-15 14:56:09 -04:00
Pavel Krajcevski 28cf254fe5 Initial decoupling of base library from core library. Includes a few formatting changes as well. 2013-09-13 19:36:37 -04:00
Pavel Krajcevski 9fe7a08422 Fix a bunch of errors incurred from refactoring. 2013-08-27 14:39:31 -04:00
Pavel Krajcevski 03a7934644 Get rid of evil tabs once and forever (from cpp/h files) 2013-08-26 16:54:08 -04:00
Pavel Krajcevski 0304bd4187 Refactor a bunch of things to renforce a bunch of style rules. 2013-08-26 16:11:39 -04:00
Pavel Krajcevski 25eba39870 Change the name of everything to FasTC 2013-08-22 18:35:01 -04:00
Pavel Krajcevski f1f1294b2e Add tab formatting. 2013-08-22 18:33:42 -04:00
Pavel Krajcevski 921c3e9f16 Add comments to BC7CompressionMode.h 2013-08-22 18:33:41 -04:00
Pavel Krajcevski b072d10b6c Multiple single pixel error by number of pixels in the partition 2013-04-08 17:03:14 -04:00
Pavel Krajcevski d23125e14c Another bug fix.
In the previous commit, we simply accomodated for alpha errors when compressing single color partitions. In fact, the issue was a bit more greivous: we weren't computing the proper error term at all! This fixed that function so that we emphasize the error metrics induced by *squaring* the error in each channel and then returning that as a measurement of the acceptability of using a single color compression for that partition.
2013-04-08 16:44:15 -04:00
Pavel Krajcevski ff18e8f33e Bug fix
When the compressor recognized that a shape was a single color, it determines
an optimal encoding for that color. However, only the error in the single
pixel was returned as the error for the overall shape. This caused problems
with modes that do not support alpha and shapes that do have alpha.
2013-03-30 11:16:32 -04:00
Pavel Krajcevski f825b28051 Single color partition with alpha bugfix.
When we detect that a partition has a single color in each subset, we can generate almost an exact representation of this value for most compression modes. However, when we were doing this subset matching, we were ignoring the error introduced by modes that had completely opaque representations against data that had transparent pixels. This bug fix essentially includes this error in our "best fit" calculations and makes everything work out for the better.
2013-03-19 11:58:21 -04:00
Pavel Krajcevski 6f6ca2d867 Another bug fix.
With the old code, it was possible that we skipped a compression with unlucky
preemption of our threads. I'm not exactly sure why, but that caused deadlock
(livelock?) in some very unfortunate circumstances. This new algorithm should
work regardless of how many threads execute at once and should also prevent
textures in the compression job list from being skipped. This algorithm seems
to be an improvement on low-core count machines (around 4 cores), but it is
slower on high-core count machines (40 cores or more)...
2013-03-11 16:20:52 -04:00
Pavel Krajcevski 9c48aaa7f2 Remove unused ResetTestAndSet function 2013-03-11 15:10:15 -04:00
Pavel Krajcevski da44e58160 Actual bug fix 2013-03-11 15:08:44 -04:00
Pavel Krajcevski cd17ddaa0b Add check for Clang. 2013-03-11 14:51:32 -04:00
Pavel Krajcevski fa56d37080 Fix a few bugs in our atomic compression algorithm 2013-03-11 14:41:25 -04:00
Pavel Krajcevski ae2324153d Repurpose the rest of our scaffolding to use Compression Jobs 2013-03-09 13:36:39 -05:00
Pavel Krajcevski 435f935de3 Update atomics compression algorithm
In general, we want to use this algorithm only with self-contained compression
lists. As such, we've added all of the proper synchronization primitives in
the list object itself. That way, different threads that are working on the
same list will be able to communicate. Ideally, this should eliminate the
number of user-space context switches that happen. Whether or not this is
faster than the other synchronization algorithms that we've tried remains
to be seen...
2013-03-09 13:34:10 -05:00
Pavel Krajcevski 1aa62003b9 Apparently rand() returns zero too. Avoid that. 2013-03-07 02:43:08 -05:00
Pavel Krajcevski 42e75a5e4c Fix debug image comparison to make sure that the difference in our images takes into account alpha. 2013-03-07 02:35:40 -05:00
Pavel Krajcevski 3d1d1e359f Actually, it turns out the min/max thing was an MSVC issue. 2013-03-06 20:57:05 -05:00
Pavel Krajcevski 599ded49d1 Remove global scope min/max 2013-03-06 20:38:00 -05:00
Pavel Krajcevski bacf327246 Fix MSVC compiler errors with the atomics 2013-03-06 19:57:20 -05:00
Pavel Krajcevski 342614a6ec Fix the horribly wrong check for atomic support with MSVC 2013-03-06 19:56:38 -05:00
Pavel Krajcevski 53fe825e49 Add first pass of atomic implementation.
This is a first pass of what I believe to be a not too terrible
implementation of a cooperative thread-based compressor. The idea is
simple... If a compressor is invoked with the same parameters on multiple
threads, then the threads cooperate via an atomic counter to compress the
texture. Each thread can take as long as possible until the texture is finished.

If a caller calls a compression routine that has different parameters, then
it will help the current compression finish before starting on its own compression. In this
way, we can split the textures up among the threads and guarantee that we maximize the
resource usage between them. I.e. this becomes more efficient:

Thread 1:    Thread 2:   Thread N:
  tex0         texN        tex(N-1)N
  tex1         texN+1      tex(N-1)(N+1)
  ..           ..          ..
  texN-1       tex2N       tex(N-1)N

I have not tested this for bugs, so I'm still not completely convinced that it is deadlock-free
although it should be...
2013-03-06 18:47:15 -05:00