Comments on Mechanical Sympathy: Processor Affinity - Part 1

2016-12-23T09:20:44.581+00:00

This comment has been removed by the author.

Google code has been archived so I moved them to G...

2016-12-21T09:59:08.663+00:00

Google code has been archived so I moved them to GitHub.

Hi Martin, The links to source code (Sender and ...

2016-12-20T05:51:22.264+00:00

Hi Martin,

The links to source code (Sender and Receiver) is broken.
Could you please update them?

Hi, great article, thx! Is there part II released?...

2016-11-18T12:50:19.168+00:00

Hi, great article, thx!
Is there part II released? It sounds like you were to describe some interesting stuff - interrupt handling.
Cheers,
Michał

No simple answer here. You need to consider contro...

2016-09-24T01:45:39.419+01:00

No simple answer here. You need to consider control groups, isocpus, and other config options.

Hi Martin, How will this work if a process has mo...

2016-09-23T13:17:51.710+01:00

Hi Martin,

How will this work if a process has more than one thread? Will it pin all threads or will it pin only the main thread?

Thanks, I tried running the java with dummy0 but t...

2014-11-19T06:52:28.028+00:00

Thanks, I tried running the java with dummy0 but the receive did not receive anything, even though I turned off selinux. But after I changed to lo it all worked, thanks.

Should be fine if the you are connected to a netwo...

2014-11-17T12:56:46.121+00:00

Should be fine if the you are connected to a network. Dummy works well even if a network is not connected.

Hi, For the dummy interface part, can I just use...

2014-11-17T07:14:56.299+00:00

Hi,

For the dummy interface part, can I just use lo interface and 127.0.0.1 instead?

Alex

Use the model specific registers (MSRs) for your C...

2012-03-29T18:07:39.419+01:00

Use the model specific registers (MSRs) for your CPU to get all the data you need on how the process is executing. Cheap way is "perf stat" on Linux. I have seen the OS schedule the thread to execute on another core too readily. This is worst with Linux; Windows, BSD and OSX do much better. Being scheduled to another core is even worse than having another thread partially pollute your warm cache.

If you have other threads running on that core it can cause the cache pollution as you point out. For low-latency applications you do not want to have this happen. This may mean you are over-provisioning cores.

You have observational evidence that pinning helps...

2012-03-29T17:45:44.843+01:00

You have observational evidence that pinning helps which is good but you assign the cause as being accumulated processor state. How did you reach that conclusion?

I base that question on the following - when the next thread is scheduled to run all the processor registers, cache-lines etc. will be loaded for that thread effectively flushing all your currents threads state (indeed the OS should save all that state for you). This will continue for subsequent threads until your thread is re-scheduled to run on that processor.

Regards,
Matt

Martin, this is a great post. You finish by menti...

2011-11-28T19:41:00.075+00:00

Martin, this is a great post.

You finish by mentioning "In the next article of this series [...]". And as the title suggest, there should be a Part 2. Where is it? Eagerly waiting for it.

Continue the great work!

Dedicating a CPU for interrupt handling can be a v...

2011-07-29T06:56:47.077+01:00

Dedicating a CPU for interrupt handling can be a very valid technique for certain types of workload. It is one of the points I plan to cover in the next instalment of this series.

How about processor affinity for interruptions? Do...

2011-07-29T03:26:34.835+01:00

How about processor affinity for interruptions? Do you think if it is good practice to dedicate one cpu for interruption handling?

i've used taskset in the past to pin init and ...

2011-07-20T19:15:17.494+01:00

i've used taskset in the past to pin init and everything under it to one core and then have my "soft-realtime" processes pinned to the other cores on the box. this way the OS shouldn't interfere with any of your application processes. Idea is to always have at least one core dedicated to the OS. Linux containers and cgroups are also well worth investigating...

taskset is the cheap and cheerful means of setting...

2011-07-20T07:27:50.671+01:00

taskset is the cheap and cheerful means of setting affinity. Other means exist such as cgroups which can be used to contain OS threads for avoiding contention with the cores assigned to specific tasks. I used taskset for quick illustration of what is possible.

When sharing the same L2 cache I'm assuming yo...

2011-07-20T07:24:16.453+01:00

When sharing the same L2 cache I'm assuming you are using a pre-Nehalem Intel processor such as Penryn? If so, you are seeing the benefits of exchanging data via the L2 rather than the L3 cache as in my test. This will obviously be faster between two cores but does not scale to more cores as well as the Nehalem processors do. Most processors now operate a 3 layer cache with only the third level shared if you discount hyper threading.

Hi Martin. No doubt you will already have this in...

2011-07-20T00:26:12.972+01:00

Hi Martin.

No doubt you will already have this in mind for a future post, but I am curious about what sort of constraints you may have in place for ensuring that other threads are not utilising the resources of the CPUs that the sender and receiver processes (obviously single-threaded) have affinity to.

i see the same results here. also, if you make sur...

2011-07-19T23:24:04.944+01:00

i see the same results here. also, if you make sure to pin the two processes to cores that share the same L2 cache you get double the throughput over two cores on different L2 caches. I presume this is the overhead of the cache interconnect?