Math is hard…

I haven’t had much time to work on this for the last month or so because work got a little crazy. Now that the Big Project is over, I’ve started back into this.

My first attempt at the triangulation is more complex than I initially expected. First I take each sound pair, and using the skew and the locations of the microphone, calculate a pair of bearings originating from the midpoint between those two microphones. Then I take those midpoints and bearings to calculate intersections, and I get four intersections for each pair of midpoints, and there can be up to 6 midpoints, therefore 15 midpoint pairs, and 60 locations. However, I ignore midpoint pairs that are too close together (2 meters or less), and also discard any location more than a couple of kilometers away as being a false signal, but that still leaves between 5 and 25 locations.

Next I try to cluster the locations by finding locations that are within 100m of each other. This isn’t working quite as well as I had hoped, but that is likely due to poor data quality. I’ve just got got the microphones scattered around my office, rather than at the corners of my property, but the algorithm is assuming the mics are at their final location. Because of the relatively low skew times, the calculations are all coming out as if the sounds originate in the center of my house.

I’ve got to work on the enclosures so I can get the microphones to their proper locations. Unfortunately the fireworks that were pretty common near the 4th of July have abated, so I will have relatively few data points to work with.

Only the good skew young

Maybe I was too hasty in thinking that my curve matching algorithm wasn’t going to be useful. I had a few loud booms go off tonight with 4 mics scattered around the office and the windows open. Mics 1 and 4 were by the open window, and mics 2 and 3 were by my computer about 6 ft. (1.8 m) away. All 4 detected at least one of the big echoing booms. My algorithm gave reasonable offsets and reasonable waveform matches once skewed.

All 4 skews were in the 5 to 7 ms range, which is right about what I’d expect for microphones that were 6ft. apart on the direction of travel (6ft/1127fps = 0.005324 seconds, or 5.3 ms). Now to be fair, this was pretty much a best-case scenario for my algorithm. These were long, low rumbling booms rather than simply a loud crack. Still.. Mics 1 and 4 heard the crack at the beginning and mics 2 and 3 didn’t, and it didn’t throw off the algorithm.

To skew or not to skew

After writing up my (computationally intensive) code to measure the skew between the signals from two microphones, I’ve made a discovery. It works great for stuff with complex, low-frequency sounds like my chair creaking, but not so well in other cases. For sustained, constant frequency sounds (like beeps) it gets confused about which of several possible alignments are “best”. Take for example this short beep as heard by two adjacent microphones:

My visual best fit says the green waveform needs to be shifted a few hundred microseconds to the right, and that these were almost in alignment already. However, my algorithm shifted it ~13,000 microseconds to the left.

It did make the wave peaks line up, but since this is a more or less steady tone, that happens every couple of milliseconds. I’m also sure it maximized my fit function, but to my eye the overall envelopes don’t match nearly as well. I think there are two factors working against my algorithm here. First, the waveforms weren’t complete–the beginning of the waveforms was cut off by different amounts in the different samples. I’ve taken measures to reduce the likelihood of that happening, but I can’t eliminate it altogether. Second, this was a fairly steady tone–as I already mentioned, and there were lots of “pretty good” fits that it had to choose from.

The other situation that it doesn’t handle well is more problematic. It appears that for short, sharp sounds–like a clap, whip crack, fireworks or gunshots–there is too much high-frequency information that the two mics will sample differently, and since my sampling rate is about 20kHz, I really can only differentiate frequencies below about 10kHz (5kHz for a good fit). See the Nyquist-Shannon theorem for a more complete discussion as to why. So, when I have a signal with a lot of high-frequency information, I can’t really match it effectively. Take this example of a clap when the mics where a few feet apart (1-2 meters):

The apparent shift shouldn’t need to be large, but the algorithm doesn’t pay attention to that, and it came up with a fit that looked like:

This is a much worse fit according to my eye. I think a better technique in this case it to line up the beginning of the loud sounds, but I need to come up with a way to identify those algorithmically. I’ll probably use some heuristic like looking at the time of the first samples to fall significantly further from the mean than I’d been seeing previously, but that requires that I have a nice quiet section before the sound happens. I’ve taken steps to try to make sure that I have that (by sending the prior buffer as well when I detect an anomaly), but it doesn’t always work out as you can see in the purple curve.

Mind the gap

The downside to increasing the sample rate is that I also increased the timing error that accumulates during a sampling buffer. Look at these sequential data buffers:

There’s a gap of more than 400 microseconds between the last measurement of the first packet and the first measuement of the second. That gap isn’t real though. There’s actually about a 42-43 microsecond gap in real time, but because I send the measurement interval as a whole number of microseconds between messages, there’s a fraction of a microsecond that gets lost to truncation. In this case, the actual interval of 42.72 microseconds gets truncated to 42 microseconds when sent to the server, and that means that there’s about a 370 microsecond error by the end of the packet (0.72 microseconds * 512 measurements in the packet).

Currently the measurement packet has a 22 bytes of header, including both the timestamp of the beginning of the packet (8 bytes) and the number of microseconds between measurements (2 bytes). I could redesign the measurement packet so that the same two bytes pass 100ths of microseconds rather than whole microseconds, and that would allow up to 655.35 microseconds as a measurement interval without changing the overhead of the packet. (I’ve only got about 1450 bytes to work with in a UDP packet that’s going to travel over WiFi and Ethernet, so I’m trying to be frugal with headers and leave as much space as possible for actual measurements.)

Crank up the frequency

I have been assuming that approximately 10kHz was about the maximum sampling rate I could achieve, but it turns out I was very wrong. So far I’ve gotten up to approximately 20kHz and am not seeing any degradation in performance. I’m not sure how high I can (or should) crank this up for optimal system performance, but I’ve already increased the precision from 1 reading every 85 microseconds to 1 every 42 microseconds. Now if only my clock were that accurate.

And now for the time and temperature…

I’ve been noticing that the microphone that has the DHT-11 temperature sensor consistently under-performs the other microphone in terms of how well is stays in sync with the NTP server. I have documented previously that trying to read a non-existent sensor caused major sync issues, but I now know that even if the sensor is working properly, it still throws the sync off slightly.

On mic 001 (the one with the temperature sensor). I was seeing the average offset being somewhere around +/-1300 microseconds, whereas on mic 004 (the one without the sensor or code to read it). I was typically seeing offsets of +/-300-400 microseconds (1/4 to 1/3 as large). So I disabled the DHT-11 on mic 001, and within 15 minutes the average offset was +/- 400-500 microseconds, and the timing of received sound waves was much more in sync.

And zooming in on that first big positive peak you can see that they’re only about 400 microseconds apart, which is pretty good.

It’s not the 100 to 200 microsecond offset I’m looking for, but I can live with this level of error.

Now what am I going to do with the temperature sensor? I still need to be able to measure ambient air temperature to calculate the speed of sound accurately, but it was never a requirement that I have 4 of them or that they be co-located with microphones. I have some spare ESP8266 feathers now, and while they’re not good for the microphones, I can easily re-purpose them to being a couple of temperature sensors. I’ll play with low-power deep-sleep and have them wake up every 30 seconds or so to check the temp and report in. That should give me a fairly accurate and current air temp.

Another timing bug squashed

I spent several hours trying to figure out why my circuit on the second M0 microcontroller was having significant timing issues when the first M0 wasn’t. The second M0 didn’t seem to exhibit the problem in my NTP test rig either. I finally figured it out. As you can see below, when I built the second M0, I didn’t have a second DHT-11 temperature and humidity sensor (the blue box–I forgot to order a second one), but I kept the code the same. If I use the preprocessor flag I put in my code to disable all the DHT related calls, the timing issues go away.

Here’s my theory as to what’s happening. The DHT11 product page on Adafruit’s site says that reading the chip requires careful timing. To get the timing right, I suspect that it disables interrupts while reading the chip. If the chip is there then this only takes a few microseconds, and nobody is the wiser. However, if the chip isn’t there, it has to wait for some timeout to happen, and that means interrupts are disabled for a while, and that throws off the millis() and micros() timing functions, which won’t increment while interrupts are disabled. Since my NTP library uses micros() calls frequently to calculate the time since the last server sync, it was accumulating significant errors which were causing the readings to go all over the place.

By the time I’d figured this out, I’d already ripped out a lot of the hairy math I had implemented for calculating gradual skews and correction factors, and replaced it with slightly less hairy math. It’s far more straightforward and easy to understand, and the only downside is that a lot of it is now floating point math, which is slower on some microprocessors.

I’m still seeing about a 1-2 millisecond skew between the readings on the two microphones, and I’m going to have to find a way to adjust for that.

Okay results with 2 microphones

I fixed the previously mentioned time jump problem, at least well enough.

I also now have 2 (roughly) identical microphones set up, and while the initial results were frightening, they settled in and starting behaving reasonably.

My first attempt was right after I had plugged them both in, and the result was this:

There was about 90-100 millisecond skew in their readings, which since they were right next to each other, seemed very odd. It’s the kind of delta I would expect if they were 50 meters (165 ft) apart. I thought that maybe my time sync wasn’t nearly as stable as I thought it was, but I decided to give them some time to settle-in and converge, and that helped. After 10 seconds of being on, I got this result:

This was much more encouraging. The waveforms are almost synced. If I zoom in on the peaks, I can see a better estimate of their delta:

The actual time difference was about 400-500 microseconds, and at least part of that (maybe 100-200 microseconds) might be due to them being 6 inches apart. This is much closer to the kind of synchronization I’m going to need for this project.

Update: Celebration premature it turns out. I still have serious NTP sync issues that are causing the two microphones to fall out of sync. Ripped out my previous hairy math and trying to fix it now, but still having problems with the corrections oscillating wildly.

Timing problems

I definitely still have timing problems. I happened to catch a squeak from my office chair (hurrah for squeaky chairs) which spanned two captured buffers. Here’s what I looks like:

As you can clearly see, there’s a definite backwards time jump. With a little more gnuplot-fu, I managed to get a clearer picture that shows a ~6.5 ms backwards jump.

There are several sources of error that I need to track down for this.

1. Assuming the timer interrupt is perfect
2. Assuming that time never jumps backward
3. Assuming that every reading takes exactly the same amount of time

My first poor assumption is that the timer interrupt fires exactly every 100 microseconds (as it would for a perfect 10kHz signal). However, if it were actually (say) every 110 microseconds (aproximately a 9kHz signal), then over 512 readings (the size of the buffer), it would have a cumulative error of 5120 microseconds, or ~5 milliseconds. A negative 6.5 ms error would indicate am actual time of about 87 microseconds and a clock frequency of about 11.5 kHz. I can correct for this on average by calculating the time it takes to read a whole buffer and dividing by the size of the buffer. I’ll do that next.

My second poor assumption is that my timer never jumps backwards, though as I’ve documented, it does, though I’ve only observed a few 10’s of microsecond jump. That doesn’t mean it doesn’t jump more at other times however, and I need to find a way to model that.

My third poor assumption is that every reading takes the same time, whether that’s 100 micros, 110 micros, or 90 micros. This is the hardest to measure, because my clock just isn’t accurate enough to measure a 10 microsecond skew. I suspect that I’m just going to have to punt on this one and assume the buffer wide average is good enough. It’s probably good enough to account for the drift in clock frequency with temperature.

First look at the data

I did some simple plotting of measurements taken by a single microphone. I clapped several times a few feet away from it. The first two claps were quite loud:

Full scale on the ADC is 12 bits, or 0-4095 units. Each unit represents about 3.3V/4095 = 0.8 millivolts. You can see that when idle it hovers around 2090, which is approximately half scale, so that’s good. A very loud sound sends it almost full scale (+/- 1500 units), which is also good. The RMS variation when quiet is around 10-20 units, which indicates there’s not a lot of noise in the system.

I’m trying to decide how to identify the time of the arrival of the wavefront. I could pick the reading that is furthest from the mean value and key off of the time of that, and I think this is what I’ll try first. I’m figuring that this translates into the peak of the initial attack, and that will be close to the same position in all the microphones.

The other strategy is to just pick the first reading that is more than handful of standard deviations away from the mean, but that requires a more sophisticated analysis of the data leading up to the wavefront, and in the case where the wavefront is at the beginning of the packet of data, I may not have enough data in those few points to get a meaningful statistical picture, which means I need to get long-term averages from the microphone controllers themselves, which means more computation on the microcontrollers.

Quieter sounds (soft clapping) produced simpler and easier to interpret waveforms:

Here you can see that there is only really one big spike with a 300 unit deviation, and so keying off of that single value is likely the right strategy.

I’m going to have to wait till I have 2 microphones to see if I can synchronize their detection.