This is the little known story of what recently happened to the SORCE spacecraft and how it was nursed back to a mostly operational status over a period of weeks, after nearly dying in the cold of space due to what appears to be a software glitch. First, some background.
The Solar Radiation and Climate Experiment (SORCE) is a NASA-sponsored satellite mission that is providing state-of-the-art measurements of incoming x-ray, ultraviolet, visible, near-infrared, and total solar radiation. The measurements provided by SORCE specifically address long-term climate change, natural variability and enhanced climate prediction, and atmospheric ozone and UV-B radiation. These measurements are critical to studies of the Sun; its effect on our Earth system; and its influence on humankind.
The SORCE spacecraft was launched on January 25, 2003 on a Pegasus XL launch vehicle to provide NASA’s Earth Science Enterprise (ESE) with precise measurements of solar radiation. It launched into a 645 km, 40 degree orbit and is operated by the Laboratory for Atmospheric and Space Physics (LASP) at the University of Colorado (CU) in Boulder, Colorado, USA. It will continue the precise measurements of total solar irradiance (TSI) that began with the ERB instrument in 1979 and has continued to the present with the ACRIM series of measurements. SORCE also provides the measurements of the solar spectral irradiance from 1nm to 2000nm, accounting for 95% of the spectral contribution to TSI.
SORCE carries four instruments including the Spectral Irradiance Monitor (SIM), Solar Stellar Irradiance Comparison Experiment (SOLSTICE), Total Irradiance Monitor (TIM), and the XUV Photometer System (XPS).
What happened: The spacecraft went into Safe Hold on Sunday, Sept. 26th. The failure appears to be due to a zero length data packet that scrambled the software control due to it being unable to handle the error condition. 2 of 3 reaction wheels (like gyros) failed, sun aligned attitude for solar cell charging was lost, batteries discharged, and the temperature of the spacecraft internal electronics plunged to as low as -30°C. Recovery of the spacecraft took three weeks of work and the support of 82 ground tracking stations along with data relays via NASA’s TDRS network.
From the SORCE weekly status reports from 9/23 – 10/13:
SORCE experienced an OBC reset at 2010/269-17:57:40 due to the MU sending a CCSDS packet with a length of zero. An OBC reset results in the satellite regressing to Safehold, and performing basic power and attitude maintenance on the APE processor.
The following activities were performed to recover the observatory.
- Spacecraft time was jammed and data dumped from after the anomaly
- The OBC was reset to resync with the 1553 bus
- OBC patches 6.8, 6.9 and 7.0 were loaded
- All the spacecraft tables and RTS’s were reloaded
- SORCE was then commanded out of safehold and back to OBC control on DOY 269.
At the next contact after exiting safehold, the spacecraft was found to be operating on one reaction wheel with lower than expected power margins. The flight operations team manually shed RWA 3, both star trackers and turned off the transmitter and commanded safehold.
Poor pointing performance lead to low battery charge state, and APE power charging tables changed to charge at a higher value on the APE. To recover from this configuration and a hybrid flight software configuration, the following actions were performed:
- RWA FSW process was reinitialized to clear the faults.
- The OBC AC control tables were poked for 3 wheel control.
- The OBC was reset to clear any lingering issues which might have existed
- OBC patches 6.8 and 6.9 were loaded. OBC 7.0 for two wheel control was not loaded at this time due to questions about it’s performance.
- All the spacecraft tables and RTS’s were reloaded
- RWA over speed was disabled in table 108. It was determined that RWA 4 had an over speed fault which lead to the transition to one wheel control in contingency mode.
- SORCE was then commanded out of safehold and back to OBC control on DOY 272.
A phased approach was taken to re-warm the satellite back to operational temperature. The degraded battery necessitated this approach. To duty cycle the battery heater around eclipse as is done in normal operations, a special ATS was loaded that included commands to power off the heater in eclipse. Over the next several orbits after exiting safehold the Star Tracker heaters and Instrument Bench heaters were enabled.
- Star Tracker 1 and Star Tracker 2 were powered on DOY 273.
- SORCE was also commanded to normal pointing mode on DOY 273.
The operational temperature of the battery with duty cycling and the instruments off was cooler than desired. The redundant battery heater set point was raised to cycle between 1 and 2 deg. C. that improved battery performance.

The instrument suite was recovered as follows:
- MU turned on and FSW patched and configured for normal operations on DOY 274
- TIM turn on and science operations began on DOY 275
- SOLSTICE B turn on and science operations began on DOY 275
- SIM B turn on and science operations began on DOY 276
- On DOY 277 SOLSTICE A was turned on. Due to a known “feature” where sending an instrument turn off command, followed by another instrument turn off command will power off the instrument after a turn on command, SOLSTICE A turn on was not successfully completed. Successful turn on and return to science was completed on DOY 278.
- XPS was turned on and began taking science on DOY 279.
Read the entire summary at the SORCE weekly status reports from 9/23 – 10/13
The most recent SORCE weekly status report on 11/18/2010 is a bit more encouraging, as they finally got the SIM B instrument back online and the temperature/heater/available battery power situation seems to be managed now.
Here’s TSI data from SORCE regularly plotted by Dr. Leif Svalgaard along with other solar data, click to enlarge.
h/t’s to Leif Svalgaard and Harold Ambler






#
#
Jim Owen says:
November 23, 2010 at 9:43 am
@ur momisugly DesertYote –
When this design mentality hits NASA we will be in big trouble.
Condolences are in order. I once worked in that kind of environment, doing IV&V on the Hubble Space Telescope C3 software systems. About 6 million lines of code. Interesting, frustrating, time consuming and ultimately impossible to eliminate ALL the bugs. Took 3 years and 8 or 9 test cycles to convince the management that “their” take on the robustness of the multiple computer interfaces was totally wrong.
Your statement above is true – except for the timing. It happened sometime around 1990.
###
I was afraid that might have been the case, but I was hoping! I bailed about that time, right after the GAO witch hunts. BTW, most of my work involved C3 also, TDRSS, GRO/COBE, GOES9/10, Venus Mapper, Mars Explorer.
Of course these problems would not occur if they used DOS 3.1 . That was bulletproof.
sarc off/
@ur momisugly ian middleton –
lol!! What they use on the Shuttle system is “almost” that advanced. And one of the 2 HST computers was the same model – leftover 60’s era MMS computers. That’s why the Shuttles have to reload the computers in-orbit – so they can get back “home”.
@ur momisugly DesertYote –
Mine was Nimbus (1-4), Landsat (all of them), UARS, HST, TDRSS (Space Network) and a couple black programs that I still can’t talk about.
And Leif is right – it’s NOT a friendly environment up there. We flew the first GPS unit on Landsat 4. It wasn’t built with hardened components so turning it on was always a crapshoot re: how long it would operate before crashing. IIRC, the longest it stayed up was about 15 minutes (although I could be off by a few minutes there). I know for sure that every time we turned it on, it was down for at least a week while we reworked the ground data base and the “airborne” software. GPS has come a long way since those days. So have the science instruments – but only because of the knowledge base that we built back in the “bad old days”. One should never talk about past failures without realizing that those failures were the “learning curve” for today’s successes.
I remember using SRAMs with packaging that emitted alpha particles – but we didn’t find out until we started looking into unexplained crashes. “But not to worry,” the vendor said. “The failure rate is something times ten to the minus something!” So, we multiplied the number of chips in our system by the failure rate, and sure enough it was supposed to fail every half hour.
I have great respect for the folks who send their software and hardware into space – it’s not an easy proposition.
Every sw/hw testing group should have at least one clueless ijit on staff, to do whatever comes naturally. After indulging in howls of outrage after each crash, they’d actually get important improvements in reliability.
🙂
All swell that ends well…. 😉
Good thing they got it going…. an’ a gripping yarn it be too.
DesertYoghurt said: “The fact that there was an unhanded zero length error indicates that it originated from a subsystem that was beveled to be incapable of sending a zero length data packet, as normally these would have at least on byte by design.”
Hey bud, it looks like your packets are getting mangled a bit. You should know better than most never to bevel the on byte of your packets once you’re in orbit!