User Tools

Site Tools


operations:documentation.errors

Alarm and error documentation page

This page is dedicated to presenting new and current observers with the myriad (or miriad ;) ) of problems that the observer may face. And more importantly, the successful solutions to them.

If you come across an error whilst observing, no matter how small, please add:

  • The alarm (without data/time) that came up and alerted you to the problem or (if there was no alarm);
  • The diagnostic that presented the problem to you;

A real-life example:

Auto-stow sequencing (no alarm) | ALARM: error fl -1 previous source in this schedule not reached before new source was commanded

I noticed the wind speed had dropped to 7.5km/h but the antenna hadn't recovered. I halted the schedule and restarted the drives. This did not fix the problem however alarms sounded every scan 'ALARM: error fl -1 previous source in this schedule not reached before new source was commanded' and started trying to record data whilst stowed. Solution was to:

      halt
      disk-record=off
      antenna=off
      terminate fs
      restart
      antenna=open
      antenna=operate
      proc=r4774hb
      schedule=r4774hb
      

Final note: logs noted a 'Auto-stow release failed' before this when all other wind stows ended with 'Auto-stow released succeeded'

Antenna stuck (no alarm) | ALARM: error fl -1 previous source in this schedule not reached before new source was commanded

  Power outage, antenna got stuck after the system recovered:
      halt
      disk_record=off
      antenna=off
      terminate
      in timeke, HMI, 'reset drives', wait until both comms stati are green
      fs
      proc=mv004ke
      setup01
      antenna=open
      antenna=operate
      source=stow

And it moved. Then I restarted the schedule. Generator status was 'in auto-off' so I assumed the power came on without any problems.

Antenna stuck | Warning: error st -27 computer acu time difference exceeded 0.25 seconds, see value above.

Normally this error can be ignored, but there was an additional error:

WARNING: error st  -10 rampepochtime not -1 when preparing to load, check antenna is in "operate".

Error presented as drive status not ready, with drives green in HMI. HMI was unresponsive - could not turn drives off. Antenna=operate and Antenna=open both return Antenna OK, but don't work. The clock on the right of HMI interface on cnsshb (CURRENT TIME UT1) was not updating.

Solution to issue is using HMI and selecting the reboot central button under resets. This reset the clock connection to the antenna, allowing the drives to communicate again.

rfpcn errors (time-outs) etc.

ALARM: error s5 -104 rfpcn: time-out, connection closed

If you receive a persistent “rfpcn: error opening, rfpic probably not running, see above for error” report, or notice that the recording is notably behind the summary file and the becklog grows, you might want to restart Rxmon.

Log in to pcfs as root and perform the following command:

pcfsyg:~#  su
pcfsyg:~#  /etc/init.d/Monica.Rxmon stop
pcfsyg:~#  ps -ef | grep Rxmon
pcfsyg:~#  /etc/init.d/Monica.Rxmon start

If the command worked, you will see the parameters listing.

The “ /etc/init.d/Monica.Rxmon start” command may not work as “ERROR on binding: Address already in use”. Just wait a minute and repeat an attempt. If it still doesn't work restart Monica:

pcfsyg:~#  /etc/init.d/Monica.monica stop
pcfsyg:~#  /etc/init.d/Monica.monica start

ALARM: Large difference between formatter and maser delays. Check for stability of new offset

First scan of the schedule this came up. The telescope checks the maserdelay and clkoff before every scan and if it is >0.5us then this alarm comes up. I immediately checked the maser first with

   maserke

In terminal and it was all green. After that I went to “ke” (pcfske) and checked

   fmset

And saw there was a large delay between the dbbc and pcfs. I halted the schedule:

   halt
   disk_record=off
    

(in erememote)

and typed into fmset “s”, “y”, “s”, “y” to resync. After the resync it still wasn't the same so I opened a terminal and typed:

   vncviewer dbbcke

In the dbbc, I closed the 'DBBC Control v104_2.exe' window and reopened the program from the desktop. It asked me to reconfigure and “y”. When it was finished I typed

   dbbc=pps_sync

Into eremote and resynced in fmset again. After this it was stable so I restarted the schedule with

   cont

ALARM: Field System Time-Out ... etc

As it says in the alarm, the first step to check this alarm is to log onto the corresponding pcfs. E.g.

   vncviewer pcfske:1

Or use the convenient VNC pull downs in the menu. Once inside, it should be fairly obvious if the field system is running. However, if you are unsure simply open a new terminal window and type

   fs

If the FS is already running, it should return 'field system already running', if not it will start the field system. If the field system IS running, but frozen, close the terminal window it is running in and restart.

Now if the field system is running, but somehow cannot connect to to e-remote control AND/OR the log monitor then this will also cause this ALARM by design. Obviously we want the log errors to be pushed through the pcfs, into the erc and lastly into the log monitor. And if this alarm goes off then this stream has been broken or inactive for too long.

Check the e-remote control. If it is not working then it may need a restart or complete reboot. See if inputs from erc are going into the fs AND the log monitor. A command like

   disk_pos

Will return a response from both. If they don't respond after this then the eremote is frozen and requires a reboot. If they do, check the timeout time, if there has been 15mins before the next scan but it is set to 10mins, then this will set off the alarm.

ERROR st - 28 UT1TimeOffset non-zero, see value above / ERROR st -27 Computer time difference exceeded 0.25 seconds, see value above (persistent beeping)

Most of the notes on the wiki state that this error is benign, however is is very annoying. In order to fix it (FOR NON RADIOASTRON EXPERIMENTS) you must ssh time__

In the HMI gui, go to the 'SETTINGS' tab. There will be a 'TIME OFFSET' box in the middle. You need to change the 'UT1-UTC' value to 0.0. It will most likely be blank. Save the changes.

Restart the field system and the changes will be implemented. Shout out to Arwin for figuring this one out.

Ceduna Network Down

First ssh into hobart and make sure to forward the X server

ssh -X oper@hobart.phys.utas.edu.au

From terminal on hobart, VNC into the 'manager' PC

vncviewer manager

Go to network connections in the bottom right, left-click on this icon and then right-click on 'CEDUNA-NETWORK' and connect. May need to click 'dial' on a pop up window. Once this connection is established open up TightVNC (should be in start menu). VNC into the router PC, details should already be in, just press connect. The password is not standard for this PC (ask for it in group chat and remember/write it down).

Check status of 3G network on the status page, navigate to system tools on the left. Click on the reboot sub-menu. Click reboot and confirm.

Start pinging the ceduna gateway (131.217.61.129) on your PC (not in the VNC window). In a terminal or command prompt:

ping 131.217.61.129

After the router finishes rebooting, click on the status menu (top-left) and check the 3G status. Ideally, the pings will start reaching the the Ceduna gateway, If so skip the next step, If not it may be the VPN that needs fixing, to do this run putty.exe (on the desktop), ssh into 192.168.1.61, login as 'physics' (different password again) and run the following command:

clear crypto ipsec client ezvpn

Once you have established connection (can ping the gateway), exit out of the router PC VNC window, then go down to network connections on 'manager', right-click on CEDUNA-NETWORK and click disconnect. This is important.

Now you are done!

Brett is the expert on this setup, so if anything is not as described above, he is the best contact person.

Not really in the ideal spot, however, not sure I have the ability to add to the RA wiki -Tiege 14/08/17

/home/www/auscope/opswiki/data/pages/operations/documentation.errors.txt · Last modified: 2017/09/27 06:29 by Jamie McCallum