HCAL Operations
HCAL Twiki:
https://twiki.cern.ch/twiki/bin/viewauth/CMS/HCALWikiHome
HCAL Contact List:
https://twiki.cern.ch/twiki/bin/view/CMS/HcalContactList
DOC on call:
https://twiki.cern.ch/twiki/bin/view/CMS/HcalDOCHowTo
DAQ info:
https://indico.cern.ch/event/519748/
On Call Cheat Sheet
Setting up Tunnels
Follow the
HCAL Connectivity
twiki to get tunnels working for lxplus,904, and p5 networks. The best choice for windows is to use linux with a virtual box. If you are on lxplus already at cern with a windows computer putty works well too.
Setup Environment
Bash files
Snippets
Getting the Hcal config files point 5: (should be the same for 904)
$ cfgcvs checkout
HcalCfg
Very similar to svn
$ cvs commit -m "Comment here"
$ cvs tag -F pro DTC.cfg
MCH Commands
telnet hcal-mch-20 (the hcal-mch-20 is like in HcalCfg/uTCA/connection.xml with hcal-card-crate-additionaloption)
>> show_fru
(M4 is on)
>> shutdown <fru_number>
(Probably do show_fru again to make sure the state goes to M1 of the particular card)
>> fru_start <fru_number>
>> exit
To shut down all cards:
>>shutdown all
To reboot ( this also restart the MCH so you will be disconnected. Wait ten seconds or so and then telnet back in to see status)
>> reboot
Run Control Websites
P5:
http://cmsrc-hcal.cms:16000/rcms/gui/servlet/RunGroupChooserServlet
CMS TOP:
http://cmsrc-top.cms:10000/rcms/gui/servlet/RunGroupChooserServlet
904:
http://cms904rc-hcal.cms904:16000/rcms/gui/servlet/RunGroupChooserServlet
H2:
http://cmshcalTB02:16000/rcms/gui/servlet/RunningConfigurationServlet
Building 28:
http://cmshcal21:16000/rcms/gui/servlet/RunningConfigurationServlet
Configuration Chooser sets up the run. Once to the configuration screen do:
Set enable parameters
initialized
configure
start
Stop the run before killing the run
uHTR
The uHTR tool is pretty straightforward. The only thing to watch out for is that the back and front fpga firmware should not be mixed up (and they can be…). Also, the versions should match between front and back. This should be both the detector part (HF, HBHE) and the number after it which is the speed (1600 = 1.6Gbps, 4800 = 4.8Gbps). Examples match below:
uhtr_front_HBHE1600_1_00_07.mcs.xz
uhtr_back_HBHE1600_1_00_00.mcs.xz
Correct Use for P5/904:
$ uHTRtool.exe -c crate:slot (uHTRtool.exe -c 52:10)
Not sure yet what the shell script does (uHTRtool.sh)
AMC13
Connecting to an AMC13:
There is a shell script at p5 and 904 that will use the connection file and then the -i option:
AMC13Tool2.exe -c ~hcalsw/uTCA.connections.pro.xml -i hcal.crate$1.amc13 ${@:2}
This is the p5 script. The 904 is almost identical but the location of the connection file is hcalsw/HcalCfg/uTCA/connections.xml. Just use this to connect:
$ ~hcalsw/bin/AMC13Tool2.sh crate#
6/24/16
To add the
AMC13Tool2.sh script to hcalsw used:
$ sudo -u hcalsw then_a_command
Example to edit the shell script:
$ sudo -u hcalsw emacs
AMC13Tool2.sh
Log Files
Go to 904 or p5 network
Ssh cms904rc-hcal (for 904)
Ssh cmsrc-hcal (for p5)
Run the handsaw script (this is located in ~hcalsw/bin/):
$ Handsaw.pl /var/log/rcms/hcalpro/Logs_hcalpro.xml]
$ tail -f /var/log/rcms/hcalpro/Logs_hcalpro.xml | Handsaw.pl
Tail streams the errors out. Probably the best to look at.
Parsing through old Logs:
Copy the log file you want to look at. They are usually in a compressed (.gz). To open them:
$ gunzip file.gz
Once you have the log file, you can use Handsaw and less to look through them:
$ Handsaw.pl logfile.xml | less -R
Elog + Shiftlift
Main Page:
https://cmsonline.cern.ch/webcenter/portal/cmsonline/pages_common?wc.contentSource=
Direct to Elog:
https://cmsonline.cern.ch/webcenter/portal/cmsonline/pages_common/elog
Elog:
Common -> Elog -> Subsystems -> Hcal -> Hcal, Hcal904, etc
Shiftlist:
Common -> Shiftlist
Random
Killing Stale Xdaq Processes
When a run is destroyed there can be processes that were not destroyed. These will interfere with the next run and cause errors on initialize.
Processes NOT to kill:
root 35246 18.7 7.3 6499672 2399100 ? Ssl Apr20 11439:33 /opt/xdaq/bin/xdaq.exe -h srv-s2f17-19-01.cms -p 9950 -u file.append:/var/log/hcal.xaad.log -e /opt/xdaq/share/hcal/profile/xaad.profile -z hcal
root 35247 8.2 0.2 4319652 91404 ? Ssl Apr20 5045:51 /opt/xdaq/bin/xdaq.exe -h srv-s2f17-19-01.cms -p 9999 -u file.append:/var/log/hcal.jobcontrol.log -e /opt/xdaq/share/hcal/profile/jobcontrol.profile -z hcal
Example of stale xdaq process to kill:
hcalpro 41955 9.1 0.2 4261576 93072 ? Sl 22:37 1:01 /opt/xdaq/bin/xdaq.exe -h hcalutca01.cms -p 15002 -s 294342 -u xml://cmsrc-hcal.cms:16010 -l INFO
Stale XDAQ example:
hcalpro 13038 4.1 0.3 2588468 122480 ? Sl Sep15 71:14 /opt/xdaq/bin/xdaq.exe -h hcalutca01.cms -p 16789 -u xml://cmsrc-hcal.cms:16010 -l INFO
Do not kill things that are run under root like job control, controlhub, xaad, etc. Be careful with pkill and such that use the name xdaq since it will pick up those other ones as well.
~hcalsw/bin/dump_all.sh
See if relevant card ispingable.
If uHTR of AMC13 is not pingable:
-Restart sysmgr:
Using script on hcalutca01 machines:
~hcalsw/bin/restart_sysmgr.sh
Basically ~hcalsw/bin/ contains all the magical scripts to do most things.
Crate Locations and FEDS:
http://cmsdoc.cern.ch/cms/HCAL/document/CountingHouse/Crates/Crate_interfaces_transition.htm
Dump: ~hcalsw/bin/dump_all.sh
ps aux (just to check what is there, if no stal xdaq it should be fine to just run:
(sudo service xdaqd restart)
sudo service xdaqd stop
pgrep xdaq (just to check what is there)
pkill xdaq
sudo service xdaqd start
THESE CHANGE QUITE OFTEN:
Restart Services 904
I have begun working on service fix at 904. To run it, type:
sudo -u hcalpro ~hcalpro/scripts/Service_fix.sh
The currently functioning options are:
-tomcat
-sysmgr
The possible functioning option is:
-ccmserver
Please use this and email/elog any issues or desired functionality
Restart System Manager p5
Here is the start system manager command. It should be used sparingly. In most cases (such as swapping uHTRs), the system manager will not need to be restarted.
sudo -u hcalpro sysmgr ~hcalsw/config_files/Sysmgr/sysmgrHCAL.conf
--
DanielArcaro - 21 Jun 2017