Hi,
During longer time I have noticed that some of services does not matter if this is external command or internal (check_http) in Status Information field are getting not result but string "(No output on stdout) stderr:"as shown below screenshot:
http://www.pictureshack.us/images/58549_nagios_screen.jpg
Are there any resolutions for that?
Regards,
Tom
Does it occur all the time ? We haven't had any problems in our/test environments so far. It seems that Nagwin 4 check mechanism is sensitive how check commands are formed, especially using backslahes and quotes. We recommend to test/verify your plugins in a bash shaell if they are runnable.
Hi,
Unfortunattely it does not occur everytime. Meaning same command runs randomly with Status update or with "(No output on stdout) stderr:".Sometime it happens and at next check Status is properly updated. Problem happens for all commands randomly as sample below. There is nothing at logs mentioned why and what happend so I have no clue what could be reason. As you can see at screenshot it also happend for localhost which was included in Nagwin package with standart command and services.
Is there any way how to debug it?
Sample commands:
check_http (this is orginal command):
# 'check_http' command definition
define command{
command_name check_http
command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
}
and also this:
define command{
command_name check_java
; Check via java
command_line c:/nagios/ICW/java/jre7/bin/java.exe -jar $USER3$plugins/CheckJava.jar $USER3$plugins/ "$ARG1$" "$ARG2$"
}
Hi,
I have enabled full debugging and here is part of log with this exception:
Making callbacks (type 6)...
handle_timed_event() end
Making callbacks (type 1)...
** Event Check Loop
Next Event Time: Wed Jan 22 09:10:34 2014
Current/Max Service Checks: 1/0 (inf% saturation)
## Polling 1500ms; sockets=6; events=18; iobs=0x80051660
Processing check result for service 'Physical memory' on host 'localhost'
handle_async_service_check_result()
** Handling check result for service 'Physical memory' on host 'localhost' from 'Core Worker 4872'...
HOST: localhost, SERVICE: Physical memory, CHECK TYPE: Active, OPTIONS: 0, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 0, OUTPUT: (No output on stdout) stderr:
Parsing check output...
Short Output: (No output on stdout) stderr:
Long Output: NULL
Perf Data: NULL
ST: HARD CA: 1 MA: 4 CS: 0 LS: 0 LHS: 0
Service is OK.
Service did not change state.
Rescheduling next check of service at Wed Jan 22 09:15:26 2014
get_next_valid_time()
_get_matching_timerange()
schedule_service_check()
Scheduling a non-forced, active check of service 'Physical memory' on host 'localhost' @ Wed Jan 22 09:15:26 2014
Hmm. What's the command behind "Physical memory" ?
Hi,
Unfortunattely I was blocked to publish whole log in this forum. Below goes begin point when 'Physical memory' was called. This command and service is a standart command/service delivered with Nagwin
Log:
Attempting to run scheduled check of service 'Physical memory' on host 'localhost': check options=0, latency=0.000000
run_async_service_check()
check_service_check_viability()
check_time_against_period()
_get_matching_timerange()
check_service_dependencies()
Making callbacks (type 6)...
Checking service 'Physical memory' on host 'localhost'...
get_raw_command_line_r()
Raw Command Input: $USER1$/check_pdm --memory physical --warning $ARG1$ --critical $ARG2$
process_macros_r()
Service:
# Define a service to check the physical memory usage the local machine.
# Critical if less than 90% of swap is in use, warning if 80% is in use
define service{
use local-service,srv-pnp ; Name of service template to use
host_name localhost
service_description Physical memory
check_command check_local_physical_mem!80!90
}
Command:
# 'check_local_pyhsical_mem' command definition
define command{
command_name check_local_physical_mem
command_line $USER1$/check_pdm --memory physical --warning $ARG1$ --critical $ARG2$
}
I use the same plugin in my production system without any problem. Do you run Nagwin on a dedicated machine or are there other processes as well ? If you click on a service, you will able to see some stats about the performance of the service - like
As the problem occurs from time to time, it may be related to the load on the Nagwin machine.
Nagwin is running on dedicated VM . However there are some additional processes like Web and DB server but for Nagwin reporting purposes.
I am not sure if this is related to load becuase logs are saying that plugin ended normaly. Only what is missing is stdout. If there would be timeout than whole process would fail. Isn it?
Here is line:
�memory in use�=1214MB; �memory usage�=29%;80;90; �memory total�=4095MB;
As I wrote before, it works here - not a big relief for you, I know. However, this should be something related to your system as messages come and go. Can you publish performance info (right top corner - taxctical view). My production system reports like:
Thanks for that indeed not big relief :)
I would like to do more investigation but dont have idea from where to start. This column (Status Information) is very important for us and our Proof Of Concept for Nagwin depends at this result and further purchase of this product for production.
Currently Nagwin runs on my dev machine Windows 7 at VM.
Anyway here is this report:
Monitoring Performance
--------------------------------------------------------
Service Check Execution Time: 0.09 / 15.48 / 5.518 sec
Service Check Latency: 0.00 / 0.00 / 0.000 sec
Host Check Execution Time: 0.06 / 4.17 / 1.096 sec
Host Check Latency: 0.00 / 0.00 / 0.000 sec
# Active Host / Service Checks: 4 / 6
# Passive Host / Service Checks: 0 / 2
Hmm. As this problem appears to be a random one, I wonder if you can check the number of network connections at your system: Windows 7 has a hard-coded limit of 20 (Was 10 in XP!). My production Nagwin is on a 2008 server system and I've never got that kind of problems as of Nagios 4.0.2.
Hi again,
My machine does not have this limit but I assume that this would not be source of problem becuase there are only 8 services. Neverless I also tried to enable Notifications and there I also got this error so this is very bad:
***** Nagios *****
Notification Type: PROBLEM
Service: Selenium test which fail
Host: QlikView
Address: srvqliviewpoc.ops.adr.ch.glencore.com
State: CRITICAL
Date/Time: Thu Jan 23 18:57:57 WEST 2014
Additional Info:
(No output on stdout) stderr:
Windows 7 machines has that limit and you run other services on that as well. This is a plugin issue. Can you describe what do you run as plugin ?
We have installed the latest Nagwin on a Windows 7 Home Premium computer and got same symptoms as you got with the message "(No output on stdout) stderr" randomly showing up. Some tweaks in the Nagios code didn't improve the situation either.
We took then fresh installations of Nagwin on Windows 7 64-bit Enterprise, Windows 2008 64-bit standard, Windows 2012 64-bit standard and Windows XP 32-bit. All of them except XP are working as expected after many hours of operation without the message above. We have observed some messages on XP.
It seems that the problem may be OS-related, and we have no idea why this happens. What we can recommend is to test Nagwin on an appropriate platform.
I have installed the latest Nagin on a Windows Server 2012 machine and I also see this problem exactly like the original poster described it.
I barely installed Nagin and have this problem already for just the out of the box included checks for localhost. I have not yet added any further host.
I would be glad for any hints how to solve this problem.
Which Nagwin version do you run ? Do you mean 2.1.0 by the latest ?
I see - no. I have 2.0.1 and I just saw you now offer 2.1.0. Does it make a difference for this issue? Can I simply upgrade by installing over the old setup? (I saw no upgrade installation hints on the web site.)
Please follow steps below for an upgrade:
I did the upgrade - the problem is unchanged.
Additional problem: the Nagwin services now terminates with the error message "trial period expired".
Apparently you put the wrong installer into my download package. Please help.
We have now released Nagwin 2.2.0 containing Nagios 4.0.6 with improvements about the issue. Please download it via your account page. We recommend to follow the procedure for a clean upgrade:
The problem seems to be solved now. Thanks!