Problem with Status Information field: "(No output on stdout) stderr:"

21 posts / 0 new
Last post
tomasz
Offline
Last seen: 7 years 2 months ago
Joined: 10.01.2014 - 17:26
Problem with Status Information field: "(No output on stdout) stderr:"

Hi,

During longer time I have noticed that some of services does not matter if this is external command or internal (check_http) in Status Information field are getting not result but string "(No output on stdout) stderr:"as shown below screenshot:

http://www.pictureshack.us/images/58549_nagios_screen.jpg

Are there any resolutions for that?

Regards,
Tom

 

itefix
Offline
Last seen: 23 min 6 sec ago
Joined: 01.05.2008 - 21:33
Does it occur all the time ?

Does it occur all the time ? We haven't had any problems in our/test environments so far. It seems that Nagwin 4 check mechanism is sensitive how check commands are formed, especially using backslahes and quotes. We recommend to test/verify your plugins in a bash shaell if they are runnable.

tomasz
Offline
Last seen: 7 years 2 months ago
Joined: 10.01.2014 - 17:26
Hi, Unfortunattely it does

Hi,

Unfortunattely it does not occur everytime. Meaning same command runs randomly with Status update or with "(No output on stdout) stderr:".Sometime it happens and at next check Status is properly updated. Problem happens for all commands randomly as sample below. There is nothing at logs mentioned why and what happend so I have no clue what could be reason. As you can see at screenshot it also happend for localhost which was included in Nagwin package with standart command and services.

Is there any way how to debug it?

 

Sample commands:

check_http (this is orginal command):

# 'check_http' command definition
define command{

       command_name    check_http
       command_line    $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
       }

and also this:

define command{ 
command_name check_java
; Check via java
command_line c:/nagios/ICW/java/jre7/bin/java.exe -jar $USER3$plugins/CheckJava.jar $USER3$plugins/ "$ARG1$" "$ARG2$"

}

tomasz
Offline
Last seen: 7 years 2 months ago
Joined: 10.01.2014 - 17:26
Hi, I have enabled full

Hi,

I have enabled full debugging and here is part of log with this exception:

Making callbacks (type 6)...

handle_timed_event() end

Making callbacks (type 1)...

** Event Check Loop

Next Event Time: Wed Jan 22 09:10:34 2014

Current/Max Service Checks: 1/0 (inf% saturation)

## Polling 1500ms; sockets=6; events=18; iobs=0x80051660

Processing check result for service 'Physical memory' on host 'localhost'

handle_async_service_check_result()

** Handling check result for service 'Physical memory' on host 'localhost' from 'Core Worker 4872'...

HOST: localhost, SERVICE: Physical memory, CHECK TYPE: Active, OPTIONS: 0, SCHEDULED: Yes, RESCHEDULE: Yes, EXITED OK: Yes, RETURN CODE: 0, OUTPUT: (No output on stdout) stderr: 

Parsing check output...

Short Output: (No output on stdout) stderr:

Long Output:  NULL

Perf Data:    NULL

ST: HARD  CA: 1  MA: 4  CS: 0  LS: 0  LHS: 0

Service is OK.

Service did not change state.

Rescheduling next check of service at Wed Jan 22 09:15:26 2014

get_next_valid_time()

_get_matching_timerange()

schedule_service_check()

Scheduling a non-forced, active check of service 'Physical memory' on host 'localhost' @ Wed Jan 22 09:15:26 2014 

itefix
Offline
Last seen: 23 min 6 sec ago
Joined: 01.05.2008 - 21:33
Hmm. What's the command

Hmm. What's the command behind "Physical memory" ?

tomasz
Offline
Last seen: 7 years 2 months ago
Joined: 10.01.2014 - 17:26
Hi, Unfortunattely I was

Hi,

Unfortunattely I was blocked to publish whole log in this forum. Below goes begin point when 'Physical memory' was called. This command and service is a standart command/service delivered with Nagwin

Log:

Attempting to run scheduled check of service 'Physical memory' on host 'localhost': check options=0, latency=0.000000

run_async_service_check()

check_service_check_viability()

check_time_against_period()

_get_matching_timerange()

check_service_dependencies()

Making callbacks (type 6)...

Checking service 'Physical memory' on host 'localhost'...

get_raw_command_line_r()

Raw Command Input: $USER1$/check_pdm --memory physical --warning $ARG1$ --critical $ARG2$

process_macros_r()

Service:

# Define a service to check the physical memory usage the local machine. 

# Critical if less than 90% of swap is in use, warning if 80% is in use

define service{

       use                             local-service,srv-pnp         ; Name of service template to use

       host_name                       localhost

       service_description             Physical memory

check_command check_local_physical_mem!80!90

       }

Command:

# 'check_local_pyhsical_mem' command definition

define command{

command_name check_local_physical_mem

command_line $USER1$/check_pdm --memory physical --warning $ARG1$ --critical $ARG2$

}

itefix
Offline
Last seen: 23 min 6 sec ago
Joined: 01.05.2008 - 21:33
I use the same plugin in my

I use the same plugin in my production system without any problem. Do you run Nagwin on a dedicated machine or are there other processes as well ? If you click on a service, you will able to see some stats about the performance of the service - like

Check Latency / Duration: 0.000 / 20.578 seconds

As the problem occurs from time to time, it may be related to the load on the Nagwin machine.

tomasz
Offline
Last seen: 7 years 2 months ago
Joined: 10.01.2014 - 17:26
Nagwin is running on

Nagwin is running on dedicated VM . However there are some additional processes like Web and DB server but for Nagwin reporting purposes.

I am not sure if this is related to load becuase logs are saying that plugin ended normaly. Only what is missing is stdout. If there would be timeout than whole process would fail. Isn it?

Here is line:

Performance Data:

�memory in use�=1214MB; �memory usage�=29%;80;90; �memory total�=4095MB;

itefix
Offline
Last seen: 23 min 6 sec ago
Joined: 01.05.2008 - 21:33
As I wrote before, it works

As I wrote before, it works here - not a big relief for you, I know. However, this should be something related to your system as messages come and go. Can you publish performance info (right top corner - taxctical view). My production system reports like:

 

 
 Monitoring Performance
Service Check Execution Time: 0.03 / 20.64 / 2.068 sec
Service Check Latency: 0.00 / 0.00 / 0.000 sec
Host Check Execution Time: 4.05 / 4.06 / 4.050 sec
Host Check Latency: 0.00 / 0.00 / 0.000 sec
# Active Host / Service Checks: 6 / 49
# Passive Host / Service Checks: 0 / 0
tomasz
Offline
Last seen: 7 years 2 months ago
Joined: 10.01.2014 - 17:26
Thanks for that indeed not

Thanks for that indeed not big relief :)

I would like to do more investigation but dont have idea from where to start. This column (Status Information) is very important for us and our Proof Of Concept for Nagwin depends at this result and further purchase of this product for production.

Currently Nagwin runs on my dev machine Windows 7 at VM.

Anyway here is this report:

Monitoring Performance
--------------------------------------------------------
Service Check Execution Time: 0.09 / 15.48 / 5.518 sec
Service Check Latency: 0.00 / 0.00 / 0.000 sec
Host Check Execution Time: 0.06 / 4.17 / 1.096 sec
Host Check Latency: 0.00 / 0.00 / 0.000 sec
# Active Host / Service Checks: 4 / 6
# Passive Host / Service Checks: 0 / 2

itefix
Offline
Last seen: 23 min 6 sec ago
Joined: 01.05.2008 - 21:33
Hmm. As this problem appears

Hmm. As this problem appears to be a random one, I wonder if you can check the number of network connections at your system: Windows 7 has a hard-coded limit of 20 (Was 10 in XP!). My production Nagwin is on a 2008 server system and I've never got that kind of problems as of Nagios 4.0.2.

tomasz
Offline
Last seen: 7 years 2 months ago
Joined: 10.01.2014 - 17:26
Hi again, My machine does not

Hi again,

My machine does not have this limit but I assume that this would not be source of problem becuase there are only 8 services. Neverless I also tried to enable Notifications and there I also got this error so this is very bad:

***** Nagios *****

Notification Type: PROBLEM

Service: Selenium test which fail

Host: QlikView

Address: srvqliviewpoc.ops.adr.ch.glencore.com

State: CRITICAL

Date/Time: Thu Jan 23 18:57:57 WEST 2014

Additional Info:

(No output on stdout) stderr:

itefix
Offline
Last seen: 23 min 6 sec ago
Joined: 01.05.2008 - 21:33
Windows 7 machines has that

Windows 7 machines has that limit and you run other services on that as well. This is a plugin issue. Can you describe what do you run as plugin ?

itefix
Offline
Last seen: 23 min 6 sec ago
Joined: 01.05.2008 - 21:33
We have installed the latest

We have installed the latest Nagwin on a Windows 7 Home Premium computer and got same symptoms as you got with the message "(No output on stdout) stderr" randomly showing up. Some tweaks in the Nagios code didn't improve the situation either.

We took then fresh installations of Nagwin on Windows 7 64-bit Enterprise, Windows 2008 64-bit standard, Windows 2012 64-bit standard and Windows XP 32-bit. All of them except XP are working as expected after many hours of operation without the message above. We have observed some messages on XP.

It seems that the problem may be OS-related, and we have no idea why this happens. What we can recommend is to test Nagwin on an appropriate platform.

 

wuesten_fuchs
Offline
Last seen: 7 months 1 week ago
Joined: 20.02.2014 - 21:16
I have installed the latest

I have installed the latest Nagin on a Windows Server 2012 machine and I also see this problem exactly like the original poster described it.

I barely installed Nagin and have this problem already for just the out of the box included checks for localhost. I have not yet added any further host.

I would be glad for any hints how to solve this problem.

 

itefix
Offline
Last seen: 23 min 6 sec ago
Joined: 01.05.2008 - 21:33
Which Nagwin version do you

Which Nagwin version do you run ? Do you mean 2.1.0 by the latest ?

wuesten_fuchs
Offline
Last seen: 7 months 1 week ago
Joined: 20.02.2014 - 21:16
I see - no. I have 2.0.1 and

I see - no. I have 2.0.1 and I just saw you now offer 2.1.0. Does it make a difference for this issue? Can I simply upgrade by installing over the old setup? (I saw no upgrade installation hints on the web site.)

itefix
Offline
Last seen: 23 min 6 sec ago
Joined: 01.05.2008 - 21:33
Please follow steps below for

Please follow steps below for an upgrade:

  • Take backup of directories etc/nagwin, var/opt
  • Upgrade Nagwin
  • Restore etc/nagwin and var/opt

 

wuesten_fuchs
Offline
Last seen: 7 months 1 week ago
Joined: 20.02.2014 - 21:16
I did the upgrade - the

I did the upgrade - the problem is unchanged.

 

Additional problem: the Nagwin services now terminates with the error message "trial period expired".

Apparently you put the wrong installer into my download package. Please help.

 

itefix
Offline
Last seen: 23 min 6 sec ago
Joined: 01.05.2008 - 21:33
We have now released Nagwin

We have now released Nagwin 2.2.0 containing Nagios 4.0.6 with improvements about the issue. Please download it via your account page. We recommend to follow the procedure for a clean upgrade:

 

  • Take backup of etc/nagios, var/opt and plugins 
  • Uninstall Nagwin
  • Terminate all Nagwin-related processes which couldn't be terminated by the uninstaller.
  • Remove installation directory
  • Install Nagwin
  • Restore etc/nagios, var/opt and plugins back
  • Start services
wuesten_fuchs
Offline
Last seen: 7 months 1 week ago
Joined: 20.02.2014 - 21:16
The problem seems to be

The problem seems to be solved now. Thanks!