Today I had to explain to a customer how you need to target a rule or monitor to a specific computer group.
This is actually not a very intuitive process and if you are used to work with MOM2005 the process is different and can have big implications in the behavior of the rule / monitor you’ve created.
This is the only correct way if you want to target a rule or monitor to a select group of server.
Open your console and go to the tab Authoring and navigate to the Rules. Right Click > Create a new rule…
In the “Create Rule Wizard” select the desired rule. In this example I’m going to create an Event Based rule in the NT Event Log (Alert).
CAUTION: make sure to change the destination management pack to a custom management pack and NOT the default management pack.
Give the rule name and click the “Select” button just behind Rule Target:
Here you need to target a class of which you are certain all the servers you want to target are part of. In this case I choose “Windows Server” but if you are for example convinced they are all SQL server you can target the “SQL server” class.
If you have selected the appropriate class hit ok but not next on the page.
Make sure the “Rule is enabled” tick box is off!
Now choose the event log where to target your rule. In our case it’s the Application log
The filter. In this example I’m searching for an Event ID 150 created by the source “Eventcreate”
Next thing is to specify the information that will be generated by the alert:
Now click create.
So far the rule has been created but is disabled. The next thing we need to do is create our group which contains the specific set of servers which need to be targeted. In the Authoring pane choose “Groups” > Right click > choose “Create a new Group…”
Choose a name for the group and again CHANGE the default management pack as a target.
NOTE: Choose the same management pack where you want to create your override in later on. It’s not possible to reference another unsealed group from a unsealed group so either use the same group for both your override and group or seal the management pack where your group is created in.
The next option is to specify the explicit group members.
There are actually 2 approaches to populating the group (which can be combined).
The first one is that you specify the explicit members of the group. They will be always in the group included no matter what criteria you specify later on. The disadvantage you have is if you install a new server which need to be targeted you have to manually include it here.
The second approach to populate your group is Dynamic Inclusion rules. These rules have a set of conditions to add servers. These can be for example all servers which are SQL servers based on the class or all servers which name starts with “SERVER0”.
You can also specify servers to be included in this group which reside in another group.
Specifically deny Objects from being included in the group:
When you are confident you have included all the servers in the groups click create.
At this point go back to the Authoring pane > Rules > search for your new created rule.
In this example you can see our newly created rule in disabled state:
Right click the rule and choose Overrides > Override the Rule > For a Group…
Now choose the group we created earlier on:
In the override parameter locate the “Enabled” parameter and tick the box in the “Override” column. In the Override Value choose “True” , click Apply and OK.
At this point the rule we have created is targeted only to the servers you’ve added to the computer group and not enabled on all the other servers. This is in face a total different approach from the way of working in MOM2005.
This is because the computer groups (The class of objects that are computer groups) only exist on the RMS. If you target a rule directly to a computer group it will try to collect info from the RMS instead of the computers you have intended.
Recently I got a mail of a user stating he’s not receiving his reports anymore via mail. They were created way back and normally these reports are in my category “set it and forget it”…
When I checked the schedule reports pane instantly I noticed that all the reports are showing an error as shown below:
“The Subscription Contains parameter values that are not valid” error message is in the status field.
During my search on the web the most common solution was to recreate the report which I did for one but because these are like 20 reports it will be a lot of work to recreate them all and risk the fact that they break again without knowing when and why.
So the next step I tried in my troubleshooting is to see whether I could fill in the missing parameters in the report which resides in a custom management pack holding all these special reports.
When I tried to run the report I noticed the following: Data Aggregation and Histogram are greyed out and it’s impossible to change them
When I tried to run the report the following error message came up:
So there is an issue with the ‘Data Aggregation’ parameter. No possibility to troubleshoot any further in the SCOM environment so we’ll have to dig deeper and turn our attention to the underlying SQL Reporting Services (SRS) install.
Connect to the SRS server and open up the SQL management studio.
Note: If you’re not sure where your SRS install resides navigate to SCOM console > administration > Reporting. The Reporting Server URL is filled in there so you can retrieve the server name / alias here.
Make sure you select “Reporting Services” in the Server Type and select the server name you’ve retrieved from your console.
Navigate to Home > “Your management pack” > reports > Subscriptions.
In this example we’re troubleshooting the “PROD3_IOReport”.
Right click and choose view report.
The web browser opens and will generate the report. However in this case the following error shows up:
Didn’t we have an issue with the “DataAggregation”? The error above shows we have an issue with our “ManagementGroupId”.
Let’s take a look at the report properties to find out.
Right click the report and choose Properties.
The familiar SQL properties page pops up.
Behind the “ManagementGroupID” (in the above print screen the sixth item) it’s indicated that there are multiple… We only have one management group so why should there be multiple?
If you open the value you get a drop down box with the 2 id’s listed
So which one is the correct one…
I opened a newly created report in the same management pack (which I recreated to solve the issue with the first report) and there there’s only one ID listed:
This report is working with all the parameters so this ID is the correct ID for our management group.
Next step is deleting the ”wrong ID” in my report parameters and click ok:
Now we go back to our SCOM console and check the report once more.
Open the report and now it’s possible to check the Data Aggregation and Histogram again.
After clicking “run” the report is generated successfully.
So all we need to do is change the parameters in our scheduled report.
Navigate back to the scheduled reports list, right click the report and choose edit.
Check the parameters and fill in the correct Data Aggregation / Histogram settings (and check the other settings as well while you’re at it).
Click finish and check back at the scheduled report view.
The report has gone from error to “ready” and is able to process when the scheduled time is there…
In this particular case it apparently was an issue when there were agents temporarily multi homed to a test environment and this test environment was deleted afterwards.
Although this was a mistake on our side I posted this blog post to illustrate that the error message in SCOM was not the cause of the real problem which was hidden in the SRS installation. This threw me off when troubleshooting the issue because I was focusing on the wrong error and has cost me a lot of valuable troubleshooting time.
I’ve posted my experience to save you some time in troubleshooting the issue
One of the most common frustration I face (and I’m sure I’m not alone) is the fact that from time to time there are things saved in the default management pack.
It’s so easy to forget to change the destination management pack while creating rules / monitors and just click next. We all know once you’ve created the rule it’s not possible to change the management pack anymore…
It’s best practice not to write anything to your default management pack but it’s always selected as default…
Yet you have 2 options:
To avoid this common mistake / lack of attention I make a habit of renaming my default management pack display name to something eye catching so I see it before clicking next while creating a rule / monitor.
Open the SCOM console and navigate to Administration > Management packs and right click your Default Management Pack
Choose Properties in the menu:
Change the Name of your Default Management Pack. In my case I always put in capital “DO NOT WRITE TO” before the name.
And click apply.
This changes in fact the display name of your management pack but not the management pack ID. It’s not possible to change the ID (it’s greyed out) so your management pack will still hold all the dependencies…
At this point the default management pack is still the default when creating a rule but there’s a nice message in capital just above the next button.
This small modification saved me already a lot of (additional) headache to remind me to change to a different management pack when creating a rule / monitor…
Just recently I needed to connect SCOM to an in-house written Problem Management system. There was not a connector available and it was also not possible to generate tickets directly in the system itself.
The only way the system is receiving input is via mail through specific keywords.
Most of the keywords could be mapped to standard alert fields like severity, time raised, description….
BUT (off course there’s a but) the system also requested other fields to decide proper escalation, severity,…
The previously used system onsite is MOM2005. In MOM2005 there’s an option to define globally these fields so you can use them in whatever alert you want to raise.
This is in MOM2005:
Open Operator Console > Administration > Global Settings > Custom Alert Fields
Up to 5 Custom Fields can be named here:
These fields can than be addressed in the Rules you create and can be populated with the text you define:
When you select Custom Fields the labels you specified in the Custom Alert Fields settings are shown here and you can pass info to the fields via the rule into the alert:
While setting up the same connection from SCOM to the Problem management system I found out it’s not possible to define the custom fields globally in SCOM like we did before in MOM.
BUT there are 10 (!) Customalertfields available in the SCOM dbase to use as u please.
So How can we use these fields and fill them in with the proper parameters to give the correct data to the Problem Management system to do it’s magic?
Well through an Alert Generating Rule which is event based:
Open Scom Console and navigate to Authoring > rules > right click in the right pane and choose new Rule…
Note: Always change your destination management pack to something OTHER than your Default Management Pack.
In this example I’ll be creating a new rule for event ID 145 in the application log to show where exactly the custom fields are.
Fill in:
Fill in the Event log name where you want to look for the event. In our case this is the Application log.
We want to look for an event ID which is equal to 145 in this case.
And in this screen there’s the option to add Custom alert field parameters.
Unfortunately it’s not possible to label them as you could in MOM so you need to keep a description at hand what you want to fill in in what number of custom field.
The custom fields are written in the SCOM dbase and can be used in PowerShell or other scripting language.
The parameter in PowerShell is $_.CustomField1 through $.Customfield10.
By passing the info here I was able to deliver the correct data to the Problem Management system to do it’s magic in the background…
Just recently I had an issue at a client where there was a need to rethink the notification possibilities due to various issues. I’ve developed a powershell script to gain more control over the notification process.
My Client is using an in-house developed and maintained problem management system installed on a mainframe platform.
The alerts which need escalation are detected in SCOM and then sent by mail to a Lotus Notes system. The data is then read through a connector between the mainframe system and the Lotus Notes dbase. The mail is scrubbed and through a series of scripts on the mainframe the key fields of the mail are detected and filled in in the ticket….
So far so good… BUT because of the use of different systems there was an issue with encoding. The mails were sent in UTF8 encoding and correctly decoded when viewing in the Lotus Notes Client but stayed encoded in the Lotus Notes Dbase and therefore the scrubbed text was all scrambled and unusable for the problem management system.
After various attempts to mail in different encoding formats I decided to rethink the notification and detach it from the SCOM system to get more freedom in testing.
The following Powershell script together with a custom notification channel did the trick:
It’s constructed in 3 sections: preparation + composing file, mailing and error handling for reporting reasons.
You can Download the script here.
First of all we are preparing everything to execute the script.
The areas in yellow need to be customized for your environment.
Variables which need customization:
$rootMS: Is used to read the RMS name (if the RMS is a single server you can use the first method, mine is on a cluster so I filled in the name to avoid issues with the RPC server when reading the name through WMI.
$NotifiedResState: Just pick any number which is not already in use. We’ll have to make the resolution state in SCOM afterwards.
$CultureInfo: Make sure you fill in the correct locale info to get the date / time format correct. For a list of all culture info check here: Table
In this part of the script we are reading in all the different desired elements of the Alert and write them in a TXT file. You could however leave the txt file option out and just write this to a string but I prefer to keep the txt files for backup to check whether a ticket was raised at any given time.
Variables which need customization:
$strResolutionState: Because the resolution state is a number in the dbase and not the word itself we need to translate the number to the correct word. This way we’ll get the resolution state name in our mail instead of the number. You need to fill in the resolution state number you’ve chosen earlier + the text you’ve associated with it in SCOM. Check below on how to implement this setting in SCOM.
$strobjectname: Because not all the desired info was in the alerting I had to use 3 custom fields to get the mails to contain compliant info for the custom made problem management system. CustomField2 is reading out the NetBIOS name. Because I don’t need the full name (servername.domain.locale) but just the server name I’m splitting the name and using just the first part in the variable $Objectname
$FilePath: The file path is constructed out of 2 parameters from the alert to create a unique name and avoid overwriting an existing txt file. You need to use the time raised of the event because if you use the Get-Date function to get the current date and time it will generate 2 files if the time changes during the process.
Off course you can adapt the different fields + structure at your liking but for our problem management system this format had to be strictly followed to be able to scrub the mail.
Note: CustomField1 and CustomField3 are static text passed by the alert generated rule.
In the last part of the script you need to send out the mail to your destination.
I’m using static parameters here because the destination will not change that often. However If you have multiple destinations it’s best to use a variable and pass it when you are running the notification command from SCOM.
Variable which needs customization:
$Sender: Fill in the From email address
$OKRecipient: This will be the email address where you want to send the mail to when everything went fine
$strOKSubject: Define the subject for the mail when everything was fine.
$ErrRecipient: This will be the email address where you want to send the mail with the error.
$strErrSubject: Define the subject for the error mail
$strErrBody: Small body to notify something went wrong along the way.
Note: due to my issues in my customers environment with encoding I’ve used a command line mail utility which I’ve used quite often and is platform independent: blat. It’s a lightweight mail utility which can be downloaded here: Blat Download
More info on Blat can be found here: Blat Info
The install + config info for blat on the RMS is at the end of the blog post.
Last but not least I’m writing an event in the event log for successful and unsuccessful script runs. This can be used to set up alerting in SCOM to give you a quick warning when the ticketing is not working anymore.
At the end we unload the snap-in to have a clean system and avoid error messages when running the script the next time:
In order to use this script some things need to be configured in your SCOM environment + on your RMS:
Install Blat:
Any tips or hints on improving this script are always welcome…
Sometimes it’s necessary to launch a custom script or other action after an alert is detected. This can be all executable scripts or programs.
In my particular case I’m using this to launch scripts when an alert is detected to properly escalate the alert and perform additional tasks on the alert.
So how do you make sure that the script you intend to run will actually run when a predefined alert is raised?
By creating a Command notification channel and subscription…
Let’s start with setting up the command notification channel.
Note: I’m using my script Create_Ticket.Ps1 as documented here. The parameters I’m passing are useful for this script but you can pass many more parameters according to your needs.
First of all open the Notification Channels by opening the SCOM console > administration > Notifications > Channels
Right click in the Right pane > choose New > Command…
In the settings tab you need to fill in what you prefer to run:
Click Finish.
At this point your Command Notification Channel is set up. The next thing you need to configure is the trigger which will run this Command Notification Channel. This is done by creating a Subscriber:
Open the Scom console and navigate to Administration > notifications > Subscribers
Right click in the right pane and choose New…
Fill in a name for the Subscriber
Leave the “always send notifications” or specify a time window (ex. during business hours only) and click next.
Click Add to ad a subscriber address to the list. The following window appears:
Fill in the address name and click next
Leave the always send notifications setting or change according to your needs.
Click Finish and you have configured your Command to run whenever you subscribe to an alert with this channel.
Sometimes it’s useful to make your own Custom Alert Resolution States to further classify your alerts in the console and use these states to trigger different actions using various scripts.
I’ll be posting some scripts which are going to use this custom alert resolution state so therefore I’m documenting here how to configure them.
Open your SCOM console, select the administration tab, settings and alerts.
Click new…
Type in the Resolution State display name and choose a uniqueID. Click OK.
And we are done.
Not much to it but it makes live a little easier when you want to classify different alerts.
In the next series of blogs I’ll be frequently using this Custom Alert Resolution States to classify and report on different types of alerts.
Recently I got a question of a customer to move the Opsdb Datawarehouse (DW) to another drive because the disk on which it was originally installed was not big enough. In fact they wanted to move the DW to an iscsi disk to boost performance.
To verify whether there would be an issue or it would be a straight forward move I did some browsing on the biggest manual out there… The internet!
However all that came up were actually moves from one server to another but not from one drive to another on the same server…
I did some testing in my lab and thought I ‘d share the outcome with you.
First of all this is your DW you are tempering about. Make sure you have proper backups of your db and read the entire blog before proceeding. Just to be on the safe side. It would be a shame that you lost all your data older than 8 days (if this is your grooming setting) because of a bad manipulation.
Ok enough said. Let’s get things started.
These are the steps I followed and in my case everything went smoothly without any problems.
First of all (again) take backups of your dbase and secondly plan a SCOM down time. To be absolutely sure that there’s no interference or blocking of the DW dbase you need to shutdown your RMS, any MS and GW servers in your environment (or at least in the management group of which the DW is part of). Some sources just drop the connections to the dbase which is an option as well but I prefer the first option. In my opinion it’s safer to do it like this.
Connect to the SQL server where your DW and open up the Microsoft SQL Server Management Studio:
Open up the connection to your DW. In my case it is residing on my VSERVER05.
Again better safe than sorry. Backing up!
The DW can be very big so it could be that it needs some time to perform the backup. When it’s finished.
At this point shutdown your environment. This means RMS, MS and GW’s. This sounds like a draconic measure but it ensures that your environment is completely shutdown and no queries are made to the dbase.
When this is done we can proceed to move the dbase
Take the DW offline by right clicking it and choosing “Take Offline”
A small dialog will popup and eventually of all goes well it will tell you the dbase is offline successfully. Notice the red arrow on the DW dbase.
Now take the ReportServer$OpSDBDW and ReportServer$OPSDBDWTempDB offline as well. Note that these dbases can have a different name in your environment or could not be present.
Note: My OpsdbDW is installed in a separate SQL instance. Be cautious with restarting your SQL service as this impacts all dbases under this instance.
When all the dbases are down they can be detached. This is done by right clicking the dbase > tasks > “detach”.
Choose the option to drop the connections to the dbase and hit OK.
Now we can copy (yes copy) the data. Again better safe than sorry and make a copy of the data rather than moving it.
After the copy has been done we are going to attach the copied DW to the SQL
Right click Databases and click Attach:
Select your dbase and attach:
In this case I’m moving my DW from E: to F: drive.
NOTE: It’s not automatically selecting the correct log file. Make sure you select it manually by clicking on the icon behind the path in the lower section.
When the attach is completed successfully you will dbases are moved to your new drive.
Start your SCOM environment again by starting your RMS first and then your MS and or GW servers you might have.
Just to be on the safe side verify whether you’re able to generate a report in the reporting view of your console with data older than 7 days (when your grooming settings are different you need to modify this to make sure you have a report with data older than your grooming setting.
If all goes well you now have successfully moved your dbase to another drive and you are free to delete the initial copy on your old location.