Natural-Adabas

Tuesday, June 20, 2006

Design of Adabas

2
13
ADABAS DESIGN
Database systems often involve complex data structures and data handling procedures that can
be designed and used only by persons with extensive knowledge and experience. ADABAS has
a remarkably simple structure by comparison, yet it provides significant advantages for
operational efficiency, ease of design, definition, and database evolution.
ADABAS Entities
In ADABAS, a “field” is the smallest logical unit of information (e.g., current salary) that may
be defined and referenced by the user. A “record” is a collection of related fields that make up
a complete unit of information (e.g., all the payroll data for a single employee). A “file” is a
group of related records that have the same format (with some exceptions; see page 23). A
“database” is a group of related files.
ADABAS Limits
The table below shows the maximum number that mainframe ADABAS supports for each
entity:
Entity Maximum
Databases 65,535
Blocks per database 2,147,483,646 using 4-byte RABNs
Files per database the lower of 5,000 or the Associator block size minus one
Records per file 4,294,967,294 using 4-byte ISNs
Fields per record 926
Uncompressed record length depends on the operating system
Compressed record length Data Storage block size
ADABAS Concepts and Facilities 2
14
ADABAS Space Management
The disk storage space allocated to a single ADABAS database is segmented into “logical”
ADABAS files. A certain part of the overall space within the database is allocated to each logical
file. When the space is filled with records from the file, ADABAS automatically allocates more
space to the file from the common free space pool. This dynamic space allocation, together with
the dynamic recovery of released space, allows ADABAS databases to run without intervention
for long periods of time.
The distribution of database space across disk drives can be controlled by “physically”
segmenting it into multiple independent datasets. When all physical database space is filled,
more datasets can be allocated dynamically, or the size of existing datasets can be increased so
that new physical files can be loaded without reorganizing the entire database.
Database Components
To support the separation of data and access structures, the ADABAS nucleus uses three
database components:
􀀀 Data Storage for compressed data
􀀀 Associator for data management and retrieval
􀀀 Work, a scratch area for complex search criteria, etc.
Data Storage
Data Storage is divided into “blocks”, each identified by a 3- or 4-byte relative ADABAS block
number or “RABN” that identifies the block’s physical location relative to the beginning of the
component. Data Storage blocks contain one or more physical records and a padding area to
absorb the expansion of records in the block.
A logical identifier stored in the first four bytes of each physical record is the only control
information stored in the data block. This internal sequence number or “ISN” uniquely identifies
each record and never changes. When a record is added, it is assigned an ISN equal to the highest
existing ISN plus one. When a record is deleted, its ISN is reused only if you instruct ADABAS
to do so. Reusing ISNs reduces system overhead during some searches and is recommended for
files with records that are frequently added and deleted.
ADABAS Design 2
15
For each file, between 0–99 percent (default 10%) of each block can be allocated as padding
based on the amount and type of updating expected. This reserved space permits records to
expand without migrating to another block and thus helps to minimize system overhead.
0071 0595 0221
ÉÉÉÉ
ÉÉÉÉ
ÉÉÉÉ
ÉÉÉÉ
ÉÉÉÉ
ÉÉÉÉ
0222
0300
0991 2021
0401 0532
RABN
1
2
3
Records (identified by ISNs) Padding Area
Figure 2–1: ADABAS Data Storage Blocks
Free Space and Space Reusage
If records become too large for their blocks, they migrate to new locations. When a record
migrates or is deleted, free space is opened in the data block between the last record and the
padding area. Figure 2–2 shows free space created when the record with ISN 0401 becomes too
large for the block and migrates to another block:
0300 0401 0533
ÉÉÉÉ
ÉÉÉÉ
Padding Area
0300 0533
ÉÉÉÉÉÉ
ÉÉÉÉÉÉ
ÉÉÉÉÉÉ
Free Space Padding Area ÉÉÉÉ
ÉÉÉÉ
ÉÉÉÉ
Figure 2–2: Free Data Storage Space Created by Record Migration
You can instruct ADABAS to reuse free space. Reusing space saves computer time, since
ADABAS then reads fewer physical blocks during searches. It is recommended for all files.
ADABAS Concepts and Facilities 2
16
Compression
Data compression significantly reduces the amount of storage required. It also permits the
transmission of more information per physical transfer, resulting in greater I/O efficiency.
ADABAS retains data records in compressed form. It defines and executes compression at the
field level. Three compression options are supported: default compression, null suppression,
and fixed format. The last two options are added as field options and are discussed on page 29.
“Default compression” deletes trailing blanks in alphanumeric fields and leading zeros in binary
fields. An inclusive length byte (ILB) at the beginning of the field indicates the total number
of stored bytes, including the ILB. Thus, if “Susan” is entered in a “first-name” field defined
with a 20-character length and default compression, its stored size will be six bytes: five bytes
for the letters of the name, plus one byte for the ILB. In addition, empty fields in a record are
not stored; an empty field is replaced by a one-byte empty field counter (EFC). ADABAS can
store up to 63 contiguous empty fields in a single hexadecimal byte.
Many ADABAS files require only 50% to 60% of the space used for the raw data. Even with
the addition of approximately 25% for the access structures stored in the Associator, ADABAS
storage requirements are still less than those required for traditional file storage or for DBMSs
that do not use data compression.
JONES 222 MA IN ST 00008000
J 2 ONES 22 MAIN ST 0–0
15 22 8
6 12 4
bytes bytes
bytes bytes bytes
bytes
Data Before Compression (45 bytes)
ADABASCompressed Data (22 bytes)
l l l
C –
0 0–8
Figure 2–3: An Example of ADABAS Default Compression
ADABAS Design 2
17
Associator
The Associator is an organizational unit used for storing the structures required to access data
in Data Storage. It contains
􀀀 a control block for the database as a whole and control blocks for each file;
􀀀 all tables needed to control and maintain the database including a Field Definition Table or
“FDT” (see page 25) for each file and coupling lists for physically coupled files (see page 21);
􀀀 an inverted list for each descriptor in each file of the database and an Address Converter for each
file.
Inverted Lists
An inverted list, which is used to resolve ADABAS search commands and read records in logical
sequence, is built and maintained for each field in an ADABAS file that is designated as a key
field or descriptor” (see page 28). It is called an “inverted” list because it is organized by
descriptor value rather than by ISN. The list comprises the Normal Index (NI) and as many as
14 Upper Indexes (UI).
The Normal Index (NI) of the inverted list for a particular descriptor has an entry for each value.
The entry contains the value itself, the number of records in which the value occurs, and the ISNs
of those records.
To increase search efficiency, Upper Index (UI) levels are automatically created by ADABAS
as required, each level to manage the next lower level index. The first level UI, like the NI it
manages, contains entries for only one descriptor in each index block. All other UI levels contain
entries for all descriptors in each index block. UIs require a minimal amount of space: two
blocks is the minimum.
Note:
The ADABAS Direct Access Method (ADAM) facility permits the retrieval of records directly
from Data Storage without accessing the inverted lists. The Data Storage block number in
which a record is located is calculated using a randomizing algorithm based on the ADAM key
of the record. The use of ADAM is completely transparent to application programs and query
and report writer facilities. See page 42 for more information.
ADABAS Concepts and Facilities 2
18
Figure 2–4 shows a typical Normal Index for the descriptor “city” in a customer file.
Value Count ISNs
London 27 3 . . .
New York
Zurich 31 2 6 23 76 . . .
61 96 . . .
. . .
Figure 2–4: A Normal Index
The example indicates that there are 31 records with the “city” Zurich (the ISNs of these records
are 2,6,23,76...).
Address Converter
The Address Converter determines the physical location of a record. It is an index that maps the
logical identifier of a record (that is, the ISN) to the relative ADABAS block number (RABN)
of the Data Storage block where the record is stored.
The Address Converter contains a list of RABNs in ISN order. Only the RABNs are actually
stored in the Address Converter; the ISNs are identified by their relative position.
ADABAS Design 2
19
Figure 2–5 shows the relationship between an inverted list, the Address Converter, and Data
Storage. For example, to determine the physical location of the record whose ISN is 6, ADABAS
uses the ISN as an index into the Address Converter. The sixth entry in the Address Converter
is 2. Therefore, ISN 6 is located in physical block 2 in Data Storage for this file.
123
45
678
Inverted List for CITY
London 27 3 ...
New York
Zurich 31 2 6 23 76 ...
61 96 ...
. . .
1
2
3
6
96
14
12758 ABC–COMP London
12600 JEL–COMP Zurich
14811 XYZ–COMP London
11643 R–COMP
14542 S–COMP New York
10002 B–COMP Munich
...
...
Block 1
Block 2
Block 3
Converter
Address
Data Storage
111
2
12
2
12
2...
Zurich
Associator
Figure 2–5: ADABAS Access Technique
When a record moves or is deleted, ADABAS updates the Address Converter automatically and
transparently.
Since the ISN for a record never changes, and its physical block address is stored only in the
Address Converter entry, the record itself may be moved in Data Storage with only one update
to the Address Converter required and with no extension to the access path of the record.
Even if a record has many descriptors defined, the inverted list for each descriptor need not be
modified because it contains ISNs.
This process explains how ADABAS is able to perform simple and complex searches quickly
and efficiently without storing pointer information in Data Storage.
ADABAS Concepts and Facilities 2
20
Work
The Work area stores information in three parts:
􀀀 Part 1 stores data protection information required by the routines for Autorestart and
Autobackout. See page 47 for more information.
􀀀 Part 2 stores intermediate results (ISN lists) of search commands.
􀀀 Part 3 stores final results (ISN lists) of search commands.
Other Components
Sort and Temp Areas
Certain ADABAS utilities (ADAINV, ADALOD) require two additional datasets, Sort and
Temp, for sorting and intermediate storage of data. Certain functions of the ADAORD utility
require the Temp dataset for intermediate storage.
The sizes of Temp and Sort vary according to the utility function to be executed. These datasets
can be allocated during the job and then released, or permanent datasets can be allocated and
reused.
Logs
ADABAS uses the following optional logs:
􀀀 The “command log” (CLOG) records information from the control block of each ADABAS
command that is issued. The CLOG provides an audit trail and can be used for debugging and
for monitoring the use of resources. Single or dual datasets can be used (dual datasets are
recommended).
􀀀 The “protection log” (PLOG) records before- and after-images of records and other elements
when changes are made to the database. It is used to recover the database (up to the last
completed transaction or “ET”) after restart. Single or dual datasets can be used (dual datasets
are recommended).
􀀀 The “recovery log” (RLOG) records additional information that the ADABAS Recovery Aid
uses to construct a recovery job stream. Dual datasets are required for recovery logging. See the
ADARAI utility discussion on page 53 for more information.
ADABAS Design 2
21
Database Files
Each database contains system files and data files. A data file is generally created for each record
structure required; that is, for each set of related fields identified.
Files are loaded into the database using the ADALOD utility. A file number must be unique in
the database and not greater than the maximum file number defined for the database in the
MAXFILES parameter. For a Checkpoint, Security, system file, or physically coupled file, the
number cannot be greater than 255; other files including a trigger file can have two-byte file
numbers. File numbers are assigned by the user in any sequence.
System Files
The ADALOD utility’s file type parameter identifies the file as one of the following ADABAS
system files:
CHECKPOINT ADABAS Checkpoint file
SECURITY ADABAS Security file
SYSFILE ADABAS system file (a file that cannot be deleted)
TRIGGER ADABAS trigger file
Coupled Files
File coupling allows you to select, using a single search command, records from one file that
are related (coupled) to records containing specified values in a second file.
Physical Coupling
Any two files with file numbers 255 or lower may be physically coupled if a common
“descriptor” (see page 28) with identical format and length definitions is present in both files.
A single file may be coupled with up to 18 other files, but only one coupling relationship may
exist between any two files at any one time. A file may not be coupled to itself.
When files are coupled, coupling lists are created in the Associator for each file being coupled.
File coupling is bidirectional rather than hierarchical in that two coupling lists are created for
each coupling relationship with each list containing the ISNs that are coupled to the other file.
ADABAS Concepts and Facilities 2
22
Once the physical coupling lists have been created, any key field in either file may be used
within a search criteria.
Physical coupling may add a considerable amount of overhead if the files involved are
frequently updated. The coupling lists must be updated if a record in either of the files is added
or deleted, or if the descriptor used as the basis for the coupling is updated in either file.
Physical coupling may be useful for information retrieval systems in which
􀀀 files seldom change;
􀀀 the additional overhead of the coupling lists is insignificant compared with the increased ease
of formulating queries; or
􀀀 files are small and primarily query-oriented.
Logical or “Soft” Coupling
Multiple files may also be queried by specifying the field to be used for interfile linkage in the
search criteria. ADABAS then performs all necessary search, read, and internal list matching
operations.
This technique is called logical or “soft” coupling because it does not require the files to be
physically coupled. Although logical coupling requires read commands, it is normally more
efficient because it avoids the increased overhead of coupling lists.
Structuring Files to Enhance Performance
An ADABAS database with one file for each record type supports any application functions
required of it and is the easiest to manipulate for interactive queries, but it may not yield the best
performance:
􀀀 As the number of ADABAS files increases, the number of ADABAS calls increases. Each
ADABAS call requires interpretation, validation and, in multiuser mode, supervisor call (SVC)
and queuing overhead.
􀀀 In addition to the input/output (I/O) operations necessary for accessing at least one index,
Address Converter, and Data Storage block from each file, the “one file per record type”
structure requires buffer pool space. If sufficient buffer space is not available, blocks are
overwritten that may be needed for a later request.
ADABAS Design 2
23
The number of ADABAS files used by critical programs can be reduced by
􀀀 using multiple-value fields and periodic groups (see page 26);
􀀀 linking physical files into a single logical (expanded) file;
􀀀 including more than one type of record in an ADABAS file;
􀀀 including records for more than one category of user in an ADABAS (multiclient) file; and
􀀀 controlling data duplication and the resulting high resource usage.
Expanded Files
If you have a large number of records of a single type, you may need to spread the records over
multiple physical files.
To reduce the number of files accessed, ADABAS allows you to link multiple physical files
containing records of the same format together as a single logical file. This file structure is called
an “expanded file” and the physical files comprising it are the “component files”. An expanded
file can comprise up to 128 component files, each with a unique range of logical ISNs. An
expanded file cannot exceed 4,294,967,294 records.
Note:
Since ADABAS version 6 supports larger file sizes and a greater number of ADABAS physical
files and databases, the need for expanded files has, in most cases, been removed.
Although an application program addresses the logical file (the address of the file is the number
of the expanded file’s base component or “anchor” file), ADABAS selects the correct
component file based on the data in a field defined as the “criterion” field. The data in this field
has characteristics unique to records in only one component file. When an application updates
the expanded file, ADABAS looks at the data in the criterion field in the record to be written
to determine which component file to update. When reading expanded file data, ADABAS uses
the logical ISN as the key to finding the correct component file.
Multiple Record Types in One File
Multiple record types can be defined within a single physical record; each record type is a logical
record composed of a subset of the fields defined for the file. Fields that do not belong to a given
type are null-suppressed.
ADABAS Concepts and Facilities 2
24
Record types can be identified to ADABAS by
􀀀 defining a record type field with values to differentiate one type from another; or
􀀀 using values of an existing field to differentiate type; for example, to differentiate two types,
a value of zero for a field common to both types might identify one type and any non-zero value
for the same field might identify the other type.
Multiclient Files
Records for multiple users or groups of users can be stored in a single ADABAS physical file
defined as “multiclient”. The multiclient feature divides the physical file into multiple logical
files by attaching an internal owner ID to each record.
The owner ID is assigned to a user ID. A user ID can have only one owner ID, but an owner ID
can belong to more than one user. Each user can access only the subset of records that is
associated with the user’s owner ID.
Note:
For any installed external security package such as RACF or CA-TOP SECRET, a user is still
identified by either NATURAL ETID or LOGON ID.
All database requests to multiclient files are handled by the ADABAS nucleus.
Controlled Data Redundancy
“Physical” redundancy increases storage requirements but may also enhance performance and
decrease complexity. For example, if a database stores customer and order information in a
Customer-Orders file and product descriptions in an Inventory file, and a program that generates
invoices requires product descriptions in addition to customer-order data, it might enhance
performance to store a duplicate copy of the product descriptions in the Customer-Orders file.
“Logical” redundancy also increases storage demands while decreasing complexity. It involves
storing in one file the results of a process on data in another file; thus, the duplicate data is
implied by the content of another file, rather than being physically stored in two places.
Physical and logical redundancy cause update programs to run more slowly. The duplicate
updates required when changes in one file affect records in another file may degrade
performance severely. Redundancy should be used only for static data or data that is updated
rarely. You can control data redundancy by using multiple-value fields, periodic groups, and
multiple record types within a file.
ADABAS Design 2
25
Records and Field Definitions
In ADABAS, the record structure and the content of each field in a physical file are described
in a Field Definition Table or “FDT”, which is stored in the Associator. There is one FDT for
each database file. The FDT is used by ADABAS during the execution of ADABAS commands
to determine the logical structure and characteristics of any given field (or group) in the file.
The FDT lists the fields of the file in physical record order; provides a “quick index” to the file’s
records; and defines the file’s fields, sub/superfields, and sub-/super-/hyper- and phonetic
descriptors. A minimum of one and a maximum of 926 field definitions may be specified.
Information about each field includes the level, name, length, format, options, and special field
and descriptor attributes.
FIELD DESCRIPTION TABLE
I I I I I
LEVEL I NAME I LENGTH I FORMAT I OPTIONS I PARENT OF
I I I I I
––––––I––––––I––––––––I––––––––I––––––––––––––I––––––––––––––––––––––––––––I
I I I I I I
1 I AA I 8 I A I DE,UQ I I
1 I AB I I I I I
2 I AC I 20 I A I NU I I
2 I AE I 20 I A I DE I SUPERDE,PHONDE I
2 I AD I 20 I A I NU I I
1 I AF I 1 I A I FI I I
1 I AG I 1 I A I FI I I
1 I AH I 6 I U I DE I I
1 I A2 I I I I I
1 I AO I 6 I A I DE I SUBDE,SUPERDE I
1 I AQ I I I PE I I
2 I AR I 3 I A I NU I SUPERDE I
2 I AS I 5 I P I NU I SUPERDE I
1 I A3 I I I I I
2 I AU I 2 I U I I SUPERDE I
2 I AV I 2 I U I NU I SUPERDE I
Figure 2–6: Field Definition Table
ADABAS Concepts and Facilities 2
26
Record Structure
The order of the fields listed in the FDT determines the structure of the record and the efficiency
of retrieval. The following factors should be considered when ordering fields:
􀀀 Fields that will be accessed frequently should be ordered first in the FDT. This technique reduces
CPU time because ADABAS does not have to read the whole record when retrieving a field.
􀀀 Fields that will frequently be accessed together should be assigned to a “group” field.
􀀀 Fields that will always be accessed together should be defined as a single field. This technique
may inhibit compression and query language use; however, it decreases processing time by
providing more efficient internal processing and shorter Format Buffers.
􀀀 If appropriate, fields that will frequently be empty should be ordered together in the FDT and
set to use default compression or null suppression.
􀀀 Numeric fields should be loaded in the format in which they will be used most often.
Field Levels
When two or more consecutive fields in the FDT are frequently accessed together, you can
reference them together by defining a group field. Other than its level and ADABAS short name,
a group field has no attributes defined. It immediately precedes its member fields in the FDT.
A higher field “level” number is used to assign the member fields to the group field. ADABAS
supports up to seven field levels. User programs can access each member field individually, or
all member fields together by referencing the group field.
For example, in Figure 2–6 on page 25, field AB is defined as a group field and assigned to level
1. Fields AC, AE, and AD are assigned to level 2, indicating that they belong to group field AB.
The next field, AF, is assigned to level 1, indicating that it is not part of the AB group. User
programs can access AC, AE, and AD individually, or together by referencing the group field
AB.
A group field can be assigned as a “periodic” group field if it is comprised of fields that can have
more than one value (for example, group field AQ in Figure 2–6); see page 30.
ADABAS Design 2
27
Field Names
A field is identified to ADABAS by a two-character ADABAS “short” name that must begin
with an alphabetic character and can be followed by a numeral or letter (the combinations
E0–E9 are reserved and special characters are not allowed) and must be unique within a file.
ADABAS assigns short names to fields automatically, although you can choose to assign them
yourself. ADABAS uses the short names internally and actually accesses fields by their short
names.
Field Length and Data Format
Field values are fixed or variable in length and can be in alphanumeric, binary, fixed-point,
floating-point, or packed/unpacked decimal formats.
The length (expressed in bytes) and format (expressed as a one-character code) of a field define
the standards (defaults) to be used by ADABAS during command processing. They are used
when the field is read/updated unless the user specifies an override.
If standard length is zero for a field, the field is assumed to be a variable-length field. Standard
format must be specified for a field. The format specified determines the type of default
compression to be performed on the field.
The maximum field lengths that may be specified depend on the “format” value:
Format Format Description Maximum Length
A Alphanumeric (left-justified) : see also the long
alphanumeric (LA) option on page 29
253 bytes
B Binary (right-justified, unsigned/positive) 126 bytes
F Fixed Point (right-justified, signed, positive value in
normal form; negative value in two’s complement form)
4 bytes (always
exactly 2 or 4 bytes)
G Floating Point (normalized form, signed) 8 bytes (always
exactly 4 or 8 bytes)
P Packed Decimal (right-justified, signed) 15 bytes
U Unpacked Decimal (right-justified, signed) 29 bytes
ADABAS Concepts and Facilities 2
28
Field Options
Options are specified using two-character codes, which may be specified in any order, separated
by a comma.
Code Option
DE Field is to be a descriptor (key).
UQ Field is to be a unique descriptor; that is, for each record in the file, the descriptor
must have a different value.
FI Field is to have a fixed storage length; values are stored without an internal length
byte, are not compressed, and cannot be longer than the defined field length.
LA An alphanumeric, variable-length field may contain a value up to 16,381 bytes
long.
NU Null values occurring in the field are to be suppressed.
MU Field may contain up to 191 values in a single record.
PE Group field with member fields that have multiple (up to 191) occurrences in a
given record.
NC Field may contain a null value that satisfies the SQL interpretation of a field
having no value; that is, the field’s value is not defined (not counted).
NN Field defined with NC option must always have a value defined; it cannot contain
an SQL null (not null).
Descriptor Options DE and UQ
A “descriptor” is a search key. The DE option indicates that the field is to be a descriptor. The
UQ option can only be specified if DE is also specified; it indicates that the DE field is to have
a different (i.e., unique) value for each record in the file. Entries are made in the Associator’s
inverted list for DE fields, adding disk space and processing overhead requirements.
Any field can be used within a selection criterion. When a field that is used extensively as a
search criterion is defined as a descriptor (key), the selection process is considerably faster since
ADABAS is able to access the descriptor’s values directly from the inverted list without reading
any records from Data Storage.
A descriptor field can be used as a sort key in a search command, as a way of controlling a logical
sequential read process (ascending or descending values), or as the basis for file coupling.
ADABAS Design 2
29
Any field and any number of fields in a file can be defined as descriptors. When a multiple-value
field or a field in a periodic group is defined as a descriptor, multiple key values are generated
for the record. Key searches may be limited to particular occurrences of a periodic group.
Because the inverted list requires disk space and update overhead, the descriptor option should
be used judiciously, particularly if the file is large and the field that is being considered as a
descriptor is updated frequently. For instance, the inverted list for a periodic group used as a
descriptor may be very large because each occurrence is stored.
A descriptor may be defined at the time a file is created, or later by using an ADABAS utility.
Because the definition of a descriptor is independent of and has no effect on the record structure,
descriptors may be created or deleted at any time without the need for database restructuring
or reorganization.
A portion of a field may be defined as a “subdescriptor”; combinations of fields or portions
thereof may be defined as a “superdescriptor”; a user-supplied algorithm may be the basis of
a “hyperdescriptor”; and a “sounds-like” encoding algorithm may be the basis of a “phonetic
descriptor”, which may be customized for specific language requirements. See page 31 for more
information.
Data Compression Options FI and NU
Default data compression is described on page 16. At the field level, additional compession can
be specified (null suppression option) or all compression can be disabled (fixed storage option).
“Null suppression” (NU) differs from default compression in that searches on descriptor fields
defined with null suppression do not return records in which the descriptor field is empty.
Fields defined as “fixed format” (FI) do not include a length byte and are not compressed. This
option actually saves storage space for one-byte fields or fields that are nearly always full (e.g.,
a field containing the social security number).
Long Alpha Option LA
The long alphanumeric (LA) option can only be specified for variable-length alphanumeric
fields; i.e., “A”-format fields having a length of zero. With the LA option, such an alphanumeric
field can contain a value up to 16,381 bytes long.
An alpha field with the LA option is compressed in the same way as an alpha field without the
option. The maximum length that a field with LA option can actually have is restricted by the
block size where the compressed record is stored.
ADABAS Concepts and Facilities 2
30
MU and PE Options and Field Types
ADABAS supports two basic field types: elementary fields and multiple-value fields. An
“elementary” field has only one value per record. A “multiple-value” (MU) field can have up
to 191 values, or occurrences, in a single record. Each multiple-value field has a “binary
occurrence counter” (BOC) that stores the number of occurrences.
A “periodic” (PE) group field defines consecutive fields in the FDT that repeat together in a
record. Like the members of a non-periodic group field, PE members immediately follow the
PE group field, have a higher level number than the PE field, and can be accessed both
individually and as a group. Each PE has a BOC that stores the number of occurrences.
A periodic group may be repeated up to 191 times per record and may contain one or more
multiple-value fields. Occurrences or values that are not used require no storage space.
ADABAS thus supports four field types:
Single Value per Record Multiple Values per Record
Single Field Elementary MU
Multiple Fields Group PE
Figure 2–7 illustrates the four field types in a single record structure.
Laura
FIRST
NAME
Cagnetti 3 1 118 Glade Erie 16509
LAST
NAME STREET CITY ZIP
ELEMENTARY
FIELD
GROUP FIELD
(NAME)
PERIODIC GROUP
(ADDRESS)
MULTIPLE VALUE
FIELD
CUSTOMER
NUMBER
19811
PE
BOC
MU
BOC
2
PA
STATE
271 Larue Cincinnati OH 45211
P.O. Box 88
2 733 Hall Easton PA 19014
P.O. Box 7
Figure 2–7: The ADABAS Field Types
ADABAS Design 2
31
A PE cannot be nested within another PE. Nesting an MU within a PE, as shown in Figure 2–7,
is permitted but complicates programming by introducing a two-dimensional array. It also has
implications for data access: when ADABAS accesses the periodic group, it returns only the first
occurrence of the MU for each occurrence of the PE returned.
The unique characteristic of the periodic group and the reason for choosing the periodic group
structure is its ability to maintain the order of occurrences. If a periodic group originally contains
three occurrences and the first or second occurrence is later deleted, those occurrences are set
to nulls; the third occurrence remains in the third position. This contrasts with the way leading
null entries are handled in multiple-value fields. The individual values in a multiple-value field
do not retain positional integrity if one of the values is removed.
SQL Compatibility Options NC and NN
Because Software AG’s mainframe ADABAS SQL Server (ESQ) and other Structured Query
Language (SQL) database query languages require SQL-compatible null representation, special
data definition options are included in ADABAS.
A field designated with the NC (not counted) option may contain a null value that satisfies the
SQL interpretation of a field having no value. An NC field containing a null means that no field
value has been entered; that is, the field’s value is not defined.
This undefined state differs from a null value assigned to a non-NC field for which no value has
been specified: a non-NC field’s null means the value in the field is either zero or blank,
depending on the field’s format.
The NN (not null) option can be specified only for NC-defined fields. It indicates that an NC
field must always have a value defined; it cannot contain an SQL null. This ensures that the field
cannot be left undefined when a record is either created or updated. The field value may be zero
or blank, however.
Special Field and Descriptor Attributes
Parent Of
The FDT indicates whether a field is a “parent” field for a sub/superfield, sub/superdescriptor,
hyperdescriptor, or phonetic descriptor as described in the following section.
ADABAS Concepts and Facilities 2
32
Special Descriptors
Information about any special fields and descriptors (subdescriptors, subfields,
superdescriptors, superfields, phonetic descriptors, and hyperdescriptors) in the file is
maintained in the Special Descriptor Table or “SDT” part of the FDT.
SPECIAL DESCRIPTOR TABLE
I I I I I I
TYPE I NAME I LENGTH I FORMAT I OPTIONS I STRUCTURE I
I I I I I I
–––––––I––––––I––––––––I––––––––I––––––––––––––––––––––I––––––––––––––––I
I I I I I I
SUPER I H1 I 4 I B I DE,NU I AU ( 1 – 2) I
I I I I I AV ( 1 – 2) I
SUB I S1 I 4 I A I DE I AO ( 1 – 4) I
SUPER I S2 I 26 I A I DE I AO ( 1 – 6) I
I I I I I AE ( 1 – 20) I
SUPER I S3 I 12 I A I DE,NU,PE I AR ( 1 – 3) I
I I I I I AS ( 1 – 9) I
I I I I I I
PHON I PH I I I I PH = PHON(AE) I
I I I I I I
I I I I I I
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
Figure 2–8: Special Descriptor Table
Along with the name, length, format, and specified options of each special field and descriptor,
this table provides the following information:
Column Explanation
TYPE SUB Subfield/subdescriptor
SUPER Superfield/superdescriptor
PHON Phonetic descriptor
HYPER Hyperdescriptor
STRUCTURE The component fields and field bytes of the sub-, super-, or hyperdescriptor.
Phonetic fields show the equivalent alphanumeric elementary fields.
ADABAS Design 2
33
Subfield / Superfield
A portion of a field (“subfield”) or any combination of fields (“superfield”) may be defined as
an elementary field (see page 30). Subfields and superfields may be used for read operations
only. They may only be changed by updating the original fields.
Subdescriptor
A “subdescriptor” is part of a single field used as a descriptor. The field from which the
subdescriptor is derived may or may not be an elementary descriptor (see page 28). If a search
criteria involves a range of values contained in the first “n” bytes of an alphanumeric field or
the last “n” bytes of a numeric field, a subdescriptor may be defined using only the relevant bytes
of the field. A subdescriptor allows you to increase the efficiency of a search by specifying a
single value rather than a range of values.
For example, if the first two bytes of a five-byte field refer to a geographical region and you want
to retrieve all records for region 11 without using a subdescriptor, you would have to search for
all records in the range 11000–11999. If you define a subdescriptor comprising the first two
bytes of the field, you could search for all records with 11 in the subdescriptor.
Superdescriptor
A “superdescriptor” combines all or parts of 2–20 fields. The fields from which the
superdescriptor is derived may or may not be elementary descriptors. When search criteria
involve values for a combination of fields, using a superdescriptor is more efficient than using
a combination of several elementary descriptors.
For example, to search for customers by last name within regions, you could create a
superdescriptor by combining the first two bytes (i.e., the geographical region indicator) of the
five-byte customer number field and the entire customer last name field.
Phonetic Descriptor
A “phonetic” descriptor may be defined and used to search for all records that contain similar
phonetic values. The phonetic value of a descriptor is determined by an internal algorithm based
on the first 20 bytes of the field value with only alphabetic values being considered (numeric
values, special characters and blanks are ignored).
ADABAS Concepts and Facilities 2
34
Hyperdescriptor
The hyperdescriptor option can be used to generate descriptor values based on a user-supplied
algorithm. Up to 31 different hyperdescriptors can be defined for a single physical ADABAS
database. Each hyperdescriptor must be named by an appropriate HEXnn ADARUN statement
parameter in the job where it is used.
With hyperdescriptors, “fuzzy” matching is possible; i.e., retrieving data based on similar
rather than on exact search criteria. Hyperdescriptors allow multiple virtual indexes, meaning
that several different search index entries can be made for a single data field.
Hyperdescriptors can be used to implement “n”-component superdescriptors, derived keys, or
other key constructs. Using hyperdescriptors, it is possible to develop applications that are
simpler and more flexible than applications based on a strictly normalized relational structure.
One application area for hyperdescriptors is name processing. For example, the name
SCHROEDER could be stored not only with the index SCHROEDER itself, but also with the
“virtual” indexes SCHRODER, SCHRADER, SCHRÖDER or any other variation of the name.
Thus, although only the name SCHROEDER is physically stored in the data area of the database,
multiple search indexes exist to the data. If, subsequently, a search is made for the name
SCHRODER, the record SCHROEDER will be found.
A more sophisticated application area for hyperdescriptors is fingerprint matching, in which
typical characteristics of fingerprints can form the basis of a fuzzy matching algorithm; i.e., the
original fingerprint is stored in the database, but any number of search indexes can be made to
the fingerprint, based on an algorithm that allows small-scale deviations from the original.

0 Comments:

Post a Comment

<< Home