CS350 Intro Computer Systems Homework

Homework 7 on Structures, Stack Overflow (Buffer Overflow), Floating Points Arithmetic

  1. For each of the following structure declarations, determine the offset of each field, the total size of the structure, and its alignment requirement for x86-64. See Practice problem 3.44 of page 275 for reference. It's best if you write C code to answer the question.
    struct P1 { char a; int b; char c; int d; };
    struct P2 { long a; char b; int c; char d; };
    struct P3 { char a[2]; short b[5]; };
    struct P4 { char *a[4]; short b[3]; };
    struct P5 { struct P2 a; struct P3 b[3]; };
      
    offsettotal sizealignment requirement
    abcd
    P1
    P2
    P3
    P4
    P5

  2. The following code is similar to the code shown in Practice problem 3.48 of page 288.
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <limits.h>
    
    int len(char *s) {
      return strlen(s);
    }
    
    void lptoa(char *s, long *p) {
      long val = *p;
      sprintf(s,"%ld",val);
    }
    
    long longlen(long x) {
      long v;
      char buf[8];
      v = x;
      lptoa(buf,&v);
      return len(buf);
    }
    
    int main() {
      longlen(INT_MAX-1);
    }
      
    Compile the code with and without stack protector.

    Fill the table with appropriate values. Leave the entries empty if not applicable.

    no stack protectorstack protector
    gcc flag
    lenassembly for allocating stack
    stack size in decimal
    assembly for freeing stack
    lptoaassembly for allocating stack
    stack size in decimal
    assembly for freeing stack
    "char *s" address relative to rsp after entering lptoa
    "long *p" address relative to rsp after entering lptoa
    "val" address relative to rsp after entering lptoa
    longlenassembly for allocating stack
    stack size in decimal
    assembly for freeing stack
    "x" address relative to rsp after entering longlen
    "v" address relative to rsp after entering longlen
    "buf" address relative to rsp after entering longlen
    canary register name
    canary address relative to rsp
    canary value
    assembly for erasing canary value
    assembly for canary cross check

  3. Do problem 3.72 of page 323.
  4. Pages 36-37 of lecture notes "08-machine-data.pdf" shows how xmm registers are used for simple floating point arithmetic such as fadd, dadd, an dincr which increments *p, a pointer to double variable by value v.

    For Wed-section you have to follow the aforementioned pages.

    For both sections, read 3.11 Floating-Point code.

    Write a C function for the following assembly. cvtsd2ss converts scalar double precision number (a double number) to scalar single precision (a float number) while cvtss2sd converts scalar single precision number (a float number) to scalar double precision (a double number).

    1.       unknown1:
      	subss	%xmm1, %xmm0
      	ret
          
    2.       unknown2:
      	subsd	%xmm1, %xmm0
      	ret
          
    3.       unknown3:
      	movaps	%xmm0, %xmm1
      	cvtsd2ss	(%rdi), %xmm0
      	addss	%xmm0, %xmm1
      	cvtss2sd	%xmm1, %xmm2
      	movsd	%xmm2, (%rdi)
      	ret
          
    4.       unknown4:
      	movsd	(%rdi), %xmm1
      	movapd	%xmm1, %xmm2
      	subsd	%xmm0, %xmm2
      	movsd	%xmm2, (%rdi)
      	movapd	%xmm1, %xmm0
      	ret
          
    5.       unknown5:
      	pxor	%xmm0, %xmm0
      	movl	$0, %eax
      	jmp	.L13
      .L14:
      	movss	(%rdx,%rax,4), %xmm1
      	mulss	(%rsi,%rax,4), %xmm1
      	addss	%xmm1, %xmm0
      	addq	$1, %rax
      .L13:
      	cmpq	%rdi, %rax
      	jb	.L14
      	rep ret
          
    6.       unknown6:
      	pxor	%xmm0, %xmm0
      	movl	$0, %eax
      	jmp	.L10
      .L11:
      	movsd	(%rdx,%rax,8), %xmm1
      	mulsd	(%rsi,%rax,8), %xmm1
      	addsd	%xmm1, %xmm0
      	addq	$1, %rax
      .L10:
      	cmpq	%rdi, %rax
      	jb	.L11
      	rep ret
          

  5. SSE/AVX/AVX512 programming: write assembly statements for overlapping two images of equal size to produce a new image.

    See the figure attached which shows four steps that overlap two images (B and E) to result in G. Given two images, B and E, you are asked to produce image G with no blue background. Assuming the images each are of size 1K x 1K pixels or 1 MB, where each pixel is a byte representing 256 colors. For 1024 * 1024 (1 MB) images, a simple minded loop would require 1 million iterations (1K * 1K), where each iteration works on a pixel. For large-sized images, this simple minded loop is prohibitively expensive. A solution to this expensive computation beside CUDA and OpenCL for GPGPU programming is SSE/AVX/AVX512 instructions which allow for one instruction to operate on 16, 32, or 64 bytes at the same time. With SSE/AVX/AVX512, a row for 1024 * 1024 (1 MB) image can now be processed in 64, 32, or 16 iterations. The entire image can therefore require only 64K, 32K, or 16K iterations instead of 1 million.

    The assembly instructions required for this problem are variation of PCMPEQ, ANDP, ANDNP, and POR which are documented in the Intel manual (https://software.intel.com/en-us/articles/intel-sdm#combined). For example, the instructions for AVX512 to overlap the two images are variations of the following AVX512 instructions:

    	VPCMPEQD zmm1, zmm2, zmm3 /m256 -> cmp
    	VANDPS zmm1, zmm2, zmm3/m256 -> and
    	VANDNPS zmm1, zmm2, zmm3/m256 -> !and
    	VPORQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst -> or
          
    Assuming and 8-bit 256-color map below: write four assembly instructions for The four assembly instructions are called loop body inside a loop that iterates 64K, 32K, or 16K depending on SSE, AVX, or AVX512. You may use register xmm/ymm/zmm2 to hold a temporary value.