Print this page
11493 aggr needs support for multiple pseudo rx groups
Portions contributed by: Dan McDonald <danmcd@joyent.com>
Reviewed by: Patrick Mooney <patrick.mooney@joyent.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>

Split Close
Expand all
Collapse all
          --- old/usr/src/uts/common/io/aggr/aggr_grp.c
          +++ new/usr/src/uts/common/io/aggr/aggr_grp.c
↓ open down ↓ 24 lines elided ↑ open up ↑
  25   25  
  26   26  /*
  27   27   * IEEE 802.3ad Link Aggregation -- Link Aggregation Groups.
  28   28   *
  29   29   * An instance of the structure aggr_grp_t is allocated for each
  30   30   * link aggregation group. When created, aggr_grp_t objects are
  31   31   * entered into the aggr_grp_hash hash table maintained by the modhash
  32   32   * module. The hash key is the linkid associated with the link
  33   33   * aggregation group.
  34   34   *
  35      - * A set of MAC ports are associated with each association group.
       35 + * Each aggregation contains a set of ports. The port is represented
       36 + * by the aggr_port_t structure. A port consists of a single MAC
       37 + * client which has exclusive (MCIS_EXCLUSIVE) use of the underlying
       38 + * MAC. This client is used by the aggr to send and receive LACP
       39 + * traffic. Each port client takes on the same MAC unicast address --
       40 + * the address of the aggregation itself (taken from the first port by
       41 + * default).
  36   42   *
  37      - * Aggr pseudo TX rings
  38      - * --------------------
  39      - * The underlying ports (NICs) in an aggregation can have TX rings. To
  40      - * enhance aggr's performance, these TX rings are made available to the
  41      - * aggr layer as pseudo TX rings. The concept of pseudo rings are not new.
  42      - * They are already present and implemented on the RX side. It is called
  43      - * as pseudo RX rings. The same concept is extended to the TX side where
  44      - * each TX ring of an underlying port is reflected in aggr as a pseudo
  45      - * TX ring. Thus each pseudo TX ring will map to a specific hardware TX
  46      - * ring. Even in the case of a NIC that does not have a TX ring, a pseudo
  47      - * TX ring is given to the aggregation layer.
       43 + * The MAC client that hangs off each aggr port is not your typical
       44 + * MAC client. Not only does it have exclusive control of the MAC, but
       45 + * it also has no Tx or Rx SRSes. An SRS is designed to queue and
       46 + * fanout traffic among L4 protocols; but the aggr is an intermediary,
       47 + * not a consumer. Instead of using SRSes, the aggr puts the
       48 + * underlying hardware rings into passthru mode and ships packets up
       49 + * via a direct call to aggr_recv_cb(). This allows aggr to enforce
       50 + * LACP while passing all other traffic up to clients of the aggr.
  48   51   *
       52 + * Pseudo Rx Groups and Rings
       53 + * --------------------------
       54 + *
       55 + * It is imperative for client performance that the aggr provide as
       56 + * many MAC groups as possible. In order to use the underlying HW
       57 + * resources, aggr creates pseudo groups to aggregate the underlying
       58 + * HW groups. Every HW group gets mapped to a pseudo group; and every
       59 + * HW ring in that group gets mapped to a pseudo ring. The pseudo
       60 + * group at index 0 combines all the HW groups at index 0 from each
       61 + * port, etc. The aggr's MAC then creates normal MAC groups and rings
       62 + * out of these pseudo groups and rings to present to the aggr's
       63 + * clients. To the clients, the aggr's groups and rings are absolutely
       64 + * no different than a NIC's groups or rings.
       65 + *
       66 + * Pseudo Tx Rings
       67 + * ---------------
       68 + *
       69 + * The underlying ports (NICs) in an aggregation can have Tx rings. To
       70 + * enhance aggr's performance, these Tx rings are made available to
       71 + * the aggr layer as pseudo Tx rings. The concept of pseudo rings are
       72 + * not new. They are already present and implemented on the Rx side.
       73 + * The same concept is extended to the Tx side where each Tx ring of
       74 + * an underlying port is reflected in aggr as a pseudo Tx ring. Thus
       75 + * each pseudo Tx ring will map to a specific hardware Tx ring. Even
       76 + * in the case of a NIC that does not have a Tx ring, a pseudo Tx ring
       77 + * is given to the aggregation layer.
       78 + *
  49   79   * With this change, the outgoing stack depth looks much better:
  50   80   *
  51   81   * mac_tx() -> mac_tx_aggr_mode() -> mac_tx_soft_ring_process() ->
  52   82   * mac_tx_send() -> aggr_ring_rx() -> <driver>_ring_tx()
  53   83   *
  54      - * Two new modes are introduced to mac_tx() to handle aggr pseudo TX rings:
       84 + * Two new modes are introduced to mac_tx() to handle aggr pseudo Tx rings:
  55   85   * SRS_TX_AGGR and SRS_TX_BW_AGGR.
  56   86   *
  57   87   * In SRS_TX_AGGR mode, mac_tx_aggr_mode() routine is called. This routine
  58      - * invokes an aggr function, aggr_find_tx_ring(), to find a (pseudo) TX
       88 + * invokes an aggr function, aggr_find_tx_ring(), to find a (pseudo) Tx
  59   89   * ring belonging to a port on which the packet has to be sent.
  60   90   * aggr_find_tx_ring() first finds the outgoing port based on L2/L3/L4
  61      - * policy and then uses the fanout_hint passed to it to pick a TX ring from
       91 + * policy and then uses the fanout_hint passed to it to pick a Tx ring from
  62   92   * the selected port.
  63   93   *
  64   94   * In SRS_TX_BW_AGGR mode, mac_tx_bw_mode() function is called where
  65   95   * bandwidth limit is applied first on the outgoing packet and the packets
  66   96   * allowed to go out would call mac_tx_aggr_mode() to send the packet on a
  67      - * particular TX ring.
       97 + * particular Tx ring.
  68   98   */
  69   99  
  70  100  #include <sys/types.h>
  71  101  #include <sys/sysmacros.h>
  72  102  #include <sys/conf.h>
  73  103  #include <sys/cmn_err.h>
  74  104  #include <sys/disp.h>
  75  105  #include <sys/list.h>
  76  106  #include <sys/ksynch.h>
  77  107  #include <sys/kmem.h>
↓ open down ↓ 36 lines elided ↑ open up ↑
 114  144  static boolean_t aggr_grp_capab_check(aggr_grp_t *, aggr_port_t *);
 115  145  static uint_t aggr_grp_max_sdu(aggr_grp_t *);
 116  146  static uint32_t aggr_grp_max_margin(aggr_grp_t *);
 117  147  static boolean_t aggr_grp_sdu_check(aggr_grp_t *, aggr_port_t *);
 118  148  static boolean_t aggr_grp_margin_check(aggr_grp_t *, aggr_port_t *);
 119  149  
 120  150  static int aggr_add_pseudo_rx_group(aggr_port_t *, aggr_pseudo_rx_group_t *);
 121  151  static void aggr_rem_pseudo_rx_group(aggr_port_t *, aggr_pseudo_rx_group_t *);
 122  152  static int aggr_pseudo_disable_intr(mac_intr_handle_t);
 123  153  static int aggr_pseudo_enable_intr(mac_intr_handle_t);
 124      -static int aggr_pseudo_start_ring(mac_ring_driver_t, uint64_t);
      154 +static int aggr_pseudo_start_rx_ring(mac_ring_driver_t, uint64_t);
      155 +static void aggr_pseudo_stop_rx_ring(mac_ring_driver_t);
 125  156  static int aggr_addmac(void *, const uint8_t *);
 126  157  static int aggr_remmac(void *, const uint8_t *);
 127  158  static int aggr_addvlan(mac_group_driver_t, uint16_t);
 128  159  static int aggr_remvlan(mac_group_driver_t, uint16_t);
 129  160  static mblk_t *aggr_rx_poll(void *, int);
 130  161  static void aggr_fill_ring(void *, mac_ring_type_t, const int,
 131  162      const int, mac_ring_info_t *, mac_ring_handle_t);
 132  163  static void aggr_fill_group(void *, mac_ring_type_t, const int,
 133  164      mac_group_info_t *, mac_group_handle_t);
 134  165  
↓ open down ↓ 224 lines elided ↑ open up ↑
 359  390          }
 360  391  
 361  392          /*
 362  393           * Update port's state.
 363  394           */
 364  395          port->lp_state = AGGR_PORT_STATE_ATTACHED;
 365  396  
 366  397          aggr_grp_multicst_port(port, B_TRUE);
 367  398  
 368  399          /*
 369      -         * Set port's receive callback
      400 +         * The port client doesn't have an Rx SRS; instead of calling
      401 +         * mac_rx_set() we set the client's flow callback directly.
      402 +         * This datapath is used only when the port's driver doesn't
      403 +         * support MAC_CAPAB_RINGS. Drivers with ring support will
      404 +         * deliver traffic to the aggr via ring passthru.
 370  405           */
 371      -        mac_rx_set(port->lp_mch, aggr_recv_cb, port);
      406 +        mac_client_set_flow_cb(port->lp_mch, aggr_recv_cb, port);
 372  407  
 373  408          /*
 374  409           * If LACP is OFF, the port can be used to send data as soon
 375  410           * as its link is up and verified to be compatible with the
 376  411           * aggregation.
 377  412           *
 378  413           * If LACP is active or passive, notify the LACP subsystem, which
 379  414           * will enable sending on the port following the LACP protocol.
 380  415           */
 381  416          if (grp->lg_lacp_mode == AGGR_LACP_OFF)
↓ open down ↓ 9 lines elided ↑ open up ↑
 391  426  {
 392  427          boolean_t link_state_changed = B_FALSE;
 393  428  
 394  429          ASSERT(MAC_PERIM_HELD(grp->lg_mh));
 395  430          ASSERT(MAC_PERIM_HELD(port->lp_mh));
 396  431  
 397  432          /* update state */
 398  433          if (port->lp_state != AGGR_PORT_STATE_ATTACHED)
 399  434                  return (B_FALSE);
 400  435  
 401      -        mac_rx_clear(port->lp_mch);
      436 +        mac_client_clear_flow_cb(port->lp_mch);
 402  437  
 403  438          aggr_grp_multicst_port(port, B_FALSE);
 404  439  
 405  440          if (grp->lg_lacp_mode == AGGR_LACP_OFF)
 406  441                  aggr_send_port_disable(port);
 407  442          else
 408  443                  aggr_lacp_port_detached(port);
 409  444  
 410  445          port->lp_state = AGGR_PORT_STATE_STANDBY;
 411  446  
↓ open down ↓ 118 lines elided ↑ open up ↑
 530  565   */
 531  566  static int
 532  567  aggr_grp_add_port(aggr_grp_t *grp, datalink_id_t port_linkid, boolean_t force,
 533  568      aggr_port_t **pp)
 534  569  {
 535  570          aggr_port_t *port, **cport;
 536  571          mac_perim_handle_t mph;
 537  572          zoneid_t port_zoneid = ALL_ZONES;
 538  573          int err;
 539  574  
 540      -        /* The port must be int the same zone as the aggregation. */
      575 +        /* The port must be in the same zone as the aggregation. */
 541  576          if (zone_check_datalink(&port_zoneid, port_linkid) != 0)
 542  577                  port_zoneid = GLOBAL_ZONEID;
 543  578          if (grp->lg_zoneid != port_zoneid)
 544  579                  return (EBUSY);
 545  580  
 546  581          /*
 547      -         * lg_mh could be NULL when the function is called during the creation
 548      -         * of the aggregation.
      582 +         * If we are creating the aggr, then there is no MAC handle
      583 +         * and thus no perimeter to hold. If we are adding a port to
      584 +         * an existing aggr, then the perimiter of the aggr's MAC must
      585 +         * be held.
 549  586           */
 550  587          ASSERT(grp->lg_mh == NULL || MAC_PERIM_HELD(grp->lg_mh));
 551  588  
 552      -        /* create new port */
 553  589          err = aggr_port_create(grp, port_linkid, force, &port);
 554  590          if (err != 0)
 555  591                  return (err);
 556  592  
 557  593          mac_perim_enter_by_mh(port->lp_mh, &mph);
 558  594  
 559      -        /* add port to list of group constituent ports */
      595 +        /* Add the new port to the end of the list. */
 560  596          cport = &grp->lg_ports;
 561  597          while (*cport != NULL)
 562  598                  cport = &((*cport)->lp_next);
 563  599          *cport = port;
 564  600  
 565  601          /*
 566  602           * Back reference to the group it is member of. A port always
 567  603           * holds a reference to its group to ensure that the back
 568  604           * reference is always valid.
 569  605           */
↓ open down ↓ 61 lines elided ↑ open up ↑
 631  667  
 632  668          /*
 633  669           * No slot for this new RX ring.
 634  670           */
 635  671          if (j == MAX_RINGS_PER_GROUP)
 636  672                  return (EIO);
 637  673  
 638  674          ring->arr_flags |= MAC_PSEUDO_RING_INUSE;
 639  675          ring->arr_hw_rh = hw_rh;
 640  676          ring->arr_port = port;
      677 +        ring->arr_grp = rx_grp;
 641  678          rx_grp->arg_ring_cnt++;
 642  679  
 643  680          /*
 644  681           * The group is already registered, dynamically add a new ring to the
 645  682           * mac group.
 646  683           */
 647  684          if ((err = mac_group_add_ring(rx_grp->arg_gh, j)) != 0) {
 648  685                  ring->arr_flags &= ~MAC_PSEUDO_RING_INUSE;
 649  686                  ring->arr_hw_rh = NULL;
 650  687                  ring->arr_port = NULL;
      688 +                ring->arr_grp = NULL;
 651  689                  rx_grp->arg_ring_cnt--;
 652  690          } else {
 653      -                mac_hwring_setup(hw_rh, (mac_resource_handle_t)ring,
 654      -                    mac_find_ring(rx_grp->arg_gh, j));
      691 +                /*
      692 +                 * This must run after the MAC is registered.
      693 +                 */
      694 +                ASSERT3P(ring->arr_rh, !=, NULL);
      695 +                mac_hwring_set_passthru(hw_rh, (mac_rx_t)aggr_recv_cb,
      696 +                    (void *)port, (mac_resource_handle_t)ring);
 655  697          }
 656  698          return (err);
 657  699  }
 658  700  
 659  701  /*
 660  702   * Remove the pseudo RX ring of the given HW ring handle.
 661  703   */
 662  704  static void
 663  705  aggr_rem_pseudo_rx_ring(aggr_pseudo_rx_group_t *rx_grp, mac_ring_handle_t hw_rh)
 664  706  {
 665      -        aggr_pseudo_rx_ring_t   *ring;
 666      -        int                     j;
      707 +        for (uint_t j = 0; j < MAX_RINGS_PER_GROUP; j++) {
      708 +                aggr_pseudo_rx_ring_t *ring = rx_grp->arg_rings + j;
 667  709  
 668      -        for (j = 0; j < MAX_RINGS_PER_GROUP; j++) {
 669      -                ring = rx_grp->arg_rings + j;
 670  710                  if (!(ring->arr_flags & MAC_PSEUDO_RING_INUSE) ||
 671  711                      ring->arr_hw_rh != hw_rh) {
 672  712                          continue;
 673  713                  }
 674  714  
 675  715                  mac_group_rem_ring(rx_grp->arg_gh, ring->arr_rh);
 676  716  
 677  717                  ring->arr_flags &= ~MAC_PSEUDO_RING_INUSE;
 678  718                  ring->arr_hw_rh = NULL;
 679  719                  ring->arr_port = NULL;
      720 +                ring->arr_grp = NULL;
 680  721                  rx_grp->arg_ring_cnt--;
 681      -                mac_hwring_teardown(hw_rh);
      722 +                mac_hwring_clear_passthru(hw_rh);
 682  723                  break;
 683  724          }
 684  725  }
 685  726  
 686  727  /*
 687  728   * Create pseudo rings over the HW rings of the port.
 688  729   *
 689  730   * o Create a pseudo ring in rx_grp per HW ring in the port's HW group.
 690  731   *
 691  732   * o Program existing unicast filters on the pseudo group into the HW group.
 692  733   *
 693  734   * o Program existing VLAN filters on the pseudo group into the HW group.
 694  735   */
 695  736  static int
 696  737  aggr_add_pseudo_rx_group(aggr_port_t *port, aggr_pseudo_rx_group_t *rx_grp)
 697  738  {
 698      -        aggr_grp_t              *grp = port->lp_grp;
 699  739          mac_ring_handle_t       hw_rh[MAX_RINGS_PER_GROUP];
 700  740          aggr_unicst_addr_t      *addr, *a;
 701  741          mac_perim_handle_t      pmph;
 702  742          aggr_vlan_t             *avp;
 703      -        int                     hw_rh_cnt, i = 0, j;
      743 +        uint_t                  hw_rh_cnt, i;
 704  744          int                     err = 0;
      745 +        uint_t                  g_idx = rx_grp->arg_index;
 705  746  
 706      -        ASSERT(MAC_PERIM_HELD(grp->lg_mh));
      747 +        ASSERT(MAC_PERIM_HELD(port->lp_grp->lg_mh));
      748 +        ASSERT3U(g_idx, <, MAX_GROUPS_PER_PORT);
 707  749          mac_perim_enter_by_mh(port->lp_mh, &pmph);
 708  750  
 709  751          /*
 710      -         * This function must be called after the aggr registers its MAC
 711      -         * and its Rx group has been initialized.
      752 +         * This function must be called after the aggr registers its
      753 +         * MAC and its Rx groups have been initialized.
 712  754           */
 713  755          ASSERT(rx_grp->arg_gh != NULL);
 714  756  
 715  757          /*
 716  758           * Get the list of the underlying HW rings.
 717  759           */
 718      -        hw_rh_cnt = mac_hwrings_get(port->lp_mch,
 719      -            &port->lp_hwgh, hw_rh, MAC_RING_TYPE_RX);
      760 +        hw_rh_cnt = mac_hwrings_idx_get(port->lp_mh, g_idx,
      761 +            &port->lp_hwghs[g_idx], hw_rh, MAC_RING_TYPE_RX);
 720  762  
 721      -        if (port->lp_hwgh != NULL) {
 722      -                /*
 723      -                 * Quiesce the HW ring and the MAC SRS on the ring. Note
 724      -                 * that the HW ring will be restarted when the pseudo ring
 725      -                 * is started. At that time all the packets will be
 726      -                 * directly passed up to the pseudo Rx ring and handled
 727      -                 * by MAC SRS created over the pseudo Rx ring.
 728      -                 */
 729      -                mac_rx_client_quiesce(port->lp_mch);
 730      -                mac_srs_perm_quiesce(port->lp_mch, B_TRUE);
 731      -        }
 732      -
 733  763          /*
 734  764           * Add existing VLAN and unicast address filters to the port.
 735  765           */
 736  766          for (avp = list_head(&rx_grp->arg_vlans); avp != NULL;
 737  767              avp = list_next(&rx_grp->arg_vlans, avp)) {
 738      -                if ((err = aggr_port_addvlan(port, avp->av_vid)) != 0)
      768 +                if ((err = aggr_port_addvlan(port, g_idx, avp->av_vid)) != 0)
 739  769                          goto err;
 740  770          }
 741  771  
 742  772          for (addr = rx_grp->arg_macaddr; addr != NULL; addr = addr->aua_next) {
 743      -                if ((err = aggr_port_addmac(port, addr->aua_addr)) != 0)
      773 +                if ((err = aggr_port_addmac(port, g_idx, addr->aua_addr)) != 0)
 744  774                          goto err;
 745  775          }
 746  776  
 747  777          for (i = 0; i < hw_rh_cnt; i++) {
 748  778                  err = aggr_add_pseudo_rx_ring(port, rx_grp, hw_rh[i]);
 749  779                  if (err != 0)
 750  780                          goto err;
 751  781          }
 752  782  
 753      -        port->lp_rx_grp_added = B_TRUE;
 754  783          mac_perim_exit(pmph);
 755  784          return (0);
 756  785  
 757  786  err:
 758  787          ASSERT(err != 0);
 759  788  
 760      -        for (j = 0; j < i; j++)
      789 +        for (uint_t j = 0; j < i; j++)
 761  790                  aggr_rem_pseudo_rx_ring(rx_grp, hw_rh[j]);
 762  791  
 763  792          for (a = rx_grp->arg_macaddr; a != addr; a = a->aua_next)
 764      -                aggr_port_remmac(port, a->aua_addr);
      793 +                aggr_port_remmac(port, g_idx, a->aua_addr);
 765  794  
 766  795          if (avp != NULL)
 767  796                  avp = list_prev(&rx_grp->arg_vlans, avp);
 768  797  
 769  798          for (; avp != NULL; avp = list_prev(&rx_grp->arg_vlans, avp)) {
 770  799                  int err2;
 771  800  
 772      -                if ((err2 = aggr_port_remvlan(port, avp->av_vid)) != 0) {
      801 +                if ((err2 = aggr_port_remvlan(port, g_idx, avp->av_vid)) != 0) {
 773  802                          cmn_err(CE_WARN, "Failed to remove VLAN %u from port %s"
 774  803                              ": errno %d.", avp->av_vid,
 775  804                              mac_client_name(port->lp_mch), err2);
 776  805                  }
 777  806          }
 778  807  
 779      -        if (port->lp_hwgh != NULL) {
 780      -                mac_srs_perm_quiesce(port->lp_mch, B_FALSE);
 781      -                mac_rx_client_restart(port->lp_mch);
 782      -                port->lp_hwgh = NULL;
 783      -        }
 784      -
      808 +        port->lp_hwghs[g_idx] = NULL;
 785  809          mac_perim_exit(pmph);
 786  810          return (err);
 787  811  }
 788  812  
 789  813  /*
 790  814   * Destroy the pseudo rings mapping to this port and remove all VLAN
 791  815   * and unicast filters from this port. Even if there are no underlying
 792  816   * HW rings we must still remove the unicast filters to take the port
 793  817   * out of promisc mode.
 794  818   */
 795  819  static void
 796  820  aggr_rem_pseudo_rx_group(aggr_port_t *port, aggr_pseudo_rx_group_t *rx_grp)
 797  821  {
 798      -        aggr_grp_t              *grp = port->lp_grp;
 799  822          mac_ring_handle_t       hw_rh[MAX_RINGS_PER_GROUP];
 800  823          aggr_unicst_addr_t      *addr;
 801      -        mac_group_handle_t      hwgh;
 802  824          mac_perim_handle_t      pmph;
 803      -        int                     hw_rh_cnt, i;
      825 +        uint_t                  hw_rh_cnt;
      826 +        uint_t                  g_idx = rx_grp->arg_index;
 804  827  
 805      -        ASSERT(MAC_PERIM_HELD(grp->lg_mh));
      828 +        ASSERT(MAC_PERIM_HELD(port->lp_grp->lg_mh));
      829 +        ASSERT3U(g_idx, <, MAX_GROUPS_PER_PORT);
      830 +        ASSERT3P(rx_grp->arg_gh, !=, NULL);
 806  831          mac_perim_enter_by_mh(port->lp_mh, &pmph);
 807  832  
 808      -        if (!port->lp_rx_grp_added)
 809      -                goto done;
      833 +        hw_rh_cnt = mac_hwrings_idx_get(port->lp_mh, g_idx, NULL, hw_rh,
      834 +            MAC_RING_TYPE_RX);
 810  835  
 811      -        ASSERT(rx_grp->arg_gh != NULL);
 812      -        hw_rh_cnt = mac_hwrings_get(port->lp_mch,
 813      -            &hwgh, hw_rh, MAC_RING_TYPE_RX);
 814      -
 815      -        for (i = 0; i < hw_rh_cnt; i++)
      836 +        for (uint_t i = 0; i < hw_rh_cnt; i++)
 816  837                  aggr_rem_pseudo_rx_ring(rx_grp, hw_rh[i]);
 817  838  
 818  839          for (addr = rx_grp->arg_macaddr; addr != NULL; addr = addr->aua_next)
 819      -                aggr_port_remmac(port, addr->aua_addr);
      840 +                aggr_port_remmac(port, g_idx, addr->aua_addr);
 820  841  
 821  842          for (aggr_vlan_t *avp = list_head(&rx_grp->arg_vlans); avp != NULL;
 822  843              avp = list_next(&rx_grp->arg_vlans, avp)) {
 823  844                  int err;
 824  845  
 825      -                if ((err = aggr_port_remvlan(port, avp->av_vid)) != 0) {
      846 +                if ((err = aggr_port_remvlan(port, g_idx, avp->av_vid)) != 0) {
 826  847                          cmn_err(CE_WARN, "Failed to remove VLAN %u from port %s"
 827  848                              ": errno %d.", avp->av_vid,
 828  849                              mac_client_name(port->lp_mch), err);
 829  850                  }
 830  851          }
 831  852  
 832      -        if (port->lp_hwgh != NULL) {
 833      -                port->lp_hwgh = NULL;
 834      -
 835      -                /*
 836      -                 * First clear the permanent-quiesced flag of the RX srs then
 837      -                 * restart the HW ring and the mac srs on the ring. Note that
 838      -                 * the HW ring and associated SRS will soon been removed when
 839      -                 * the port is removed from the aggr.
 840      -                 */
 841      -                mac_srs_perm_quiesce(port->lp_mch, B_FALSE);
 842      -                mac_rx_client_restart(port->lp_mch);
 843      -        }
 844      -
 845      -        port->lp_rx_grp_added = B_FALSE;
 846      -done:
      853 +        port->lp_hwghs[g_idx] = NULL;
 847  854          mac_perim_exit(pmph);
 848  855  }
 849  856  
 850  857  /*
 851  858   * Add a pseudo TX ring for the given HW ring handle.
 852  859   */
 853  860  static int
 854  861  aggr_add_pseudo_tx_ring(aggr_port_t *port,
 855  862      aggr_pseudo_tx_group_t *tx_grp, mac_ring_handle_t hw_rh,
 856  863      mac_ring_handle_t *pseudo_rh)
↓ open down ↓ 83 lines elided ↑ open up ↑
 940  947          mac_perim_handle_t      pmph;
 941  948          int                     hw_rh_cnt, i = 0, j;
 942  949          int                     err = 0;
 943  950  
 944  951          ASSERT(MAC_PERIM_HELD(grp->lg_mh));
 945  952          mac_perim_enter_by_mh(port->lp_mh, &pmph);
 946  953  
 947  954          /*
 948  955           * Get the list the the underlying HW rings.
 949  956           */
 950      -        hw_rh_cnt = mac_hwrings_get(port->lp_mch,
 951      -            NULL, hw_rh, MAC_RING_TYPE_TX);
      957 +        hw_rh_cnt = mac_hwrings_get(port->lp_mch, NULL, hw_rh,
      958 +            MAC_RING_TYPE_TX);
 952  959  
 953  960          /*
 954  961           * Even if the underlying NIC does not have TX rings, we
 955  962           * still make a psuedo TX ring for that NIC with NULL as
 956  963           * the ring handle.
 957  964           */
 958  965          if (hw_rh_cnt == 0)
 959  966                  port->lp_tx_ring_cnt = 1;
 960  967          else
 961  968                  port->lp_tx_ring_cnt = hw_rh_cnt;
↓ open down ↓ 85 lines elided ↑ open up ↑
1047 1054  }
1048 1055  
1049 1056  static int
1050 1057  aggr_pseudo_enable_intr(mac_intr_handle_t ih)
1051 1058  {
1052 1059          aggr_pseudo_rx_ring_t *rr_ring = (aggr_pseudo_rx_ring_t *)ih;
1053 1060          return (mac_hwring_enable_intr(rr_ring->arr_hw_rh));
1054 1061  }
1055 1062  
1056 1063  /*
1057      - * Here we need to start the pseudo-ring. As MAC already ensures that the
1058      - * underlying device is set up, all we need to do is save the ring generation.
1059      - *
1060      - * Note, we don't end up wanting to use the underlying mac_hwring_start/stop
1061      - * functions here as those don't actually stop and start the ring, they just
1062      - * quiesce the ring. Regardless of whether the aggr is logically up or not, we
1063      - * want to make sure that we can receive traffic for LACP.
     1064 + * Start the pseudo ring. Since the pseudo ring is just an abstraction
     1065 + * over an actual HW ring, the real task is to start the underlying HW
     1066 + * ring.
1064 1067   */
1065 1068  static int
1066      -aggr_pseudo_start_ring(mac_ring_driver_t arg, uint64_t mr_gen)
     1069 +aggr_pseudo_start_rx_ring(mac_ring_driver_t arg, uint64_t mr_gen)
1067 1070  {
     1071 +        int err;
1068 1072          aggr_pseudo_rx_ring_t *rr_ring = (aggr_pseudo_rx_ring_t *)arg;
1069 1073  
     1074 +        err = mac_hwring_start(rr_ring->arr_hw_rh);
     1075 +
     1076 +        if (err != 0)
     1077 +                return (err);
     1078 +
1070 1079          rr_ring->arr_gen = mr_gen;
1071      -        return (0);
     1080 +        return (err);
1072 1081  }
1073 1082  
1074 1083  /*
     1084 + * Stop the pseudo ring. Since the pseudo ring is just an abstraction
     1085 + * over an actual HW ring, the real task is to stop the underlying HW
     1086 + * ring.
     1087 + */
     1088 +static void
     1089 +aggr_pseudo_stop_rx_ring(mac_ring_driver_t arg)
     1090 +{
     1091 +        aggr_pseudo_rx_ring_t *rr_ring = (aggr_pseudo_rx_ring_t *)arg;
     1092 +
     1093 +        /*
     1094 +         * The rings underlying the default group must stay up to
     1095 +         * continue receiving LACP traffic. We would normally never
     1096 +         * stop the default Rx rings because of the primary MAC
     1097 +         * client; but aggr's primary MAC client doesn't call
     1098 +         * mac_unicast_add() and thus mi_active is 0 when the last
     1099 +         * non-primary client is deleted.
     1100 +         */
     1101 +        if (rr_ring->arr_grp->arg_index != 0)
     1102 +                mac_hwring_stop(rr_ring->arr_hw_rh);
     1103 +}
     1104 +
     1105 +/*
1075 1106   * Add one or more ports to an existing link aggregation group.
1076 1107   */
1077 1108  int
1078 1109  aggr_grp_add_ports(datalink_id_t linkid, uint_t nports, boolean_t force,
1079 1110      laioc_port_t *ports)
1080 1111  {
1081      -        int rc, i, nadded = 0;
     1112 +        int rc;
     1113 +        uint_t port_added = 0;
     1114 +        uint_t grp_added;
1082 1115          aggr_grp_t *grp = NULL;
1083 1116          aggr_port_t *port;
1084 1117          boolean_t link_state_changed = B_FALSE;
1085 1118          mac_perim_handle_t mph, pmph;
1086 1119  
1087      -        /* get group corresponding to linkid */
     1120 +        /* Get the aggr corresponding to linkid. */
1088 1121          rw_enter(&aggr_grp_lock, RW_READER);
1089 1122          if (mod_hash_find(aggr_grp_hash, GRP_HASH_KEY(linkid),
1090 1123              (mod_hash_val_t *)&grp) != 0) {
1091 1124                  rw_exit(&aggr_grp_lock);
1092 1125                  return (ENOENT);
1093 1126          }
1094 1127          AGGR_GRP_REFHOLD(grp);
1095 1128  
1096 1129          /*
1097      -         * Hold the perimeter so that the aggregation won't be destroyed.
     1130 +         * Hold the perimeter so that the aggregation can't be destroyed.
1098 1131           */
1099 1132          mac_perim_enter_by_mh(grp->lg_mh, &mph);
1100 1133          rw_exit(&aggr_grp_lock);
1101 1134  
1102      -        /* add the specified ports to group */
1103      -        for (i = 0; i < nports; i++) {
1104      -                /* add port to group */
     1135 +        /* Add the specified ports to the aggr. */
     1136 +        for (uint_t i = 0; i < nports; i++) {
     1137 +                grp_added = 0;
     1138 +
1105 1139                  if ((rc = aggr_grp_add_port(grp, ports[i].lp_linkid,
1106 1140                      force, &port)) != 0) {
1107 1141                          goto bail;
1108 1142                  }
     1143 +
1109 1144                  ASSERT(port != NULL);
1110      -                nadded++;
     1145 +                port_added++;
1111 1146  
1112 1147                  /* check capabilities */
1113 1148                  if (!aggr_grp_capab_check(grp, port) ||
1114 1149                      !aggr_grp_sdu_check(grp, port) ||
1115 1150                      !aggr_grp_margin_check(grp, port)) {
1116 1151                          rc = ENOTSUP;
1117 1152                          goto bail;
1118 1153                  }
1119 1154  
1120 1155                  /*
1121 1156                   * Create the pseudo ring for each HW ring of the underlying
1122 1157                   * port.
1123 1158                   */
1124 1159                  rc = aggr_add_pseudo_tx_group(port, &grp->lg_tx_group);
1125 1160                  if (rc != 0)
1126 1161                          goto bail;
1127      -                rc = aggr_add_pseudo_rx_group(port, &grp->lg_rx_group);
1128      -                if (rc != 0)
1129      -                        goto bail;
1130 1162  
     1163 +                for (uint_t j = 0; j < grp->lg_rx_group_count; j++) {
     1164 +                        rc = aggr_add_pseudo_rx_group(port,
     1165 +                            &grp->lg_rx_groups[j]);
     1166 +
     1167 +                        if (rc != 0)
     1168 +                                goto bail;
     1169 +
     1170 +                        grp_added++;
     1171 +                }
     1172 +
1131 1173                  mac_perim_enter_by_mh(port->lp_mh, &pmph);
1132 1174  
1133 1175                  /* set LACP mode */
1134 1176                  aggr_port_lacp_set_mode(grp, port);
1135 1177  
1136 1178                  /* start port if group has already been started */
1137 1179                  if (grp->lg_started) {
1138 1180                          rc = aggr_port_start(port);
1139 1181                          if (rc != 0) {
1140 1182                                  mac_perim_exit(pmph);
1141 1183                                  goto bail;
1142 1184                          }
1143 1185  
1144 1186                          /*
1145 1187                           * Turn on the promiscuous mode over the port when it
1146 1188                           * is requested to be turned on to receive the
1147      -                         * non-primary address over a port, or the promiscous
     1189 +                         * non-primary address over a port, or the promiscuous
1148 1190                           * mode is enabled over the aggr.
1149 1191                           */
1150 1192                          if (grp->lg_promisc || port->lp_prom_addr != NULL) {
1151 1193                                  rc = aggr_port_promisc(port, B_TRUE);
1152 1194                                  if (rc != 0) {
1153 1195                                          mac_perim_exit(pmph);
1154 1196                                          goto bail;
1155 1197                                  }
1156 1198                          }
1157 1199                  }
↓ open down ↓ 14 lines elided ↑ open up ↑
1172 1214          /* update the MAC address of the constituent ports */
1173 1215          if (aggr_grp_update_ports_mac(grp))
1174 1216                  link_state_changed = B_TRUE;
1175 1217  
1176 1218          if (link_state_changed)
1177 1219                  mac_link_update(grp->lg_mh, grp->lg_link_state);
1178 1220  
1179 1221  bail:
1180 1222          if (rc != 0) {
1181 1223                  /* stop and remove ports that have been added */
1182      -                for (i = 0; i < nadded; i++) {
     1224 +                for (uint_t i = 0; i < port_added; i++) {
     1225 +                        uint_t grp_remove;
     1226 +
1183 1227                          port = aggr_grp_port_lookup(grp, ports[i].lp_linkid);
1184 1228                          ASSERT(port != NULL);
     1229 +
1185 1230                          if (grp->lg_started) {
1186 1231                                  mac_perim_enter_by_mh(port->lp_mh, &pmph);
1187 1232                                  (void) aggr_port_promisc(port, B_FALSE);
1188 1233                                  aggr_port_stop(port);
1189 1234                                  mac_perim_exit(pmph);
1190 1235                          }
     1236 +
1191 1237                          aggr_rem_pseudo_tx_group(port, &grp->lg_tx_group);
1192      -                        aggr_rem_pseudo_rx_group(port, &grp->lg_rx_group);
     1238 +
     1239 +                        /*
     1240 +                         * Only the last port could have a partial set
     1241 +                         * of groups added.
     1242 +                         */
     1243 +                        grp_remove = (i + 1 == port_added) ? grp_added :
     1244 +                            grp->lg_rx_group_count;
     1245 +
     1246 +                        for (uint_t j = 0; j < grp_remove; j++) {
     1247 +                                aggr_rem_pseudo_rx_group(port,
     1248 +                                    &grp->lg_rx_groups[j]);
     1249 +                        }
     1250 +
1193 1251                          (void) aggr_grp_rem_port(grp, port, NULL, NULL);
1194 1252                  }
1195 1253          }
1196 1254  
1197 1255          mac_perim_exit(mph);
1198 1256          AGGR_GRP_REFRELE(grp);
1199 1257          return (rc);
1200 1258  }
1201 1259  
1202 1260  static int
↓ open down ↓ 141 lines elided ↑ open up ↑
1344 1402          grp->lg_lacp_done = B_FALSE;
1345 1403          grp->lg_tx_notify_done = B_FALSE;
1346 1404          grp->lg_lacp_head = grp->lg_lacp_tail = NULL;
1347 1405          grp->lg_lacp_rx_thread = thread_create(NULL, 0,
1348 1406              aggr_lacp_rx_thread, grp, 0, &p0, TS_RUN, minclsyspri);
1349 1407          grp->lg_tx_notify_thread = thread_create(NULL, 0,
1350 1408              aggr_tx_notify_thread, grp, 0, &p0, TS_RUN, minclsyspri);
1351 1409          grp->lg_tx_blocked_rings = kmem_zalloc((sizeof (mac_ring_handle_t *) *
1352 1410              MAX_RINGS_PER_GROUP), KM_SLEEP);
1353 1411          grp->lg_tx_blocked_cnt = 0;
1354      -        bzero(&grp->lg_rx_group, sizeof (aggr_pseudo_rx_group_t));
     1412 +        bzero(&grp->lg_rx_groups,
     1413 +            sizeof (aggr_pseudo_rx_group_t) * MAX_GROUPS_PER_PORT);
1355 1414          bzero(&grp->lg_tx_group, sizeof (aggr_pseudo_tx_group_t));
1356 1415          aggr_lacp_init_grp(grp);
1357 1416  
1358      -        grp->lg_rx_group.arg_untagged = 0;
1359      -        list_create(&(grp->lg_rx_group.arg_vlans), sizeof (aggr_vlan_t),
1360      -            offsetof(aggr_vlan_t, av_link));
1361      -
1362 1417          /* add MAC ports to group */
1363 1418          grp->lg_ports = NULL;
1364 1419          grp->lg_nports = 0;
1365 1420          grp->lg_nattached_ports = 0;
1366 1421          grp->lg_ntx_ports = 0;
1367 1422  
1368 1423          /*
1369 1424           * If key is not specified by the user, allocate the key.
1370 1425           */
1371 1426          if ((key == 0) && ((key = (uint32_t)id_alloc(key_ids)) == 0)) {
↓ open down ↓ 1 lines elided ↑ open up ↑
1373 1428                  goto bail;
1374 1429          }
1375 1430          grp->lg_key = key;
1376 1431  
1377 1432          for (i = 0; i < nports; i++) {
1378 1433                  err = aggr_grp_add_port(grp, ports[i].lp_linkid, force, &port);
1379 1434                  if (err != 0)
1380 1435                          goto bail;
1381 1436          }
1382 1437  
     1438 +        grp->lg_rx_group_count = 1;
     1439 +
     1440 +        for (i = 0, port = grp->lg_ports; port != NULL;
     1441 +            i++, port = port->lp_next) {
     1442 +                uint_t num_rgroups;
     1443 +
     1444 +                mac_perim_enter_by_mh(port->lp_mh, &mph);
     1445 +                num_rgroups = mac_get_num_rx_groups(port->lp_mh);
     1446 +                mac_perim_exit(mph);
     1447 +
     1448 +                /*
     1449 +                 * Utilize all the groups in a port. If some ports
     1450 +                 * have less groups than others, then traffic destined
     1451 +                 * for the same unicast address may be HW classified
     1452 +                 * on some ports but SW classified by aggr when
     1453 +                 * arriving on other ports.
     1454 +                 */
     1455 +                grp->lg_rx_group_count = MAX(grp->lg_rx_group_count,
     1456 +                    num_rgroups);
     1457 +        }
     1458 +
1383 1459          /*
     1460 +         * There could be cases where the hardware provides more
     1461 +         * groups than aggr can support. Make sure we never go above
     1462 +         * the max aggr can support.
     1463 +         */
     1464 +        grp->lg_rx_group_count = MIN(grp->lg_rx_group_count,
     1465 +            MAX_GROUPS_PER_PORT);
     1466 +
     1467 +        ASSERT3U(grp->lg_rx_group_count, >, 0);
     1468 +        for (i = 0; i < MAX_GROUPS_PER_PORT; i++) {
     1469 +                grp->lg_rx_groups[i].arg_index = i;
     1470 +                grp->lg_rx_groups[i].arg_untagged = 0;
     1471 +                list_create(&(grp->lg_rx_groups[i].arg_vlans),
     1472 +                    sizeof (aggr_vlan_t), offsetof(aggr_vlan_t, av_link));
     1473 +        }
     1474 +
     1475 +        /*
1384 1476           * If no explicit MAC address was specified by the administrator,
1385 1477           * set it to the MAC address of the first port.
1386 1478           */
1387 1479          grp->lg_addr_fixed = mac_fixed;
1388 1480          if (grp->lg_addr_fixed) {
1389 1481                  /* validate specified address */
1390 1482                  if (bcmp(aggr_zero_mac, mac_addr, ETHERADDRL) == 0) {
1391 1483                          err = EINVAL;
1392 1484                          goto bail;
1393 1485                  }
1394 1486                  bcopy(mac_addr, grp->lg_addr, ETHERADDRL);
1395 1487          } else {
1396 1488                  bcopy(grp->lg_ports->lp_addr, grp->lg_addr, ETHERADDRL);
1397 1489                  grp->lg_mac_addr_port = grp->lg_ports;
1398 1490          }
1399 1491  
1400      -        /* set the initial group capabilities */
     1492 +        /* Set the initial group capabilities. */
1401 1493          aggr_grp_capab_set(grp);
1402 1494  
1403 1495          if ((mac = mac_alloc(MAC_VERSION)) == NULL) {
1404 1496                  err = ENOMEM;
1405 1497                  goto bail;
1406 1498          }
1407 1499          mac->m_type_ident = MAC_PLUGIN_IDENT_ETHER;
1408 1500          mac->m_driver = grp;
1409 1501          mac->m_dip = aggr_dip;
1410 1502          mac->m_instance = grp->lg_key > AGGR_MAX_KEY ? (uint_t)-1 : grp->lg_key;
↓ open down ↓ 14 lines elided ↑ open up ↑
1425 1517                  grp->lg_mh = NULL;
1426 1518                  goto bail;
1427 1519          }
1428 1520  
1429 1521          mac_perim_enter_by_mh(grp->lg_mh, &mph);
1430 1522  
1431 1523          /*
1432 1524           * Update the MAC address of the constituent ports.
1433 1525           * None of the port is attached at this time, the link state of the
1434 1526           * aggregation will not change.
     1527 +         *
     1528 +         * All ports take on the primary MAC address of the aggr
     1529 +         * (lg_aggr). At this point, none of the ports are attached;
     1530 +         * thus the link state of the aggregation will not change.
1435 1531           */
1436 1532          link_state_changed = aggr_grp_update_ports_mac(grp);
1437 1533          ASSERT(!link_state_changed);
1438 1534  
1439      -        /* update outbound load balancing policy */
     1535 +        /* Update outbound load balancing policy. */
1440 1536          aggr_send_update_policy(grp, policy);
1441 1537  
1442      -        /* set LACP mode */
     1538 +        /* Set LACP mode. */
1443 1539          aggr_lacp_set_mode(grp, lacp_mode, lacp_timer);
1444 1540  
1445 1541          /*
1446 1542           * Attach each port if necessary.
1447 1543           */
1448 1544          for (port = grp->lg_ports; port != NULL; port = port->lp_next) {
1449 1545                  /*
1450      -                 * Create the pseudo ring for each HW ring of the underlying
1451      -                 * port. Note that this is done after the aggr registers the
1452      -                 * mac.
     1546 +                 * Create the pseudo ring for each HW ring of the
     1547 +                 * underlying port. Note that this is done after the
     1548 +                 * aggr registers its MAC.
1453 1549                   */
1454      -                VERIFY(aggr_add_pseudo_tx_group(port, &grp->lg_tx_group) == 0);
1455      -                VERIFY(aggr_add_pseudo_rx_group(port, &grp->lg_rx_group) == 0);
     1550 +                VERIFY3S(aggr_add_pseudo_tx_group(port, &grp->lg_tx_group),
     1551 +                    ==, 0);
     1552 +
     1553 +                for (i = 0; i < grp->lg_rx_group_count; i++) {
     1554 +                        VERIFY3S(aggr_add_pseudo_rx_group(port,
     1555 +                            &grp->lg_rx_groups[i]), ==, 0);
     1556 +                }
     1557 +
1456 1558                  if (aggr_port_notify_link(grp, port))
1457 1559                          link_state_changed = B_TRUE;
1458 1560  
1459 1561                  /*
1460 1562                   * Initialize the callback functions for this port.
1461 1563                   */
1462 1564                  aggr_port_init_callbacks(port);
1463 1565          }
1464 1566  
1465 1567          if (link_state_changed)
↓ open down ↓ 261 lines elided ↑ open up ↑
1727 1829                   * it is called from inside aggr_grp_rem_port() after the
1728 1830                   * port has been detached. The reason is that
1729 1831                   * aggr_rem_pseudo_tx_group() removes one ring at a time
1730 1832                   * and if there is still traffic going on, then there
1731 1833                   * is the possibility of aggr_find_tx_ring() returning a
1732 1834                   * removed ring for transmission. Once the port has been
1733 1835                   * detached, that port will not be used and
1734 1836                   * aggr_find_tx_ring() will not return any rings
1735 1837                   * belonging to it.
1736 1838                   */
1737      -                aggr_rem_pseudo_rx_group(port, &grp->lg_rx_group);
     1839 +                for (i = 0; i < grp->lg_rx_group_count; i++)
     1840 +                        aggr_rem_pseudo_rx_group(port, &grp->lg_rx_groups[i]);
1738 1841  
1739 1842                  /* remove port from group */
1740 1843                  rc = aggr_grp_rem_port(grp, port, &mac_addr_changed,
1741 1844                      &link_state_changed);
1742 1845                  ASSERT(rc == 0);
1743 1846                  mac_addr_update = mac_addr_update || mac_addr_changed;
1744 1847                  link_state_update = link_state_update || link_state_changed;
1745 1848          }
1746 1849  
1747 1850  bail:
↓ open down ↓ 84 lines elided ↑ open up ↑
1832 1935          /* detach and free MAC ports associated with group */
1833 1936          port = grp->lg_ports;
1834 1937          while (port != NULL) {
1835 1938                  cport = port->lp_next;
1836 1939                  mac_perim_enter_by_mh(port->lp_mh, &pmph);
1837 1940                  if (grp->lg_started)
1838 1941                          aggr_port_stop(port);
1839 1942                  (void) aggr_grp_detach_port(grp, port);
1840 1943                  mac_perim_exit(pmph);
1841 1944                  aggr_rem_pseudo_tx_group(port, &grp->lg_tx_group);
1842      -                aggr_rem_pseudo_rx_group(port, &grp->lg_rx_group);
     1945 +                for (uint_t i = 0; i < grp->lg_rx_group_count; i++)
     1946 +                        aggr_rem_pseudo_rx_group(port, &grp->lg_rx_groups[i]);
1843 1947                  aggr_port_delete(port);
1844 1948                  port = cport;
1845 1949          }
1846 1950  
1847 1951          mac_perim_exit(mph);
1848 1952  
1849 1953          kmem_free(grp->lg_tx_blocked_rings,
1850 1954              (sizeof (mac_ring_handle_t *) * MAX_RINGS_PER_GROUP));
1851 1955          /*
1852 1956           * Wait for the port's lacp timer thread and its notification callback
1853 1957           * to exit before calling mac_unregister() since both needs to access
1854 1958           * the mac perimeter of the grp.
1855 1959           */
1856 1960          aggr_grp_port_wait(grp);
1857 1961  
1858 1962          VERIFY(mac_unregister(grp->lg_mh) == 0);
1859 1963          grp->lg_mh = NULL;
1860 1964  
1861      -        list_destroy(&(grp->lg_rx_group.arg_vlans));
     1965 +        for (uint_t i = 0; i < MAX_GROUPS_PER_PORT; i++) {
     1966 +                list_destroy(&(grp->lg_rx_groups[i].arg_vlans));
     1967 +        }
1862 1968  
1863 1969          AGGR_GRP_REFRELE(grp);
1864 1970          return (0);
1865 1971  }
1866 1972  
1867 1973  void
1868 1974  aggr_grp_free(aggr_grp_t *grp)
1869 1975  {
1870 1976          ASSERT(grp->lg_refs == 0);
1871 1977          ASSERT(grp->lg_port_ref == 0);
↓ open down ↓ 345 lines elided ↑ open up ↑
2217 2323                  } else {
2218 2324                          return (B_FALSE);
2219 2325                  }
2220 2326          }
2221 2327          case MAC_CAPAB_NO_NATIVEVLAN:
2222 2328                  return (!grp->lg_vlan);
2223 2329          case MAC_CAPAB_NO_ZCOPY:
2224 2330                  return (!grp->lg_zcopy);
2225 2331          case MAC_CAPAB_RINGS: {
2226 2332                  mac_capab_rings_t *cap_rings = cap_data;
     2333 +                uint_t ring_cnt = 0;
2227 2334  
     2335 +                for (uint_t i = 0; i < grp->lg_rx_group_count; i++)
     2336 +                        ring_cnt += grp->lg_rx_groups[i].arg_ring_cnt;
     2337 +
2228 2338                  if (cap_rings->mr_type == MAC_RING_TYPE_RX) {
2229 2339                          cap_rings->mr_group_type = MAC_GROUP_TYPE_STATIC;
2230      -                        cap_rings->mr_rnum = grp->lg_rx_group.arg_ring_cnt;
2231      -
2232      -                        /*
2233      -                         * An aggregation advertises only one (pseudo) RX
2234      -                         * group, which virtualizes the main/primary group of
2235      -                         * the underlying devices.
2236      -                         */
2237      -                        cap_rings->mr_gnum = 1;
     2340 +                        cap_rings->mr_rnum = ring_cnt;
     2341 +                        cap_rings->mr_gnum = grp->lg_rx_group_count;
2238 2342                          cap_rings->mr_gaddring = NULL;
2239 2343                          cap_rings->mr_gremring = NULL;
2240 2344                  } else {
2241 2345                          cap_rings->mr_group_type = MAC_GROUP_TYPE_STATIC;
2242 2346                          cap_rings->mr_rnum = grp->lg_tx_group.atg_ring_cnt;
2243 2347                          cap_rings->mr_gnum = 0;
2244 2348                  }
2245 2349                  cap_rings->mr_rget = aggr_fill_ring;
2246 2350                  cap_rings->mr_gget = aggr_fill_group;
2247 2351                  break;
↓ open down ↓ 18 lines elided ↑ open up ↑
2266 2370  }
2267 2371  
2268 2372  /*
2269 2373   * Callback function for MAC layer to register groups.
2270 2374   */
2271 2375  static void
2272 2376  aggr_fill_group(void *arg, mac_ring_type_t rtype, const int index,
2273 2377      mac_group_info_t *infop, mac_group_handle_t gh)
2274 2378  {
2275 2379          aggr_grp_t *grp = arg;
2276      -        aggr_pseudo_rx_group_t *rx_group;
2277      -        aggr_pseudo_tx_group_t *tx_group;
2278 2380  
2279      -        ASSERT(index == 0);
2280 2381          if (rtype == MAC_RING_TYPE_RX) {
2281      -                rx_group = &grp->lg_rx_group;
     2382 +                aggr_pseudo_rx_group_t *rx_group = &grp->lg_rx_groups[index];
     2383 +
2282 2384                  rx_group->arg_gh = gh;
2283 2385                  rx_group->arg_grp = grp;
2284 2386  
2285 2387                  infop->mgi_driver = (mac_group_driver_t)rx_group;
2286 2388                  infop->mgi_start = NULL;
2287 2389                  infop->mgi_stop = NULL;
2288 2390                  infop->mgi_addmac = aggr_addmac;
2289 2391                  infop->mgi_remmac = aggr_remmac;
2290 2392                  infop->mgi_count = rx_group->arg_ring_cnt;
2291 2393  
2292 2394                  /*
2293 2395                   * Always set the HW VLAN callbacks. They are smart
2294 2396                   * enough to know when a port has HW VLAN filters to
2295 2397                   * program and when it doesn't.
2296 2398                   */
2297 2399                  infop->mgi_addvlan = aggr_addvlan;
2298 2400                  infop->mgi_remvlan = aggr_remvlan;
2299 2401          } else {
2300      -                tx_group = &grp->lg_tx_group;
     2402 +                aggr_pseudo_tx_group_t *tx_group = &grp->lg_tx_group;
     2403 +
     2404 +                ASSERT3S(index, ==, 0);
2301 2405                  tx_group->atg_gh = gh;
2302 2406          }
2303 2407  }
2304 2408  
2305 2409  /*
2306 2410   * Callback funtion for MAC layer to register all rings.
2307 2411   */
2308 2412  static void
2309 2413  aggr_fill_ring(void *arg, mac_ring_type_t rtype, const int rg_index,
2310 2414      const int index, mac_ring_info_t *infop, mac_ring_handle_t rh)
2311 2415  {
2312 2416          aggr_grp_t      *grp = arg;
2313 2417  
2314 2418          switch (rtype) {
2315 2419          case MAC_RING_TYPE_RX: {
2316      -                aggr_pseudo_rx_group_t  *rx_group = &grp->lg_rx_group;
     2420 +                aggr_pseudo_rx_group_t  *rx_group;
2317 2421                  aggr_pseudo_rx_ring_t   *rx_ring;
2318 2422                  mac_intr_t              aggr_mac_intr;
2319 2423  
2320      -                ASSERT(rg_index == 0);
2321      -
2322      -                ASSERT((index >= 0) && (index < rx_group->arg_ring_cnt));
     2424 +                rx_group = &grp->lg_rx_groups[rg_index];
     2425 +                ASSERT3S(index, >=, 0);
     2426 +                ASSERT3S(index, <, rx_group->arg_ring_cnt);
2323 2427                  rx_ring = rx_group->arg_rings + index;
2324 2428                  rx_ring->arr_rh = rh;
2325 2429  
2326 2430                  /*
2327 2431                   * Entrypoint to enable interrupt (disable poll) and
2328 2432                   * disable interrupt (enable poll).
2329 2433                   */
2330 2434                  aggr_mac_intr.mi_handle = (mac_intr_handle_t)rx_ring;
2331 2435                  aggr_mac_intr.mi_enable = aggr_pseudo_enable_intr;
2332 2436                  aggr_mac_intr.mi_disable = aggr_pseudo_disable_intr;
2333 2437                  aggr_mac_intr.mi_ddi_handle = NULL;
2334 2438  
2335 2439                  infop->mri_driver = (mac_ring_driver_t)rx_ring;
2336      -                infop->mri_start = aggr_pseudo_start_ring;
2337      -                infop->mri_stop = NULL;
     2440 +                infop->mri_start = aggr_pseudo_start_rx_ring;
     2441 +                infop->mri_stop = aggr_pseudo_stop_rx_ring;
2338 2442  
2339 2443                  infop->mri_intr = aggr_mac_intr;
2340 2444                  infop->mri_poll = aggr_rx_poll;
2341 2445  
2342 2446                  infop->mri_stat = aggr_rx_ring_stat;
2343 2447                  break;
2344 2448          }
2345 2449          case MAC_RING_TYPE_TX: {
2346 2450                  aggr_pseudo_tx_group_t  *tx_group = &grp->lg_tx_group;
2347 2451                  aggr_pseudo_tx_ring_t   *tx_ring;
↓ open down ↓ 66 lines elided ↑ open up ↑
2414 2518  
2415 2519  static int
2416 2520  aggr_addmac(void *arg, const uint8_t *mac_addr)
2417 2521  {
2418 2522          aggr_pseudo_rx_group_t  *rx_group = (aggr_pseudo_rx_group_t *)arg;
2419 2523          aggr_unicst_addr_t      *addr, **pprev;
2420 2524          aggr_grp_t              *grp = rx_group->arg_grp;
2421 2525          aggr_port_t             *port, *p;
2422 2526          mac_perim_handle_t      mph;
2423 2527          int                     err = 0;
     2528 +        uint_t                  idx = rx_group->arg_index;
2424 2529  
2425 2530          mac_perim_enter_by_mh(grp->lg_mh, &mph);
2426 2531  
2427 2532          if (bcmp(mac_addr, grp->lg_addr, ETHERADDRL) == 0) {
2428 2533                  mac_perim_exit(mph);
2429 2534                  return (0);
2430 2535          }
2431 2536  
2432 2537          /*
2433 2538           * Insert this mac address into the list of mac addresses owned by
↓ open down ↓ 6 lines elided ↑ open up ↑
2440 2545                          return (EEXIST);
2441 2546                  }
2442 2547                  pprev = &addr->aua_next;
2443 2548          }
2444 2549          addr = kmem_alloc(sizeof (aggr_unicst_addr_t), KM_SLEEP);
2445 2550          bcopy(mac_addr, addr->aua_addr, ETHERADDRL);
2446 2551          addr->aua_next = NULL;
2447 2552          *pprev = addr;
2448 2553  
2449 2554          for (port = grp->lg_ports; port != NULL; port = port->lp_next)
2450      -                if ((err = aggr_port_addmac(port, mac_addr)) != 0)
     2555 +                if ((err = aggr_port_addmac(port, idx, mac_addr)) != 0)
2451 2556                          break;
2452 2557  
2453 2558          if (err != 0) {
2454 2559                  for (p = grp->lg_ports; p != port; p = p->lp_next)
2455      -                        aggr_port_remmac(p, mac_addr);
     2560 +                        aggr_port_remmac(p, idx, mac_addr);
2456 2561  
2457 2562                  *pprev = NULL;
2458 2563                  kmem_free(addr, sizeof (aggr_unicst_addr_t));
2459 2564          }
2460 2565  
2461 2566          mac_perim_exit(mph);
2462 2567          return (err);
2463 2568  }
2464 2569  
2465 2570  static int
↓ open down ↓ 24 lines elided ↑ open up ↑
2490 2595                          continue;
2491 2596                  }
2492 2597                  break;
2493 2598          }
2494 2599          if (addr == NULL) {
2495 2600                  mac_perim_exit(mph);
2496 2601                  return (EINVAL);
2497 2602          }
2498 2603  
2499 2604          for (port = grp->lg_ports; port != NULL; port = port->lp_next)
2500      -                aggr_port_remmac(port, mac_addr);
     2605 +                aggr_port_remmac(port, rx_group->arg_index, mac_addr);
2501 2606  
2502 2607          *pprev = addr->aua_next;
2503 2608          kmem_free(addr, sizeof (aggr_unicst_addr_t));
2504 2609  
2505 2610          mac_perim_exit(mph);
2506 2611          return (err);
2507 2612  }
2508 2613  
2509 2614  /*
2510 2615   * Search for VID in the Rx group's list and return a pointer if
↓ open down ↓ 15 lines elided ↑ open up ↑
2526 2631  /*
2527 2632   * Accept traffic on the specified VID.
2528 2633   *
2529 2634   * Persist VLAN state in the aggr so that ports added later will
2530 2635   * receive the correct filters. In the future it would be nice to
2531 2636   * allow aggr to iterate its clients instead of duplicating state.
2532 2637   */
2533 2638  static int
2534 2639  aggr_addvlan(mac_group_driver_t gdriver, uint16_t vid)
2535 2640  {
2536      -        aggr_pseudo_rx_group_t  *rx_group = (aggr_pseudo_rx_group_t *)gdriver;
     2641 +        aggr_pseudo_rx_group_t  *rx_group = (aggr_pseudo_rx_group_t *)gdriver;
2537 2642          aggr_grp_t              *aggr = rx_group->arg_grp;
2538 2643          aggr_port_t             *port, *p;
2539 2644          mac_perim_handle_t      mph;
2540 2645          int                     err = 0;
2541 2646          aggr_vlan_t             *avp = NULL;
     2647 +        uint_t                  idx = rx_group->arg_index;
2542 2648  
2543 2649          mac_perim_enter_by_mh(aggr->lg_mh, &mph);
2544 2650  
2545 2651          if (vid == MAC_VLAN_UNTAGGED) {
2546 2652                  /*
2547 2653                   * Aggr is both a MAC provider and MAC client. As a
2548 2654                   * MAC provider it is passed MAC_VLAN_UNTAGGED by its
2549 2655                   * client. As a client itself, it should pass
2550 2656                   * VLAN_ID_NONE to its ports.
2551 2657                   */
↓ open down ↓ 9 lines elided ↑ open up ↑
2561 2667                  mac_perim_exit(mph);
2562 2668                  return (0);
2563 2669          }
2564 2670  
2565 2671          avp = kmem_zalloc(sizeof (aggr_vlan_t), KM_SLEEP);
2566 2672          avp->av_vid = vid;
2567 2673          avp->av_refs = 1;
2568 2674  
2569 2675  update_ports:
2570 2676          for (port = aggr->lg_ports; port != NULL; port = port->lp_next)
2571      -                if ((err = aggr_port_addvlan(port, vid)) != 0)
     2677 +                if ((err = aggr_port_addvlan(port, idx, vid)) != 0)
2572 2678                          break;
2573 2679  
2574 2680          if (err != 0) {
2575 2681                  /*
2576 2682                   * If any of these calls fail then we are in a
2577 2683                   * situation where the ports have different HW state.
2578 2684                   * There's no reasonable action the MAC client can
2579 2685                   * take in this scenario to rectify the situation.
2580 2686                   */
2581 2687                  for (p = aggr->lg_ports; p != port; p = p->lp_next) {
2582 2688                          int err2;
2583 2689  
2584      -                        if ((err2 = aggr_port_remvlan(p, vid)) != 0) {
     2690 +                        if ((err2 = aggr_port_remvlan(p, idx, vid)) != 0) {
2585 2691                                  cmn_err(CE_WARN, "Failed to remove VLAN %u"
2586 2692                                      " from port %s: errno %d.", vid,
2587 2693                                      mac_client_name(p->lp_mch), err2);
2588 2694                          }
2589 2695  
2590 2696                  }
2591 2697  
2592 2698                  if (vid == VLAN_ID_NONE)
2593 2699                          rx_group->arg_untagged--;
2594 2700  
↓ open down ↓ 10 lines elided ↑ open up ↑
2605 2711          mac_perim_exit(mph);
2606 2712          return (err);
2607 2713  }
2608 2714  
2609 2715  /*
2610 2716   * Stop accepting traffic on this VLAN if it's the last use of this VLAN.
2611 2717   */
2612 2718  static int
2613 2719  aggr_remvlan(mac_group_driver_t gdriver, uint16_t vid)
2614 2720  {
2615      -        aggr_pseudo_rx_group_t  *rx_group = (aggr_pseudo_rx_group_t *)gdriver;
     2721 +        aggr_pseudo_rx_group_t  *rx_group = (aggr_pseudo_rx_group_t *)gdriver;
2616 2722          aggr_grp_t              *aggr = rx_group->arg_grp;
2617 2723          aggr_port_t             *port, *p;
2618 2724          mac_perim_handle_t      mph;
2619 2725          int                     err = 0;
2620 2726          aggr_vlan_t             *avp = NULL;
     2727 +        uint_t                  idx = rx_group->arg_index;
2621 2728  
2622 2729          mac_perim_enter_by_mh(aggr->lg_mh, &mph);
2623 2730  
2624 2731          /*
2625 2732           * See the comment in aggr_addvlan().
2626 2733           */
2627 2734          if (vid == MAC_VLAN_UNTAGGED) {
2628 2735                  vid = VLAN_ID_NONE;
2629 2736                  rx_group->arg_untagged--;
2630 2737  
↓ open down ↓ 10 lines elided ↑ open up ↑
2641 2748                  goto done;
2642 2749          }
2643 2750  
2644 2751          avp->av_refs--;
2645 2752  
2646 2753          if (avp->av_refs > 0)
2647 2754                  goto done;
2648 2755  
2649 2756  update_ports:
2650 2757          for (port = aggr->lg_ports; port != NULL; port = port->lp_next)
2651      -                if ((err = aggr_port_remvlan(port, vid)) != 0)
     2758 +                if ((err = aggr_port_remvlan(port, idx, vid)) != 0)
2652 2759                          break;
2653 2760  
2654 2761          /*
2655 2762           * See the comment in aggr_addvlan() for justification of the
2656 2763           * use of VERIFY here.
2657 2764           */
2658 2765          if (err != 0) {
2659 2766                  for (p = aggr->lg_ports; p != port; p = p->lp_next) {
2660 2767                          int err2;
2661 2768  
2662      -                        if ((err2 = aggr_port_addvlan(p, vid)) != 0) {
     2769 +                        if ((err2 = aggr_port_addvlan(p, idx, vid)) != 0) {
2663 2770                                  cmn_err(CE_WARN, "Failed to add VLAN %u"
2664 2771                                      " to port %s: errno %d.", vid,
2665 2772                                      mac_client_name(p->lp_mch), err2);
2666 2773                          }
2667 2774                  }
2668 2775  
2669 2776                  if (avp != NULL)
2670 2777                          avp->av_refs++;
2671 2778  
2672 2779                  if (vid == VLAN_ID_NONE)
↓ open down ↓ 598 lines elided ↑ open up ↑
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX